[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] A useless but interesting exercise: Design MH from scr

From: Ralph Corderoy
Subject: Re: [Nmh-workers] A useless but interesting exercise: Design MH from scratch in the 2014 context
Date: Thu, 20 Feb 2014 11:37:22 +0000

Hi Jeff,

> Just because you can use a lot of disk space doesn't mean you should.
> If you use a database backend for indexing the messages you detect and
> avoid duplicate copies of attachments.  The actual pieces of the
> message could still exist in directories with duplicate copies linked
> in.  Attachments, and copies of attachments, are the main source of
> disk usage in my mail folders.

I've been idly thinking about this quite a bit, and not just for MH's
~/mail.  I think deduplication is better done by the filesystem than
every application.  If the filesystem doesn't provide it then you get
bloat, but we already have that.

Hard links act coarsely on whole files and have only one inode for
metadata.  btrfs, a Linux filesystem, has IIRC a filesystem format that
lets a sequence of stored bytes be referred to by other stored data,
perhaps with a "compression" method involved.  The ioctl(2) interface
only works for runs of data that are a multiple of block size but the
on-disk format is more flexible and perhaps the code will catch up.
btrfs does some deduplication as it receives data but the rest is done
as a background task, and userspace gets involved AIUI, so it isn't
putting lots of application logic in a filesystem.  btrfs isn't alone in
supporting deduplication beyond hard links.

For a filesystem in "archive" mode one can imagine an email with a
base64-encoded tar file sitting in ~/mail, foo.tar being extracted and
including foo/bar.ps.  Over time, this might become a compressed bar.ps
stored on disk with foo/bar.ps referring to it, foo.tar referring to
foo/bar.ps along with other files, and *part* of ~/mail/inbox/42 being
stored using "compression" method base64 on foo.tar.

Yes, overhead in CPU and access-time but then Facebook have developed a
Blue-ray disk-changer for archive storage.

Cheers, Ralph.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]