Re: GNU Parted & partition image

bug-parted
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Parted & partition image

From:	Andrew Clausen
Subject:	Re: GNU Parted & partition image
Date:	Wed, 01 Nov 2000 18:43:56 +1100
Hi François,

(BTW: François wrote partimage, a partition imaging program.
partimage.sourceforge.org, IIRC)

François Dupoux wrote:
> 
> > Anyway, here are my ideas:
> > * perhaps we should make a function:
> >
> >       int ped_file_system_is_region_used (PedFileSystem* fs, PedGeometry* 
> > geom)
> >
> > Which would return 1 if part of the region, represented by geom inside the
> > file system has used blocks (which need to be copied)
> 
> Your idea is very interesting. I think you could use the code of my project a
> lot to write this function, because a big part of my code is written to know
> what blocks are used.

Exactly :-)

> Some questions:
> - What type of block will it be ?
>   * 512 bytes
>   * or a cluster for FAT/NTFS, a 4096 bytes block for Reiser...
> I think it's not possible to use "high level blocks" (as clusters for Ms File
> systems) because there is a problem with it: for example, on the FAT file
> sys, the beginning of the partition contains meta data which are not
> clusters. Then a cluster view of the disk would fail.

More importantly, 4k blocks may not be aligned perfectly with the start.  i.e.
The sector number (512 bytes) of the start of a cluster, relative to the
start of the partition may not be divisible by 4k (or whatever the cluster
size is)

        cluster_location % cluster_size != 0

A solution to this is to ask the file system where the "data" area starts (i.e.
where the first cluster is).  However, I think it's better to use 512 byte 
blocks
(see below)
 
> But there is a disadvantage if you use 512 bytes blocks to map the partition:
> file systems whose blocks size are big (as 4096 bytes for ext2fs sometimes,
> or reiserfs) will need a big bitmap to map the same disk space:
> For example, if you had a 8192 bytes ext2fs partition, with 4096 blocks:
> - you need 2 bits to map the file system with 4096 blocks
> - you need 16 bites to map the same space with 512 blocks

OTOH, the 16 bits will probably compress quite well, since, for most clusters,
the entire cluster is used.  (BTW: my FAT code can tell you HOW MUCH of a
cluster is used B)

(Eg: for huffman coding, you could expect 10 - 2 bits - to be "cluster entirely
used" and 11 - 2 bits - to be "cluster entirely empty", which would cover most
entries.  Obviously, we can let gzip/bzip2 handle this, but I'm pointing out
that gzip/bzip2 should be able to do a very good job.  If they can't do a good
job, we can do it ourselves ;-)
 
> Then, the best solution would be to use a different block size for each file
> system:
> * 512 bytes for FAT
> * logical block size for NTFS, HPFS, Extfs2, ReiserFS (because the bitmap of
> the partition maps all the data of the partition)

Obviously, I don't think it's necessary.

> Another problem is how to implement it:
> 
> If you use this code
> --------------------------
> fat_isBlockUsed(char *szPartition, long long int block)
> {
>   read the FAT table and the bootsect
> 
>   return (block == fat table || block == bootsect || block == used-data)
> }
> 
> fat_copyPartition(char *szPartition, char *szDest)
> {
>   for (i=0; i < block-count; i++)
>   {
>      if (fat_isBlockUsed(i))
>        copy-block(i);
>   }
> }
> 
> The problem in this code is you will have to read the file system metadata
> (boot sector, FAT allocation table) for each block to copy.

Why?  You have to ped_file_system_open() the file system first, which will load
all metadata.

BTW: with the existing libparted, for FAT, you can determine if a sector is 
used with:
(sectors are 512 bytes)

int
fat_is_sector_used (const PedFileSystem* fs, PedSector sector)
{
        const FatSpecific*    fs_info = FAT_SPECIFIC (fs);

        if (sector < fs_info->cluster_offset)
                return 1;       /* all data before the first cluster is 
important */
                                /* (actually, this isn't quite true, but there 
most
                                of it is, so there's no point worrying about 
it) */
        else
                return fat_is_fragment_active (fs, fat_sector_to_frag (fs, 
sector));
}

So, the code is trivial ;-)  Likewise for ext2.  Because we already have lots of
code for ext2 and FAT, it is wiser to build this stuff on top of the existing
libparted code.  For other file systems (NTFS, Reiserfs, etc.), you're code will
be useful :-)

BTW: to use this, it would look roughly like:

        fs = ped_file_system_open (&geom);      /* geom is some region on a 
disk */
        for (i = 0; i < geom->length; i++) {
                if (ped_file_system_is_sector_used (fs, i)) {
                        /* ... */
                }
        }
        ped_file_system_close (fs);

I don't advocate it looking exactly like this - I think 5000000 calls to
ped_file_system_is_sector_used() is going to use a lot of CPU, as it goes 
through
all the bureaucracy with polymorphism, etc.  This may even rival the disks 
(lack of)
speed, on 486's, etc.  Better idea: make a PedBitmap:

typedef char    PedBitmap

int
ped_bitmap_get (PedBitmap* bitmap, PedSector sector)
{
        return bitmap [sector / 8] & (1 << (sector % 8)) > 0;
}

void
ped_bitmap_set (PedBitmap* bitmap, PedSector sector, int value)
{
        if (value)
                bitmap [sector / 8] |= 1 << (sector % 8);
        else
                bitmap [sector / 8] &= ~ (1 << (sector % 8));
}

(We could complicate PedBitmap a bit, with range checking, etc., if we 
wanted...)

And, have the file system function to be called:

int ped_file_system_mark_used_blocks (PedFileSystem* fs, PedGeometry* geom, 
PedBitmap*
bitmap)

So, geom refers to a region (i.e. a continous group of blocks on the file 
system), and
the file system code should mark the corresponding bits in the bitmap as 1 for 
used
blocks,
and 0 for unused blocks.  i.e. the first block in the region (geom->start) 
corresponds to
the first bit (i.e. 0) in the bitmap.

We could use this bitmap as the basic format of our partition images - just use 
a constant
bitmap size, of say, 2048 sectors (or whatever), and write out the bitmap, 
followed by the
used blocks, in order.  As I said earlier, these block bitmaps should compress 
very well,
since they will probably look something like 0, 255, 0, 0, 0, 255, etc. ;-)
 
> Then, the isBlockUsed() function will need to have all meta-data....

Obviously, but you fetch all the metadata on ped_file_system_open(), and keep it
as necessary.  Or, another approach (which is necessary for "Real" file systems,
that can scale)... is to cache metadata.  As Lennert pointed out to me on IRC,
we can probably mmap() the ext2 block bitmaps (and everything else, for that
matter), and let Linux figure out when to load it in / throw it out of memory.

> The big advantage of your idea is you can use the same code to save all file
> systems to an image file. You just have to re-write ths isBlockUsed()
> function.

Exactly ;-)

> About working on a mounted partition:
> is it possible to lock a partition before working on it ?

Well, the LVM people where discussing something like that...
The mail archives are at http://linux.msede.com/lvm/mlist/archive/
Unfortunately, they don't have October's archive up yet.  I haven't kept a copy
of the discussion, so I can't tell you more now... hopefully, the archive will
be up soon...  I didn't really pay much attention to it at the time :-(

> If that's easy when reading the partition, (just unlock after finish
> reading), how to do when you write to the partition. When the operation is
> finished, how to tell the system to reload all meta-data ?

Hehe, there's NO WAY you could restore a partition image, and keep it mounted!
I meant, when saving a partition to an image file.

Andrew Clausen
[Prev in Thread]
Current Thread
[Next in Thread]
Re: GNU Parted & partition image, Andrew Clausen <=
Next by Date: Parted 1.4.0-pre6
Next by thread: Parted 1.4.0-pre6
Index(es):
- Date
- Thread