libcdio-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Libcdio-devel] [RFC] New API iso9660_statv2_t as API/ABI compatible way


From: Thomas Schmitt
Subject: [Libcdio-devel] [RFC] New API iso9660_statv2_t as API/ABI compatible way to read files >= 4 GiB
Date: Fri, 06 Jul 2018 20:09:19 +0200

Hi,

i have pushed the very large commit 87f5053 to branch ts-multiextent :

  New API around iso9660_statv2_t as successor of iso9660_stat_t for reading
  data files of 4 GiB or larger, while keeping old API and ABI valid.

  
http://git.savannah.gnu.org/cgit/libcdio.git/commit/?h=ts-multiextent&id=87f50538ea360c3e533a4bbee8d8e1b9feba4131

Before jumping on the changeset, reviewers should read the following
introduction to problem and proposed solution.

--------------------------------------------------------------------------

The lack of multi-extent support in iso9660_stat_t led to the development
of a successor type: iso9660_statv2_t
That's because any ABI compatible extension of iso9660_stat_t is prevented
by its last member, an array of variable size, and by the fact that the
struct behind iso9660_stat_t is public.

So i present iso9660_statv2_t for approval or rejection in branch
  ts-multiextent
  http://git.savannah.gnu.org/cgit/libcdio.git/log/?h=ts-multiextent

The new API is based on the work of Pete Batard in branch
  pbatard-multiextent2
where he implemented the proper assessment of multi-extent files while
reading ISO 9660 directory records. His proposal is much more concise than
iso9660_statv2_t.

I see only two problems with his proposal:

* It breaks the ABI.

* It will break the ABI again if ever the maximum number of extents
  ISO_MAX_MULTIEXTENT shall be incresed from 8 to a higher number.
  It is unattractive to choose a large value already now, because each
  extent slot costs 8 byte and iso9660_stat_t can have many instances.

  (This could be fixed without a new API by cherrypicking parts of my
   proposal. I will probably do, if iso9660_statv2_t gets rejected.)

It turned out that avoiding thee two ABI problems implies to stirr up
a lot of code.
Pete is officially allowed to laugh now.

--------------------------------------------------------------------------

Other than its predecessor the entrails of iso9660_statv2_t are completely
private and accessible only via API functions. This gives hope that future
changes will not cause API or ABI problems.

iso9660_stat_free and all API functions which return iso9660_stat_t are
now deprececated. Each of them has a successor function:

  iso9660_fs_stat                ->  iso9660_fs_statv2
  iso9660_fs_stat_translate      ->  iso9660_fs_statv2_translate
  iso9660_fs_readdir             ->  iso9660_fs_readdir_v2
  iso9660_ifs_stat               ->  iso9660_ifs_statv2
  iso9660_ifs_stat_translate     ->  iso9660_ifs_statv2_translate
  iso9660_ifs_readdir            ->  iso9660_ifs_readdir_v2

  iso9660_fs_find_lsn            ->  iso9660_fs_find_lsn_v2
  iso9660_fs_find_lsn_with_path  ->  iso9660_fs_find_lsn_with_path_v2
  iso9660_ifs_find_lsn           ->  iso9660_ifs_find_lsn_v2
  iso9660_ifs_find_lsn_with_path ->  iso9660_ifs_find_lsn_with_path_v2

  iso9660_stat_free              ->  iso9660_statv2_free

Further the list type for iso9660_stat_t members and its methods have
successors:

   CdioISO9660FileList_t         ->  CdioISO9660FileListV2_t
   iso9660_filelist_new          ->  iso9660_filelist_new_v2
   iso9660_filelist_free         ->  iso9660_filelist_free_v2

The new iso9660_statv2_t has access methods for its struct members:

                                     iso9660_statv2_get_rr
                                     iso9660_statv2_get_tm
                                     iso9660_statv2_get_total_size
                                     iso9660_statv2_get_extents
                                     iso9660_statv2_get_xa
                                     iso9660_statv2_get_type
                                     iso9660_statv2_get_filename

and a convenience function to obtain a legacy iso9660_stat_t:

                                     iso9660_statv2_get_v1

--------------------------------------------------------------------------

The old API around iso9660_stat_t is still provided without change.

Each application of libiso9660 which wants to access data files of 4 GiB
or larger will have to make the transition by own effort.
See example/extract.c for the alternative usage of both APIs.

There are two scenarios for adaption of applications to new iso9660_statv2_t:

* Many direct accesses to members of iso9660_stat_t.

* Few such accesses and mainly file data content reading.

--------------------------------------------------------------------------

If the application has many direct accesses to members of iso9660_stat_t,
then continue to use the pointer as a convenience object.
Program example/extract.c demonstrates this in the #else cases of #ifdef
LIBISO9660_USE_LEGACY_STAT.

  * Let us assume that the existing iso9660_stat_t is declared like this:

      iso9660_stat_t *p_stat;

  * As its source of information introduce a new pointer to iso9660_statv2_t
    or a list of such objects.
    E.g.

     + iso9660_statv2_t *p_statv2;

  * If you have lists of iso9660_stat_t, then change their type to the type
    for the new iso9660_statv2_t elements.
    E.g.

     - CdioISO9660FileList_t *p_filelist;
     + CdioISO9660FileListV2_t *p_filelist;
      
  * Change the old API calls to obtain the iso9660_stat_t object or a list
    of such objects to the corresponding calls of the new API.
    E.g.

     - p_stat = iso9660_fs_stat(...);
     + p_statv2 = iso9660_fs_statv2(...);

     - p_filelist = iso9660_fs_readdir(...);
     + p_filelist = iso9660_fs_readdir_v2(...);

  * When using list elements, obtain the iso9660_statv2_t object.
    E.g.

     - p_stat = (iso9660_stat_t *) _cdio_list_node_data();
     + p_statv2 = (iso9660_statv2_t *) _cdio_list_node_data();

  * Obtain the convenience iso9660_stat_t from the iso9660_statv2_t by
    call iso9660_statv2_get_v1().
    E.g.

     + p_stat = iso9660_statv2_get_v1(p_statv2);

  * Now use the convenience iso9660_stat_t for all purposes except for
    data file content.
    E.g.

       strcpy(name, p_stat.filename);

  * For data file content use the iso9660_statv2_t object.
    E.g.

     + uint32_t num_extents;
     + iso9660_extent_descr_t *extents
     
     + total_size = iso9660_statv2_get_total_size(p_statv2);
     + num_extents = iso9660_statv2_get_extents(p_statv2, &extents);

    The content of a datafile is comprised of the byte intervals which
    are described by iso9660_extent_descr_t:
    * iso9660_extent_descr_t.lsn   is the block address
    * iso9660_extent_descr_t.size  is the byte count of the interval
    All those intervals concatenated constitute the data file content.
    So you will have to loop over num_extents when reading data.
    See exaple/extract.c .

  * Replace iso9660_stat_t.secsize by a macro usage for each extent.
    E.g.

     - blocks = p_stat.secsize;
     + blocks = CDIO_EXTENT_BLOCKS(extents[index].size);

  * Finally, wherever the derived iso9660_stat_t is disposed, dispose the
    iso9660_statv2_t object, too:

     + iso9660_statv2_free(p_statv2);
       iso9660_stat_free(p_stat);

--------------------------------------------------------------------------

If only few direct accesses to members of iso9660_stat_t occur, then
it is unnecessary to have a convenience iso9660_stat_t.
Program example/isolist.c demonstrates this approach.

  * Change the types from old to new

     - iso9660_stat_t *p_stat
     + iso9660_statv2_t *p_stat

     - CdioISO9660FileList_t *p_filelist;
     + CdioISO9660FileListV2_t *p_filelist;

  * Change the old API calls to obtain the iso9660_stat_t object or a list
    of such objects to the corresponding calls of the new API.
    E.g.

     - p_stat = iso9660_fs_stat(...);
     + p_stat = iso9660_fs_statv2(...);

     - p_filelist = iso9660_fs_readdir(...);
     + p_filelist = iso9660_fs_readdir_v2(...);

  * When using list elements, obtain the iso9660_statv2_t object.
    E.g.

     - p_stat = (iso9660_stat_t *) _cdio_list_node_data();
     + p_stat = (iso9660_statv2_t *) _cdio_list_node_data();

  * Use the access methods of iso9660_statv2 rather than direct access to
    members of  iso9660_stat_t. (For member .secsize, see below.)
    E.g.

     - strcpy(name, p_stat.filename);
     + strcpy(name, iso9660_statv2_get_filename(p_stat));

  * Adapt to the new way to learn about data file content.
    E.g.

     + uint64_t total_size;
     + uint32_t num_extents;
     + iso9660_extent_descr_t *extents
     
     + total_size = iso9660_statv2_get_total_size(p_stat);
     + num_extents = iso9660_statv2_get_extents(p_stat, &extents);

    The content of a datafile is comprised of the byte intervals which
    are described by iso9660_extent_descr_t:
    * iso9660_extent_descr_t[index].lsn   is the block address
    * iso9660_extent_descr_t[index].size  is the byte count of the interval
    All those intervals concatenated constitute the data file content.
    So you will have to loop over num_extents when reading data.
    index ranges from 0 to num_extents - 1.
    See exaple/extract.c .

  * Replace iso9660_stat_t.secsize by a macro usage for each extent.
    E.g.

     - blocks = p_stat.secsize;
     + blocks = CDIO_EXTENT_BLOCKS(extents[index].size);

  * Finally use the new disposal call for the iso9660_statv2_t object:

     - iso9660_stat_free(p_stat);
     + iso9660_statv2_free(p_stat);

--------------------------------------------------------------------------

There are minimal runtime precautions to detect mismatch of type and
API function.
Typical messages are:

  cdio 3 message: Probable programming error with using 
iso9660_statv2_get_extents : Detected non-iso9660_statv2_t
  cdio ASSERT message: file iso9660_fs.c: line 2132 
(iso9660_statv2_get_extents): assertion failed: (_iso9660_demand_statv2(p_stat, 
"iso9660_statv2_get_extents"))

or

  !ASSERT: file iso9660_fs.c: line 2236 (iso9660_stat_free): assertion failed: 
(_iso9660_demand_stat(p_stat, "iso9660_stat_free"))

The names "_iso9660_demand_statv2" or "_iso9660_demand_stat" tell what
type was expected as parameter.  


==========================================================================

Implementation report:

* The expansion of API struct iso9660_stat_s by Pete Batard was reverted
  in include/cdio/iso9660.h .

* The new struct iso9660_statv2_s is defined in lib/iso9660/iso9660_fs.c.
  In include/cdio/iso9660.h there are only opaque declarations:
    struct iso9660_statv2_s;
    typedef struct iso9660_statv2_s   iso9660_statv2_t;

* All feature implementations in iso9660_fs.c were converted to the new
  type iso9660_statv2_t. The old API functions are now frontends to the
  new ones, which derive their resuling iso9660_stat_t from the
  iso9660_statv2_t which they obtain from the implementations.
  
* Creation and cloning of iso9660_statv2_t are now delegated to private
  methods: iso9660_statv2_new(), iso9660_statv2_clone().

* Initialization, cloning and disposal of entrails of iso_rock_statbuf_t
  are now delegated to semi-public functions in lib/iso9660/rock.c:
  iso9660_rock_statbuf_init(), iso9660_rock_statbuf_clone_entrails(),
  iso9660_rock_statbuf_free_entrails().

* Semi-public functions of lib/iso9660/rock.c which have iso9660_stat_s
  as parameter have been accompanied by equivalent functions which have
  iso_rock_statbuf_t instead:
    int get_rock_ridge_filename_rr(iso9660_dir_t * p_iso9660_dir,
                                   /*out*/ char * psz_name,
                                   /*in/out*/ iso_rock_statbuf_t *p_rr);
    int parse_rock_ridge_stat_rr(iso9660_dir_t *p_iso9660_dir,
                                 /*out*/ iso_rock_statbuf_t *p_rr);
  The old functions are not used by libcdio any more but they are declared
  in include/cdio/rock.h. So they still exist as frontends to the new
  functions.

* The definition of  bool_3way_t  in  include/cdio/types.h  got a comment
  which warns that the type may not get a value 3 or else breaks the runtime
  safety check which shall detect bad combinations of old and new types
  with API calls.

* Examples, tests, and src/ programs have been adapted to use the new API:
    example/extract.c   (shows usage of both APIs)
    example/isofile.c
    example/isofile2.c
    example/isolist.c
    src/iso-read.c       
    src/util.c
    src/cd-info.c
    src/iso-info.c
    test/testisocd2.c
    test/testisocd_joliet.c

* One test was braought back to pure legacy iso9660_stat_t use (i.e. i
  reverted Pete Batard's changes to it).
  Just to surely have such an application at compile time:
    test/testisocd.c

* Fixed wrong description of CdioList_t in iso9660.h. (Its destructor
  implementation iso9660_dirlist_free() submits free(3) as member destructor
  function. Its users inside libcdio submit text strings as members.
  External users of legacy iso9660_stat_t will suffer no memory leaks as
  long as only directories are submitted to the list. With iso9660_stat2_t
  there would be substantial memory leaking.)

--------------------------------------------------------------------------

Open decisions

* Legacy iso9660_stat_s has an anonymous enum for the file type.
  It is not possible to use the same enum names in another enum in successor
  iso9660_statv2_s.
  Alternative solutions are:

  - Define and use a named enum and thus change the souce code of the API
    definition.
    To my understanding the change is compatible. But that is a bet on
    the plausibility of compiler behavior. At least i cannot give hard
    reasons why this must stay API and ABI compatible under all circumstances.
    Define ISO9660_WITH_FILE_TYPE_ENUM to activate this alternative.

  - Simulate in iso9660_statv2_s the anonymous enum by some integer-ish type
    which bears the same values as publicly defined by the anonymous enum.
    This is a dirty hack, but leaves the legacy API literally unchanged.
    Do NOT define ISO9660_WITH_FILE_TYPE_ENUM to activate this alternative.

    This is currently enabled.

* Where to declare semi-public methods of iso_rock_statbuf_t ?
  I introduced three calls by which iso9660_fs.c can avoid to dig in the
  entrails of iso_rock_statbuf_t. Their prototypes should be exposed to
    lib/iso9660/iso9660_fs.c
    lib/iso9660/rock.c
  but not to the API.
  So i can hardly declare them in
    include/cdio/rock.h

--------------------------------------------------------------------------

Test runs on GNU/Linux

I configured libcdio with --disable-shared in order to get it ready for
valgrind (and for gdb, too).

The ISOs used are available as

  http://scdbackup.webframe.org/multi_extent_8k.iso.gz
  MD5 c2c57fa0bc428f6032b33f61ceae8bef
  (use gzip to uncompress)

  
https://cdimage.debian.org/debian-cd/current/i386/iso-cd/debian-9.4.0-i386-netinst.iso
  MD5 a9a477d5cd311635d502c4ab746743d3
After next Debian point release it will be in the archive:
  
http://cdimage.debian.org/mirror/cdimage/archive/9.4.0/i386/iso-cd/debian-9.4.0-i386-netinst.iso

Both ISOs are stored in some directory on disk, where a subdirectory
./test_dir exists

  isodir=/dvdbuffer

Sometimes they get mounted at

  mounted=/mnt/iso

Run-time memory checker valgrind was applied by this prefix to the
program runs:

  valgrind --leak-check=full \

Now for the particular runnable programs.

example/example.c :

  example/extract "$isodir"/debian-9.4.0-i386-netinst.iso \
                  "$isodir"/test_dir/debian-9.4.0-i386-netinst 2>&1 | less

  mount "$isodir"/debian-9.4.0-i386-netinst.iso "$mounted"
  # Symbolic links will be missing because example/extract prefers
  # Joliet unconditionally.
  diff -r "$isodir"/test_dir/debian-9.4.0-i386-netinst "$mounted" 2>&1 | less
  umount "$mounted"
  /bin/rm -r "$isodir"/test_dir/debian-9.4.0-i386-netinst

  example/extract "$isodir"/multi_extent_8k.iso \
                   "$isodir"/test_dir/multi_extent_8k 2>&1 | less
  mount "$isodir"/multi_extent_8k.iso "$mounted"
  # No difference should show up.
  diff -r "$isodir"/test_dir/multi_extent_8k "$mounted" 2>&1 | less
  umount "$mounted"
  /bin/rm -r "$isodir"/test_dir/multi_extent_8k


example/isofile.c :

  example/isofile "$isodir"/debian-9.4.0-i386-netinst.iso \
                  /.disk/mkisofs "$isodir"/test_dir/disk_mkisofs 2>&1 | less
  # Should bear the lengthy xorriso command by which the ISO was made
  view "$isodir"/test_dir/disk_mkisofs


example/isofile2.c :

  # Put debian-9.4.0-i386-netinst onto DVD+RW
  # (On Linux kernel: Eject and load after burning, so the kernel learns about
  #  the new state of medium and drive.)

  example/isofile2 /dev/sr4 md5sum.txt 2>&1 | less

  # Should bear a long list of MD5 sums and the file
  view md5sum.txt
  rm md5sum.txt


example/isolist.c :

  example/isolist "$isodir"/multi_extent_8k.iso 2>&1 | less
  example/isolist "$isodir"/debian-9.4.0-i386-netinst.iso 2>&1 | less


src/iso-read.c :

  # (valgrind reports memory leaks about function parse_options())
  src/iso-read "$isodir"/multi_extent_8k.iso \
               -e /multi_extent_file \
               -o "$isodir"/test_dir/multi_extent_file

  mount "$isodir"/multi_extent_8k.iso "$mounted"
  diff "$isodir"/test_dir/multi_extent_file "$mounted"/multi_extent_file \
        2>&1 | less
  umount "$mounted"
  rm "$isodir"/test_dir/multi_extent_file


src/cd-info.c :

  src/cd-info  -C /dev/sr4 --iso9660 --dvd 2>&1 | less


src/iso-info.c :

  src/iso-info -i "$isodir"/debian-9.4.0-i386-netinst.iso \
               --iso9660  2>&1 | less


test/testisocd2.c :

  test/testisocd2 2>&1 | less


test/testisocd_joliet.c :

  test/testisocd_joliet 2>&1 | less


test/testisocd.c :

  # Put DVD+RW into drive with highest number (e.g. /dev/sr4)

  (cd test ; ./testisocd ) 2>&1 | less


---------------------------------------------------------------------------

Have a nice day :)

Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]