[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Libcdio-devel] [RFC] New API iso9660_statv2_t as API/ABI compatible way
From: |
Thomas Schmitt |
Subject: |
[Libcdio-devel] [RFC] New API iso9660_statv2_t as API/ABI compatible way to read files >= 4 GiB |
Date: |
Fri, 06 Jul 2018 20:09:19 +0200 |
Hi,
i have pushed the very large commit 87f5053 to branch ts-multiextent :
New API around iso9660_statv2_t as successor of iso9660_stat_t for reading
data files of 4 GiB or larger, while keeping old API and ABI valid.
http://git.savannah.gnu.org/cgit/libcdio.git/commit/?h=ts-multiextent&id=87f50538ea360c3e533a4bbee8d8e1b9feba4131
Before jumping on the changeset, reviewers should read the following
introduction to problem and proposed solution.
--------------------------------------------------------------------------
The lack of multi-extent support in iso9660_stat_t led to the development
of a successor type: iso9660_statv2_t
That's because any ABI compatible extension of iso9660_stat_t is prevented
by its last member, an array of variable size, and by the fact that the
struct behind iso9660_stat_t is public.
So i present iso9660_statv2_t for approval or rejection in branch
ts-multiextent
http://git.savannah.gnu.org/cgit/libcdio.git/log/?h=ts-multiextent
The new API is based on the work of Pete Batard in branch
pbatard-multiextent2
where he implemented the proper assessment of multi-extent files while
reading ISO 9660 directory records. His proposal is much more concise than
iso9660_statv2_t.
I see only two problems with his proposal:
* It breaks the ABI.
* It will break the ABI again if ever the maximum number of extents
ISO_MAX_MULTIEXTENT shall be incresed from 8 to a higher number.
It is unattractive to choose a large value already now, because each
extent slot costs 8 byte and iso9660_stat_t can have many instances.
(This could be fixed without a new API by cherrypicking parts of my
proposal. I will probably do, if iso9660_statv2_t gets rejected.)
It turned out that avoiding thee two ABI problems implies to stirr up
a lot of code.
Pete is officially allowed to laugh now.
--------------------------------------------------------------------------
Other than its predecessor the entrails of iso9660_statv2_t are completely
private and accessible only via API functions. This gives hope that future
changes will not cause API or ABI problems.
iso9660_stat_free and all API functions which return iso9660_stat_t are
now deprececated. Each of them has a successor function:
iso9660_fs_stat -> iso9660_fs_statv2
iso9660_fs_stat_translate -> iso9660_fs_statv2_translate
iso9660_fs_readdir -> iso9660_fs_readdir_v2
iso9660_ifs_stat -> iso9660_ifs_statv2
iso9660_ifs_stat_translate -> iso9660_ifs_statv2_translate
iso9660_ifs_readdir -> iso9660_ifs_readdir_v2
iso9660_fs_find_lsn -> iso9660_fs_find_lsn_v2
iso9660_fs_find_lsn_with_path -> iso9660_fs_find_lsn_with_path_v2
iso9660_ifs_find_lsn -> iso9660_ifs_find_lsn_v2
iso9660_ifs_find_lsn_with_path -> iso9660_ifs_find_lsn_with_path_v2
iso9660_stat_free -> iso9660_statv2_free
Further the list type for iso9660_stat_t members and its methods have
successors:
CdioISO9660FileList_t -> CdioISO9660FileListV2_t
iso9660_filelist_new -> iso9660_filelist_new_v2
iso9660_filelist_free -> iso9660_filelist_free_v2
The new iso9660_statv2_t has access methods for its struct members:
iso9660_statv2_get_rr
iso9660_statv2_get_tm
iso9660_statv2_get_total_size
iso9660_statv2_get_extents
iso9660_statv2_get_xa
iso9660_statv2_get_type
iso9660_statv2_get_filename
and a convenience function to obtain a legacy iso9660_stat_t:
iso9660_statv2_get_v1
--------------------------------------------------------------------------
The old API around iso9660_stat_t is still provided without change.
Each application of libiso9660 which wants to access data files of 4 GiB
or larger will have to make the transition by own effort.
See example/extract.c for the alternative usage of both APIs.
There are two scenarios for adaption of applications to new iso9660_statv2_t:
* Many direct accesses to members of iso9660_stat_t.
* Few such accesses and mainly file data content reading.
--------------------------------------------------------------------------
If the application has many direct accesses to members of iso9660_stat_t,
then continue to use the pointer as a convenience object.
Program example/extract.c demonstrates this in the #else cases of #ifdef
LIBISO9660_USE_LEGACY_STAT.
* Let us assume that the existing iso9660_stat_t is declared like this:
iso9660_stat_t *p_stat;
* As its source of information introduce a new pointer to iso9660_statv2_t
or a list of such objects.
E.g.
+ iso9660_statv2_t *p_statv2;
* If you have lists of iso9660_stat_t, then change their type to the type
for the new iso9660_statv2_t elements.
E.g.
- CdioISO9660FileList_t *p_filelist;
+ CdioISO9660FileListV2_t *p_filelist;
* Change the old API calls to obtain the iso9660_stat_t object or a list
of such objects to the corresponding calls of the new API.
E.g.
- p_stat = iso9660_fs_stat(...);
+ p_statv2 = iso9660_fs_statv2(...);
- p_filelist = iso9660_fs_readdir(...);
+ p_filelist = iso9660_fs_readdir_v2(...);
* When using list elements, obtain the iso9660_statv2_t object.
E.g.
- p_stat = (iso9660_stat_t *) _cdio_list_node_data();
+ p_statv2 = (iso9660_statv2_t *) _cdio_list_node_data();
* Obtain the convenience iso9660_stat_t from the iso9660_statv2_t by
call iso9660_statv2_get_v1().
E.g.
+ p_stat = iso9660_statv2_get_v1(p_statv2);
* Now use the convenience iso9660_stat_t for all purposes except for
data file content.
E.g.
strcpy(name, p_stat.filename);
* For data file content use the iso9660_statv2_t object.
E.g.
+ uint32_t num_extents;
+ iso9660_extent_descr_t *extents
+ total_size = iso9660_statv2_get_total_size(p_statv2);
+ num_extents = iso9660_statv2_get_extents(p_statv2, &extents);
The content of a datafile is comprised of the byte intervals which
are described by iso9660_extent_descr_t:
* iso9660_extent_descr_t.lsn is the block address
* iso9660_extent_descr_t.size is the byte count of the interval
All those intervals concatenated constitute the data file content.
So you will have to loop over num_extents when reading data.
See exaple/extract.c .
* Replace iso9660_stat_t.secsize by a macro usage for each extent.
E.g.
- blocks = p_stat.secsize;
+ blocks = CDIO_EXTENT_BLOCKS(extents[index].size);
* Finally, wherever the derived iso9660_stat_t is disposed, dispose the
iso9660_statv2_t object, too:
+ iso9660_statv2_free(p_statv2);
iso9660_stat_free(p_stat);
--------------------------------------------------------------------------
If only few direct accesses to members of iso9660_stat_t occur, then
it is unnecessary to have a convenience iso9660_stat_t.
Program example/isolist.c demonstrates this approach.
* Change the types from old to new
- iso9660_stat_t *p_stat
+ iso9660_statv2_t *p_stat
- CdioISO9660FileList_t *p_filelist;
+ CdioISO9660FileListV2_t *p_filelist;
* Change the old API calls to obtain the iso9660_stat_t object or a list
of such objects to the corresponding calls of the new API.
E.g.
- p_stat = iso9660_fs_stat(...);
+ p_stat = iso9660_fs_statv2(...);
- p_filelist = iso9660_fs_readdir(...);
+ p_filelist = iso9660_fs_readdir_v2(...);
* When using list elements, obtain the iso9660_statv2_t object.
E.g.
- p_stat = (iso9660_stat_t *) _cdio_list_node_data();
+ p_stat = (iso9660_statv2_t *) _cdio_list_node_data();
* Use the access methods of iso9660_statv2 rather than direct access to
members of iso9660_stat_t. (For member .secsize, see below.)
E.g.
- strcpy(name, p_stat.filename);
+ strcpy(name, iso9660_statv2_get_filename(p_stat));
* Adapt to the new way to learn about data file content.
E.g.
+ uint64_t total_size;
+ uint32_t num_extents;
+ iso9660_extent_descr_t *extents
+ total_size = iso9660_statv2_get_total_size(p_stat);
+ num_extents = iso9660_statv2_get_extents(p_stat, &extents);
The content of a datafile is comprised of the byte intervals which
are described by iso9660_extent_descr_t:
* iso9660_extent_descr_t[index].lsn is the block address
* iso9660_extent_descr_t[index].size is the byte count of the interval
All those intervals concatenated constitute the data file content.
So you will have to loop over num_extents when reading data.
index ranges from 0 to num_extents - 1.
See exaple/extract.c .
* Replace iso9660_stat_t.secsize by a macro usage for each extent.
E.g.
- blocks = p_stat.secsize;
+ blocks = CDIO_EXTENT_BLOCKS(extents[index].size);
* Finally use the new disposal call for the iso9660_statv2_t object:
- iso9660_stat_free(p_stat);
+ iso9660_statv2_free(p_stat);
--------------------------------------------------------------------------
There are minimal runtime precautions to detect mismatch of type and
API function.
Typical messages are:
cdio 3 message: Probable programming error with using
iso9660_statv2_get_extents : Detected non-iso9660_statv2_t
cdio ASSERT message: file iso9660_fs.c: line 2132
(iso9660_statv2_get_extents): assertion failed: (_iso9660_demand_statv2(p_stat,
"iso9660_statv2_get_extents"))
or
!ASSERT: file iso9660_fs.c: line 2236 (iso9660_stat_free): assertion failed:
(_iso9660_demand_stat(p_stat, "iso9660_stat_free"))
The names "_iso9660_demand_statv2" or "_iso9660_demand_stat" tell what
type was expected as parameter.
==========================================================================
Implementation report:
* The expansion of API struct iso9660_stat_s by Pete Batard was reverted
in include/cdio/iso9660.h .
* The new struct iso9660_statv2_s is defined in lib/iso9660/iso9660_fs.c.
In include/cdio/iso9660.h there are only opaque declarations:
struct iso9660_statv2_s;
typedef struct iso9660_statv2_s iso9660_statv2_t;
* All feature implementations in iso9660_fs.c were converted to the new
type iso9660_statv2_t. The old API functions are now frontends to the
new ones, which derive their resuling iso9660_stat_t from the
iso9660_statv2_t which they obtain from the implementations.
* Creation and cloning of iso9660_statv2_t are now delegated to private
methods: iso9660_statv2_new(), iso9660_statv2_clone().
* Initialization, cloning and disposal of entrails of iso_rock_statbuf_t
are now delegated to semi-public functions in lib/iso9660/rock.c:
iso9660_rock_statbuf_init(), iso9660_rock_statbuf_clone_entrails(),
iso9660_rock_statbuf_free_entrails().
* Semi-public functions of lib/iso9660/rock.c which have iso9660_stat_s
as parameter have been accompanied by equivalent functions which have
iso_rock_statbuf_t instead:
int get_rock_ridge_filename_rr(iso9660_dir_t * p_iso9660_dir,
/*out*/ char * psz_name,
/*in/out*/ iso_rock_statbuf_t *p_rr);
int parse_rock_ridge_stat_rr(iso9660_dir_t *p_iso9660_dir,
/*out*/ iso_rock_statbuf_t *p_rr);
The old functions are not used by libcdio any more but they are declared
in include/cdio/rock.h. So they still exist as frontends to the new
functions.
* The definition of bool_3way_t in include/cdio/types.h got a comment
which warns that the type may not get a value 3 or else breaks the runtime
safety check which shall detect bad combinations of old and new types
with API calls.
* Examples, tests, and src/ programs have been adapted to use the new API:
example/extract.c (shows usage of both APIs)
example/isofile.c
example/isofile2.c
example/isolist.c
src/iso-read.c
src/util.c
src/cd-info.c
src/iso-info.c
test/testisocd2.c
test/testisocd_joliet.c
* One test was braought back to pure legacy iso9660_stat_t use (i.e. i
reverted Pete Batard's changes to it).
Just to surely have such an application at compile time:
test/testisocd.c
* Fixed wrong description of CdioList_t in iso9660.h. (Its destructor
implementation iso9660_dirlist_free() submits free(3) as member destructor
function. Its users inside libcdio submit text strings as members.
External users of legacy iso9660_stat_t will suffer no memory leaks as
long as only directories are submitted to the list. With iso9660_stat2_t
there would be substantial memory leaking.)
--------------------------------------------------------------------------
Open decisions
* Legacy iso9660_stat_s has an anonymous enum for the file type.
It is not possible to use the same enum names in another enum in successor
iso9660_statv2_s.
Alternative solutions are:
- Define and use a named enum and thus change the souce code of the API
definition.
To my understanding the change is compatible. But that is a bet on
the plausibility of compiler behavior. At least i cannot give hard
reasons why this must stay API and ABI compatible under all circumstances.
Define ISO9660_WITH_FILE_TYPE_ENUM to activate this alternative.
- Simulate in iso9660_statv2_s the anonymous enum by some integer-ish type
which bears the same values as publicly defined by the anonymous enum.
This is a dirty hack, but leaves the legacy API literally unchanged.
Do NOT define ISO9660_WITH_FILE_TYPE_ENUM to activate this alternative.
This is currently enabled.
* Where to declare semi-public methods of iso_rock_statbuf_t ?
I introduced three calls by which iso9660_fs.c can avoid to dig in the
entrails of iso_rock_statbuf_t. Their prototypes should be exposed to
lib/iso9660/iso9660_fs.c
lib/iso9660/rock.c
but not to the API.
So i can hardly declare them in
include/cdio/rock.h
--------------------------------------------------------------------------
Test runs on GNU/Linux
I configured libcdio with --disable-shared in order to get it ready for
valgrind (and for gdb, too).
The ISOs used are available as
http://scdbackup.webframe.org/multi_extent_8k.iso.gz
MD5 c2c57fa0bc428f6032b33f61ceae8bef
(use gzip to uncompress)
https://cdimage.debian.org/debian-cd/current/i386/iso-cd/debian-9.4.0-i386-netinst.iso
MD5 a9a477d5cd311635d502c4ab746743d3
After next Debian point release it will be in the archive:
http://cdimage.debian.org/mirror/cdimage/archive/9.4.0/i386/iso-cd/debian-9.4.0-i386-netinst.iso
Both ISOs are stored in some directory on disk, where a subdirectory
./test_dir exists
isodir=/dvdbuffer
Sometimes they get mounted at
mounted=/mnt/iso
Run-time memory checker valgrind was applied by this prefix to the
program runs:
valgrind --leak-check=full \
Now for the particular runnable programs.
example/example.c :
example/extract "$isodir"/debian-9.4.0-i386-netinst.iso \
"$isodir"/test_dir/debian-9.4.0-i386-netinst 2>&1 | less
mount "$isodir"/debian-9.4.0-i386-netinst.iso "$mounted"
# Symbolic links will be missing because example/extract prefers
# Joliet unconditionally.
diff -r "$isodir"/test_dir/debian-9.4.0-i386-netinst "$mounted" 2>&1 | less
umount "$mounted"
/bin/rm -r "$isodir"/test_dir/debian-9.4.0-i386-netinst
example/extract "$isodir"/multi_extent_8k.iso \
"$isodir"/test_dir/multi_extent_8k 2>&1 | less
mount "$isodir"/multi_extent_8k.iso "$mounted"
# No difference should show up.
diff -r "$isodir"/test_dir/multi_extent_8k "$mounted" 2>&1 | less
umount "$mounted"
/bin/rm -r "$isodir"/test_dir/multi_extent_8k
example/isofile.c :
example/isofile "$isodir"/debian-9.4.0-i386-netinst.iso \
/.disk/mkisofs "$isodir"/test_dir/disk_mkisofs 2>&1 | less
# Should bear the lengthy xorriso command by which the ISO was made
view "$isodir"/test_dir/disk_mkisofs
example/isofile2.c :
# Put debian-9.4.0-i386-netinst onto DVD+RW
# (On Linux kernel: Eject and load after burning, so the kernel learns about
# the new state of medium and drive.)
example/isofile2 /dev/sr4 md5sum.txt 2>&1 | less
# Should bear a long list of MD5 sums and the file
view md5sum.txt
rm md5sum.txt
example/isolist.c :
example/isolist "$isodir"/multi_extent_8k.iso 2>&1 | less
example/isolist "$isodir"/debian-9.4.0-i386-netinst.iso 2>&1 | less
src/iso-read.c :
# (valgrind reports memory leaks about function parse_options())
src/iso-read "$isodir"/multi_extent_8k.iso \
-e /multi_extent_file \
-o "$isodir"/test_dir/multi_extent_file
mount "$isodir"/multi_extent_8k.iso "$mounted"
diff "$isodir"/test_dir/multi_extent_file "$mounted"/multi_extent_file \
2>&1 | less
umount "$mounted"
rm "$isodir"/test_dir/multi_extent_file
src/cd-info.c :
src/cd-info -C /dev/sr4 --iso9660 --dvd 2>&1 | less
src/iso-info.c :
src/iso-info -i "$isodir"/debian-9.4.0-i386-netinst.iso \
--iso9660 2>&1 | less
test/testisocd2.c :
test/testisocd2 2>&1 | less
test/testisocd_joliet.c :
test/testisocd_joliet 2>&1 | less
test/testisocd.c :
# Put DVD+RW into drive with highest number (e.g. /dev/sr4)
(cd test ; ./testisocd ) 2>&1 | less
---------------------------------------------------------------------------
Have a nice day :)
Thomas