savannah-register-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-register-public] [task #10138] Submission of Swiss army knife


From: Tong Sun
Subject: [Savannah-register-public] [task #10138] Submission of Swiss army knife of file signature checksum tool
Date: Mon, 01 Feb 2010 17:56:16 +0000
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091123 Iceweasel/3.5.5 (like Firefox/3.5.5; Debian-3.5.5-1)

URL:
  <http://savannah.gnu.org/task/?10138>

                 Summary: Submission of Swiss army knife of file signature
checksum tool
                 Project: Savannah Administration
            Submitted by: xpt
            Submitted on: Mon 01 Feb 2010 12:56:16 PM EST
         Should Start On: Mon 01 Feb 2010 12:00:00 AM EST
   Should be Finished on: Thu 11 Feb 2010 12:00:00 AM EST
                Category: Project Approval
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
        Percent Complete: 0%
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________

Details:

A new project has been registered at Savannah 
This project account will remain inactive until a site admin approves or
discards the registration.


= Registration Administration =

While this item will be useful to track the registration process, *approving
or discarding the registration must be done using the specific Group
Administration
<https://savannah.gnu.org/siteadmin/groupedit.php?group_id=10450> page*,
accessible only to site administrators, effectively *logged as site
administrators* (superuser):

* Group Administration
<https://savannah.gnu.org/siteadmin/groupedit.php?group_id=10450>


= Registration Details =

* Name: *Swiss army knife of file signature checksum tool*
* System Name:  *checksum*
* Type: non-GNU software & documentation
* License: Modified BSD License

----

==== Description: ====
checksum is a versatile checksum creator and verifier. It support crc32 and
md5 algorithms. It can compute/verify signatures from files (with names given
from either commandline or stdin pipe), from given directory or text argument
from commandline. It understand/verify common file formats like .sfv or .md5.


Most importantly, it can remove duplicated files, i.e., identify distinct
files with identical content within the given directory using the file
signatures, and link them together to save disk space. 

programming language: C

What is special about it?

There is only one tools written in C to check and remove duplicated files --
the dupmerge, but I found its algorithm inefficient and slow:

,-----
| Dupmerge works by quicksorting a list of path names, with the
| actual unlinking and relinking steps performed as side effects of
| the comparison function. The results of the sort are discarded.
`-----

The algorithm is inefficient and slow because:

- File *contents* are compared *each time* in the O(n*ln(n)) comparison. 
- It doesn't provide an option to only check same-name files to speed up the
process. 
Ref:
https://sourceforge.net/projects/dupmerge/forums/forum/427960/topic/3536512
- My approach will not only omit file-contents comparison each time, but also
bring down the complexity from O(n*ln(n)) to near linear comparison, thanks to
the introduced file signature.
- The other advantage over dupmerge is that, using quicksort, the file list
has to be store in memory, which imposes a barrier to deal with a big number
of files with small memory. My checksum does not has such problem at all. 




==== Other Software Required: ====
MD5 message-digest algorithm, RSA Data Security, Inc.
=====================================================

 Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All
rights reserved.

License to copy and use this software is granted provided that it
is identified as the "RSA Data Security, Inc. MD5 Message-Digest
Algorithm" in all material mentioning or referencing this software
or this function.

License is also granted to make and use derivative works provided
that such works are identified as "derived from the RSA Data
Security, Inc. MD5 Message-Digest Algorithm" in all material
mentioning or referencing the derived work.

RSA Data Security, Inc. makes no representations concerning either
the merchantability of this software or the suitability of this
software for any particular purpose. It is provided "as is"
without express or implied warranty of any kind.

These notices must be retained in any copies of any part of this
documentation and/or software.


32-bit CRC, ANSI X3.66
======================

The computing of the 32-bit CRC is used as the frame check sequence in ADCCP
(ANSI X3.66, also known as FIPS PUB 71 and FED-STD-1003, the U.S. versions
of CCITT's X.25 link-level protocol).  The 32-bit FCS was added via the
Federal Register, 1 June 1982, p.23798.




==== Other Comments: ====
Half way done. No tarball to upload. Need the git to keep dev versions. The
uploaded tarball is just to satisfy the project registration. 



==== Tarball URL: ====
http://savannah.gnu.org/submissions_uploads/checksum.tgz






    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/task/?10138>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]