[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[GNUnet-SVN] r9997 - Extractor-docs/WWW/man
From: |
gnunet |
Subject: |
[GNUnet-SVN] r9997 - Extractor-docs/WWW/man |
Date: |
Wed, 13 Jan 2010 17:25:37 +0100 |
Author: grothoff
Date: 2010-01-13 17:25:37 +0100 (Wed, 13 Jan 2010)
New Revision: 9997
Modified:
Extractor-docs/WWW/man/extract.html
Extractor-docs/WWW/man/libextractor.html
Log:
man2html update
Modified: Extractor-docs/WWW/man/extract.html
===================================================================
--- Extractor-docs/WWW/man/extract.html 2010-01-13 16:22:19 UTC (rev 9996)
+++ Extractor-docs/WWW/man/extract.html 2010-01-13 16:25:37 UTC (rev 9997)
@@ -2,7 +2,7 @@
<HTML><HEAD><TITLE>Man page of EXTRACT</TITLE>
</HEAD><BODY>
<H1>EXTRACT</H1>
-Section: User Commands (1)<BR>Updated: April 28, 2005<BR><A
HREF="#index">Index</A>
+Section: User Commands (1)<BR>Updated: Dec 20, 2009<BR><A
HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
@@ -17,20 +17,18 @@
<B>extract</B>
[
-<B>-abdfhLnrsvV</B>
+<B>-bghLnvV</B>
]
[
-<B>-B</B>
+<B>-H</B>
-<I>language</I>
+<I>hash-algorithm</I>
]
[
-<B>-H</B>
+<B>-i</B>
-<I>hash-algorithm</I>
-
]
[
<B>-l</B>
@@ -39,7 +37,7 @@
]
[
-<B>-p </B>
+<B>-p</B>
<I>type</I>
@@ -58,59 +56,48 @@
<A NAME="lbAD"> </A>
<H2>DESCRIPTION</H2>
-This manual page documents version 0.5.11 of the
-<B>extract </B>
+This manual page documents version 0.6.0 of the
+<B>extract</B>
command.
<P>
<B>extract</B>
-tests each file specified in the argument list in an attempt to infer
meta-information from it. Each file is subjected to the meta-data extraction
libraries from
-<I>libextractor. </I>
+tests each file specified in the argument list in an attempt to infer
meta-information from it. Each file is subjected to the meta-data extraction
libraries from
+<I>libextractor.</I>
<P>
-libextractor classifies meta-information (also referred to as keywords) into
types. A list of all types can be obtained with the
-<B>-L </B>
+libextractor classifies meta-information (also referred to as keywords) into
types. A list of all types can be obtained with the
+<B>-L</B>
-option.
+option.
<P>
<A NAME="lbAE"> </A>
<H2>OPTIONS</H2>
<DL COMPACT>
-<DT><B>-a</B>
-
-<DD>
-Do not remove any duplicates, even if the keywords match exactly and have the
same type (i.e. because the same keyword was found by different extractor
libraries).
<DT><B>-b</B>
<DD>
-Display the output in BiBTeX format. This implies the
-<B>-d </B>
+Display the output in BiBTeX format.
+<DT><B>-g</B>
-option
-<DT><B>-B LANG</B>
-
<DD>
-Use the generic plaintext extractor for the language with the 2-letter
language code LANG. Supported languages are DA (Danish), DE (German), EN
(English), ES (Spanish), FI (Finnish), FR (French), GA (Gaelic), IT (Italian),
NO (Norwegian) and SV (Swedish).
-<DT><B>-d</B>
-
-<DD>
-Remove duplicates only if the types match exactly. By default, duplicates are
removed if the types match or if one of the types is I unknown (in this case,
the duplicate of unknown type is removed).
-<DT><B>-f</B>
-
-<DD>
-add the filename(s) (without directory) to the list of keywords.
+Use grep-friendly output (all keywords on a single line for each file). Use
the verbose option to print the filename first, followed by the keywords. Use
the verbose option twice to also display the keyword types. This option will
not print keyword types or non-textual metadata.
<DT><B>-h</B>
<DD>
Print a brief summary of the options.
-<DT><B>-H ALGORITHM</B>
+<DT><B>-i</B>
<DD>
-Use the ALGORITHM to compute a hash of each file (possible algorithms are sha1
and md5).
+Run plugins in-process (for debugging). By default, each plugin is run in its
own process.
+<DT><B>-l</B><I> libraries</I>
+
+<DD>
+Use the specified libraries to extract keywords. The general format of
libraries is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*] where LIBRARYNAME is a
libextractor compatible library and typically of the form .Ijpeg. The minus
before the libraryname indicates that this library should be removed from the
existing list. To run only a few selected plugins, use -l in combination with
-n.
<DT><B>-L</B>
<DD>
@@ -119,14 +106,10 @@
<DD>
Do not use the default set of extractors (typically all standard extractors,
currently mp3, ogg, jpg, gif, png, tiff, real, html, pdf and mime-types), use
only the extractors specified with the .B -l option.
-<DT><B>-r</B>
+<DT><B>-p type</B>
<DD>
-Remove all duplicates disregarding differences in the keyword type.
-<DT><B>-s</B>
-
-<DD>
-Split keywords at delimiters (space, comma, colon, etc.) and list split
keywords to be of .I unknown type. This can also be done by loading the
split-library. Using this option guarantees that the splitting is performed
after all other libraries have been run. It is always performed before
duplicate elimination.
+Print only the keywords matching the specified type. By default, all keywords
that are found and not removed as duplicates are printed.
<DT><B>-v</B>
<DD>
@@ -134,28 +117,16 @@
<DT><B>-V</B>
<DD>
-Be verbose.
-<DT><B>-B</B>
+Be verbose. This option can be specified multiple times to increase verbosity
further.
+<DT><I>-x type</I>
<DD>
-Run the printable extractor (costly, generic extractor for binaries)
-<DT><B>-l</B><I> libraries</I>
-
-<DD>
-Use the specified libraries to extract keywords. The general format of
libraries is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*] where LIBRARYNAME is a
libextractor compatible library and typically of the form .I
libextractor_jpeg.so. The minus before the libraryname indicates that this
library should be run after all the libraries that were specified so far. If
the minus is missing, the library is run before all previously specified
libraries.
-<DT><B>-p</B><I> type</I>
-
-<DD>
-Print only the keywords matching the specified type. By default, all keywords
that are found and not removed as duplicates are printed.
-<DT><B>-x</B><I> type</I>
-
-<DD>
Exclude keywords of the specified type from the output. By default, all
keywords that are found and not removed as duplicates are printed.
</DL>
<A NAME="lbAF"> </A>
<H2>SEE ALSO</H2>
-<B><A HREF="libextractor.html">libextractor</A></B>(3)
+<B><A HREF="/cgi-bin/man/man2html?3+libextractor">libextractor</A></B>(3)
- description of the libextractor library
<BR>
@@ -168,15 +139,14 @@
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
mimetype - image/jpeg
-$ extract -Vf -x comment test/test.jpg
+$ extract -V -x comment test/test.jpg
Keywords for file test/test.jpg:
mimetype - image/jpeg
-filename - test.jpg
$ extract -p comment test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
-$ extract -nV -l libextractor_png.so -p comment test/test.jpg test/test.png
+$ extract -nV -l png.so -p comment test/test.jpg test/test.png
Keywords for file test/test.jpg:
Keywords for file test/test.png:
comment - Testing keyword extraction
@@ -184,7 +154,7 @@
</PRE><A NAME="lbAH"> </A>
<H2>LEGAL NOTICE</H2>
-libextractor and the extract tool are released under the GPL. libextractor is
a GNU project.
+libextractor and the extract tool are released under the GPL. libextractor is
a GNU package.
<P>
<A NAME="lbAI"> </A>
<H2>BUGS</H2>
@@ -196,15 +166,12 @@
<B>extract</B>
-was originally written by Christian Grothoff <<A
HREF="mailto:address@hidden">address@hidden</A>> and
-Vidyut Samanta <<A HREF="mailto:address@hidden">address@hidden</A>>. Use
<<A HREF="mailto:address@hidden">address@hidden</A>>
-to contact the current maintainer(s).
+was originally written by Christian Grothoff <<A
HREF="mailto:address@hidden">address@hidden</A>> and Vidyut Samanta <<A
HREF="mailto:address@hidden">address@hidden</A>>. Use <<A
HREF="mailto:address@hidden">address@hidden</A>> to contact the current
maintainer(s).
<P>
<A NAME="lbAK"> </A>
<H2>AVAILABILITY</H2>
-You can obtain the original author's latest version from
-<A HREF="http://gnunet.org/libextractor/">http://gnunet.org/libextractor/</A>
+You can obtain the original author's latest version from <A
HREF="http://www.gnu.org/software/libextractor/">http://www.gnu.org/software/libextractor/</A>
<P>
<HR>
@@ -222,6 +189,9 @@
<DT><A HREF="#lbAK">AVAILABILITY</A><DD>
</DL>
<HR>
-Time: 16:47:24 GMT, April 26, 2006
+This document was created by
+<A HREF="/cgi-bin/man/man2html">man2html</A>,
+using the manual pages.<BR>
+Time: 16:18:18 GMT, January 13, 2010
</BODY>
</HTML>
Modified: Extractor-docs/WWW/man/libextractor.html
===================================================================
--- Extractor-docs/WWW/man/libextractor.html 2010-01-13 16:22:19 UTC (rev
9996)
+++ Extractor-docs/WWW/man/libextractor.html 2010-01-13 16:25:37 UTC (rev
9997)
@@ -2,90 +2,76 @@
<HTML><HEAD><TITLE>Man page of LIBEXTRACTOR</TITLE>
</HEAD><BODY>
<H1>LIBEXTRACTOR</H1>
-Section: C Library Functions (3)<BR>Updated: Jul 14, 2005<BR><A
HREF="#index">Index</A>
+Section: C Library Functions (3)<BR>Updated: Dec 14, 2009<BR><A
HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB"> </A>
<H2>NAME</H2>
-libextractor - meta-information extraction library 0.5.11
+libextractor - meta-information extraction library 0.6.0
<A NAME="lbAC"> </A>
<H2>SYNOPSIS</H2>
<P>
<B>#include <<A HREF="file:///usr/include/extractor.h">extractor.h</A>>
<P>
-<BR> typedef struct EXTRACTOR_Keywords {
-<BR> char * </B><I>keyword</I><B>;
-<BR> EXTRACTOR_KeywordType </B><I>keywordType</I><B>;
-<BR> struct EXTRACTOR_Keywords * </B><I>next</I><B>;
-<BR> } EXTRACTOR_KeywordList;FB
+const char *EXTRACTOR_metatype_to_string(enum EXTRACTOR_MetaType
</B><I>type</I><B>);
<P>
+const char *EXTRACTOR_metatype_to_description(enum EXTRACTOR_MetaType
</B><I>type</I><B>);
<P>
-<BR> EXTRACTOR_ExtractorList * EXTRACTOR_loadDefaultLibraries ();
+enum EXTRACTOR_MetaTypeEXTRACTOR_metatype_get_max (void);
<P>
-<BR> const char * EXTRACTOR_getKeywordTypeAsString (const EXTRACTOR_KeywordType </B><I>type</I><B>);
+struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_defaults(enum
EXTRACTOR_Options </B><I>flags</I><B>);
<P>
-<BR> EXTRACTOR_ExtractorList * EXTRACTOR_loadConfigLibraries (EXTRACTOR_ExtractorList * </B><I>prev</I><B>, const char * </B><I>config</I><B>);
+struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add (struct EXTRACTOR_PluginList
* </B><I>prev</I><B>, const char * </B><I>library</I><B>, const char *
</B><I>options</I><B>, enum EXTRACTOR_Options </B><I>flags</I><B>);
<P>
-<BR> EXTRACTOR_ExtractorList * EXTRACTOR_addLibrary (EXTRACTOR_ExtractorList * </B><I>prev</I><B>, const char * </B><I>library</I><B>);
<P>
-<BR> EXTRACTOR_ExtractorList * EXTRACTOR_addLibraryLast (EXTRACTOR_ExtractorList * </B><I>prev</I><B>, const char * </B><I>library</I><B>);
+struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_last(struct
EXTRACTOR_PluginList *</B><I>prev</I><B>, const char *</B><I>library</I><B>,
const char *</B><I>options</I><B>, enum EXTRACTOR_Options </B><I>flags</I><B>);
<P>
-<BR> EXTRACTOR_ExtractorList * EXTRACTOR_removeLibrary (EXTRACTOR_ExtractorList * </B><I>prev</I><B>, const char * </B><I>library</I><B>);
+struct EXTRACTOR_PluginList *EXTRACTOR_plugin_add_config (struct
EXTRACTOR_PluginList * </B><I>prev</I><B>, const char *</B><I>config</I><B>,
enum EXTRACTOR_Options </B><I>flags</I><B>);
+<TT> </TT><TT> </TT><BR>
+struct EXTRACTOR_PluginList *EXTRACTOR_plugin_remove(struct
EXTRACTOR_PluginList * </B><I>prev</I><B>, const char * </B><I>library</I><B>);
<P>
-<BR> void EXTRACTOR_removeAll (EXTRACTOR_ExtractorList * </B><I>prev</I><B>);
+void EXTRACTOR_plugin_remove_all(struct EXTRACTOR_PluginList
*</B><I>plugins</I><B>);
<P>
-<BR> EXTRACTOR_KeywordList * EXTRACTOR_getKeywords (EXTRACTOR_ExtractorList * </B><I>extractor</I><B>, const char * </B><I>filename</I><B>);
+void EXTRACTOR_extract(struct EXTRACTOR_PluginList *</B><I>plugins</I><B>,
const char *</B><I>filename</I><B>, const void *</B><I>data</I><B>, size_t
</B><I>size</I><B>, EXTRACTOR_MetaDataProcessor </B><I>proc</I><B>, void
*</B><I>proc_cls</I><B>);
<P>
-<BR> EXTRACTOR_KeywordList * EXTRACTOR_getKeywords (EXTRACTOR_ExtractorList * </B><I>extractor</I><B>, const char * </B><I>data</I><B>, size_t </B><I>size</I><B>);
+int EXTRACTOR_meta_data_print(void * </B><I>handle</I><B>, const char
*</B><I>plugin_name</I><B>, enum EXTRACTOR_MetaType </B><I>type</I><B>, enum
EXTRACTOR_MetaFormat </B><I>format</I><B>, const char
*</B><I>data_mime_type</I><B>, const char *</B><I>data</I><B>, size_t
</B><I>data_len</I><B>);
<P>
-<BR> EXTRACTOR_KeywordList * EXTRACTOR_removeEmptyKeywords (EXTRACTOR_KeywordList * </B><I>list</I><B>);
+EXTRACTOR_VERSION
<P>
-<BR> EXTRACTOR_KeywordList * EXTRACTOR_removeDuplicateKeywords (EXTRACTOR_KeywordList * </B><I>list</I><B>, const unsigned int </B><I>options</I><B>);
-<P>
-<BR> void EXTRACTOR_printKeywords (FILE * </B><I>handle</I><B>, EXTRACTOR_KeywordList * </B><I>keywords</I><B>);
-<P>
-<BR> void EXTRACTOR_freeKeywords (EXTRACTOR_KeywordList * </B><I>keywords</I><B>);
-<P>
-<BR> const char * EXTRACTOR_extractLast (const EXTRACTOR_KeywordType * </B><I>type</I><B>, EXTRACTOR_KeywordList * </B><I>keywords</I><B>);
-<P>
-<BR> const char * EXTRACTOR_extractLastByString (const char * </B><I>type</I><B>, EXTRACTOR_KeywordList * </B><I>keywords</I><B>);
-<P>
-<BR> unsigned int EXTRACTOR_countKeywords (EXTRACTOR_KeywordList * </B><I>keywords</I><B>);
-<P>
-<BR> EXTRACTOR_DEFAULT_LIBRARIES
-<P>
-<BR> EXTRACTOR_VERSION
-<P>
</B><A NAME="lbAD"> </A>
<H2>DESCRIPTION</H2>
+<P>
-libextractor is a simple library for keyword extraction. libExtractor does
not support all formats but supports a simple plugging mechanism such that you
can quickly add extractors for additional formats, even without recompiling
libExtractor. libExtractor typically ships with one or more helper-libraries
that can be used to obtain keywords from common file-types. If you want to
write your own extractor for some filetype, all you need to do is write a
little library that implements a single method with this signature:
+GNU libextractor is a simple library for keyword extraction. libextractor
does not support all formats but supports a simple plugging mechanism such that
you can quickly add extractors for additional formats, even without recompiling
libextractor. libextractor typically ships with dozens of plugins that can be
used to obtain meta data from common file-types. If you want to write your own
plugin for some filetype, all you need to do is write a little library that
implements a single method with this signature:
<P>
-<BR> <B>EXTRACTOR_KeywordList * LIBRARYNAME_extract(const char * </B><I>filename</I><B>,
-<BR> char * </B><I>data</I><B>,
-<BR> size_t </B><I>size</I><B>,
-<BR> EXTRACTOR_KeywordList * </B><I>prev</I><B>);
+<BR> <B>int EXTRACTOR_name_extract(const char *</B><I>data</I><B>, size_t </B><I>datasize</I><B>, EXTRACTOR_MetaDataProcessor </B><I>proc</I><B>, void *</B><I>proc_cls</I><B>, const char *</B><I>options</I><B>);
<P>
+<P>
-The filename is the name of the file, data is a pointer to the contents of the
file and size is the size of the file. The extract method must prepend
keywords that it finds to the linked list 'prev' and return the new head. The
library must allocate (malloc) the entry in the keyword list and the memory for
the filename since both will be free'ed by libExtractor once the application
calls freeKeywords. An example implementation can be found in
</B><I>mp3extractor.c</I>. The application extract gives an example how to use
libExtractor.
+Data is a pointer to the contents of the file and datasize is the size of
data. The extract method must call proc for meta data that it finds. The
interpretation of options is up to the plugin. The function should return 0 if
'proc' always returned 0, otherwise 1. After 'proc' returned a non-zero value,
proc should not be called again. An example implementation can be found in
</B><I>html_extractor.c</I>. Plugins should be automatically found and used
once they are installed in the respective directory (typically something like
/usr/lib/libextractor/).
<P>
-The basic use of libextractor is to load the plugins (for example with
<B>EXTRACTOR_loadDefaultLibraries</B>), then to extract the keyword list using
<B>EXTRACTOR_getKeywords</B>, processing the list (using application specific
code and possibly some of the postprocessing convenience functions like
<B>EXTRACTOR_removeDuplicateKeywords</B>), freeing the keyword list (using
<B>EXTRACTOR_freeKeywords</B>) and finally unloading the plugins (with
<B>EXTRACTOR_removeAll</B>).
+The application extract gives an example how to use libextractor.
+<P>
-The keywords obtained from libextractor are supposed to be UTF-8 encoded. The
EXTRACTOR_printKeywords function converts the UTF-8 keywords to the character
set from the current locale before printing them. Plugins are supposed to
convert meta-data to UTF-8 if necessary.
+The basic use of libextractor is to load the plugins (for example with
<B>EXTRACTOR_plugin_add_defaults</B>), then to extract the keyword list using
<B>EXTRACTOR_extract</B>, and finally unloading the plugins (with
<B>EXTRACTOR_plugin_remove_all</B>).
+<P>
+Textual meta data obtained from libextractor is supposed to be UTF-8 encoded
if the text encoding is known. Plugins are supposed to convert meta-data to
UTF-8 if necessary. The EXTRACTOR_meta_data_print function converts the
UTF-8 keywords to the character set from the current locale before printing
them.
+<P>
+
<A NAME="lbAE"> </A>
<H2>SEE ALSO</H2>
-<A HREF="extract.html">extract</A>(1)
+<A HREF="/cgi-bin/man/man2html?1+extract">extract</A>(1)
<P>
<A NAME="lbAF"> </A>
<H2>LEGAL NOTICE</H2>
-libextractor is released under the GPL and a GNU project (<A
HREF="http://www.gnu.org/).">http://www.gnu.org/).</A>
+libextractor is released under the GPL and a GNU package (<A
HREF="http://www.gnu.org/).">http://www.gnu.org/).</A>
<P>
<A NAME="lbAG"> </A>
<H2>BUGS</H2>
@@ -100,7 +86,7 @@
<A NAME="lbAI"> </A>
<H2>AVAILABILITY</H2>
-You can obtain the original author's latest version from <A
HREF="http://gnunet.org/libextractor/.">http://gnunet.org/libextractor/.</A>
+You can obtain the original author's latest version from <A
HREF="http://www.gnu.org/software/libextractor/.">http://www.gnu.org/software/libextractor/.</A>
<P>
<HR>
@@ -116,6 +102,9 @@
<DT><A HREF="#lbAI">AVAILABILITY</A><DD>
</DL>
<HR>
-Time: 16:47:32 GMT, April 26, 2006
+This document was created by
+<A HREF="/cgi-bin/man/man2html">man2html</A>,
+using the manual pages.<BR>
+Time: 16:18:28 GMT, January 13, 2010
</BODY>
</HTML>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [GNUnet-SVN] r9997 - Extractor-docs/WWW/man,
gnunet <=