bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnupedia] Re: Classification difficulty and incompleteness


From: <address@hidden>
Subject: [Bug-gnupedia] Re: Classification difficulty and incompleteness
Date: Thu, 18 Jan 2001 10:41:45 +1300

> Good stuff ... but we need some 'dotted' classification system such as:
> 
> article.science.biology.genetics.human.gene ... ala Dewey Decimal ... so we
> can do effective searches.
> 
> Could be pull down menus on the submission site etc...
> 
> I also think we want a user-feedback system to correct bad classifications
> and even (pray tell) rate articles for usefullness etc...

Classification has been studied in library science since Alexandria. What it 
comes down to is that classifications are ontologies---sets of assumptions we 
make about the how the world works the relative importance of the different 
parts of it.

Classification is incomplete in the mathematical sense, and it is unclear 
whether all documents should be classified [see below for explanations of 
these points]. What has been found to work best is:

*) the ability to assign multiple subjects to a document, so a document can be 
both science.biology.genetics.human and ethics.biology
*) separation of of subject from format (so films, biographies, articles on a 
topic can be found in the same place.
*) using multiple classification schemes, preferably ones known to and 
understood by the users (this means LoC and Dewey mainly).
*) pointers from one category into another (exemplified by the Yahoo system)

What has been found not to work is:

*) X.X.general categories as in the ACM classification system.
*) numbering the categories as in both the LoC and Dewey system (the Dewey 
system has as much space for Christian material as for all other religious 
material).
*) unchanging labels on categories. 50 years ago automobiles was a fine name 
for a category, the LoC system is still uses automobile where car would be 
much better.

So what should we use? Personally I believe we should classify using the Dewey 
Decimal and LoC systems in parallel, both with generous number of cross 
references. When we have enough articles on a topic (physics, computers etc) 
to justify it we can also include subject-specific classification systems.

What the world doesn't need is another ad-hoc classification system.

stuart

An Undecidable Classification Problem
=====================================

Consider a digital library classification scheme that denoted whether a 
document used humor, and further, whether or not the humor was funny. Consider 
an author writing a piece of humor which relied entirely for it's humor on 
being classified as being not funny. If classified as funny, the humor fails 
and the document is mis-classified; if classified as not funny, the humor 
succeeds and the document is mis-classified. Either way, the document is 
mis-classified.

Such classification schemes exist and are useful in the real world---consider 
for example the newsgroup rec.humour.funny , a moderated newsgroup which tries 
to carry only `funny' humour. Pathological jokes have been been attempted (by 
myself) and submitted, but without response from the moderators (who must 
judge the humour of the joke).

It was suggested that this apparent paradox can be resolved because the joke 
is impossible to construct as it contains an internal paradox (i.e. it's only 
true when it's false). The problem with this argument is that jokes are a 
literary form which has no requirement internal consistency, indeed many 
famous examples (much of Lewis Carroll's works for example) contain many 
internal contradictions.

Should all documents be classified?
===================================

Consider a new document that is sufficiently metaphorical and allusionary that 
it could be about anything (something like the prophecies on Nostradamus). Any 
assignment of subject classification by a classifier to the document instantly 
places that subject at the forefront of a readers mind when interpreting the 
book, thus the classifier biases all subsequent readers of the document.


--    stuart yeates <address@hidden> aka `loam'
"Oh, havoc," cried Pooh, as he let slip the heffalumps of war.
X-no-archive:yes




reply via email to

[Prev in Thread] Current Thread [Next in Thread]