[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: switch to XHTML
From: |
Pascal Bourguignon |
Subject: |
Re: switch to XHTML |
Date: |
Sat, 9 Mar 2002 02:55:54 +0100 (CET) |
> From: Pete French <pete@twisted.org.uk>
> Date: Fri, 08 Mar 2002 23:42:07 +0000
>
> > designed to be parsed by MILLIONS of machines, and produced by THOUSANDS of
> > people, and it's case sensitive? It is to laugh.
>
> Case sensetivity is an artefact of the latin alphabet (and various
> derivatives). It doesnt exist as a concept in most human languages, and
> doesnt make and sense anyway - why are 'a' and 'A' the same, other than an
> accident of description ? You wouldnt want a case-sensetive filesystem,
> so why a case sensetive markup langauge ? The same arguments apply
> to programming languages - you dont want SWITCH and WHILE to be valid
> C do you, so why shoulg <WML> and <wml> be equivalewnt ?
Either I don't understant what you mean, or you seem contradictory to me.
1) 'a' and 'A' are the same for human.
2) one would not want to have while = WHILE in C.
Then you choose that <WML> be different than <wml>. Aren't you human? ;-)
In roman scripts, there is this notion of majuscule and minuscule.
I've been said that in arab scripts, each letter can be written four
different ways, depending on its position in the word :first, middle
with accent, middle witout accent, last, IIRC.
In Hebrew, most letters have only one form, but some are written
differently when the last of the word.
In the case of Arab and Hebrew, one could say that the form is only a
presentation matter and can be determined automatically, therefore
it's meaning-less (but not when writting about the Hebrew or Arab
scripts, then you need to write each form independently).
In European languages, the difference in majuscule and miniscule is
very significant. Obviously in German where all nouns are written with
a majuscule. Obvious in English and in French where all proper nouns
are written with a majuscule. poubelle != Poubelle. In the first case,
it's trash can, in the second, it's M. Poubelle, the inventor of the
poubelle. Very significant I would say.
On the other hand, if you see a sign written "EXIT", "Exit" or "exit"
it does not matter, you know it's where you have to go to exit.
In conclusion, it all depends on the context.
In programming languages, and other computer language, some are case
sensitive (MODULA-2, MODULA-3, C, ADA), some are not (LISP, Pascal).
In general, seasoned programmers or computer users will prefer case
sensitive. It allows to express nuances like Poubelle/poubelle. It
also is easier and faster to compare strings in a case sensitive way,
because you can have a bijection between the characters and the codes.
However, people in general are not sensible to the case (they're not
case sensitive!). That's why user oriented file systems are not case
sensitive (Macintosh, MS-DOS), and why tutorial languages like Pascal
and Basic are not case sensitive either. But then you have problems
such as matching accented letters (in a French dictionary, é = è = ê =
e = É = È = Ê = E, but in a Spanish dictionnary, n != ñ ; mind you, in
a Spanish dictionarry rr and ll are considered a single letter to be
put after r and l, not after rq and lk). And of course, that means
that now you have to know all the encodings your data may be in before
you can compare a string, and you need to know the language in which
your data is written. In iso-latin-1, the relation between majuscule
accented and miniscule accented is homogenous, but that's not the case
in Macintosh encoding. That means that you need to know exactly the
encoding you're working with before you're able to be case insensitive
with accented letters. UNICODE did not improve this state because you
have several equivalent encodings for the same letter, not counting
case and accents... Then you have quite complicated algorithms to
compare strings in a case insensitive. Here you have why there's a
whole "Manager" dedicated to string comparison and sorting in MacOS.
Here, you need a lot of context before you can compare meaningfully a
string. Language, country, script, encoding, etc.
HTML is not case sensitive because it was conceived to be written by
human. What have been overlooked is that it's interpreted by
computers a lot more than it's written by human, and soon enough, it
was even generated automatically by computer. How many HTML geeks
exist who write their own HTML like me? In retrospect, it would have
been more efficient if HTML was case sensitive and all tags written in
only one form.
Therefore, it's natural that new tag marking languages such as XML or
XHTML be case sensitive (I would say code sensitive!), because it
simplifies and accelerates the string matching needed to analyze the
documents. You, err, not you, the computer only has to compare the
bytes.
> > There are more standards than there are applications to comply with them.
> > Every new standard, including XHTML, makes the problem worse.
>
> This is tre. But I have yet to see a good argument as to why HTML should not
> have been made case-sensitive in the first place.
>
> -bat.
>
> _______________________________________________
> Discuss-gnustep mailing list
> Discuss-gnustep@gnu.org
> http://mail.gnu.org/mailman/listinfo/discuss-gnustep
>
--
__Pascal_Bourguignon__ (o_ Software patents are endangering
() ASCII ribbon against html email //\ the computer industry all around
/\ and Microsoft attachments. V_/ the world http://lpf.ai.mit.edu/
1962:DO20I=1.100 2001:my($f)=`fortune`; http://petition.eurolinux.org/
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/IT d? s++:++(+++)>++ a C+++ UB+++L++++$S+X++++>$ P- L+++ E++ W++
N++ o-- K- w------ O- M++$ V PS+E++ Y++ PGP++ t+ 5? X+ R !tv b++(+)
DI+++ D++ G++ e+++ h+(++) r? y---? UF++++
------END GEEK CODE BLOCK------
- Re: switch to XHTML, Gregory Martin Pfeil, 2002/03/02
- Re: switch to XHTML, Jeff Teunissen, 2002/03/05
- Re: switch to XHTML, Gregory Martin Pfeil, 2002/03/08
- Re: switch to XHTML, Pete French, 2002/03/08
- Re: switch to XHTML, Jeff Teunissen, 2002/03/08
- Re: switch to XHTML, Robert J. Slover, 2002/03/08
- Re: switch to XHTML, Pete French, 2002/03/08
- Re: switch to XHTML,
Pascal Bourguignon <=
- Re: switch to XHTML, Jeff Teunissen, 2002/03/08
- Re: switch to XHTML, Jeff Teunissen, 2002/03/08
- Re: switch to XHTML, Pete French, 2002/03/09
- Re: switch to XHTML, Jeff Teunissen, 2002/03/11