discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: switch to XHTML


From: Pascal Bourguignon
Subject: Re: switch to XHTML
Date: Sat, 9 Mar 2002 02:55:54 +0100 (CET)

> From: Pete French <pete@twisted.org.uk>
> Date: Fri, 08 Mar 2002 23:42:07 +0000
> 
> > designed to be parsed by MILLIONS of machines, and produced by THOUSANDS of
> > people, and it's case sensitive? It is to laugh.
> 
> Case sensetivity is an artefact of the latin alphabet (and various
> derivatives). It doesnt exist as a concept in most human languages, and
> doesnt make and sense anyway - why are 'a' and 'A' the same, other than an
> accident of description ? You wouldnt want a case-sensetive filesystem,
> so why a case sensetive markup langauge ? The same arguments apply
> to programming languages - you dont want SWITCH and WHILE to be valid
> C do you, so why shoulg <WML> and <wml> be equivalewnt ?

Either I don't understant what you mean, or you seem contradictory to me.
       
       1) 'a' and 'A' are the same for human.
       2) one would not want to have while = WHILE in C.

Then you choose that <WML> be different than <wml>. Aren't you human?  ;-)

In roman scripts, there is this notion of majuscule and minuscule.

I've been said  that in arab scripts, each letter  can be written four
different ways, depending  on its position in the  word :first, middle
with accent, middle witout accent, last, IIRC.

In  Hebrew, most  letters have  only one  form, but  some  are written
differently when the last of the word.

In the case of Arab and Hebrew,  one could say that the form is only a
presentation  matter and  can be  determined  automatically, therefore
it's  meaning-less (but  not when  writting about  the Hebrew  or Arab
scripts, then you need to write each form independently).

In European  languages, the difference  in majuscule and  miniscule is
very significant. Obviously in German where all nouns are written with
a majuscule. Obvious  in English and in French  where all proper nouns
are written with a majuscule. poubelle != Poubelle. In the first case,
it's trash can,  in the second, it's M. Poubelle,  the inventor of the
poubelle. Very significant I would say.  

On the other hand, if you  see a sign written "EXIT", "Exit" or "exit"
it does not matter, you know it's where you have to go to exit.

In conclusion, it all depends on the context.
 

In programming  languages, and other computer language,  some are case
sensitive (MODULA-2, MODULA-3, C, ADA), some are not (LISP, Pascal).

In general,  seasoned programmers or  computer users will  prefer case
sensitive.  It  allows to express nuances  like Poubelle/poubelle.  It
also is easier and faster to  compare strings in a case sensitive way,
because you can have a bijection between the characters and the codes.

However, people in  general are not sensible to  the case (they're not
case sensitive!).  That's why user  oriented file systems are not case
sensitive (Macintosh, MS-DOS), and  why tutorial languages like Pascal
and Basic are  not case sensitive either.  But  then you have problems
such as matching accented letters (in a French dictionary, é = è = ê =
e = É = È = Ê = E, but in a Spanish dictionnary, n != ñ ; mind you, in
a Spanish dictionarry  rr and ll are considered a  single letter to be
put after r  and l, not after  rq and lk).  And of  course, that means
that now you have to know all the encodings your data may be in before
you can compare  a string, and you need to know  the language in which
your data  is written. In iso-latin-1, the  relation between majuscule
accented and miniscule accented is homogenous, but that's not the case
in Macintosh  encoding. That means that  you need to  know exactly the
encoding you're working with before you're able to be case insensitive
with accented letters. UNICODE did  not improve this state because you
have several  equivalent encodings for  the same letter,  not counting
case  and accents...  Then  you have  quite complicated  algorithms to
compare strings  in a case insensitive.   Here you have  why there's a
whole "Manager"  dedicated to string comparison and  sorting in MacOS.
Here, you need a lot of  context before you can compare meaningfully a
string.  Language, country, script, encoding, etc.


HTML is not  case sensitive because it was conceived  to be written by
human.   What  have  been  overlooked  is  that  it's  interpreted  by
computers a lot  more than it's written by human,  and soon enough, it
was  even generated automatically  by computer.   How many  HTML geeks
exist who write their own HTML  like me?  In retrospect, it would have
been more efficient if HTML was case sensitive and all tags written in
only one form.

Therefore, it's natural that new  tag marking languages such as XML or
XHTML  be case  sensitive (I  would say  code sensitive!),  because it
simplifies and  accelerates the string matching needed  to analyze the
documents.  You,  err, not you, the  computer only has  to compare the
bytes.


> > There are more standards than there are applications to comply with them.
> > Every new standard, including XHTML, makes the problem worse. 
> 
> This is tre. But I have yet to see a good argument as to why HTML should not
> have been made case-sensitive in the first place.
> 
> -bat.
> 
> _______________________________________________
> Discuss-gnustep mailing list
> Discuss-gnustep@gnu.org
> http://mail.gnu.org/mailman/listinfo/discuss-gnustep
> 


-- 
__Pascal_Bourguignon__              (o_ Software patents are endangering
()  ASCII ribbon against html email //\ the computer industry all around
/\  and Microsoft attachments.      V_/ the world http://lpf.ai.mit.edu/
1962:DO20I=1.100  2001:my($f)=`fortune`;  http://petition.eurolinux.org/

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/IT d? s++:++(+++)>++ a C+++  UB+++L++++$S+X++++>$ P- L+++ E++ W++
N++ o-- K- w------ O- M++$ V PS+E++ Y++ PGP++ t+ 5? X+ R !tv b++(+)
DI+++ D++ G++ e+++ h+(++) r? y---? UF++++
------END GEEK CODE BLOCK------





reply via email to

[Prev in Thread] Current Thread [Next in Thread]