[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Unac-devel] problem encoding ß

From: mark warren bracher
Subject: Re: [Unac-devel] problem encoding ß
Date: Fri, 06 Sep 2002 11:51:37 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2a) Gecko/20020906

Loic Dachary wrote:
    > mark warren bracher writes:
    >  > I downloaded the latest unac lib and the Text::Unaccent perl
module, and
    >  > it all looks great.
    >  >
    >  > I started throwing as much Spanish/French/German as I remember
at it,
    >  > and I've come up with one oddity.  The German S-set (or sz
    >  > those are the only two names I know it by) ß passes straight
through any
    >  > attempt to unaccent.  It should be encoded as
    >  >
    >  >    ß -> ss
    >  >
    >  > in much the same way that the ae ligature
    >  >
    >  >    æ -> ae
    >  >
    >        I could not say which transformation is the correct one. Is it
> ß -> ß or is it ß -> ss ? Could you get an authoritative answer ? I'll
    > implement whatever is needed to comply to it.
    >        Cheers,

Ok, that's a bit of a complex theological question...


The two are equivalent, but......  Grammar rules are moving towards
using ß in some instances (after long vowels and diphthongs), and ss in
others (after short vowels).  Which is two say that the desire is for
the spelling to be evident from the pronunciation.

As to mapping rules, I suppose it comes down to the intent of the unac
lib.  My specific intent is to use it to augment keyword buckets in a
search engine, so that matches can be found without requiring accented
characters in the query string.  Someone searches for 'Maná' and finds
Maná; they search for 'Mana' and find Maná.  A search for 'Straße' finds
Straße; a search for 'Strasse' finds Straße.

     From a practical perspective, when typing German on a keyboard without
a mapping for ß, I was taught to use ss as the grammatical equivalent.

I'm not sure that's a definitive answer, this probably bears more
discussion...  Are there any native German speakers on this list who
care to chime in?  I grew up living in the U.S., but my parents are
fluent in German and it was spoken in at home.  Since I am not a
'native' speaker, I don't necessarily trust my judgement as an
'authoritative answer'.

- mark

reply via email to

[Prev in Thread] Current Thread [Next in Thread]