[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Maposmatic-dev] Missing city name in result files!

From: Jeroen van Rijn
Subject: Re: [Maposmatic-dev] Missing city name in result files!
Date: Thu, 14 Jan 2010 11:35:34 +0100

On Thu, Jan 14, 2010 at 09:12, David MENTRE <address@hidden> wrote:
> Two short answers, in addition of those already provided:
>  * This known and expected behaviour that we decided when we started
> MapOSMatic. We have done that because of the fear of security issues
> when including unknown characters in file names. Yes it was a
> conservative approach! :-)
>  * This choice is obviously wrong and we should do something about
> that. But as Thomas said, the answer is not obvious. Maybe we should
> start we a simple solution, as suggested by Jeroen;

True, punycode is easy to implement and because the result is a
latinized string there are no security implications in using it to
name a file. This would be a good stopgap solution that is relative
painless and quick to implement, at that. The user receiving the file
has then the option of renaming the file, as they do already.

Going with UTF-8 encoded filenames later on after researching the
options (using the rfc3987 mentioned by Thomas comes to mind) is
possible as a gradual enhancement to the maposmatic service and

As far as the end-user is concerned I know that NTFS can store UTF8
filenames (barring the usual candidates of : and \), and most if not
all unix filesystems (in use) have no problem handling UTF8 filenames
either. This then brings up browsers which are presented with a UTF8
url and asked to convey that to the server, and receiving a file with
that name set as a hint.

After all a file download presented to the server, the name is nothing
more than a hint that most browsers go with because it's convenient. A
browser which doesn't understand the filename could easily accept the
file data in question but name it 'whatthehellisthisname.doc'

>From my limited study this morning, FF 3+ and Chrome (the beta I'm
running at least) have no problem using the arabic wikipedia site and
clicking to articles there. I know that as of a year ago it used to be
the case that such wikipedia urls used urlencoding for non-latin
characters. This suggests that maybe they detect the browser and
present urlencoded or utf-8 urls depending on what they know the
capability to be.

Feel free to assign a task investigating utf-8, urlencoded and/or
punycoded urls to me on the savannah tracker.

There's a 3rd possible thing to do which could be employed between
punycoding and going full utf8, but that would basically be to write a
russian->latin transliteration class, an arabic->latin one, a
koi8->latin one, a big5->latin one, and so on. Effort perhaps better
spent in finding out a more permanent and satisfying solution like
apparently wikipedia did.

"That's like saying that a squirrel is 48% juicier than an orange -
maybe it's true, but anybody who puts the two in a blender to compare
them is kind of sick." -- Linus Torvalds

reply via email to

[Prev in Thread] Current Thread [Next in Thread]