qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] decodetree: Open files with encoding='utf-8'


From: Philippe Mathieu-Daudé
Subject: Re: [PATCH] decodetree: Open files with encoding='utf-8'
Date: Fri, 8 Jan 2021 17:44:32 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0

On 1/8/21 4:38 PM, Peter Maydell wrote:
> On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>>
>> When decodetree.py was added in commit 568ae7efae7, QEMU was
>> using Python 2 which happily reads UTF-8 files in text mode.
>> Python 3 requires either UTF-8 locale or an explicit encoding
>> passed to open(). Now that Python 3 is required, explicit
>> UTF-8 encoding for decodetree sources.
>>
>> This fixes:
>>
>>   $ /usr/bin/python3 scripts/decodetree.py test.decode
>>   Traceback (most recent call last):
>>     File "scripts/decodetree.py", line 1397, in <module>
>>       main()
>>     File "scripts/decodetree.py", line 1308, in main
>>       parse_file(f, toppat)
>>     File "scripts/decodetree.py", line 994, in parse_file
>>       for line in f:
>>     File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>>       return codecs.ascii_decode(input, self.errors)[0]
>>   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
>>   ordinal not in range(128)
>>
>> Reported-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> ---
>>  scripts/decodetree.py | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
>> index 47aa9caf6d1..fa40903cff1 100644
>> --- a/scripts/decodetree.py
>> +++ b/scripts/decodetree.py
>> @@ -1304,7 +1304,7 @@ def main():
>>
>>      for filename in args:
>>          input_file = filename
>> -        f = open(filename, 'r')
>> +        f = open(filename, 'r', encoding='utf-8')
>>          parse_file(f, toppat)
>>          f.close()
> 
> Should we also be opening the output file explicitly as
> utf-8 ? (How do we say "write to sys.stdout as utf-8" for
> the case where we're doing that?)

I have been wondering about it, but the content written
in the output file is plain C code using only ASCII,
which any locale is able to process. But indeed maybe
we prefer ignore the user locale... I'm not sure.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]