[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] decodetree: Open files with encoding='utf-8'
From: |
Philippe Mathieu-Daudé |
Subject: |
Re: [PATCH] decodetree: Open files with encoding='utf-8' |
Date: |
Fri, 8 Jan 2021 17:44:32 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 |
On 1/8/21 4:38 PM, Peter Maydell wrote:
> On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>>
>> When decodetree.py was added in commit 568ae7efae7, QEMU was
>> using Python 2 which happily reads UTF-8 files in text mode.
>> Python 3 requires either UTF-8 locale or an explicit encoding
>> passed to open(). Now that Python 3 is required, explicit
>> UTF-8 encoding for decodetree sources.
>>
>> This fixes:
>>
>> $ /usr/bin/python3 scripts/decodetree.py test.decode
>> Traceback (most recent call last):
>> File "scripts/decodetree.py", line 1397, in <module>
>> main()
>> File "scripts/decodetree.py", line 1308, in main
>> parse_file(f, toppat)
>> File "scripts/decodetree.py", line 994, in parse_file
>> for line in f:
>> File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>> return codecs.ascii_decode(input, self.errors)[0]
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
>> ordinal not in range(128)
>>
>> Reported-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> ---
>> scripts/decodetree.py | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
>> index 47aa9caf6d1..fa40903cff1 100644
>> --- a/scripts/decodetree.py
>> +++ b/scripts/decodetree.py
>> @@ -1304,7 +1304,7 @@ def main():
>>
>> for filename in args:
>> input_file = filename
>> - f = open(filename, 'r')
>> + f = open(filename, 'r', encoding='utf-8')
>> parse_file(f, toppat)
>> f.close()
>
> Should we also be opening the output file explicitly as
> utf-8 ? (How do we say "write to sys.stdout as utf-8" for
> the case where we're doing that?)
I have been wondering about it, but the content written
in the output file is plain C code using only ASCII,
which any locale is able to process. But indeed maybe
we prefer ignore the user locale... I'm not sure.