[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnulib-tool-py] Python 2 vs Python 3: operating with strings
From: |
address@hidden |
Subject: |
[gnulib-tool-py] Python 2 vs Python 3: operating with strings |
Date: |
Sat, 28 Apr 2012 21:32:01 +0000 |
Hello everyone.
I have one problem: there is some functions inside the code which get stdout
after executing some shell commands. All the content from shell has 'str' type
in Python 2 and 'bytes' type in Python 3. All works great in Python 2, because
all English strings can be converted to 'unicode' type, but Python 3 pays a
great attention to the type of string. You can not contatenate 'str' and
'bytes' types without telling what encoding you use. So we have a problem of
portability. What decisions we can make?
1. We can check what encodings are used on different systems. All we need is to
run some commands:
a) sys.getdefaultencoding()
b) sys.stdout.encoding
c) sys.getfilesystemencoding()
I know that on Linux we have 'UTF-8' everywhere, but on pure Windows we have
'cp1251', 'cp866', 'mbcs' and it depends on locale too. I think that cygwin
uses 'UTF-8' too. However we need to check everything and then must write
conditions how to convert bytes to string if we use Python 3.
2. We could use my package streaming and fileutils, which could solve this
problem absolutely. There are two problems: 1) I haven't yet converted this
package to Python 3; 2) package must be recompiled for each system.
However both ways can be improved.
1. May be we don't need to know all the encodings. I think that we can do
something like this:
result = subprocess.check_output(args)
if PY3K:
result = str(result, sys.stdout.encoding)
result = 'gnulib ' + result
2. I can take from my streaming module only what will work cross-platform and
create the same bstream and ustream classes. That will allow not to use
conditions when converting bytes to strings. This way really combines 1 and 2
ways. The another plus is that this can be used as a separated module. Of
course this module will use pure Python, not Cython.
So what are you thinking? The problem needs to be solved. I think the second
way is better, but I need to know your opinion.
- [gnulib-tool-py] Python 2 vs Python 3: operating with strings,
address@hidden <=