gnulib-tool-py
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnulib-tool-py] Python 2 vs Python 3: operating with strings


From: address@hidden
Subject: [gnulib-tool-py] Python 2 vs Python 3: operating with strings
Date: Sat, 28 Apr 2012 21:32:01 +0000

Hello everyone.

I have one problem:  there is some functions inside the code which get stdout 
after executing some shell commands. All the content from shell has 'str' type 
in Python 2 and 'bytes' type in Python 3. All works great in Python 2,  because 
all English strings can be converted to 'unicode' type, but Python 3 pays a 
great attention to the type of string. You can not contatenate 'str' and 
'bytes' types without telling what encoding you use. So we have a problem of 
portability. What decisions we can make?

1. We can check what encodings are used on different systems. All we need is to 
run some commands:
    a) sys.getdefaultencoding()
    b) sys.stdout.encoding
    c) sys.getfilesystemencoding()
I know that on Linux we have 'UTF-8' everywhere, but on pure Windows we have 
'cp1251', 'cp866', 'mbcs' and it depends on locale too. I think that cygwin 
uses 'UTF-8' too. However we need to check everything and then must write 
conditions how to convert bytes to string if we use Python 3.
2. We could use my package streaming and fileutils, which could solve this 
problem absolutely. There are two problems: 1) I haven't yet converted this 
package to Python 3;  2) package must be recompiled for each system.

However both ways can be improved.

1. May be we don't need to know all the encodings. I think that we can do 
something like this:

result = subprocess.check_output(args)
if PY3K:
  result = str(result, sys.stdout.encoding)
result = 'gnulib ' + result

2. I can take from my streaming module only what will work cross-platform and 
create the same bstream and ustream classes. That will allow not to use 
conditions when converting bytes to strings. This way really combines 1 and 2 
ways. The another plus is that this can be used as a separated module. Of 
course this module will use pure Python, not Cython.

So what are you thinking? The problem needs to be solved. I think the second 
way is better, but I need to know your opinion.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]