Porting to Python 3 Redux (opens in new tab)

(lucumr.pocoo.org)

103 pointsdous13y ago18 comments

18 comments

It's great to see Armin's continued work on this. Things have obviously progressed since he wrote his post on the subject [0] (over a year ago). He's written so much of the code I rely on daily that I was concerned that he'd lose interest in Python during the 2-3 transition (I know it's not completely on him to support it but he is a key contributor to Python's use on the web).

[0] http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/

kmike8413y ago

Great writeup (and a cool metaclass workaround!)

However, I think that "Drop 2.5, 3.1 and 3.2" advice is bad - dropping 2.5 and 3.1 is the way to go (hey, drop 2.5 even if you're not porting to 3.x), but dropping 3.2 is not necessary in most cases.

In my experience (porting and maintaining 20 open-source packages that work with Python 2 and Python 3 using a single codebase, including NLTK) Python 3.2 has never been a problem - I don't see how NLTK code and code of my other packages could be improved by dropping Python 3.2 compatibility.

The main argument for dropping Python 3.2 support seems to be that u'strings' are not supported in Python 3.2. There are 3 "types" of strings in Python:

* b"bytes",

* "native strings" # bytes in 2.x and unicode in 3.x

* u"unicode"

By adding `from __future__ import unicode_literals`` line to a top of the file, code compatible with 2.6-3.2 could be written like this:

* b"bytes"

* str("native string") # bytes in 2.x and unicode in 3.x

* "unicode"

In my opinion this is not a hack (unlike six.b and six.u necessary for 2.5 support), and this is arguably closer to Python 3.x semantics (unicode strings are default). So IMHO while using u"unicode" feature from Python 3.3 makes porting somewhat easier (less stupid search-replace), it also could make code worse and more cluttered, and Python 3.2 - compatible syntax is just fine.

It is true that 3.3 brings other improvements (Armin mentioned binary codecs), but it is quite rare that the library actually needs them (even libraries as big as NLTK and Django are fine with 3.2 stdlib).

3.2 is a default 3.x Python in current Ubuntu LTS (EOL in 2017) and a default 3.x Python in the recently released Debian Wheezy; 3.2 will be around for a long time, and not supporting it will hurt. So if you're doing Python 3.x porting, please just fix those stupid u'strings' with unicode_literals future import - your code will be more ideomatic and also 3.2 compatible.

There is also an advice for encoding __repr__ and __str__ results to utf8 under Python 2.x in the article; this is fine (other approaches are not better), but it has some non-obvious consequences (like breaking REPL in some setups) that developers should be aware of, see http://kmike.ru/python-with-strings-attached/

For lower-level 2.x-3.x compatible C/C++ extensions Cython is great. In fact, many libraries (e.g. lxml) are compatible with Python 3.x because they are written in Cython which generates compatible code (modulo library changes) by default.

the_mitsuhiko13y ago

> Python 3.2 has never been a problem

It's not a problem if you are willing to litter your code with calls or upgrade a ton of code in 2.x to unicode accidentally. There are just too many cases in 2.x where that is a terrible idea and introduces subtle bugs. I very strongly recommend against `from __future__ import unicode_literals`. If anything go with six.

In regards to supporting 3.2: I don't think anyone cares. The number of people currently using Python 3 is pretty low and a lot of libraries are already dropping 3.2 support. Requests, MarkupSafe, Jinja2 now all dropped 3.2 support and with that a lot of stuff that pulls in dependencies to those will now also depend on 3.3.

I still think people should stick to 2.7 for at least another one, two years and at that point a lot will have changed.

//EDIT: wrt __str__ returning utf-8 data: __str__'s encoding is undefined but usually accepted to be > ASCII. Django and Jinja2 for instance returned utf-8 there for years and it did not cause any problems.

kmike8413y ago

In case of NLTK unicode_literals ("unicode by default") fixed a lot of bugs and made other bugs visible, so mileage may vary :)

Could you give an example of cases where unicode_literals is a terrible idea?

3.2 is important for newcomer experience IMHO; it is very common for people starting with Python to use 3.x version and wonder why the code doesn't work. It's a pity high-profile packages are dropping 3.2 support, I wasn't aware Requests and Jinja2 dropped it.

utf8 __str__ definitely caused issues for Django (e.g. `print mymodel` sometimes fails in REPL in Windows with Russian locale); people using REPL in Windows are too used to such errors so they don't complain and blame Windows for this, but that doesn't mean there is no issue.

1 more reply

JulianWasTaken13y ago

> There is also an advice for encoding __repr__ and __str__ results to utf8 under Python 2.x in the article; this is fine (other approaches are not better), but it has some non-obvious consequences (like breaking REPL in some setups) that developers should be aware of, see http://kmike.ru/python-with-strings-attached/

I don't see `__repr__` mentioned there, but `__repr__` should basically always be ascii (which a quick glance at your article looks like it mentions).

I'm fine with `__str__` returning (encoding to) `utf-8` generally, as if someone wants something else they can always encode the unicode themselves to what they want, but `.encode(locale.getpreferredencoding())` is also fine with me if you want to be even more polite.

kmike8413y ago

You're right that __repr__ was not mentioned, my bad.

I think `.encode(locale.getpreferredencoding())` is awful because this changes string encoding from run to run, and because locale.getpreferredencoding() could be different (and is different by default e.g. in Cyrillic Windows XP) from both `sys.stdout.encoding` (used for printing) and `sys.getdefaultencoding()` (used for implicit type conversions).

1 more reply

wallunit13y ago

> If you have a C module written on top of the Python C API: shoot yourself. There is no tooling available for that yet from what I know and so much stuff changed.

I don't agree with that one. I have added Python 3 support two years ago to the Python bindings for libssh2 and it was straight forward. First of all it is still C and therefore you don't have to care about the syntax changed in Python. Just add some #if PY_MAJOR_VERSION < 3 for API calls that have changed, or even better wrap that code in macros. Probably you already have some backwards compatibility switches/macros like that already anyway in your code, if you already support multiple versions of 2.x. So adding some more for Python 3, isn't that a big deal.

At least at that time, when six and modernizer wasn't available it was way easier and straight forward to support Python 2 and 3 with the same codebase with extension modules, than with actual Python code. And it seems if you don't want to drop support for Python <= 2.5 or <= 3.2, it still is.

rogerbinns13y ago

My C module is APSW - a python wrapper around SQLite - https://code.google.com/p/apsw/

It supports every version of Python from 2.3 onwards with the exception of 3.0. I provide binaries for Windows and astonishingly people are still downloading the 2.3 version.

As you stated, most of the work is done by feeding the C preprocessor relevant information - https://code.google.com/p/apsw/source/browse/src/pyutil.c

It did take me considerably longer to make my test suite work. This is because I have 99.6% code coverage, and it exercises a lot of edge/error conditions. The test suite code (in Python) is written to run under both Python 2 and 3 as is and has to use some of the similar tricks with exec as the article mentioned. Fun challenges are constructing invalid UTF8 sequences in all Python versions and that sort of thing.

wiredfool13y ago

My experience is that at least with Py 2.6, 2.7, 3.2, and 3.3, it's not all that bad. I'm helping maintain Pillow, a PIL fork, and the commit to add Python 3 support touched a lot of things, but it wasn't that complicated. We've got a py3k.h file that has some ifdefs in it, all the print statements got changed to functions, and there's a few other bits and pieces.

Prior to Pycon, I wasn't really ready for python 3, now I'm missing it in my main codebase (which is on 2.7).

the_mitsuhiko13y ago

That again depends on how much you do with strings and integers and how many modules you construct. PyInt is gone, PyUnicode is now PyStr, module construction uses a vastly different system and on 3.x you want to support the stable ABI which looks a bit different.

wallunit13y ago

Except for the module creation you can easily add some very simple compatibility macros. I don't see how that would be different from your _compat module. However module creation can't be abstracted into a uniform macro in fact, because of it requires to define a PyModuleDef struct and the the modlue's init function got a return value in Python 3. But I'm fine with using some #if PY_MAJOR_VERSION >= 3 here.

After all you have to deal with way less compatibility issues, in extension modules than in actual python code. And if needed you can always do a simple version switch. You don't have to care about changes in the syntax of Python. You also don't have to care about changes of the __*__ magic method, because of you don't call them directly, and when defining classes you use slots for stuff like that.

1 more reply

mynegation13y ago

I also found that maintaining single code base for both 2 and 3 is the only sane way. Running 2to3 during build is just too intrusive.

I liked the approach of 'six', but it is not shipped as a system module. Having something like that as a default system module in Python 2.6, 2.7, and 3.x would go a long way towards adoption of Python 3.

I found that I end up either using six or implementing some subset of it if I do not want to introduce the dependency.

bdarnell13y ago

This is a good writeup. On Tornado I went through a similar transition from 2to3 to a single codebase. As long as you can drop Python 2.5 support you can probably avoid 2to3, but if you do need it I wrote some tools to make it less painful: http://bdarnell.github.io/blog/2012/03/13/cross-python-devel...

hoodoof13y ago

So Armin what did not come across from your blog post is how you feel about Python 3.

Until now you have been seen as one of the people holding out strongly against it. Where are you at now? Are you going to move to Python 3?

What's the future for Jinja2 now? Are you enthused about maintaining it? If it's headed for the deadpool please let us know as we can move to technologies that are going to have a future.

What the future for Flask?

What are your current thoughts on Python 3?

j / k navigate · click thread line to collapse

18 comments

aidos13y ago

[0] http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/

kmike8413y ago

Great writeup (and a cool metaclass workaround!)

However, I think that "Drop 2.5, 3.1 and 3.2" advice is bad - dropping 2.5 and 3.1 is the way to go (hey, drop 2.5 even if you're not porting to 3.x), but dropping 3.2 is not necessary in most cases.

The main argument for dropping Python 3.2 support seems to be that u'strings' are not supported in Python 3.2. There are 3 "types" of strings in Python:

* b"bytes",

* "native strings" # bytes in 2.x and unicode in 3.x

* u"unicode"

By adding `from __future__ import unicode_literals`` line to a top of the file, code compatible with 2.6-3.2 could be written like this:

* b"bytes"

* str("native string") # bytes in 2.x and unicode in 3.x

* "unicode"

the_mitsuhiko13y ago

> Python 3.2 has never been a problem

I still think people should stick to 2.7 for at least another one, two years and at that point a lot will have changed.

kmike8413y ago

In case of NLTK unicode_literals ("unicode by default") fixed a lot of bugs and made other bugs visible, so mileage may vary :)

Could you give an example of cases where unicode_literals is a terrible idea?

1 more reply

JulianWasTaken13y ago

I don't see `__repr__` mentioned there, but `__repr__` should basically always be ascii (which a quick glance at your article looks like it mentions).

kmike8413y ago

You're right that __repr__ was not mentioned, my bad.

1 more reply

wallunit13y ago

> If you have a C module written on top of the Python C API: shoot yourself. There is no tooling available for that yet from what I know and so much stuff changed.

rogerbinns13y ago

My C module is APSW - a python wrapper around SQLite - https://code.google.com/p/apsw/

It supports every version of Python from 2.3 onwards with the exception of 3.0. I provide binaries for Windows and astonishingly people are still downloading the 2.3 version.

As you stated, most of the work is done by feeding the C preprocessor relevant information - https://code.google.com/p/apsw/source/browse/src/pyutil.c

wiredfool13y ago

Prior to Pycon, I wasn't really ready for python 3, now I'm missing it in my main codebase (which is on 2.7).

the_mitsuhiko13y ago

wallunit13y ago

1 more reply

mynegation13y ago

I also found that maintaining single code base for both 2 and 3 is the only sane way. Running 2to3 during build is just too intrusive.

I found that I end up either using six or implementing some subset of it if I do not want to introduce the dependency.

bdarnell13y ago

hoodoof13y ago

So Armin what did not come across from your blog post is how you feel about Python 3.

Until now you have been seen as one of the people holding out strongly against it. Where are you at now? Are you going to move to Python 3?

What's the future for Jinja2 now? Are you enthused about maintaining it? If it's headed for the deadpool please let us know as we can move to technologies that are going to have a future.

What the future for Flask?

What are your current thoughts on Python 3?

j / k navigate · click thread line to collapse