API design for Python 2 / 3 compatibility

S

Stefan Schwarzer

Hello,

I'm currently changing the FTP client library ftputil [1]
so that the same code of the library works with Python
2 (2.6 and up) and Python 3. (At the moment the code is for
Python 2 only.) I've run into a API design issue where I
don't know which API I should offer ftputil users under
Python 2 and Python 3.

[1] http://ftputil.sschwarzer.net/

Some important background information: A key idea in ftputil
is that it uses the same APIs as the Python standard library
where possible. For example, with ftputil you can write code
like this:

with ftputil.FTPHost(host, user, password) as ftp_host:
# Like `os.path.isdir`, but works on the FTP server.
if ftp_host.path.isdir("hello_dir"):
# Like `os.chdir`, but works on the FTP server.
ftp_host.chdir("hello_dir")
# Like the `open` builtin, but opens a remote file.
with ftp_host.open("new_file", "w") as fobj:
# Like `file.write` and `file.close`
fobj.write("Hello world!")
fobj.close()

Since most of Python 2's and Python 3's filesystem-related
APIs accept either bytes and character strings (and return
the same type if they return anything string-like at all),
the design here is rather clear to me.

However, I have some difficulty with ftputil's counterpart
of the `open` builtin function when files are opened for
reading in text mode. Here are the approaches I've been
thinking of so far:

* Approach 1

When opening remote text files for reading, ftputil will
return byte strings from `read(line/s)` when run under
Python 2 and unicode strings when run under Python 3.

Pro: Each of the Python versions has ftputil behavior
which matches the Python standard library behavior of
the respective Python version.

Con: Developers who want to use ftputil under Python 2
_and_ 3 have to program against two different APIs since
their code "inherits" ftputil's duality.

Con: Support for two different APIs will make the
ftputil code (at least a bit) more complicated than just
returning unicode strings under both Python versions.

* Approach 2

When opening remote text files for reading, ftputil will
always return unicode strings from `read(line/s)`,
regardless of whether it runs under Python 2 or Python 3.

Pro: Uniform API, independent on underlying Python
version.

Pro: Supporting a single API will result in cleaner code
in ftputil than when supporting different APIs (see
above).

Con: This approach might break some code which expects
the returned strings under Python 2 to be byte strings.

Con: Developers who only use Python 2 might be confused
if ftputil returns unicode strings from `read(line/s)`
since this behavior doesn't match files opened with
`open` in Python 2.

Which approach do you recommend and why do you prefer that
approach? Are there other approaches I have overlooked? Do
you have other suggestions?

Best regards,
Stefan
 
T

Terry Jan Reedy

Hello,

I'm currently changing the FTP client library ftputil [1]
so that the same code of the library works with Python
2 (2.6 and up) and Python 3. (At the moment the code is for
Python 2 only.) I've run into a API design issue where I
don't know which API I should offer ftputil users under
Python 2 and Python 3.

[1] http://ftputil.sschwarzer.net/

Some important background information: A key idea in ftputil
is that it uses the same APIs as the Python standard library
where possible. For example, with ftputil you can write code
like this:

with ftputil.FTPHost(host, user, password) as ftp_host:
# Like `os.path.isdir`, but works on the FTP server.
if ftp_host.path.isdir("hello_dir"):
# Like `os.chdir`, but works on the FTP server.
ftp_host.chdir("hello_dir")
# Like the `open` builtin, but opens a remote file.
with ftp_host.open("new_file", "w") as fobj:
# Like `file.write` and `file.close`
fobj.write("Hello world!")
fobj.close()

Since most of Python 2's and Python 3's filesystem-related
APIs accept either bytes and character strings (and return
the same type if they return anything string-like at all),
the design here is rather clear to me.

However, I have some difficulty with ftputil's counterpart
of the `open` builtin function when files are opened for
reading in text mode. Here are the approaches I've been
thinking of so far:

* Approach 1

When opening remote text files for reading, ftputil will
return byte strings from `read(line/s)` when run under
Python 2 and unicode strings when run under Python 3.

Pro: Each of the Python versions has ftputil behavior
which matches the Python standard library behavior of
the respective Python version.

Con: Developers who want to use ftputil under Python 2
_and_ 3 have to program against two different APIs since
their code "inherits" ftputil's duality.

Con: Support for two different APIs will make the
ftputil code (at least a bit) more complicated than just
returning unicode strings under both Python versions.

* Approach 2

When opening remote text files for reading, ftputil will
always return unicode strings from `read(line/s)`,
regardless of whether it runs under Python 2 or Python 3.

Pro: Uniform API, independent on underlying Python
version.

Pro: Supporting a single API will result in cleaner code
in ftputil than when supporting different APIs (see
above).

Con: This approach might break some code which expects
the returned strings under Python 2 to be byte strings.

Con: Developers who only use Python 2 might be confused
if ftputil returns unicode strings from `read(line/s)`
since this behavior doesn't match files opened with
`open` in Python 2.

Which approach do you recommend and why do you prefer that
approach? Are there other approaches I have overlooked? Do
you have other suggestions?

Approach 2 matches (or should match) io.open, which became builtin open
in Python 3. I would simply document that ftp_host.open mimics io.open
in the same way that ftp_host.chdir, etcetera, match os.chdir, etc. Your
principle will remain intact.

Anyone writing *new* Py 2 code with any idea of ever running on Py 3
should be using io.open anyway. That is why it was backported. You might
be able to reuse some io code or subclass some io classes for your
implementation.
 
E

Ethan Furman

* Approach 2

When opening remote text files for reading, ftputil will
always return unicode strings from `read(line/s)`,
regardless of whether it runs under Python 2 or Python 3.

Pro: Uniform API, independent on underlying Python
version.

Pro: Supporting a single API will result in cleaner code
in ftputil than when supporting different APIs (see
above).

Con: This approach might break some code which expects
the returned strings under Python 2 to be byte strings.

Con: Developers who only use Python 2 might be confused
if ftputil returns unicode strings from `read(line/s)`
since this behavior doesn't match files opened with
`open` in Python 2.

Which approach do you recommend and why do you prefer that
approach?

Approach 2, because it is much saner to deal with unicode inside the program, and only switch back to some kind of
encoding when writing to files/pipes/etc. Since you are going to support python 3 as well you can bump the major
version number and note the backward incompatibility.
 
S

Stefan Schwarzer

Terry, Ethan:

Thanks a lot for your excellent advice. :)

Approach 2 matches (or should match) io.open, which became
builtin open in Python 3. I would simply document that
ftp_host.open mimics io.open in the same way that
ftp_host.chdir, etcetera, match os.chdir, etc. Your
principle will remain intact.

I didn't know about `io.open` (or had forgotten it).
Anyone writing *new* Py 2 code with any idea of ever
running on Py 3 should be using io.open anyway. That is
why it was backported. You might be able to reuse some io
code or subclass some io classes for your implementation.

Since I use `socket.makefile` to create the underlying file
objects, I can use `BufferedReader`/`BufferedWriter` and
`TextIOWrapper` to supply buffering and encoding/decoding.

Approach 2, because it is much saner to deal with unicode
inside the program, and only switch back to some kind of
encoding when writing to files/pipes/etc.

Yes, this is a much saner design. I just was hesitant
because of the introduced backward incompatibility and
wanted to get other's opinions.
Since you are going to support python 3 as well you can
bump the major version number and note the backward
incompatibility.

Actually I plan to increase the version number from 2.8 to
3.0 because of the Python 3 support and already intend to
change some module names that will be visible to client
code. So this is also a good opportunity to clean up the
file interface. :)

Best regards,
Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top