Re: Problem with -3 switch

Discussion in 'Python' started by Chris Rebert, Jan 9, 2009.

  1. Chris Rebert

    Chris Rebert Guest

    On Fri, Jan 9, 2009 at 9:17 AM, Aivar Annamaa <> wrote:
    > Hi
    >
    > I'm getting started with Python and in order to get good habits for Python
    > 3, i'd like to run my Python 2.6.1 with Python 3 warning mode.
    >
    > When i run
    > python -3
    >
    > and execute statement
    >>>> print 4

    >
    > then i expect to see a warning because i've understood that this statement
    > is not valid in Python 3
    >
    > however no warning appears.
    >
    > Have is misunderstood something?


    As was recently pointed out in a nearly identical thread, the -3
    switch only points out problems that the 2to3 converter tool can't
    automatically fix. Changing print to print() on the other hand is
    easily fixed by 2to3.

    Cheers,
    Chris

    --
    Follow the path of the Iguana...
    http://rebertia.com
    Chris Rebert, Jan 9, 2009
    #1
    1. Advertising

  2. Chris Rebert

    Steve Holden Guest

    Aivar Annamaa wrote:
    >> As was recently pointed out in a nearly identical thread, the -3
    >> switch only points out problems that the 2to3 converter tool can't
    >> automatically fix. Changing print to print() on the other hand is
    >> easily fixed by 2to3.
    >>
    >> Cheers,
    >> Chris
    >>

    >
    > I see.
    > So i gotta keep my own discipline with print() then :)
    >

    Only if you don't want to run your 2.x code through 2to3 before you use
    it as Python 3.x code.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
    Steve Holden, Jan 9, 2009
    #2
    1. Advertising

  3. Chris Rebert

    Carl Banks Guest

    On Jan 9, 12:36 pm, "J. Cliff Dyer" <> wrote:
    > On Fri, 2009-01-09 at 13:13 -0500, Steve Holden wrote:
    > > Aivar Annamaa wrote:
    > > >> As was recently pointed out in a nearly identical thread, the -3
    > > >> switch only points out problems that the 2to3 converter tool can't
    > > >> automatically fix. Changing print to print() on the other hand is
    > > >> easily fixed by 2to3.

    >
    > > >> Cheers,
    > > >> Chris

    >
    > > > I see.
    > > > So i gotta keep my own discipline with print() then :)

    >
    > > Only if you don't want to run your 2.x code through 2to3 before you use
    > > it as Python 3.x code.

    >
    > > regards
    > >  Steve

    >
    > And mind you, if you follow that route, you are programming in a
    > mightily crippled language.


    How do you figure?

    I expect that it'd be a PITA in some cases to use the transitional
    dialect (like getting all your Us in place), but that doesn't mean the
    language is crippled.


    > It's about as bad as trying to write
    > cross-browser CSS.  Don't put yourself through that pain if you don't
    > have to.


    Have you tried doing that, or are you imagining how it will be? I'm
    curious about people's actual experiences.

    Problem is, a lot of people use the "bang at it with a hammer till it
    works" approach to programming, and really have no shame when it comes
    to engaging in questionable practices like relying on accidental side
    effects, rather than taking the time to try to program robustly. I
    expect people with that style of programming will have many more
    issues with the transition.


    Carl Banks
    Carl Banks, Jan 9, 2009
    #3
  4. Chris Rebert

    John Machin Guest

    On Jan 10, 6:58 am, Carl Banks <> wrote:
    > On Jan 9, 12:36 pm, "J. Cliff Dyer" <> wrote:
    >
    >
    >
    >
    >
    > > On Fri, 2009-01-09 at 13:13 -0500, Steve Holden wrote:
    > > > Aivar Annamaa wrote:
    > > > >> As was recently pointed out in a nearly identical thread, the -3
    > > > >> switch only points out problems that the 2to3 converter tool can't
    > > > >> automatically fix. Changing print to print() on the other hand is
    > > > >> easily fixed by 2to3.

    >
    > > > >> Cheers,
    > > > >> Chris

    >
    > > > > I see.
    > > > > So i gotta keep my own discipline with print() then :)

    >
    > > > Only if you don't want to run your 2.x code through 2to3 before you use
    > > > it as Python 3.x code.

    >
    > > > regards
    > > >  Steve

    >
    > > And mind you, if you follow that route, you are programming in a
    > > mightily crippled language.

    >
    > How do you figure?
    >
    > I expect that it'd be a PITA in some cases to use the transitional
    > dialect (like getting all your Us in place), but that doesn't mean the
    > language is crippled.


    What is this "transitional dialect"? What does "getting all your Us in
    place" mean?

    Steve & Cliff are talking about the rather small subset of Python that
    is not only valid syntax in both 2.x and 3.x but also has the same
    meaning in 2.x and 3.x.

    >
    > > It's about as bad as trying to write
    > > cross-browser CSS.  Don't put yourself through that pain if you don't
    > > have to.

    >
    > Have you tried doing that, or are you imagining how it will be?  I'm
    > curious about people's actual experiences.


    I maintain two packages, xlrd which supports 2.1 to 2.6, and xlwt
    which supports 2.3 to 2.6. I've done suck-it-and-see trials on being
    able to support 3.x as well from the same codebase, and it's turned
    out reasonably well. xlrd already had a module called timemachine
    which caters for version- dependent stuff. Extending this to 3.x was
    more a voyage of discovery than a PITA. timemachine.py is "crippled"
    in Cliff's sense, in that because I'm the principal user I need to
    make it robust and idiot-proof, so it has been written under the
    following constraints:
    (1) not one copy of timemachine.py for 2.1, one for 2.2, one for
    2.3, ... etc; just one copy, period.
    (2) means that it must use syntax that's valid in all supported
    versions
    (3) must be able to be processed by 2to3 without causing a commotion
    (4) the original version and the 2to3 output must have the same effect
    when imported by 3.x.

    So one ends up with code like:
    glued = BYTES_NULL.join(list_of_pieces_of_a_binary_file)
    which is supported by timemachine definitions like
    BYTES_NULL = bytes(0) # 3.x ... note b'' is not valid in 2.x
    BYTES_NULL = '' # 2.x

    BYTES_NULL.join() may be ugly, but it's not crippled, it's fully
    functional, and it would be very easy to find and change in the future
    in two possible scenarios (1) drop 2.x support (2) change codebase to
    be mostly 3.x, support 2.x by a (mythical, hoped-for) 3to2 mechanism.

    > Problem is, a lot of people use the "bang at it with a hammer till it
    > works" approach to programming, and really have no shame when it comes
    > to engaging in questionable practices like relying on accidental side
    > effects, rather than taking the time to try to program robustly.  I
    > expect people with that style of programming will have many more
    > issues with the transition.


    Those with many more issues are likely to be those who don't have
    adequate tests and/or can't debug their way out of a wet paper bag --
    could well be we're talking about the same bunch :)

    Cheers,
    John
    John Machin, Jan 10, 2009
    #4
  5. Chris Rebert

    Carl Banks Guest

    On Jan 9, 6:11 pm, John Machin <> wrote:
    > On Jan 10, 6:58 am, Carl Banks <> wrote:
    >
    >
    >
    > > On Jan 9, 12:36 pm, "J. Cliff Dyer" <> wrote:

    >
    > > > On Fri, 2009-01-09 at 13:13 -0500, Steve Holden wrote:
    > > > > Aivar Annamaa wrote:
    > > > > >> As was recently pointed out in a nearly identical thread, the -3
    > > > > >> switch only points out problems that the 2to3 converter tool can't
    > > > > >> automatically fix. Changing print to print() on the other hand is
    > > > > >> easily fixed by 2to3.

    >
    > > > > >> Cheers,
    > > > > >> Chris

    >
    > > > > > I see.
    > > > > > So i gotta keep my own discipline with print() then :)

    >
    > > > > Only if you don't want to run your 2.x code through 2to3 before you use
    > > > > it as Python 3.x code.

    >
    > > > > regards
    > > > >  Steve

    >
    > > > And mind you, if you follow that route, you are programming in a
    > > > mightily crippled language.

    >
    > > How do you figure?

    >
    > > I expect that it'd be a PITA in some cases to use the transitional
    > > dialect (like getting all your Us in place), but that doesn't mean the
    > > language is crippled.

    >
    > What is this "transitional dialect"? What does "getting all your Us in
    > place" mean?


    Transitional dialect is the subset of Python 2.6 that can be
    translated to Python3 with 2to3 tool. Getting all your Us in place
    refers to prepending a u to strings to make them unicode objects,
    which is something 2to3 users are highly advised to do to keep hassles
    to a minimum. (Getting Bs in place would be a good idea too.)


    > Steve & Cliff are talking about the rather small subset of Python that
    > is not only valid syntax in both 2.x and 3.x but also has the same
    > meaning in 2.x and 3.x.


    That would be a crippled language, yes. But I do not believe that's
    what Steve and Cliff are referring to. Steve wrote of "running your
    code through 2to3", and that was what Cliff followed up to, so I
    believe they are both referring to writing valid code in 2.6 which is
    able to be translated through 2to3, and then generating 3.0 code using
    2to3. That is not a crippled language at all, just a PITA sometimes.


    > > > It's about as bad as trying to write
    > > > cross-browser CSS.  Don't put yourself through that pain if you don't
    > > > have to.

    >
    > > Have you tried doing that, or are you imagining how it will be?  I'm
    > > curious about people's actual experiences.

    >
    > I maintain two packages, xlrd which supports 2.1 to 2.6, and xlwt
    > which supports 2.3 to 2.6. I've done suck-it-and-see trials on being
    > able to support 3.x as well from the same codebase, and it's turned
    > out reasonably well. xlrd already had a module called timemachine
    > which caters for version- dependent stuff. Extending this to 3.x was
    > more a voyage of discovery than a PITA. timemachine.py is "crippled"
    > in Cliff's sense, in that because I'm the principal user I need to
    > make it robust and idiot-proof, so it has been written under the
    > following constraints:
    > (1) not one copy of timemachine.py for 2.1, one for 2.2, one for
    > 2.3, ... etc; just one copy, period.
    > (2) means that it must use syntax that's valid in all supported
    > versions
    > (3) must be able to be processed by 2to3 without causing a commotion
    > (4) the original version and the 2to3 output must have the same effect
    > when imported by 3.x.
    >
    > So one ends up with code like:
    >    glued = BYTES_NULL.join(list_of_pieces_of_a_binary_file)
    > which is supported by timemachine definitions like
    > BYTES_NULL = bytes(0) # 3.x ... note b'' is not valid in 2.x
    > BYTES_NULL = '' # 2.x
    >
    > BYTES_NULL.join() may be ugly, but it's not crippled, it's fully
    > functional, and it would be very easy to find and change in the future
    > in two possible scenarios (1) drop 2.x support (2) change codebase to
    > be mostly 3.x, support 2.x by a (mythical, hoped-for) 3to2 mechanism.


    Cool, thanks.


    Carl Banks
    Carl Banks, Jan 12, 2009
    #5
  6. Chris Rebert

    John Machin Guest

    On Jan 12, 12:23 pm, Carl Banks <> wrote:
    > On Jan 9, 6:11 pm, John Machin <> wrote:
    >
    >
    >
    >
    >
    > > On Jan 10, 6:58 am, Carl Banks <> wrote:

    >
    > > > On Jan 9, 12:36 pm, "J. Cliff Dyer" <> wrote:

    >
    > > > > On Fri, 2009-01-09 at 13:13 -0500, Steve Holden wrote:
    > > > > > Aivar Annamaa wrote:
    > > > > > >> As was recently pointed out in a nearly identical thread, the -3
    > > > > > >> switch only points out problems that the 2to3 converter tool can't
    > > > > > >> automatically fix. Changing print to print() on the other hand is
    > > > > > >> easily fixed by 2to3.

    >
    > > > > > >> Cheers,
    > > > > > >> Chris

    >
    > > > > > > I see.
    > > > > > > So i gotta keep my own discipline with print() then :)

    >
    > > > > > Only if you don't want to run your 2.x code through 2to3 before you use
    > > > > > it as Python 3.x code.

    >
    > > > > > regards
    > > > > >  Steve

    >
    > > > > And mind you, if you follow that route, you are programming in a
    > > > > mightily crippled language.

    >
    > > > How do you figure?

    >
    > > > I expect that it'd be a PITA in some cases to use the transitional
    > > > dialect (like getting all your Us in place), but that doesn't mean the
    > > > language is crippled.

    >
    > > What is this "transitional dialect"? What does "getting all your Us in
    > > place" mean?

    >
    > Transitional dialect is the subset of Python 2.6 that can be
    > translated to Python3 with 2to3 tool.


    I'd never seen it called "transitional dialect" before.

    >  Getting all your Us in place
    > refers to prepending a u to strings to make them unicode objects,
    > which is something 2to3 users are highly advised to do to keep hassles
    > to a minimum.  (Getting Bs in place would be a good idea too.)


    Ummm ... I'm not understanding something. 2to3 changes u"foo" to
    "foo", doesn't it? What's the point of going through the code and
    changing all non-binary "foo" to u"foo" only so that 2to3 can rip the
    u off again? What hassles? Who's doing the highly-advising where and
    with what supporting argument?

    "Getting Bs into place" is necessary eventually. Whether it is
    worthwhile trying to find these in advance, or waiting for them to be
    picked up at testing time is a bit of a toss-up.

    Let's look at this hypothetical but fairly realistic piece of 2.x
    code:
    OLE2_SIGNATURE = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
    def is_ole2_file(filepath):
    return open(filepath, "rb").read(8) == OLE2_SIGNATURE

    This is already syntactically valid 3.x code, and won't be changed by
    2to3, but it won't work in 3.x because b"x" != "x" for all x. In this
    case, the cause of test failures should be readily apparent; in other
    cases the unexpected exception or test failure may happen at some
    distance.

    The 3.x version needs to have the effect of:
    OLE2_SIGNATURE = b"\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
    def is_ole2_file(filepath):
    return open(filepath, "rb").read(8) == OLE2_SIGNATURE

    So in my regional variation of the transitional dialect, this becomes:
    from timemachine import *
    OLE2_SIGNATURE = BYTES_LITERAL("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1")
    def is_ole2_file(filepath):
    return open(filepath, "rb").read(8) == OLE2_SIGNATURE
    # NOTE: don't change "rb"
    ....
    and timemachine.py contains (amongst other things):
    import sys
    python_version = sys.version_info[:2] # e.g. version 2.4 -> (2, 4)
    if python_version >= (3, 0):
    BYTES_LITERAL = lambda x: x.encode('latin1')
    else:
    BYTES_LITERAL = lambda x: x

    It is probably worthwhile taking an up-front inventory of all file open
    () calls and [c]StringIO.StringIO() calls -- is the file being used as
    a text file or a binary file?
    If a text file, check that any default encoding is appropriate.
    If a binary file, ensure there's a "b" in the mode (real file) or you
    supply (in 3.X) an io.BytesIO() instance, not an io.StringIO()
    instance.

    >
    > > Steve & Cliff are talking about the rather small subset of Python that
    > > is not only valid syntax in both 2.x and 3.x but also has the same
    > > meaning in 2.x and 3.x.

    >
    > That would be a crippled language, yes.  But I do not believe that's
    > what Steve and Cliff are referring to.  Steve wrote of "running your
    > code through 2to3", and that was what Cliff followed up to, so I
    > believe they are both referring to writing valid code in 2.6 which is
    > able to be translated through 2to3, and then generating 3.0 code using
    > 2to3.  That is not a crippled language at all, just a PITA sometimes.


    Uh huh; I assumed that "crippled" was being applied to the worse of
    the two options :)

    Cheers,
    John
    John Machin, Jan 12, 2009
    #6
  7. Chris Rebert

    Steve Holden Guest

    Carl Banks wrote:
    > On Jan 9, 6:11 pm, John Machin <> wrote:
    >> On Jan 10, 6:58 am, Carl Banks <> wrote:

    [...]
    >> Steve & Cliff are talking about the rather small subset of Python that
    >> is not only valid syntax in both 2.x and 3.x but also has the same
    >> meaning in 2.x and 3.x.

    >
    > That would be a crippled language, yes. But I do not believe that's
    > what Steve and Cliff are referring to. Steve wrote of "running your
    > code through 2to3", and that was what Cliff followed up to, so I
    > believe they are both referring to writing valid code in 2.6 which is
    > able to be translated through 2to3, and then generating 3.0 code using
    > 2to3. That is not a crippled language at all, just a PITA sometimes.
    >

    Correct. The recommended way of maintaining a dual-version code base is
    to paraphrase your 2.6 code in such a way that the 2to3 converter will
    produce correct 3.0 code that required no further attention. If you
    don't do this you are making a rod for your own back.
    [...]

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
    Steve Holden, Jan 12, 2009
    #7
  8. Chris Rebert

    Carl Banks Guest

    On Jan 12, 12:32 am, John Machin <> wrote:
    > On Jan 12, 12:23 pm, Carl Banks <> wrote:
    >
    >
    >
    > > On Jan 9, 6:11 pm, John Machin <> wrote:

    >
    > > > On Jan 10, 6:58 am, Carl Banks <> wrote:

    >
    > > > > On Jan 9, 12:36 pm, "J. Cliff Dyer" <> wrote:

    >
    > > > > > On Fri, 2009-01-09 at 13:13 -0500, Steve Holden wrote:
    > > > > > > Aivar Annamaa wrote:
    > > > > > > >> As was recently pointed out in a nearly identical thread, the -3
    > > > > > > >> switch only points out problems that the 2to3 converter tool can't
    > > > > > > >> automatically fix. Changing print to print() on the other hand is
    > > > > > > >> easily fixed by 2to3.

    >
    > > > > > > >> Cheers,
    > > > > > > >> Chris

    >
    > > > > > > > I see.
    > > > > > > > So i gotta keep my own discipline with print() then :)

    >
    > > > > > > Only if you don't want to run your 2.x code through 2to3 before you use
    > > > > > > it as Python 3.x code.

    >
    > > > > > > regards
    > > > > > >  Steve

    >
    > > > > > And mind you, if you follow that route, you are programming in a
    > > > > > mightily crippled language.

    >
    > > > > How do you figure?

    >
    > > > > I expect that it'd be a PITA in some cases to use the transitional
    > > > > dialect (like getting all your Us in place), but that doesn't mean the
    > > > > language is crippled.

    >
    > > > What is this "transitional dialect"? What does "getting all your Us in
    > > > place" mean?

    >
    > > Transitional dialect is the subset of Python 2.6 that can be
    > > translated to Python3 with 2to3 tool.

    >
    > I'd never seen it called "transitional dialect" before.


    I had hoped the context would make it clear what I was talking about.

    > >  Getting all your Us in place
    > > refers to prepending a u to strings to make them unicode objects,
    > > which is something 2to3 users are highly advised to do to keep hassles
    > > to a minimum.  (Getting Bs in place would be a good idea too.)

    >
    > Ummm ... I'm not understanding something. 2to3 changes u"foo" to
    > "foo", doesn't it? What's the point of going through the code and
    > changing all non-binary "foo" to u"foo" only so that 2to3 can rip the
    > u off again?


    It does a bit more than that.

    > What hassles? Who's doing the highly-advising where and
    > with what supporting argument?


    You add the u so the the constant will be the same data type in 2.6 as
    it becomes in 3.0 after applying 2to3. str and unicode objects aren't
    always with smooth with each other, and you have a much better chance
    of getting the same behavior in 2.6 and 3.0 if you use an actual
    unicode string in both.

    A example of this, though not with string constants, was posted here
    recently. Someone found that urllib.open() returns a bytes object in
    Python 3.0, which messed him up since in 2.x he was running regexp
    searches on the output. If he had been taking care to use only
    unicode objects in 2.x (in this case, by explicitly decoding the
    output) then it wouldn't have been an issue.


    > "Getting Bs into place" is necessary eventually. Whether it is
    > worthwhile trying to find these in advance, or waiting for them to be
    > picked up at testing time is a bit of a toss-up.
    >
    > Let's look at this hypothetical but fairly realistic piece of 2.x
    > code:
    > OLE2_SIGNATURE = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
    > def is_ole2_file(filepath):
    >      return open(filepath, "rb").read(8) == OLE2_SIGNATURE
    >
    > This is already syntactically valid 3.x code, and won't be changed by
    > 2to3, but it won't work in 3.x because b"x" != "x" for all x. In this
    > case, the cause of test failures should be readily apparent; in other
    > cases the unexpected exception or test failure may happen at some
    > distance.
    >
    > The 3.x version needs to have the effect of:
    > OLE2_SIGNATURE = b"\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
    > def is_ole2_file(filepath):
    >      return open(filepath, "rb").read(8) == OLE2_SIGNATURE
    >
    > So in my regional variation of the transitional dialect, this becomes:
    > from timemachine import *
    > OLE2_SIGNATURE = BYTES_LITERAL("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1")
    > def is_ole2_file(filepath):
    >      return open(filepath, "rb").read(8) == OLE2_SIGNATURE
    > # NOTE: don't change "rb"
    > ...
    > and timemachine.py contains (amongst other things):
    > import sys
    > python_version = sys.version_info[:2] # e.g. version 2.4 -> (2, 4)
    > if python_version >= (3, 0):
    >     BYTES_LITERAL = lambda x: x.encode('latin1')
    > else:
    >     BYTES_LITERAL = lambda x: x
    >
    > It is probably worthwhile taking an up-front inventory of all file open
    > () calls and [c]StringIO.StringIO() calls -- is the file being used as
    > a text file or a binary file?
    > If a text file, check that any default encoding is appropriate.
    > If a binary file, ensure there's a "b" in the mode (real file) or you
    > supply (in 3.X) an io.BytesIO() instance, not an io.StringIO()
    > instance.


    Right. "Taking care of the Us" refered specifically to the act of
    prepending Us to string constants, but figuratively it means making
    explicit your intentions with all string data. 2to3 can only do so
    much; it can't always guess whether your string usage is supposed to
    be character or binary.

    It's definitely going to be the hardest part of the transition since
    it's the most drastic change.



    Carl Banks
    Carl Banks, Jan 12, 2009
    #8
  9. Chris Rebert

    John Machin Guest

    On Jan 12, 7:29 pm, Carl Banks <> wrote:
    > On Jan 12, 12:32 am, John Machin <> wrote:
    >
    >
    >
    >
    >
    > > On Jan 12, 12:23 pm, Carl Banks <> wrote:

    >
    > > > On Jan 9, 6:11 pm, John Machin <> wrote:

    >
    > > > > On Jan 10, 6:58 am, Carl Banks <> wrote:


    > > > > > I expect that it'd be a PITA in some cases to use the transitional
    > > > > > dialect (like getting all your Us in place), but that doesn't mean the
    > > > > > language is crippled.

    >
    > > > > What is this "transitional dialect"? What does "getting all your Us in
    > > > > place" mean?

    >
    > > > Transitional dialect is the subset of Python 2.6 that can be
    > > > translated to Python3 with 2to3 tool.

    >
    > > I'd never seen it called "transitional dialect" before.

    >
    > I had hoped the context would make it clear what I was talking about.


    In vain.

    >
    > > >  Getting all your Us in place
    > > > refers to prepending a u to strings to make them unicode objects,
    > > > which is something 2to3 users are highly advised to do to keep hassles
    > > > to a minimum.  (Getting Bs in place would be a good idea too.)

    >
    > > Ummm ... I'm not understanding something. 2to3 changes u"foo" to
    > > "foo", doesn't it? What's the point of going through the code and
    > > changing all non-binary "foo" to u"foo" only so that 2to3 can rip the
    > > u off again?

    >
    > It does a bit more than that.


    Like what?

    >
    > > What hassles? Who's doing the highly-advising where and
    > > with what supporting argument?

    >
    > You add the u so the the constant will be the same data type in 2.6 as
    > it becomes in 3.0 after applying 2to3.  str and unicode objects aren't
    > always with smooth with each other, and you have a much better chance
    > of getting the same behavior in 2.6 and 3.0 if you use an actual
    > unicode string in both.


    (1) Why specifically 2.6? Do you mean 2.X, or is this related to the
    "port to 2.6 first" theory?
    (2) We do assume we are starting off with working 2.X code, don't we?
    If we change "foo" to u"foo" and get a different answer from the 2.X
    code, is that still "working"?

    >
    > A example of this, though not with string constants,


    And therefore irrelevant.

    I would like to hear from someone who has actually started with
    working 2.x code and changed all their text-like "foo" to
    u"foo" [except maybe unlikely suspects like open()'s mode arg]:
    * how many places where the 2.x code broke and so did the 3.x code
    [i.e. the problem would have been detected without prepending u]
    * how many places where the 2.x code broke but the 3.x code didn't
    [i.e. prepending u did find the problem]
    * whether they thought it was worth the effort

    In the meantime I would be interested to hear from anybody with a made-
    up example of code where the problem would be detected (sooner |
    better | only) by prepending u to text-like string constants.

    > 2to3 can only do so
    > much; it can't always guess whether your string usage is supposed to
    > be character or binary.


    AFAICT it *always* guesses text rather than binary; do you have any
    examples where it guesses binary (rightly or wrongly)?
    John Machin, Jan 12, 2009
    #9
  10. Chris Rebert

    John Machin Guest

    On Jan 12, 11:05 pm, Christian Heimes <> wrote:
    > John Machin schrieb:
    >
    > > And therefore irrelevant.

    >
    > No, Carl is talking about the very same issue.
    >
    > > I would like to hear from someone who has actually started with
    > > working 2.x code and changed all their text-like "foo" to
    > > u"foo" [except maybe unlikely suspects like open()'s mode arg]:
    > > * how many places where the 2.x code broke and so did the 3.x code
    > > [i.e. the problem would have been detected without prepending u]
    > > * how many places where the 2.x code broke but the 3.x code didn't
    > > [i.e. prepending u did find the problem]
    > > * whether they thought it was worth the effort

    >
    > Perhaps you also like to hear from a developer who has worked on Python
    > 3.0 itself and who has done lots of work with internationalized
    > applications. If you want to get it right you must
    >
    > * decode incoming text data to unicode as early as possible
    > * use unicode for all internal text data
    > * encode outgoing unicode as late as possible.
    >
    > where incoming data is read from the file system, database, network etc.
    >
    > This rule applies not only to Python 3.0 but to *any* application
    > written in *any* languate.


    The above is a story with which I'm quite familiar. However it is
    *not* the issue!! The issue is why would anyone propose changing a
    string constant "foo" in working 2.x code to u"foo"?

    > The urlopen example is a very good example
    > for the issue. The author didn't think of decoding the incoming bytes to
    > unicode. In Python 2.x it works fine as long as the site contains ASCII
    > only. In Python 3.0 however an error is raised because binary data is no
    > longer implicitly converted to unicode.


    All very true but nothing to do with the "foo" -> u"foo" issue.
    Somebody please come up with an example of how changing "foo" to
    u"foo" could help a port from 2.x working code to a single codebase
    that supports 2.x and 2to3ed 3.x.
    John Machin, Jan 12, 2009
    #10
  11. Chris Rebert

    John Machin Guest

    On Jan 13, 12:06 am, Christian Heimes <> wrote:
    > >> Perhaps you also like to hear from a developer who has worked on Python
    > >> 3.0 itself and who has done lots of work with internationalized
    > >> applications. If you want to get it right you must

    >
    > >> * decode incoming text data to unicode as early as possible
    > >> * use unicode for all internal text data
    > >> * encode outgoing unicode as late as possible.

    >
    > >> where incoming data is read from the file system, database, network etc.

    >
    > >> This rule applies not only to Python 3.0 but to *any* application
    > >> written in *any* languate.

    >
    > > The above is a story with which I'm quite familiar. However it is
    > > *not* the issue!! The issue is why would anyone propose changing a
    > > string constant "foo" in working 2.x code to u"foo"?

    >
    > Do I really have to repeat "use unicode for all internal text data"?
    >
    > "foo" and u"foo" are two totally different things. The former is a byte
    > sequence "\x66\x6f\x6f" while the latter is the text 'foo'. It just
    > happens that "foo" and u"foo" are equal in Python 2.x because
    > "foo".decode("ascii") == u"foo". In Python 3.x does it right, b"foo" is
    > unequal to "foo".
    >


    Again, all very true, but irrelevant. b"foo" is *not* involved.

    You're ignoring the effect of 2to3:

    Original 2.x code: assert "foo" == u"foo" # works
    output from 2to3: assert "foo" == "foo" # works

    Original 2.x code with u prepended: assert u"foo" == u"foo" # works
    output from 2to3: assert "foo" == "foo" # works

    I say again, show me a case of working 2.5 code where prepending u to
    an ASCII string constant that is intended to be used in a text context
    is actually worth the keystrokes.
    John Machin, Jan 12, 2009
    #11
  12. Chris Rebert

    Carl Banks Guest

    On Jan 12, 5:26 am, John Machin <> wrote:
    > On Jan 12, 7:29 pm, Carl Banks <> wrote:
    >
    >
    >
    > > On Jan 12, 12:32 am, John Machin <> wrote:

    >
    > > > On Jan 12, 12:23 pm, Carl Banks <> wrote:

    >
    > > > > On Jan 9, 6:11 pm, John Machin <> wrote:

    >
    > > > > > On Jan 10, 6:58 am, Carl Banks <> wrote:
    > > > > > > I expect that it'd be a PITA in some cases to use the transitional
    > > > > > > dialect (like getting all your Us in place), but that doesn't mean the
    > > > > > > language is crippled.

    >
    > > > > > What is this "transitional dialect"? What does "getting all your Us in
    > > > > > place" mean?

    >
    > > > > Transitional dialect is the subset of Python 2.6 that can be
    > > > > translated to Python3 with 2to3 tool.

    >
    > > > I'd never seen it called "transitional dialect" before.

    >
    > > I had hoped the context would make it clear what I was talking about.

    >
    > In vain.


    You were one who was mistaken about what Steve and Cliff were talking
    about, chief. Maybe if you'd have paid better attention you would
    have gotten it?


    > > > >  Getting all your Us in place
    > > > > refers to prepending a u to strings to make them unicode objects,
    > > > > which is something 2to3 users are highly advised to do to keep hassles
    > > > > to a minimum.  (Getting Bs in place would be a good idea too.)

    >
    > > > Ummm ... I'm not understanding something. 2to3 changes u"foo" to
    > > > "foo", doesn't it? What's the point of going through the code and
    > > > changing all non-binary "foo" to u"foo" only so that 2to3 can rip the
    > > > u off again?

    >
    > > It does a bit more than that.

    >
    > Like what?


    Never mind; I was confusing it with a different tool. (Someone had a
    source code processing tool that replaced strings with their reprs a
    while back.) My bad.


    > > > What hassles? Who's doing the highly-advising where and
    > > > with what supporting argument?

    >
    > > You add the u so the the constant will be the same data type in 2.6 as
    > > it becomes in 3.0 after applying 2to3.  str and unicode objects aren't
    > > always with smooth with each other, and you have a much better chance
    > > of getting the same behavior in 2.6 and 3.0 if you use an actual
    > > unicode string in both.

    >
    > (1) Why specifically 2.6? Do you mean 2.X, or is this related to the
    > "port to 2.6 first" theory?


    It's not a theory. 2to3 was designed to translate a subset of 2.6
    code to 3.0. It's not designed to translate arbitrary 2.6 code, nor
    any 2.5 or lower code. It might work well enough from 2.5, but it
    wasn't designed for it.

    > (2) We do assume we are starting off with working 2.X code, don't we?
    > If we change "foo" to u"foo" and get a different answer from the 2.X
    > code, is that still "working"?


    Of course it's not "working" in 2.6, and that's the point: you want it
    to work in 2.6 with Unicode strings because it has to run in 3.0 with
    Unicode strings.


    > > A example of this, though not with string constants,

    >
    > And therefore irrelevant.


    Well, it wasn't from my viewpoint, which was "make sure you are using
    only unicode and bytes objects", never str objects. But if you want
    to talk about string constants specifically, ok.


    > I would like to hear from someone who has actually started with
    > working 2.x code and changed all their text-like "foo" to
    > u"foo" [except maybe unlikely suspects like open()'s mode arg]:
    > * how many places where the 2.x code broke and so did the 3.x code
    > [i.e. the problem would have been detected without prepending u]


    I think you're missing the point. This isn't merely about detecting
    errors; it's about making the code in 2.6 behave as similarly to 3.0
    as possible, and that includes internal behavior. When you have mixed
    str and unicode objects, 2.6 has to do a lot of encoding and decoding
    under the covers; in 3.0 that won't be happening. That increases the
    risk of divergent behavior, and is something you want to avoid.

    If you think your test suite is invincible and can catch every
    possible edge case where some encoding or decoding mishap occurs, be
    my guest and don't do it.

    Also, I'm not sure why you think it's preferrable to run tests on 3.0
    and have to go back to the 2.6 codebase, run 2to3 again, apply the
    patch again, and retest, to fix it. I don't know, maybe it makes
    sense for people with a timemachine.py module, but I don't think it'll
    make sense for most people.


    > * how many places where the 2.x code broke but the 3.x code didn't
    > [i.e. prepending u did find the problem]


    If you think this was the main benefit of doing that you are REALLY
    missing the point. The point isn't to find problems in 2.6, it's to
    modify 2.6 to behave as similarly to 3.0 as possible.


    > * whether they thought it was worth the effort
    >
    > In the meantime I would be interested to hear from anybody with a made-
    > up example of code where the problem would be detected (sooner |
    > better | only) by prepending u to text-like string constants.


    Here's one for starters. The mistake was using a multibyte character
    in a str object in 2.6. 2to3 would have converted this to a script
    that has different behavior. If the u"" had been present on the
    string it would have the same behavior in both 2.6 and 3.0. (Well,
    the repr is different but it's a repr of the same object in both.)

    # coding: utf-8
    print repr("abcd¥")

    Out of curiosity, do


    > > 2to3 can only do so
    > > much; it can't always guess whether your string usage is supposed to
    > > be character or binary.

    >
    > AFAICT it *always* guesses text rather than binary; do you have any
    > examples where it guesses binary (rightly or wrongly)?


    Again, not the point. It's not whether 2to3 guesses correctly, but
    whether the runtime does different things in the two versions.


    Carl Banks
    Carl Banks, Jan 12, 2009
    #12
  13. Chris Rebert

    John Machin Guest

    On Jan 13, 6:12 am, Christian Heimes <> wrote:
    > > I say again, show me a case of working 2.5 code where prepending u to
    > > an ASCII string constant that is intended to be used in a text context
    > > is actually worth the keystrokes.

    >
    > Eventually you'll learn it the hard way. *sigh*


    And the hard way involves fire and brimstone, together with weeping,
    wailing and gnashing of teeth, correct? Hmmm, let's see. Let's take
    Carl's example of the sinner who didn't decode the input: """ Someone
    found that urllib.open() returns a bytes object in Python 3.0, which
    messed him up since in 2.x he was running regexp searches on the
    output. If he had been taking care to use only unicode objects in 2.x
    (in this case, by explicitly decoding the output) then it wouldn't
    have been an issue. """

    3.0 says:
    | >>> re.search("foo", b"barfooble")
    | Traceback (most recent call last):
    | File "<stdin>", line 1, in <module>
    | File "C:\python30\lib\re.py", line 157, in search
    | return _compile(pattern, flags).search(string)
    | TypeError: can't use a string pattern on a bytes-like object

    The problem is diagnosed at the point of occurrence with a 99%-OK
    exception message. Why only 99%? Because it only vaguely hints at this
    possibility:
    | >>> re.search(b"foo", b"barfooble")
    | <_sre.SRE_Match object at 0x00FACD78>

    Obvious solution (repent and decode):
    | >>> re.search("foo", b"barfooble".decode('ascii'))
    | <_sre.SRE_Match object at 0x00FD86B0>

    This is "messed him up"? One can get equally "messed up" when AFAICT
    one is doing the right thing in 2.X e.g. one is digging XML documents
    out of a ZIP file (str in 2.X, bytes in 3.x). ElementTree.parse()
    requires a file, so in 2.X one uses cStringIO.StringIO. 2to3 changes
    that to io.StringIO [quite reasonable; no easy way of knowing if
    BytesIO would be better; StringIO more probable]. 3.X barfs on the
    io.StringIO(xml_bytes) with a reasonable message "TypeError: can't
    write bytes to text stream" [momentarily puzzling -- write? Oh yeah,
    it happens in self.write(initial_value)] so there's a need to setup a
    BYTESIO that's conditional on Python version -- not a big deal at all.

    I see no /Inferno/ here :)
    John Machin, Jan 12, 2009
    #13
  14. Chris Rebert

    Carl Banks Guest

    On Jan 12, 5:03 pm, John Machin <> wrote:
    > On Jan 13, 6:12 am, Christian Heimes <> wrote:
    >
    > > > I say again, show me a case of working 2.5 code where prepending u to
    > > > an ASCII string constant that is intended to be used in a text context
    > > > is actually worth the keystrokes.

    >
    > > Eventually you'll learn it the hard way. *sigh*

    >
    > And the hard way involves fire and brimstone, together with weeping,
    > wailing and gnashing of teeth, correct? Hmmm, let's see. Let's take
    > Carl's example of the sinner who didn't decode the input: """ Someone
    > found that urllib.open() returns a bytes object in Python 3.0, which
    > messed him up since in 2.x he was running regexp searches on the
    > output.  If he had been taking care to use only unicode objects in 2.x
    > (in this case, by explicitly decoding the output) then it wouldn't
    > have been an issue. """
    >
    > 3.0 says:
    > | >>> re.search("foo", b"barfooble")
    > | Traceback (most recent call last):
    > |   File "<stdin>", line 1, in <module>
    > |   File "C:\python30\lib\re.py", line 157, in search
    > |     return _compile(pattern, flags).search(string)
    > | TypeError: can't use a string pattern on a bytes-like object
    >
    > The problem is diagnosed at the point of occurrence with a 99%-OK
    > exception message. Why only 99%? Because it only vaguely hints at this
    > possibility:
    > | >>> re.search(b"foo", b"barfooble")
    > | <_sre.SRE_Match object at 0x00FACD78>
    >
    > Obvious solution (repent and decode):
    > | >>> re.search("foo", b"barfooble".decode('ascii'))
    > | <_sre.SRE_Match object at 0x00FD86B0>
    >
    > This is "messed him up"?


    If you believe in waiting for bugs to occur and then fixing them,
    rather than programming to avoid bugs, there is no helping you.

    P.S. The "obvious" solution is wrong, although I'm not sure if you
    were making some kind of ironic point.


    Carl Banks
    Carl Banks, Jan 13, 2009
    #14
  15. > Perhaps you also like to hear from a developer who has worked on
    > Python 3.0 itself and who has done lots of work with internationalized
    > applications. If you want to get it right you must
    >
    > * decode incoming text data to unicode as early as possible
    > * use unicode for all internal text data
    > * encode outgoing unicode as late as possible.
    >
    > where incoming data is read from the file system, database, network
    > etc.


    amen to that... I hate all those apps/libs that only work with ASCII.





    --
    дамјан ( http://softver.org.mk/damjan/ )

    Spammers scratch here with a diamond to find my address:
    |||||||||||||||||||||||||||||||||||||||||||||||
    Дамјан ГеоргиевÑки, Jan 13, 2009
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mikael Östberg
    Replies:
    5
    Views:
    447
    Kevin Spencer
    Oct 22, 2005
  2. Christian Seberino
    Replies:
    0
    Views:
    289
    Christian Seberino
    Oct 21, 2003
  3. Chih-Hsu Yen

    Switch-case problem

    Chih-Hsu Yen, Apr 12, 2005, in forum: C Programming
    Replies:
    10
    Views:
    560
    Chih-Hsu Yen
    Apr 13, 2005
  4. onkar

    switch problem - interesting !!

    onkar, Dec 15, 2006, in forum: C Programming
    Replies:
    1
    Views:
    228
    Richard Heathfield
    Dec 15, 2006
  5. Switch Within A Switch

    , Apr 22, 2006, in forum: Javascript
    Replies:
    7
    Views:
    96
    Lasse Reichstein Nielsen
    Apr 22, 2006
Loading...

Share This Page