Hebrew in idle ans eclipse (Windows)

Discussion in 'Python' started by iu2, Jan 16, 2008.

  1. iu2

    iu2 Guest

    Hi all,

    I'll realy appreciate your help in this:

    I read data from a database containg Hebrew words.
    When the application is run from IDLE a word looks like this, for
    example:
    \xe8\xe9\xe5

    But when I run the same application from eclipse or the Windows shell
    I get the 'e's replaced with '8's:
    \x88\x89\x85

    The IDLE way is the way I need, since using Hebrew words in the
    program text itself, as keys in a dict, for example, yield similar
    strings (with 'e's). When running from eclipse I get KeyError for this
    dict..

    What do I need to do run my app like IDLE does?

    Thanks
    iu2
     
    iu2, Jan 16, 2008
    #1
    1. Advertising

  2. > What do I need to do run my app like IDLE does?

    Can you please show the fragment of your program that prints
    these strings?

    Regards,
    Martin
     
    Martin v. Löwis, Jan 17, 2008
    #2
    1. Advertising

  3. iu2

    iu2 Guest

    On Jan 17, 6:59 am, "Martin v. Löwis" <> wrote:
    > > What do I need to do run my app like IDLE does?

    >
    > Can you please show the fragment of your program that prints
    > these strings?
    >
    > Regards,
    > Martin


    Hi,
    I use pymssql to get the data from a database, just like this (this is
    from the pymssql doc):

    import pymssql

    con =
    pymssql.connect(host='192.168.13.122',user='sa',password='',database='tempdb')
    cur = con.cursor()
    cur.execute('select firstname, lastname from [users]')
    lines = cur.fetchall()

    print lines

    or

    print lines[0]

    'lines' is a list containing tuples of 2 values, for firstname and
    lastname. The names are Hebrew and their code looks different when I'm
    runnig it from IDLE than when running it from Windows shell or
    eclipse, as I described in my first post.


    Important: This doesn't happer when I read text from a file containing
    Hebrew text. In that case both IDLE and eclipse give the same reulst
    (the hebrew word itself is printed to the console)
     
    iu2, Jan 17, 2008
    #3
  4. > import pymssql
    >
    > con =
    > pymssql.connect(host='192.168.13.122',user='sa',password='',database='tempdb')
    > cur = con.cursor()
    > cur.execute('select firstname, lastname from [users]')
    > lines = cur.fetchall()
    >
    > print lines
    >
    > or
    >
    > print lines[0]
    >
    > 'lines' is a list containing tuples of 2 values, for firstname and
    > lastname. The names are Hebrew and their code looks different when I'm
    > runnig it from IDLE than when running it from Windows shell or
    > eclipse, as I described in my first post.


    Ok. Please understand that there are different ways to represent
    characters as bytes; these different ways are called "encodings".

    Please also understand that you have to make a choice of encoding
    every time you represent characters as bytes: if you read it from a
    database, and if you print it to a file or to the terminal.

    Please further understand that interpreting bytes in an encoding
    different from the one they were meant for results in a phenomenon
    called "moji-bake" (from Japanese, "ghost characters"). You get
    some text, but it makes no sense (or individual characters are incorrect).

    So you need to find out
    a) what the encoding is that your data have in MySQL
    b) what the encoding is that is used when printing in IDLE
    c) what the encoding is that is used when printing into
    a terminal window.

    b) and c) are different on Windows; the b) encoding is called
    the "ANSI code page", and c) is called the "OEM code page".
    What the specific choice is depends on your specific Windows
    version and local system settings.

    As for a: that's a choice somebody made when the database
    was created; I don't know how to figure out what encoding
    MySQL uses.

    In principle, rather than doing

    print lines[0]

    you should do

    print lines[0].decode("<a-encoding>").encode("<c-encoding>")

    when printing to the console. Furtenately, you can also write
    this as

    print lines[0].decode("<a-encoding>")

    as Python will figure out the console encoding by itself, but
    it can't figure out the MySQL encoding (or atleast doesn't,
    the way you use MySQL).

    Regards,
    Martin
     
    Martin v. Löwis, Jan 17, 2008
    #4
  5. iu2

    iu2 Guest

    On Jan 17, 10:35 pm, "Martin v. Löwis" <> wrote:
    > > import pymssql

    >
    > > con =
    > > pymssql.connect(host='192.168.13.122',user='sa',password='',database='tempd­b')
    > > cur = con.cursor()
    > > cur.execute('select firstname, lastname from [users]')
    > > lines = cur.fetchall()

    >
    > > print lines

    >
    > > or

    >
    > > print lines[0]

    >
    > > 'lines' is a list containing tuples of 2 values, for firstname and
    > > lastname. The names areHebrewand their code looks different when I'm
    > > runnig it fromIDLEthan when running it from Windows shell or
    > >eclipse, as I described in my first post.

    >
    > Ok. Please understand that there are different ways to represent
    > characters as bytes; these different ways are called "encodings".
    >
    > Please also understand that you have to make a choice of encoding
    > every time you represent characters as bytes: if you read it from a
    > database, and if you print it to a file or to the terminal.
    >
    > Please further understand that interpreting bytes in an encoding
    > different from the one they were meant for results in a phenomenon
    > called "moji-bake" (from Japanese, "ghost characters"). You get
    > some text, but it makes no sense (or individual characters are incorrect).
    >
    > So you need to find out
    > a) what the encoding is that your data have in MySQL
    > b) what the encoding is that is used when printing inIDLE
    > c) what the encoding is that is used when printing into
    >    a terminal window.
    >
    > b) and c) are different on Windows; the b) encoding is called
    > the "ANSI code page", and c) is called the "OEM code page".
    > What the specific choice is depends on your specific Windows
    > version and local system settings.
    >
    > As for a: that's a choice somebody made when the database
    > was created; I don't know how to figure out what encoding
    > MySQL uses.
    >
    > In principle, rather than doing
    >
    >   print lines[0]
    >
    > you should do
    >
    >   print lines[0].decode("<a-encoding>").encode("<c-encoding>")
    >
    > when printing to the console. Furtenately, you can also write
    > this as
    >
    >   print lines[0].decode("<a-encoding>")
    >
    > as Python will figure out the console encoding by itself, but
    > it can't figure out the MySQL encoding (or atleast doesn't,
    > the way you use MySQL).
    >
    > Regards,
    > Martin- Hide quoted text -
    >
    > - Show quoted text -


    Thanks for the detailed explanation. I'll try that.
     
    iu2, Jan 20, 2008
    #5
  6. iu2

    iu2 Guest

    On Jan 17, 10:35 pm, "Martin v. Löwis" <> wrote:
    > ...
    > print lines[0].decode("<a-encoding>").encode("<c-encoding>")
    > ...
    > Regards,
    > Martin


    Ok, I've got the solution, but I still have a question.

    Recall:
    When I read data using sql I got a sequence like this:
    \x88\x89\x85
    But when I entered heberw words directly in the print statement (or as
    a dictionary key)
    I got this:
    \xe8\xe9\xe5

    Now, scanning the encoding module I discovered that cp1255 maps
    '\u05d9' to \xe9
    while cp856 maps '\u05d9' to \x89,
    so trasforming \x88\x89\x85 to \xe8\xe9\xe5 is done by

    s.decode('cp856').encode('cp1255')

    ending up with the pattern you suggested.

    My qestion is, is there a way I can deduce cp856 and cp1255 from the
    string itself? Is there a function doing it? (making the
    transformation more robust)

    I don't know how IDLE guessed cp856, but it must have done it.
    (perhaps because it uses tcl, and maybe tcl guesses the encoding
    automatically?)

    thanks
    iu2
     
    iu2, Jan 22, 2008
    #6
  7. > Recall:
    > When I read data using sql I got a sequence like this:
    > \x88\x89\x85
    > But when I entered heberw words directly in the print statement (or as
    > a dictionary key)
    > I got this:
    > \xe8\xe9\xe5
    >
    > Now, scanning the encoding module I discovered that cp1255 maps
    > '\u05d9' to \xe9
    > while cp856 maps '\u05d9' to \x89,
    > so trasforming \x88\x89\x85 to \xe8\xe9\xe5 is done by


    Hebrew Windows apparently uses cp1255 (aka windows-1255) as
    the "ANSI code page", used in all GUI APIs, and cp856 as the
    "OEM code page", used in terminal window - and, for some reason,
    in MS SQL.

    > My qestion is, is there a way I can deduce cp856 and cp1255 from the
    > string itself?


    That's not possible. You have to know where the string comes from.
    to know what the encoding is.

    In the specific case, if the string comes out of MS SQL, it apparently
    has cp856 (but I'm sure you can specify the client encoding somewhere
    in SQL server, or in pymssql)

    > I don't know how IDLE guessed cp856, but it must have done it.


    I don't know why you think it did. You said you entered \xe9 directly
    into the source code in IDLE, so
    a) this is windows-1255, not cp856, and
    b) IDLE just *used* windows-1255 (i.e. the ANSI code page), it did
    not guess it.

    If you are claimaing that the program

    import pymssql

    con =
    pymssql.connect(host='192.168.13.122',user='sa',password='',database='tempdb')
    cur = con.cursor()
    cur.execute('select firstname, lastname from [users]')
    lines = cur.fetchall()
    print repr(lines[0])

    does different things depending on whether it is run in IDLE or in a
    terminal window - I find that hard to believe. IDLE/Tk has nothing to
    do with that. It's the *repr* that you are printing, ie. all escaping
    has been done before IDLE/Tk even sees the text. So it must have been
    pymssql that returns different data in each case.

    It could be that the DB-API does such things, see

    http://msdn2.microsoft.com/en-us/library/aa937147(SQL.80).aspx

    Apparently, they do the OEMtoANSI conversion when you run a console
    application (i.e. python.exe), whereas they don't convert when running
    a GUI application (pythonw.exe).

    I'm not quite sure how they find out whether the program is a console
    application or not; the easiest thing to do might be to turn the
    autoconversion off on the server.


    Regards,
    Martin
     
    Martin v. Löwis, Jan 23, 2008
    #7
  8. iu2

    iu2 Guest

    On Jan 23, 11:17 am, "Martin v. Löwis" <> wrote:

    > If you are claimaing that the program
    >
    > Apparently, they do the OEMtoANSI conversion when you run a console
    > application (i.e. python.exe), whereas they don't convert when running
    > a GUI application (pythonw.exe).
    >
    > I'm not quite sure how they find out whether the program is a console
    > application or not; the easiest thing to do might be to turn the
    > autoconversion off on the server.
    >
    > Regards,
    > Martin


    True! It's amazing, I've just written a little code that reads from
    the database and writes the data to a file.
    Then I ran the code with both python.exe and pythonw.exe and got the
    two kinds of results - the IDLE one and the eclipse one!
     
    iu2, Jan 23, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Namratha Shah \(Nasha\)
    Replies:
    0
    Views:
    2,658
    Namratha Shah \(Nasha\)
    Nov 4, 2004
  2. Replies:
    1
    Views:
    516
    Squidge
    May 27, 2005
  3. =?Utf-8?B?TWlrZSBEb25uZWxseQ==?=

    CheckBox ans DataTable

    =?Utf-8?B?TWlrZSBEb25uZWxseQ==?=, Aug 23, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    429
    =?Utf-8?B?TWlrZSBEb25uZWxseQ==?=
    Aug 23, 2004
  4. Alex via DotNetMonster.com

    Shared login for ASP.NET 1.1 ans ASP.NET 2.0

    Alex via DotNetMonster.com, Nov 14, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    473
    Alex via DotNetMonster.com
    Nov 14, 2005
  5. rl0103

    hebrew language support for windows ce

    rl0103, Apr 24, 2006, in forum: ASP .Net Mobile
    Replies:
    1
    Views:
    1,093
    Steven Cheng[MSFT]
    Apr 24, 2006
Loading...

Share This Page