unicode regex example: trouble

Discussion in 'Python' started by marek, May 21, 2004.

  1. marek

    marek Guest

    trying this example to make print MatchObject reference. Fails (prints None).
    Does anybody know where I am wrong?

    # -*- coding: cp1251 -*-

    import re

    # pattern in Ukrainian ('привіт')
    p = '\377\376?\004@\0048\0042\004V\004B\004'

    # data (pattern is in the middle of the string)
    d = '\377\376t\000e\000s\000t\000?\004@\0048\0042\004V\004B\004t\000t\000'

    re_test = re.compile(p, re.UNICODE)

    print re_test.search(d, re.UNICODE)
    marek, May 21, 2004
    #1
    1. Advertising

  2. marek

    Peter Otten Guest

    marek wrote:

    > trying this example to make print MatchObject reference. Fails (prints
    > None). Does anybody know where I am wrong?
    >
    > # -*- coding: cp1251 -*-
    >
    > import re
    >
    > # pattern in Ukrainian ('привіт')
    > p = '\377\376?\004@\0048\0042\004V\004B\004'
    >
    > # data (pattern is in the middle of the string)
    > d = '\377\376t\000e\000s\000t\000?\004@\0048\0042\004V\004B\004t\000t\000'
    >
    > re_test = re.compile(p, re.UNICODE)
    >
    > print re_test.search(d, re.UNICODE)


    What you have here are funny 8 bit characters, not unicode:

    >>>>>> print p, d

    ÿþ?@82VB ÿþtest?@82VBtt

    I guess the encoding is utf-16, therefore:

    >>> du = d.decode("utf-16")
    >>> pu = p.decode("utf-16")
    >>> r = re.compile(pu)
    >>> m = r.search(du)
    >>> m

    <_sre.SRE_Match object at 0x40392090>
    >>> print m.group(0).encode("utf-16")

    ÿþ?@82VB

    Works as expected :)

    Here's what the docs say about the unicode flag:

    UNICODE
    Make \w, \W, \b, and \B dependent on the Unicode character properties
    database. New in version 2.0.

    You may or may not need that when you refine your regexp.

    Peter
    Peter Otten, May 21, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Neil Sahar
    Replies:
    1
    Views:
    1,192
    dagreat
    Jan 10, 2005
  2. A. Name
    Replies:
    2
    Views:
    282
    Gianni Mariani
    Sep 15, 2003
  3. slyraymond
    Replies:
    5
    Views:
    276
    Mark Lutz
    Apr 26, 2004
  4. Replies:
    3
    Views:
    753
    Reedick, Andrew
    Jul 1, 2008
  5. Sam Roberts
    Replies:
    15
    Views:
    289
    Sam Roberts
    Feb 7, 2005
Loading...

Share This Page