Regex with ASCII and non-ASCII chars

Discussion in 'Python' started by TOXiC, Jan 31, 2007.

  1. TOXiC

    TOXiC Guest

    Hello everybody.
    How I can do a regex match in a string with ascii and non ascii chars
    for example:

    regex = re.compile(r"(ÿÿ‹ð…öÂty)", re.IGNORECASE)
    match = regex.search("ÿÿ‹ð…öÂty")
    if match:
    result = match.group()
    print result
    else:
    result = "No match found"
    print result

    it return "no match found" even if the two string are equal.
    Help me please!
    Thx in advance :)
    TOXiC, Jan 31, 2007
    #1
    1. Advertising

  2. TOXiC

    Peter Otten Guest

    TOXiC wrote:

    > How I can do a regex match in a string with ascii and non ascii chars
    > for example:
    >
    > regex = re.compile(r"(ÿÿ?ð?öÂty)", re.IGNORECASE)
    > match = regex.search("ÿÿ?ð?öÂty")
    > if match:
    > result = match.group()
    > print result
    > else:
    > result = "No match found"
    > print result
    >
    > it return "no match found" even if the two string are equal.


    For equal strings you should get a match:

    >>> re.compile("Zäöü", re.IGNORECASE).search("yadda zäöü yadda")

    <_sre.SRE_Match object at 0x401e0a68>
    >>> print _.group()

    zäöü

    For case ignorance your best bet is unicode:

    >>> re.compile(u"äöü", re.IGNORECASE|re.UNICODE).search(u"ÄÖÜ")

    <_sre.SRE_Match object at 0x401e09f8>

    Peter
    Peter Otten, Jan 31, 2007
    #2
    1. Advertising

  3. TOXiC

    TOXiC Guest

    Thx it work perfectly.
    If I want to query a file stream?

    file = open(fileName, "r")
    text = file.read()
    file.close()

    regex = re.compile(u"(ÿÿ‹ð…öÂ)", re.IGNORECASE)
    match = regex.search(text)
    if (match):
    result = match.group()
    print result
    WritePatch(fileName,text,result)
    else:
    result = "No match found"
    print result

    It return "no match found" (the file contain the string "ÿÿ‹ð…öÂ"
    but...).
    Thanks in advance for the help!
    TOXiC, Jan 31, 2007
    #3
  4. TOXiC

    Peter Otten Guest

    TOXiC wrote:

    > Thx it work perfectly.
    > If I want to query a file stream?
    >
    > file = open(fileName, "r")
    > text = file.read()
    > file.close()


    Convert the bytes read from the file to unicode. For that you have to know
    the encoding, e. g.

    file_encoding = "utf-8" # replace with the actual encoding
    text = text.decode(file_encoding)

    > regex = re.compile(u"(ÿÿ‹ð…öÂ)", re.IGNORECASE)
    > match = regex.search(text)
    > if (match):
    > result = match.group()
    > print result
    > WritePatch(fileName,text,result)
    > else:
    > result = "No match found"
    > print result
    >
    > It return "no match found" (the file contain the string "ÿÿ‹ð…öÂ"
    > but...).
    > Thanks in advance for the help!


    Peter
    Peter Otten, Jan 31, 2007
    #4
  5. TOXiC

    TOXiC Guest

    It wont work with utf-8,iso or ascii...
    TOXiC, Jan 31, 2007
    #5
  6. TOXiC

    TOXiC Guest

    On 31 Gen, 17:30, "TOXiC" <> wrote:
    > It wont work with utf-8,iso or ascii...


    I think the best way is to search hex value in the file stream but I
    tryed (in the regex) \hxx but it don't work...
    TOXiC, Jan 31, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    783
  2. softwarepearls_com
    Replies:
    10
    Views:
    4,734
    fylia
    Feb 26, 2009
  3. Vlastimil Brom
    Replies:
    1
    Views:
    874
    John Nagle
    Aug 22, 2010
  4. bruce
    Replies:
    38
    Views:
    274
    Mark Lawrence
    Nov 1, 2013
  5. MRAB
    Replies:
    0
    Views:
    96
Loading...

Share This Page