Re: string / split method on ASCII code?

Discussion in 'Python' started by Carsten Haese, Mar 12, 2008.

  1. On Wed, 2008-03-12 at 15:29 -0500, Michael Wieher wrote:
    > Hey all,
    >
    > I have these annoying textilfes that are delimited by the ASCII char
    > for << (only its a single character) and >> (again a single character)
    >
    > Their codes are 174 and 175, respectively.
    >
    > My datafiles are in the moronic form
    >
    > X<<Y>>Z


    Those are decidedly not ASCII codes, since ASCII only goes to 127.

    The easiest approach is probably to replace those markers with some
    other characters that occurs nowhere else, for example good old NUL, and
    then split on that:

    line = line.replace(chr(174), "\0")
    line = line.replace(chr(175), "\0")
    result = line.split("\0")

    HTH,

    --
    Carsten Haese
    http://informixdb.sourceforge.net
    Carsten Haese, Mar 12, 2008
    #1
    1. Advertising

  2. Sorry for breaking threading by replying to a reply, but I don't seem to
    have the original post.

    On Wed, 2008-03-12 at 15:29 -0500, Michael Wieher wrote:
    > Hey all,
    >
    > I have these annoying textilfes that are delimited by the ASCII char
    > for << (only its a single character) and >> (again a single character)
    >
    > Their codes are 174 and 175, respectively.
    >
    > My datafiles are in the moronic form
    >
    > X<<Y>>Z


    The glyph that looks like "<<" is a left quote in some European countries
    (and a right quote in others, sigh...), and similar for ">>", and are
    usually known as left and right "angle quotation mark", chevron or
    guillemet. And yes, that certainly looks like a moronic form for a data
    file.

    But whatever the characters are, we can work with them as normal, if you
    don't mind ignoring that they don't display properly everywhere:

    >>> lq = chr(174)
    >>> rq = chr(175)
    >>> s = "x" + lq + "y" + rq + "z"
    >>> print s

    x�y�z
    >>> s.split(lq)

    ['x', 'y\xafz']
    >>> s.split(rq)

    ['x\xaey', 'z']


    And you can use regular expressions as well. Assuming that the quotes are
    never nested:

    >>> import re
    >>> r = re.compile(lq + '(.*?)' + rq)
    >>> r.search(s).group(1)

    'y'


    If you want to treat both characters the same:

    >>> s = s.replace(lq, rq)
    >>> s.split(rq)

    ['x', 'y', 'z']



    --
    Steven
    Steven D'Aprano, Mar 13, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Carlos Ribeiro
    Replies:
    11
    Views:
    684
    Alex Martelli
    Sep 17, 2004
  2. TOXiC
    Replies:
    5
    Views:
    1,216
    TOXiC
    Jan 31, 2007
  3. Sam Kong
    Replies:
    5
    Views:
    222
    Rick DeNatale
    Aug 12, 2006
  4. Fritz Anderson

    String#split regex \W on non-ASCII text

    Fritz Anderson, Nov 9, 2010, in forum: Ruby
    Replies:
    1
    Views:
    164
    Fritz Anderson
    Nov 9, 2010
  5. Stanley Xu
    Replies:
    2
    Views:
    582
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page