text processing

Discussion in 'Python' started by jitenshah78@gmail.com, Sep 25, 2008.

  1. Guest

    I have string like follow
    12560/ABC,12567/BC,123,567,890/JK

    I want above string to group like as follow
    (12560,ABC)
    (12567,BC)
    (123,567,890,JK)

    i try regular expression i am able to get first two not the third one.
    can regular expression given data in different groups
     
    , Sep 25, 2008
    #1
    1. Advertising

  2. On Thu, 25 Sep 2008 15:51:28 +0100, wrote:

    > I have string like follow
    > 12560/ABC,12567/BC,123,567,890/JK
    >
    > I want above string to group like as follow (12560,ABC)
    > (12567,BC)
    > (123,567,890,JK)
    >
    > i try regular expression i am able to get first two not the third one.
    > can regular expression given data in different groups


    Without regular expressions:

    def group(string):
    result = list()
    for item in string.split(','):
    if '/' in item:
    result.extend(item.split('/'))
    yield tuple(result)
    result = list()
    else:
    result.append(item)

    def main():
    string = '12560/ABC,12567/BC,123,567,890/JK'
    print list(group(string))

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Sep 25, 2008
    #2
    1. Advertising

  3. kib2 Guest

    You can do it with regexps too :

    >------------------------------------------------------------------

    import re
    to_watch = re.compile(r"(?P<number>\d+)[/](?P<letter>[A-Z]+)")

    final_list = to_watch.findall("12560/ABC,12567/BC,123,567,890/JK")

    for number,word in final_list :
    print "number:%s -- word: %s"%(number,word)
    >------------------------------------------------------------------


    the output is :

    number:12560 -- word: ABC
    number:12567 -- word: BC
    number:890 -- word: JK

    See you,

    Kib².
     
    kib2, Sep 25, 2008
    #3
  4. MRAB Guest

    On Sep 25, 6:34 pm, Marc 'BlackJack' Rintsch <> wrote:
    > On Thu, 25 Sep 2008 15:51:28 +0100, wrote:
    > > I have string like follow
    > > 12560/ABC,12567/BC,123,567,890/JK

    >
    > > I want above string to group like as follow (12560,ABC)
    > > (12567,BC)
    > > (123,567,890,JK)

    >
    > > i try regular expression i am able to get first two not the third one.
    > > can regular expression given data in different groups

    >
    > Without regular expressions:
    >
    > def group(string):
    >     result = list()
    >     for item in string.split(','):
    >         if '/' in item:
    >             result.extend(item.split('/'))
    >             yield tuple(result)
    >             result = list()
    >         else:
    >             result.append(item)
    >
    > def main():
    >     string = '12560/ABC,12567/BC,123,567,890/JK'
    >     print list(group(string))
    >

    How about:

    >>> string = "12560/ABC,12567/BC,123,567,890/JK"
    >>> r = re.findall(r"(\d+(?:,\d+)*/\w+)", string)
    >>> r

    ['12560/ABC', '12567/BC', '123,567,890/JK']
    >>> [tuple(x.replace(",", "/").split("/")) for x in r]

    [('12560', 'ABC'), ('12567', 'BC'), ('123', '567', '890', 'JK')]
     
    MRAB, Sep 25, 2008
    #4
  5. Paul McGuire Guest

    On Sep 25, 9:51 am, "" <>
    wrote:
    > I have string like follow
    > 12560/ABC,12567/BC,123,567,890/JK
    >
    > I want above string to group like as follow
    > (12560,ABC)
    > (12567,BC)
    > (123,567,890,JK)
    >
    > i try regular expression i am able to get first two not the third one.
    > can regular expression given data in different groups


    Looks like each item is:
    - a list of 1 or more integers, in a comma-delimited list
    - a slash
    - a word composed of alpha characters

    And the whole thing is a list of items in a comma-delimited list

    Now to implement that in pyparsing:

    >>> data = "12560/ABC,12567/BC,123,567,890/JK"
    >>> from pyparsing import Suppress, delimitedList, Word, alphas, nums, Group
    >>> SLASH = Suppress('/')
    >>> dataitem = delimitedList(Word(nums)) + SLASH + Word(alphas)
    >>> dataformat = delimitedList(Group(dataitem))
    >>> map(tuple, dataformat.parseString(data))

    [('12560', 'ABC'), ('12567', 'BC'), ('123', '567', '890', 'JK')]

    Wah-lah! (as one of my wife's 1st graders announced in one of his
    school papers)

    -- Paul
     
    Paul McGuire, Sep 26, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason Heyes
    Replies:
    4
    Views:
    383
    Karl Heinz Buchegger
    Mar 24, 2005
  2. Joe Francia
    Replies:
    0
    Views:
    303
    Joe Francia
    Jul 8, 2003
  3. phil hunt

    Text-to-HTML processing program

    phil hunt, Jan 3, 2004, in forum: Python
    Replies:
    11
    Views:
    595
    Reinier Post
    Jan 8, 2004
  4. Michael Ellis

    Cleaner idiom for text processing?

    Michael Ellis, May 26, 2004, in forum: Python
    Replies:
    16
    Views:
    491
    Peter Otten
    May 27, 2004
  5. Hubert Hung-Hsien Chang
    Replies:
    2
    Views:
    517
    Michael Foord
    Sep 17, 2004
Loading...

Share This Page