Tag parsing in python

Discussion in 'Python' started by agnibhu, Aug 28, 2010.

  1. agnibhu

    agnibhu Guest

    Hi all,

    I'm a newbie in python. I'm trying to create a library for parsing
    certain keywords.
    For example say I've key words like abc: bcd: cde: like that... So the
    user may use like
    abc: How are you bcd: I'm fine cde: ok

    So I've to extract the "How are you" and "I'm fine" and "ok"..and
    assign them to abc:, bcd: and cde: respectively.. There may be
    combination of keyowords introduced in future. like abc: xy: How are
    you
    So new keywords qualifying the other keywords so on..
    So I would like to know the python way of doing this. Is there any
    library already existing for making my work easier. ?

    ~
    Agnibhu
     
    agnibhu, Aug 28, 2010
    #1
    1. Advertising

  2. agnibhu

    Tim Chase Guest

    On 08/28/10 11:14, agnibhu wrote:
    > For example say I've key words like abc: bcd: cde: like that... So the
    > user may use like
    > abc: How are you bcd: I'm fine cde: ok
    >
    > So I've to extract the "How are you" and "I'm fine" and "ok"..and
    > assign them to abc:, bcd: and cde: respectively..


    For this, you can do something like

    >>> s = "abc: how are you bcd: I'm fine cde: ok"
    >>> import re
    >>> r = re.compile(r'(\w+):\s*((?:[^:](?!\w+:))*)')
    >>> r.findall(s)

    [('abc', 'how are you'), ('bcd', "I'm fine"), ('cde', 'ok')]

    Yes, it's a bit of a gnarled regexp, but it seems to do the job.

    > There may be combination of keyowords introduced in future.
    > like abc: xy: How are you So new keywords qualifying the other
    > keywords so on.


    I'm not sure I understand this bit of what you're asking. If you
    have

    s = "abc: xy: How are you"

    why should that not be parsed as

    >>> r.findall("abc: xy: How are you")

    [('abc', ''), ('xy', 'How are you')]

    as your initial description prescribes?

    -tkc
     
    Tim Chase, Aug 28, 2010
    #2
    1. Advertising

  3. agnibhu

    Josh English Guest

    On Aug 28, 9:14 am, agnibhu <> wrote:
    > Hi all,
    >
    > I'm a newbie in python. I'm trying to create a library for parsing
    > certain keywords.
    > For example say I've key words like abc: bcd: cde: like that... So the
    > user may use like
    > abc: How are you bcd: I'm fine cde: ok
    >
    > So I've to extract the "How are you" and "I'm fine" and "ok"..and
    > assign them to abc:, bcd: and cde: respectively.. There may be
    > combination of keyowords introduced in future. like abc: xy: How are
    > you
    > So new keywords qualifying the other keywords so on..
    > So I would like to know the python way of doing this. Is there any
    > library already existing for making my work easier. ?
    >
    > ~
    > Agnibhu


    Have you looked at pyparsing? (http://pyparsing.wikispaces.com/) It
    may
    be possible to use that library to do this.

    Josh
     
    Josh English, Aug 28, 2010
    #3
  4. agnibhu

    Paul McGuire Guest

    On Aug 28, 11:14 am, agnibhu <> wrote:
    > Hi all,
    >
    > I'm a newbie in python. I'm trying to create a library for parsing
    > certain keywords.
    > For example say I've key words like abc: bcd: cde: like that... So the
    > user may use like
    > abc: How are you bcd: I'm fine cde: ok
    >
    > So I've to extract the "How are you" and "I'm fine" and "ok"..and
    > assign them to abc:, bcd: and cde: respectively.. There may be
    > combination of keyowords introduced in future. like abc: xy: How are
    > you
    > So new keywords qualifying the other keywords so on..
    > So I would like to know the python way of doing this. Is there any
    > library already existing for making my work easier. ?
    >
    > ~
    > Agnibhu


    Here's how pyparsing can parse your keyword/tags:

    from pyparsing import Combine, Word, alphas, Group, OneOrMore, empty,
    SkipTo, LineEnd

    text1 = "abc: How are you bcd: I'm fine cde: ok"
    text2 = "abc: xy: How are you"

    tag = Combine(Word(alphas)+":")
    tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
    LineEnd())("body")

    for text in (text1,text2):
    print text
    for td in tag_defn.searchString(text):
    print td.dump()
    print

    Prints:

    abc: How are you bcd: I'm fine cde: ok
    [['abc:'], 'How are you']
    - body: How are you
    - tag: ['abc:']
    [['bcd:'], "I'm fine"]
    - body: I'm fine
    - tag: ['bcd:']
    [['cde:'], 'ok']
    - body: ok
    - tag: ['cde:']

    abc: xy: How are you
    [['abc:', 'xy:'], 'How are you']
    - body: How are you
    - tag: ['abc:', 'xy:']



    Now here's how to further use pyparsing to actually use those tags as
    substitution macros:

    from pyparsing import Forward, MatchFirst, Literal, And, replaceWith,
    FollowedBy

    # now combine macro detection with substitution
    macros = {}
    macro_substitution = Forward()
    def make_macro_sub(tokens):
    macros[tuple(tokens.tag)] = tokens.body

    # define macro substitution
    macro_substitution << MatchFirst(
    [(Literal(k[0]) if len(k)==1
    else And([Literal(kk) for kk in
    k])).setParseAction(replaceWith(v))
    for k,v in macros.items()] ) + ~FollowedBy(tag)

    return ""
    tag_defn.setParseAction(make_macro_sub)

    scan_pattern = macro_substitution | tag_defn

    test_text = text1 + "\nBob said, 'abc:?' I said, 'bcd:.'" + text2 +
    "\nThen Bob said 'abc: xy:?'"

    print test_text
    print scan_pattern.transformString(test_text)


    Prints:

    abc: How are you bcd: I'm fine cde: ok
    Bob said, 'abc:?' I said, 'bcd:.'abc: xy: How are you
    Then Bob said 'abc: xy:?'

    Bob said, 'How are you?' I said, 'I'm fine.'
    Then Bob said 'How are you?'
     
    Paul McGuire, Aug 29, 2010
    #4
  5. agnibhu

    Paul McGuire Guest

    On Aug 28, 11:23 pm, Paul McGuire <> wrote:
    > On Aug 28, 11:14 am, agnibhu <> wrote:
    >
    >
    >
    >
    >
    > > Hi all,

    >
    > > I'm a newbie in python. I'm trying to create a library for parsing
    > > certain keywords.
    > > For example say I've key words like abc: bcd: cde: like that... So the
    > > user may use like
    > > abc: How are you bcd: I'm fine cde: ok

    >
    > > So I've to extract the "How are you" and "I'm fine" and "ok"..and
    > > assign them to abc:, bcd: and cde: respectively.. There may be
    > > combination of keyowords introduced in future. like abc: xy: How are
    > > you
    > > So new keywords qualifying the other keywords so on..


    I got to thinking more about your keywords-qualifying-keywords
    example, and I thought this would be a good way to support locale-
    specific tags. I also thought how one might want to have tags within
    tags, to be substituted later, requiring a "abc::" escaped form of
    "abc:", so that the tag is substituted with the value of tag "abc:" as
    a late binding.

    Wasn't too hard to modify what I posted yesterday, and now I rather
    like it.

    -- Paul


    # tag_substitute.py

    from pyparsing import (Combine, Word, alphas, FollowedBy, Group,
    OneOrMore,
    empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal,
    And, replaceWith)

    tag = Combine(Word(alphas) + ~FollowedBy("::") + ":")
    tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
    LineEnd())("body") + Optional(LineEnd().suppress())


    # now combine macro detection with substitution
    macros = {}
    macro_substitution = Forward()
    def make_macro_sub(tokens):
    # unescape '::' and substitute any embedded tags
    tag_value =
    macro_substitution.transformString(tokens.body.replace("::",":"))

    # save this tag and value (or overwrite previous)
    macros[tuple(tokens.tag)] = tag_value

    # define overall macro substitution expression
    macro_substitution << MatchFirst(
    [(Literal(k[0]) if len(k)==1
    else And([Literal(kk) for kk in
    k])).setParseAction(replaceWith(v))
    for k,v in macros.items()] ) + ~FollowedBy(tag)

    # return empty string, so macro definitions don't show up in final
    # expanded text
    return ""

    tag_defn.setParseAction(make_macro_sub)

    # define pattern for macro scanning
    scan_pattern = macro_substitution | tag_defn


    sorry = """\
    nm: Dave
    sorry: en: I'm sorry, nm::, I'm afraid I can't do that.
    sorry: es: Lo siento nm::, me temo que no puedo hacer eso.
    Hal said, "sorry: en:"
    Hal dijo, "sorry: es:" """
    print scan_pattern.transformString(sorry)

    Prints:

    Hal said, "I'm sorry, Dave, I'm afraid I can't do that."
    Hal dijo, "Lo siento Dave, me temo que no puedo hacer eso."
     
    Paul McGuire, Aug 29, 2010
    #5
  6. agnibhu

    agnibhu Guest

    On Aug 29, 5:43 pm, Paul McGuire <> wrote:
    > On Aug 28, 11:23 pm, Paul McGuire <> wrote:
    >
    >
    >
    > > On Aug 28, 11:14 am, agnibhu <> wrote:

    >
    > > > Hi all,

    >
    > > > I'm a newbie in python. I'm trying to create a library for parsing
    > > > certain keywords.
    > > > For example say I've key words like abc: bcd: cde: like that... So the
    > > > user may use like
    > > > abc: How are you bcd: I'm fine cde: ok

    >
    > > > So I've to extract the "How are you" and "I'm fine" and "ok"..and
    > > > assign them to abc:, bcd: and cde: respectively.. There may be
    > > > combination of keyowords introduced in future. like abc: xy: How are
    > > > you
    > > > So new keywords qualifying the other keywords so on..

    >
    > I got to thinking more about your keywords-qualifying-keywords
    > example, and I thought this would be a good way to support locale-
    > specific tags.  I also thought how one might want to have tags within
    > tags, to be substituted later, requiring a "abc::" escaped form of
    > "abc:", so that the tag is substituted with the value of tag "abc:" as
    > a late binding.
    >
    > Wasn't too hard to modify what I posted yesterday, and now I rather
    > like it.
    >
    > -- Paul
    >
    > # tag_substitute.py
    >
    > from pyparsing import (Combine, Word, alphas, FollowedBy, Group,
    > OneOrMore,
    >     empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal,
    > And, replaceWith)
    >
    > tag = Combine(Word(alphas) + ~FollowedBy("::") + ":")
    > tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
    > LineEnd())("body") + Optional(LineEnd().suppress())
    >
    > # now combine macro detection with substitution
    > macros = {}
    > macro_substitution = Forward()
    > def make_macro_sub(tokens):
    >     # unescape '::' and substitute any embedded tags
    >     tag_value =
    > macro_substitution.transformString(tokens.body.replace("::",":"))
    >
    >     # save this tag and value (or overwrite previous)
    >     macros[tuple(tokens.tag)] = tag_value
    >
    >     # define overall macro substitution expression
    >     macro_substitution << MatchFirst(
    >             [(Literal(k[0]) if len(k)==1
    >                 else And([Literal(kk) for kk in
    > k])).setParseAction(replaceWith(v))
    >                     for k,v in macros.items()] ) + ~FollowedBy(tag)
    >
    >     # return empty string, so macro definitions don't show up in final
    >     # expanded text
    >     return ""
    >
    > tag_defn.setParseAction(make_macro_sub)
    >
    > # define pattern for macro scanning
    > scan_pattern = macro_substitution | tag_defn
    >
    > sorry = """\
    > nm: Dave
    > sorry: en: I'm sorry, nm::, I'm afraid I can't do that.
    > sorry: es: Lo siento nm::, me temo que no puedo hacer eso.
    > Hal said, "sorry: en:"
    > Hal dijo, "sorry: es:" """
    > print scan_pattern.transformString(sorry)
    >
    > Prints:
    >
    > Hal said, "I'm sorry, Dave, I'm afraid I can't do that."
    > Hal dijo, "Lo siento Dave, me temo que no puedo hacer eso."


    Thanks all for giving me great solutions. I'm happy to see the
    respones.
    Will try out these and post the reply soon.

    Thanks once again,
    Agnibhu..
     
    agnibhu, Aug 30, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jstack
    Replies:
    1
    Views:
    613
    Tor Iver Wilhelmsen
    Jul 4, 2003
  2. kishan bisht

    struts tag inside a tag

    kishan bisht, Jul 8, 2003, in forum: Java
    Replies:
    1
    Views:
    1,489
    Wendy S
    Jul 9, 2003
  3. shahbaz
    Replies:
    0
    Views:
    911
    shahbaz
    Oct 27, 2003
  4. shruds
    Replies:
    1
    Views:
    835
    John C. Bollinger
    Jan 27, 2006
  5. P
    Replies:
    7
    Views:
    141
    Tad McClellan
    Jan 12, 2007
Loading...

Share This Page