Tag parsing in python

A

agnibhu

Hi all,

I'm a newbie in python. I'm trying to create a library for parsing
certain keywords.
For example say I've key words like abc: bcd: cde: like that... So the
user may use like
abc: How are you bcd: I'm fine cde: ok

So I've to extract the "How are you" and "I'm fine" and "ok"..and
assign them to abc:, bcd: and cde: respectively.. There may be
combination of keyowords introduced in future. like abc: xy: How are
you
So new keywords qualifying the other keywords so on..
So I would like to know the python way of doing this. Is there any
library already existing for making my work easier. ?

~
Agnibhu
 
T

Tim Chase

For example say I've key words like abc: bcd: cde: like that... So the
user may use like
abc: How are you bcd: I'm fine cde: ok

So I've to extract the "How are you" and "I'm fine" and "ok"..and
assign them to abc:, bcd: and cde: respectively..

For this, you can do something like
>>> s = "abc: how are you bcd: I'm fine cde: ok"
>>> import re
>>> r = re.compile(r'(\w+):\s*((?:[^:](?!\w+:))*)')
>>> r.findall(s)
[('abc', 'how are you'), ('bcd', "I'm fine"), ('cde', 'ok')]

Yes, it's a bit of a gnarled regexp, but it seems to do the job.
There may be combination of keyowords introduced in future.
like abc: xy: How are you So new keywords qualifying the other
keywords so on.

I'm not sure I understand this bit of what you're asking. If you
have

s = "abc: xy: How are you"

why should that not be parsed as
[('abc', ''), ('xy', 'How are you')]

as your initial description prescribes?

-tkc
 
J

Josh English

Hi all,

I'm a newbie in python. I'm trying to create a library for parsing
certain keywords.
For example say I've key words like abc: bcd: cde: like that... So the
user may use like
abc: How are you bcd: I'm fine cde: ok

So I've to extract the "How are you" and "I'm fine" and "ok"..and
assign them to abc:, bcd: and cde: respectively.. There may be
combination of keyowords introduced in future. like abc: xy: How are
you
So new keywords qualifying the other keywords so on..
So I would like to know the python way of doing this. Is there any
library already existing for making my work easier. ?

~
Agnibhu

Have you looked at pyparsing? (http://pyparsing.wikispaces.com/) It
may
be possible to use that library to do this.

Josh
 
P

Paul McGuire

Hi all,

I'm a newbie in python. I'm trying to create a library for parsing
certain keywords.
For example say I've key words like abc: bcd: cde: like that... So the
user may use like
abc: How are you bcd: I'm fine cde: ok

So I've to extract the "How are you" and "I'm fine" and "ok"..and
assign them to abc:, bcd: and cde: respectively.. There may be
combination of keyowords introduced in future. like abc: xy: How are
you
So new keywords qualifying the other keywords so on..
So I would like to know the python way of doing this. Is there any
library already existing for making my work easier. ?

~
Agnibhu

Here's how pyparsing can parse your keyword/tags:

from pyparsing import Combine, Word, alphas, Group, OneOrMore, empty,
SkipTo, LineEnd

text1 = "abc: How are you bcd: I'm fine cde: ok"
text2 = "abc: xy: How are you"

tag = Combine(Word(alphas)+":")
tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
LineEnd())("body")

for text in (text1,text2):
print text
for td in tag_defn.searchString(text):
print td.dump()
print

Prints:

abc: How are you bcd: I'm fine cde: ok
[['abc:'], 'How are you']
- body: How are you
- tag: ['abc:']
[['bcd:'], "I'm fine"]
- body: I'm fine
- tag: ['bcd:']
[['cde:'], 'ok']
- body: ok
- tag: ['cde:']

abc: xy: How are you
[['abc:', 'xy:'], 'How are you']
- body: How are you
- tag: ['abc:', 'xy:']



Now here's how to further use pyparsing to actually use those tags as
substitution macros:

from pyparsing import Forward, MatchFirst, Literal, And, replaceWith,
FollowedBy

# now combine macro detection with substitution
macros = {}
macro_substitution = Forward()
def make_macro_sub(tokens):
macros[tuple(tokens.tag)] = tokens.body

# define macro substitution
macro_substitution << MatchFirst(
[(Literal(k[0]) if len(k)==1
else And([Literal(kk) for kk in
k])).setParseAction(replaceWith(v))
for k,v in macros.items()] ) + ~FollowedBy(tag)

return ""
tag_defn.setParseAction(make_macro_sub)

scan_pattern = macro_substitution | tag_defn

test_text = text1 + "\nBob said, 'abc:?' I said, 'bcd:.'" + text2 +
"\nThen Bob said 'abc: xy:?'"

print test_text
print scan_pattern.transformString(test_text)


Prints:

abc: How are you bcd: I'm fine cde: ok
Bob said, 'abc:?' I said, 'bcd:.'abc: xy: How are you
Then Bob said 'abc: xy:?'

Bob said, 'How are you?' I said, 'I'm fine.'
Then Bob said 'How are you?'
 
P

Paul McGuire

I got to thinking more about your keywords-qualifying-keywords
example, and I thought this would be a good way to support locale-
specific tags. I also thought how one might want to have tags within
tags, to be substituted later, requiring a "abc::" escaped form of
"abc:", so that the tag is substituted with the value of tag "abc:" as
a late binding.

Wasn't too hard to modify what I posted yesterday, and now I rather
like it.

-- Paul


# tag_substitute.py

from pyparsing import (Combine, Word, alphas, FollowedBy, Group,
OneOrMore,
empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal,
And, replaceWith)

tag = Combine(Word(alphas) + ~FollowedBy("::") + ":")
tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
LineEnd())("body") + Optional(LineEnd().suppress())


# now combine macro detection with substitution
macros = {}
macro_substitution = Forward()
def make_macro_sub(tokens):
# unescape '::' and substitute any embedded tags
tag_value =
macro_substitution.transformString(tokens.body.replace("::",":"))

# save this tag and value (or overwrite previous)
macros[tuple(tokens.tag)] = tag_value

# define overall macro substitution expression
macro_substitution << MatchFirst(
[(Literal(k[0]) if len(k)==1
else And([Literal(kk) for kk in
k])).setParseAction(replaceWith(v))
for k,v in macros.items()] ) + ~FollowedBy(tag)

# return empty string, so macro definitions don't show up in final
# expanded text
return ""

tag_defn.setParseAction(make_macro_sub)

# define pattern for macro scanning
scan_pattern = macro_substitution | tag_defn


sorry = """\
nm: Dave
sorry: en: I'm sorry, nm::, I'm afraid I can't do that.
sorry: es: Lo siento nm::, me temo que no puedo hacer eso.
Hal said, "sorry: en:"
Hal dijo, "sorry: es:" """
print scan_pattern.transformString(sorry)

Prints:

Hal said, "I'm sorry, Dave, I'm afraid I can't do that."
Hal dijo, "Lo siento Dave, me temo que no puedo hacer eso."
 
A

agnibhu

I got to thinking more about your keywords-qualifying-keywords
example, and I thought this would be a good way to support locale-
specific tags.  I also thought how one might want to have tags within
tags, to be substituted later, requiring a "abc::" escaped form of
"abc:", so that the tag is substituted with the value of tag "abc:" as
a late binding.

Wasn't too hard to modify what I posted yesterday, and now I rather
like it.

-- Paul

# tag_substitute.py

from pyparsing import (Combine, Word, alphas, FollowedBy, Group,
OneOrMore,
    empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal,
And, replaceWith)

tag = Combine(Word(alphas) + ~FollowedBy("::") + ":")
tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
LineEnd())("body") + Optional(LineEnd().suppress())

# now combine macro detection with substitution
macros = {}
macro_substitution = Forward()
def make_macro_sub(tokens):
    # unescape '::' and substitute any embedded tags
    tag_value =
macro_substitution.transformString(tokens.body.replace("::",":"))

    # save this tag and value (or overwrite previous)
    macros[tuple(tokens.tag)] = tag_value

    # define overall macro substitution expression
    macro_substitution << MatchFirst(
            [(Literal(k[0]) if len(k)==1
                else And([Literal(kk) for kk in
k])).setParseAction(replaceWith(v))
                    for k,v in macros.items()] ) + ~FollowedBy(tag)

    # return empty string, so macro definitions don't show up in final
    # expanded text
    return ""

tag_defn.setParseAction(make_macro_sub)

# define pattern for macro scanning
scan_pattern = macro_substitution | tag_defn

sorry = """\
nm: Dave
sorry: en: I'm sorry, nm::, I'm afraid I can't do that.
sorry: es: Lo siento nm::, me temo que no puedo hacer eso.
Hal said, "sorry: en:"
Hal dijo, "sorry: es:" """
print scan_pattern.transformString(sorry)

Prints:

Hal said, "I'm sorry, Dave, I'm afraid I can't do that."
Hal dijo, "Lo siento Dave, me temo que no puedo hacer eso."

Thanks all for giving me great solutions. I'm happy to see the
respones.
Will try out these and post the reply soon.

Thanks once again,
Agnibhu..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top