pyparsing Combine without merging sub-expressions

Steven Bethard · Jan 20, 2007

Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})

How do I go about this? I was using something like::
... '.mrg')

But of course then all I get back is the full path::
(['wsj/00/wsj_0003.mrg'], {})

I could leave off the final Combine and add a parse action::
... wsj_name = tokens[4]
... return ''.join(tokens), wsj_name
... ([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {})

But that then allows whitespace between the pieces of the path, which
there shouldn't be::
([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {})

How do I make sure no whitespace intervenes, and still have access to
the sub-expression?

Thanks,

STeVe

Dennis Lee Bieber · Jan 21, 2007

Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})

If working file name/paths, why not use the functions in os.path?
(only problem may be if one is using Windows where the native separator
is \ )

Or just split on the /, first...

paths = sample.split("/")

paths ['wsj', '00', 'wsj_0003.mrg']
os.path.splitext(paths[-1])

Click to expand...

Click to expand...

('wsj_0003', '.mrg')

But that then allows whitespace between the pieces of the path, which
there shouldn't be::

If you didn't have whitespace coming in, there shouldn't be any
going out. If you do, you likely have malformed data and probably should
detect it earlier... Or need to define a more complete grammar for what
determines a filename/path...

([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {})

sample = "wsj / 00 / wsj_0003.mrg"
paths = sample.split("/")
paths ['wsj', '00', 'wsj_0003.mrg']
os.path.splitext(paths[-1]) ('wsj_0003', '.mrg')

os.path.join(*paths) 'wsj\\00\\wsj_0003.mrg'
#Windows...!

Click to expand...

Click to expand...

How do I make sure no whitespace intervenes, and still have access to
the sub-expression?

Thanks,

STeVe

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Paul McGuire · Jan 21, 2007

Steven said:
Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})

How do I go about this? I was using something like::
... '.mrg')

But of course then all I get back is the full path::
(['wsj/00/wsj_0003.mrg'], {})

The tokens are what the tokens are, so if you want to replicate a
sub-field, then you'll need a parse action to insert it into the
returned tokens. BUT, if all you want is to be able to easily *access*
that sub-field, then why not give it a results name? Like this:

wsj_name = pp.Combine(alphas + '_' + digits).setResultsName("name")

Leave everything else the same, but now you can access the name field
independently from the rest of the combined tokens.

result = wsj_path.parseString('wsj/00/wsj_0003.mrg')
print result.dump()
print result.name
print result.asList()

-- Paul

Steven Bethard · Jan 22, 2007

Dennis said:
Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::

foo.parseString('wsj/00/wsj_0003.mrg')

Click to expand...

(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})

Click to expand...

If working file name/paths, why not use the functions in os.path?

Two reasons. First, as I mentioned, this is within a larger pyparsing
grammar so it's not as easy to switch back and forth between the two.
Second, I do want to do some data validation (e.g. the name of the file
needs to be in a particular format) so I either need to post-process the
os.path approach or just do it in pyparsing.

If you didn't have whitespace coming in, there shouldn't be any
going out. If you do, you likely have malformed data and probably should
detect it earlier...

Well that's the intention of using pyparsing here. With a proper
grammar, pyparsing can detect the malformed data for me and throw an error.

STeVe

Steven Bethard · Jan 22, 2007

Paul said:
Steven said:

Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::

foo.parseString('wsj/00/wsj_0003.mrg')

Click to expand...

(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})

How do I go about this? I was using something like::

digits = pp.Word(pp.nums)
alphas = pp.Word(pp.alphas)
wsj_name = pp.Combine(alphas + '_' + digits)
wsj_path = pp.Combine(alphas + '/' + digits + '/' + wsj_name +

Click to expand...

... '.mrg')

Click to expand...

[snip]
BUT, if all you want is to be able to easily *access*
that sub-field, then why not give it a results name? Like this:

wsj_name = pp.Combine(alphas + '_' + digits).setResultsName("name")

Leave everything else the same, but now you can access the name field
independently from the rest of the combined tokens.

Works great. Thanks!

STeVe

Ann: Pyparsing 1.5.0 released	0	Jun 1, 2008
pyparsing problem	3	Jul 1, 2008
Pyparsing...	2	Sep 21, 2004
Pyparsing help	9	Mar 22, 2008
ANN: pyparsing 1.5.1 released	4	Oct 18, 2008
ANN: pyparsing-1.3 released	0	Mar 26, 2005
ANN: pyparsing 1.4.8 released	0	Oct 7, 2007
[ANN] pyparsing 1.5.3 released	0	Jun 25, 2010

pyparsing Combine without merging sub-expressions

Steven Bethard

Dennis Lee Bieber

Paul McGuire

Steven Bethard

Steven Bethard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads