pyparsing Combine without merging sub-expressions

S

Steven Bethard

Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})

How do I go about this? I was using something like::
... '.mrg')

But of course then all I get back is the full path::
(['wsj/00/wsj_0003.mrg'], {})

I could leave off the final Combine and add a parse action::
... wsj_name = tokens[4]
... return ''.join(tokens), wsj_name
... ([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {})

But that then allows whitespace between the pieces of the path, which
there shouldn't be::
([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {})

How do I make sure no whitespace intervenes, and still have access to
the sub-expression?

Thanks,

STeVe
 
D

Dennis Lee Bieber

Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})
If working file name/paths, why not use the functions in os.path?
(only problem may be if one is using Windows where the native separator
is \ )

Or just split on the /, first...
paths = sample.split("/")
paths ['wsj', '00', 'wsj_0003.mrg']
os.path.splitext(paths[-1])
('wsj_0003', '.mrg')
But that then allows whitespace between the pieces of the path, which
there shouldn't be::
If you didn't have whitespace coming in, there shouldn't be any
going out. If you do, you likely have malformed data and probably should
detect it earlier... Or need to define a more complete grammar for what
determines a filename/path...
([('wsj/00/wsj_0003.mrg', 'wsj_0003')], {})
sample = "wsj / 00 / wsj_0003.mrg"
paths = sample.split("/")
paths ['wsj', '00', 'wsj_0003.mrg']
os.path.splitext(paths[-1]) ('wsj_0003', '.mrg')

os.path.join(*paths) 'wsj\\00\\wsj_0003.mrg'
#Windows...!
How do I make sure no whitespace intervenes, and still have access to
the sub-expression?

Thanks,

STeVe
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
P

Paul McGuire

Steven said:
Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})

How do I go about this? I was using something like::
... '.mrg')

But of course then all I get back is the full path::
(['wsj/00/wsj_0003.mrg'], {})
The tokens are what the tokens are, so if you want to replicate a
sub-field, then you'll need a parse action to insert it into the
returned tokens. BUT, if all you want is to be able to easily *access*
that sub-field, then why not give it a results name? Like this:

wsj_name = pp.Combine(alphas + '_' + digits).setResultsName("name")

Leave everything else the same, but now you can access the name field
independently from the rest of the combined tokens.

result = wsj_path.parseString('wsj/00/wsj_0003.mrg')
print result.dump()
print result.name
print result.asList()

-- Paul
 
S

Steven Bethard

Dennis said:
Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
foo.parseString('wsj/00/wsj_0003.mrg')
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})
If working file name/paths, why not use the functions in os.path?

Two reasons. First, as I mentioned, this is within a larger pyparsing
grammar so it's not as easy to switch back and forth between the two.
Second, I do want to do some data validation (e.g. the name of the file
needs to be in a particular format) so I either need to post-process the
os.path approach or just do it in pyparsing.

If you didn't have whitespace coming in, there shouldn't be any
going out. If you do, you likely have malformed data and probably should
detect it earlier...

Well that's the intention of using pyparsing here. With a proper
grammar, pyparsing can detect the malformed data for me and throw an error.

STeVe
 
S

Steven Bethard

Paul said:
Steven said:
Within a larger pyparsing grammar, I have something that looks like::

wsj/00/wsj_0003.mrg

When parsing this, I'd like to keep around both the full string, and the
AAA_NNNN substring of it, so I'd like something like::
foo.parseString('wsj/00/wsj_0003.mrg')
(['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})

How do I go about this? I was using something like::
digits = pp.Word(pp.nums)
alphas = pp.Word(pp.alphas)
wsj_name = pp.Combine(alphas + '_' + digits)
wsj_path = pp.Combine(alphas + '/' + digits + '/' + wsj_name +
... '.mrg')
[snip]
BUT, if all you want is to be able to easily *access*
that sub-field, then why not give it a results name? Like this:

wsj_name = pp.Combine(alphas + '_' + digits).setResultsName("name")

Leave everything else the same, but now you can access the name field
independently from the rest of the combined tokens.

Works great. Thanks!

STeVe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Ann: Pyparsing 1.5.0 released 0
pyparsing problem 3
Pyparsing... 2
Pyparsing help 9
ANN: pyparsing 1.5.1 released 4
ANN: pyparsing-1.3 released 0
ANN: pyparsing 1.4.8 released 0
[ANN] pyparsing 1.5.3 released 0

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top