pyparsing: how to negate a grammar

knguyen · Jan 8, 2005

Hi,

I want to define a rule for a line that does NOT start with a given
Literal. How do I do that? I try the following and my program just hang
there:

BodyLine = ~Literal("HTTP/1.1") + restOfLine

Thanks,
Khoa

Paul McGuire · Jan 9, 2005

Hi,

I want to define a rule for a line that does NOT start with a given
Literal. How do I do that? I try the following and my program just hang
there:

BodyLine = ~Literal("HTTP/1.1") + restOfLine

Thanks,
Khoa

Khoa -

pyparsing can be run in several modes, one of which tokenizes and extracts
data according to a given grammar, one of which scans for pattern matches,
and one which translates matched patterns into other patterns. Its not
clear from your e-mail what you are trying to do. There is nothing in your
statement that would cause Python to "just hang there", what else is your
program doing?

-- Paul

knguyen · Jan 9, 2005

Hi Paul,

I am trying to extract HTTP response codes from a HTTP page send from
a web server. Below is my test program. The program just hangs.

Thanks,
Khoa
##################################################

#!/usr/bin/python

from pyparsing import ParseException, Dict, CharsNotIn,
Group,Literal,Word,ZeroOrMore,OneOrMore,
Suppress,nums,alphas,alphanums,printables,restOfLine

data = """HTTP/1.1 200 OK
body line some text here
body line some text here
HTTP/1.1 400 Bad request
body line some text here
body line some text here

HTTP/1.1 500 Bad request
body line some text here
body line some text here
"""

print "================="
print data
print "================="

HTTPVersion = (Literal("HTTP/1.1")).setResultsName("HTTPVersion")
StatusCode = (Word(nums)).setResultsName("StatusCode")
ReasonPhrase = restOfLine.setResultsName("ReasonPhrase")
StatusLine = Group(HTTPVersion + StatusCode + ReasonPhrase)

nonHTTP = ~Literal("HTTP/1.1")
BodyLine = Group(nonHTTP + restOfLine)
Response = OneOrMore(StatusLine + ZeroOrMore(BodyLine))
respFields = Response.parseString(data)
print respFields

Paul McGuire · Jan 9, 2005

Hi Paul,

I am trying to extract HTTP response codes from a HTTP page send from
a web server. Below is my test program. The program just hangs.

Thanks,
Khoa
##################################################

Khoa -

Thanks for supplying a little more information to go on. The problem you
are struggling with has to do with pyparsing's handling or non-handling of
whitespace, which I'll admit takes some getting used to.

In general, pyparsing works its way through the input string, matching input
characters against the defined pattern. This gets a little tricky when
dealing with whitespace (which includes '\n' characters). In particular,
restOfLine will read up to the next '\n', but will not go past it - AND
restOfLine will match an empty string. So if you have a grammar that
includes repetition, such as OneOrMore(restOfLine), this will read up to the
next '\n', and then just keep matching forever. This is just about the case
you have in your code, ZeroOrMore(BodyLine), in which BodyLine is
BodyLine = Group(nonHTTP + restOfLine)
You need to include something to consume the terminating '\n', which is the
purpose of the LineEnd() class. Change BodyLine to
BodyLine = Group(nonHTTP + restOfLine + LineEnd())
and this will break the infinite looping that occurs at the end of the first
body line. (If you like, use LineEnd.suppress(), to keep the '\n' tokens
from getting included with your other parsed data.)

Now there is one more problem - another infinite loop at the end of the
string. By similar reasoning, it is resolved by changing
nonHTTP = ~Literal("HTTP/1.1")
to
nonHTTP = ~Literal("HTTP/1.1") + ~StringEnd()

After making those two changes, your program runs to completion on my
system.

Usually, when someone has some problems with this kind of "line-sensitive"
parsing, I recommend that they consider using pyparsing in a different
manner, or use some other technique. For instance, you might use
pyparsing's scanString generator to match on the HTTP lines, as in

for toks,start,end in StatusLine.scanString(data):
print toks,toks[0].StatusCode, toks[0].ReasonPhrase
print start,end

which gives
[['HTTP/1.1', '200', ' OK']] 200 OK
0 15
[['HTTP/1.1', '400', ' Bad request']] 400 Bad request
66 90
[['HTTP/1.1', '500', ' Bad request']] 500 Bad request
142 166

If you need the intervening body text, you can use the start and end values
to extract it in slices from the input data string.

Or, since your data is reasonably well-formed, you could just use readlines,
or data.split('\n'), and find the HTTP lines using startswith(). While this
is a brute force approach, it will run certainly many times faster than
pyparsing.

In any event, best of luck using pyparsing, and write back if you have other
questions.

-- Paul

PyParsing contextual suggestions?	0	Dec 11, 2012
parsley parsing question, how to make a variable grammar	0	Jun 13, 2014
[ANN] pyparsing 1.5.3 released	0	Jun 25, 2010
help with pyparsing	3	Dec 10, 2007
Questions about negate a negative number	1	Feb 19, 2014
Pyparsing: Grammar Suggestion	1	May 17, 2006
Pyparsing: Specify grammar at run time	2	May 17, 2006
Pyparsing...	2	Sep 20, 2004

pyparsing: how to negate a grammar

knguyen

Paul McGuire

knguyen

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads