Pyparsing - Dealing with a Blank Value

Steve · Jan 26, 2007

Hi All,

I've picked up the PyParsing module and am trying to figure out how to
do a simple parsing of some HTML source code. My specific problem is
dealing with an <TD></TD> element that is blank.

from pyparsing import *
import sys

integer = Word("0123456789")

trStart = Literal("<TR>").suppress()
trEnd = Literal("</TR>").suppress()

tdStart = Literal("<TD>").suppress()
tdEnd = Literal("</TD>").suppress()

#dataItem = Word(alphas)
BlankItem = Word('')
dataItem = Word(alphanums + " " + "," + ":") # works with spaces in
data
MultiItem = Optional(OneOrMore(dataItem))

TestLine = ['<TR><TD>Group</TD><TD>Year</TD><TD>City</TD></TR>',
'<TR><TD>AAA</TD><TD>1992</TD><TD>Los Angeles</TD></TR>',
'<TR><TD>BBB</TD><TD>2007</TD><TD>Santa Cruz</TD></TR>',
'<TR><TD></TD><TD>2001</TD><TD>Santa Cruz</TD></TR>']

htmlLine = trStart + tdStart + MultiItem.setResultsName('status') +
tdEnd + tdStart + MultiItem.setResultsName('year') + tdEnd + tdStart +
MultiItem.setResultsName('title') + tdEnd + trEnd

for CurrentLine in TestLine:
print 'Line = ', CurrentLine

for srvrtokens,startloc,endloc in htmlLine.scanString( CurrentLine ):
print 'tokens = %s %d %d \n' % (srvrtokens, startloc,endloc)

Output :

Line = <TR><TD>Group</TD><TD>Year</TD><TD>City</TD></TR>
tokens = ['Group', 'Year', 'City'] 0 49

Line = <TR><TD>AAA</TD><TD>1992</TD><TD>Los Angeles</TD></TR>
tokens = ['AAA', '1992', 'Los Angeles'] 0 54

Line = <TR><TD>BBB</TD><TD>2007</TD><TD>Santa Cruz</TD></TR>
tokens = ['BBB', '2007', 'Santa Cruz'] 0 53

*** Blank 1st element - only shows 2 elements - need 3 elements to be
consistent ***

Line = <TR><TD></TD><TD>2001</TD><TD>Santa Cruz</TD></TR>
tokens = ['2001', 'Santa Cruz'] 0 50

Any assistance would be greatly appreciated!

Steve

Gabriel Genellina · Jan 26, 2007

I've picked up the PyParsing module and am trying to figure out how to
do a simple parsing of some HTML source code. My specific problem is
dealing with an <TD></TD> element that is blank.

Sorry for not answering your question exactly, but I'd use
BeautifulSoup instead, it works even if the HTML is not well formed.

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Paul McGuire · Jan 26, 2007

Hi All,

I've picked up thePyParsingmodule and am trying to figure out how to
do a simple parsing of some HTML source code. My specific problem is
dealing with an <TD></TD> element that is blank.

Any assistance would be greatly appreciated!

Steve

Just define a default value to be returned for MultiItem if the
Optional expression is not found:

MultiItem = Optional(OneOrMore(dataItem),default="")

Define default to be whatever string you choose.

-- Paul

Paul McGuire · Jan 26, 2007

Hi All,

I've picked up thePyParsingmodule and am trying to figure out how to
do a simple parsing of some HTML source code. My specific problem is
dealing with an <TD></TD> element that is blank.

I'd also suggest use the makeHTMLTags helper module for the TR and TD
tags:

trStart,trEnd = makeHTMLTags("TR")
tdStart,tdEnd = makeHTMLTags("TD")

makeHTMLTags includes a much more robust definition than just
Literal("<tag>"), including recognition of attributes and tolerance of
upper/lower case.

-- Paul

Steve · Jan 26, 2007

Hi Paul!

Thanks for your suggestions on the default value (I didn't know you
could do that!!) and the use of the makeHTMLtags module!

Steve

pyparsing with nested table	2	Dec 8, 2005
Help with pyparsing and dealing with null values	2	Oct 29, 2007
help with pyparsing	3	Dec 10, 2007
pyparsing question: single word values with a double quoted stringevery once in a while	1	May 19, 2009
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
Calls for Papers: International Conference on Communications Systemsand Technologies (ICCST 2008)	0	May 7, 2008
Call for Papers: The International Conference on Internet and Multimedia Technologies (ICIMT 2007)	0	Jun 2, 2007
Opportunity of a lifetime to Attend a Amazing Event.	0	Apr 12, 2008

Pyparsing - Dealing with a Blank Value

Steve

Gabriel Genellina

Paul McGuire

Paul McGuire

Steve

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads