ANN: pyparsing-1.3.3 released

P

Paul McGuire

Pyparsing 1.3.3 contains mostly bugfixes and minor enhancements over
previous releases, including some improvement in Unicode support. Here
are the change notes:

Version 1.3.3 - September 12, 2005
----------------------------------
- Improved support for Unicode strings that would be returned using
srange. Added greetingInKorean.py example, for a Korean version of
"Hello, World!" using Unicode. (Thanks, June Kim!)

- Added 'hexnums' string constant (nums+"ABCDEFabcdef") for defining
hexadecimal value expressions.

- NOTE: ===THIS CHANGE MAY BREAK EXISTING CODE===
Modified tag and results definitions returned by makeHTMLTags(),
to better support the looseness of HTML parsing. Tags to be
parsed are now caseless, and keys generated for tag attributes are
now converted to lower case.

Formerly, makeXMLTags("XYZ") would return a tag with results
name of "startXYZ", this has been changed to "startXyz". If this
tag is matched against '<XYZ Abc="1" DEF="2" ghi="3">', the
matched keys formerly would be "Abc", "DEF", and "ghi"; keys are
now converted to lower case, giving keys of "abc", "def", and
"ghi". These changes were made to try to address the lax
case sensitivity agreement between start and end tags in many
HTML pages.

No changes were made to makeXMLTags(), which assumes more rigorous
parsing rules.

Also, cleaned up case-sensitivity bugs in closing tags, and
switched to using Keyword instead of Literal class for tags.
(Thanks, Steve Young, for getting me to look at these in more
detail!)

- Added two helper parse actions, upcaseTokens and downcaseTokens,
which will convert matched text to all uppercase or lowercase,
respectively.

- Deprecated Upcase class, to be replaced by upcaseTokens parse
action.

- Converted messages sent to stderr to use warnings module, such as
when constructing a Literal with an empty string, one should use
the Empty() class or the empty helper instead.

- Added ' ' (space) as an escapable character within a quoted
string.

- Added helper expressions for common comment types, in addition
to the existing cStyleComment (/*...*/) and htmlStyleComment
(<!-- ... -->)
. dblSlashComment = // ... (to end of line)
. cppStyleComment = cStyleComment or dblSlashComment
. javaStyleComment = cppStyleComment
. pythonStyleComment = # ... (to end of line)



Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul


========================================

Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers. Parser grammars are assembled directly in
the calling Python code, using classes such as Literal, Word,
OneOrMore, Optional, etc., combined with operators '+', '|', and '^'
for And, MatchFirst, and Or. No separate code-generation or external
files are required. Pyparsing can be used in many cases in place of
regular expressions, with shorter learning curve and greater
readability and maintainability. Pyparsing comes with a number of
parsing examples, including:
- "Hello, World!" (English and Korean)
- chemical formulas
- configuration file parser
- web page URL extractor
- 5-function arithmetic expression parser
- subset of CORBA IDL
- chess portable game notation
- simple SQL parser
- Mozilla calendar file parser
- EBNF parser/compiler
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top