[ANN] pyparsing 1.4.5 released

P

Paul McGuire

This latest version of pyparsing has a few minor bug-fixes and
enhancements, and a performance improvement of up to 100% increase in
parsing speed.

This release also includes some new examples:
- parsePythonValue.py - parses strings representing lists, dicts,
and tuples, with nesting support
- sql2dot.py - SQL diagram generator, parsed from schema table
definitions
- htmlStripper.py - strips HTML tags from HTML pages, leaving only
body text

Download pyparsing 1.4.5 at http://pyparsing.sourceforge.net. The
pyparsing Wiki is at http://pyparsing.wikispaces.com

-- Paul

========================================
Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers. Parser grammars are assembled directly in
the calling Python code, using classes such as Literal, Word,
OneOrMore, Optional, etc., combined with operators '+', '|', and '^'
for And, MatchFirst, and Or. No separate code-generation or external
files are required. Pyparsing comes with a number of parsing examples,
including:
- "Hello, World!" (English, Korean, and Greek)
- chemical formulas
- configuration file parser
- web page URL extractor
- 5-function arithmetic expression parser
- subset of CORBA IDL
- chess portable game notation
- simple SQL parser
- Mozilla calendar file parser
- EBNF parser/compiler
- Python value string parser (lists, dicts, tuples, with nesting) (new)
- HTML tag stripper (new)


Version 1.4.5 - December, 2006
------------------------------
- Removed debugging print statement from QuotedString class. Sorry
for not stripping this out before the 1.4.4 release!

- A significant performance improvement, the first one in a while!
For my Verilog parser, this version of pyparsing is about double the
speed - YMMV.

- Added support for pickling of ParseResults objects. (Reported by
Jeff Poole, thanks Jeff!)

- Fixed minor bug in makeHTMLTags that did not recognize tag attributes
with embedded '-' or '_' characters. Also, added support for
passing expressions to makeHTMLTags and makeXMLTags, and used this
feature to define the globals anyOpenTag and anyCloseTag.

- Fixed error in alphas8bit, I had omitted the y-with-umlaut character.

- Added punc8bit string to complement alphas8bit - it contains all the
non-alphabetic, non-blank 8-bit characters.

- Added commonHTMLEntity expression, to match common HTML "ampersand"
codes, such as "<", ">", "&", " ", and """. This
expression also defines a results name 'entity', which can be used
to extract the entity field (that is, "lt", "gt", etc.). Also added
built-in parse action replaceHTMLEntity, which can be attached to
commonHTMLEntity to translate "<", ">", "&", " ", and
"&quot;" to "<", ">", "&", " ", and "'".

- Added example, htmlStripper.py, that strips HTML tags and scripts
from HTML pages. It also translates common HTML entities to their
respective characters.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top