(mostly-)POSIX regular expressions

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?= · May 27, 2006

Hi,

I'm searching for a POSIX 1003.2 compatible regular expression engine.
The Python binding "pregex" by Neal Becker may do the job, but I did
not manage to download it as the original link
ftp://ftp.ctd.comsat.com/pub/
seems dead.

Does any old-timer (<wink>) have a copy of this package ?

Cheers,

SB

Paddy · May 28, 2006

maybe this: http://www.pcre.org/pcre.txt and ctypes might work for you?
(I was suprised to find out that PCRE supported POSIX but don't know
what version it supports or how well).

- Pad

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?= · May 28, 2006

Very good hint ! I wouldn't have found it alone ...

I have to study the doc, but the "THE DFA MATCHING ALGORITHM" may do
what I need Obviously, I didn't expect the Perl-Compatible Regular
Expressions to implement
"an alternative algorithm, provided by the pcre_dfa_exec() function,
that operates in a different way, and is not Perl-compatible".

Maybe the lib should be renamed in PCREWSO for:
Perl-compatible regular expressions ... well, sort of

Cheers,

SB

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?= · May 28, 2006

Paddy a écrit :

maybe this: http://www.pcre.org/pcre.txt and ctypes might work for you?

Well finally, it doesn't fit. What I need is a "longest match" policy
in
patterns like "(a)|(b)|(c)" and NOT a "left-to-right" policy.
Additionaly,
I need to be able to obtain the matched ("captured") substring and
the PCRE does not allow this in DFA mode.

Too bad ...

SB

John Machin · May 28, 2006

Paddy a écrit :

Well finally, it doesn't fit. What I need is a "longest match" policy
in
patterns like "(a)|(b)|(c)" and NOT a "left-to-right" policy.
Additionaly,
I need to be able to obtain the matched ("captured") substring and
the PCRE does not allow this in DFA mode.

Perhaps you might like to be somewhat more precise with your
requirements. "POSIX-compliant" made me think of yuckies like [:fubar:]
in character classes

The operands of | are such that the length is not fixed and so you can't
write them in descending length order? Care to tell us some more detail
about those operands?

If those operands are simple strings (LOGICAL|LOGIC|LOG) and you've got
more than a handful of them, try Danny Yoo's ahocorasick module.

HTH,
John

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?= · May 29, 2006

John said:
Perhaps you might like to be somewhat more precise with your
requirements.

Sure. More on this below.

"POSIX-compliant" made me think of yuckies like [:fubar:]
in character classes

Yep. I do not need POSIX *syntax* for regular expressions but POSIX
*semantics*, at least the "leftmost-longest" part (in contrast to the
"first then longest" used in Python, Perl, .NET, etc.)

The operands of | are such that the length is not fixed and so you can't
write them in descending length order? Care to tell us some more detail
about those operands?

Basically, I'd like to use the (excellent) python module SPARK
of John Aycock to build an (extended) C lexer. To do so, I need
to specify the patterns that match my tokens as well as a priority
between them. SPARK then builds a big alternate list of patterns
that begins with the high priority patterns and ends with the low
priority patterns and runs a match.

The problem with to be very careful and to specify explicitely the
priorities to get the desired results: "<=" shall be higher than "<",
decimal stuff higher than integer, etc, when most of the time what
you really want is to match the longest pattern ...

Worse, the priority work-around does not work well when you
compare keywords and (other) identifiers. To match "fortune"
as a identifier, you would need to define identifier with a higher
priority than keyword and it is a problem: "for" would be then
match as a identifier when it is a keyword.

I can come up with possible work-arounds for the "id vs
keyword" issue, but nothing that really makes me happy ...
Therefore, I was studying the possible replacement of the
Python native regular expression engine with a "POSIX
semantics" regular expression engine that would give the
longest match and avoid me a lot of extra work ...

I hope it's clearer now

Any advice ?

Cheers

SB

Anthony · May 29, 2006

i have a problem with the os.times() command, on different Python
versions, i get different printout:

Server1# python
Python 2.3.4 (#1, Feb 2 2005, 11:44:13)
[GCC 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import time
>>> import os
>>>
>>> print os.times()[4]

Click to expand...

Click to expand...

4880406.62

----------------------------------
Server2% python
Python 2.3.2 (#4, Sep 14 2004, 09:41:45) [C] on sunos5
Type "help", "copyright", "credits" or "license" for more information.

>>> import time
>>> import os
>>>
>>> print os.times()[4]

Click to expand...

Click to expand...

-21464227.74

---------------
Server3% python
Python 2.4.1 (#1, May 16 2005, 15:19:29)
[GCC 4.0.0 20050512 (Red Hat 4.0.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import time
>>> import os
>>>
>>> print os.times()[4]

Click to expand...

Click to expand...

18390711.21

and on the 3 servers, the linux command: $date
returns the same value.....

any suggestions???

Possible to insert variables into regular expressions?	6	Dec 9, 2004
ANN: 'rex', a module for easy creation and use of regular expressions	0	Jun 10, 2004
anybody help me	1	Feb 10, 2006
comp.lang.c FAQ list Table of Contents	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
python-dev Summary for 2004-08-01 through 2004-08-15	17	Aug 24, 2004
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Feb 1, 2008

(mostly-)POSIX regular expressions

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?=

Paddy

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?=

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?=

John Machin

=?iso-8859-1?q?S=E9bastien_Boisg=E9rault?=

Anthony

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads