look-behind regexp ?

Y

Yukihiro Matsumoto

Hi,

In message "Re: look-behind regexp ?"

|Are there any plans to support look-behinds in the core regexp engine?

1.9 Oniguruma regexp engine already has one.

matz.
 
A

Austin Ziegler

Ah, yes. Thanks. I should have Googled first. :)

But reading through that and the documentation on the same site, I am
still looking for a rationale document. Why Onigurama and not, say,
PCRE? Why a new regexp parser?

1. Licensing. PCRE's licensing has been somewhat fluid. The current
release seems OK.
2. Control. In many ways, such a core feature to Ruby should be native to Ruby.
3. Native concepts. Ruby REs are a bit different because they end up
being objects.

-austin
 
B

B. K. Oxley (binkley)

Austin said:
1. Licensing. PCRE's licensing has been somewhat fluid. The current
release seems OK.
2. Control. In many ways, such a core feature to Ruby should be native to Ruby.
3. Native concepts. Ruby REs are a bit different because they end up
being objects.

Hrm.

In all honesty, these objections seem weak to me.

If the licensing is not a problem right now, why would it necessarily
become one in the future? (Although I don't know the history of
licensing in PCRE, so perhaps it has a record of arbitrariness.)

Control is not so important when you have the source code. And Ruby can
contribute to the development of PCRE.

I'm unsure what you mean in point three. I presume that a Ruby regexp
implementation would use PCRE for implementation, wrapping any details
so that the implementation is not visible, and only objects remain.

Not to be so nitpicky, I only used PCRE as an example. I have an
inherent dislike of wheel-reinvention (my natural laziness at play), so
my ears perk up when I see something like a rewrite of regexp parsers
when so many fine ones are already around.


Cheers,
--binkley
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: look-behind regexp ?"

|But reading through that and the documentation on the same site, I am
|still looking for a rationale document. Why Onigurama and not, say,
|PCRE? Why a new regexp parser?

PCRE does only support UTF-8 (as far as I know), not multiple
encodings like Ruby does. Oniguruma supports UTF-8, UTF-16,
ISO-8859-*, EUC-JP, Shift_JIS, and lot more.

matz.
 
B

B. K. Oxley (binkley)

Yukihiro said:
PCRE does only support UTF-8 (as far as I know), not multiple
encodings like Ruby does. Oniguruma supports UTF-8, UTF-16,
ISO-8859-*, EUC-JP, Shift_JIS, and lot more.

Ah. I inferred as much from the prominence given the list of encodings,
but wanted to find out more.


Thanks,
--binkley
 
Y

Yukihiro Matsumoto

In message "Re: look-behind regexp ?"

|Ah. I inferred as much from the prominence given the list of encodings,
|but wanted to find out more.

Here's the list of encodings supported by default:

ASCII BIG5 EUC-KR EUC-JP EUC-TW
ISO8859-1 ISO8859-2 ISO8859-3
ISO8859-4 ISO8859-5 ISO8859-6
ISO8859-7 ISO8859-8 ISO8859-9
ISO8859-10 ISO8859-11 ISO8859-13
ISO8859-14 ISO8859-15 ISO8859-16
KOI8 KOI8-R Shift_JIS UTF-8
UTF-16BE UTF-16LE UTF-32BE UTF-32LE

And more importantly, its encoding support is pluggable, you can add
new encoding support by writing callback routines.

matz.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top