RXParse module (by robic0), Version 0.1000

J

John Bokma

robic0 wrote:

: I actually like all the regulars here.

Must be a strange feeling of being liked this way.

: [...] I stay away from it with alchohol and drugs because
: it reaks my body. It forces my body to [...] hours of creativity
: that it, my body just can't keep up with. Its like the universe
: flashes on me in an instant.
: I can't control it. Its very weird.

Take on long distance running.

Long distance running is not going to help with such things if the troll
is serious about it.
It may help you. It will also keep you
away from a computer keyboard long enough as to allow you to rethink
your stuff before mindlessly keying it into heaps of 1s and 0s.

You mean like you just did? Wow.
 
M

Matt Garrish

Uri Guttman said:
normally i avoid this troll and of course posting any code reviews of
his crappy modules. but i am still drawn to skim its code for laughs and
for future examples of how not to code in perl. and i came across this
nugget:


r> while ($$ref_parse_ln =~ /$RxParse/g)
r> {
r> ## handle contents
r> if (defined $14) {
r> $content .= $14;

and MUCH MUCH later:

r> $RxParse =
r>
qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;

yes, that is one long line of regex with at least 14 grabs. read it and
weep. you may want to gouge out your eyes and i sympathize with you.

the regex is assigned way away from where it is used (and it's a
horrible regex to boot. does this moron really think he can parse SGML
type stuff with a regex?). but the use of $14 is one of the worst pieces
of perl code i have ever seen. and there is an amazing amount of bad
perl out there (easy to find on the web and in too much of cpan). but i
have never seen $14 used before. that takes a really microencephalic
brain to use a numbered grab that large, with such an ugly regex and
being so far away from the regex. but we know this troll well enough to
know it can code this poorly and now we have proof.

We had this out with him just before you came back. Myself and a few other
tried to explain to him that his xml regex parsing thing would never fly,
and he actually acknowledged as much at one point after falling hopelessly
on his face. It seems he's gone back to the
this-is-my-parser-that-does-nothing-useful-anyone-would-want-and-doesn't-follow-xml-rules-and
I-don't-care attitude:

http://tinyurl.com/p84zb

Matt
 
R

robic0

Uri Guttman said:
normally i avoid this troll and of course posting any code reviews of
his crappy modules. but i am still drawn to skim its code for laughs and
for future examples of how not to code in perl. and i came across this
nugget:
"r" == robic0 <robic0> writes:


r> while ($$ref_parse_ln =~ /$RxParse/g)
r> {
r> ## handle contents
r> if (defined $14) {
r> $content .= $14;

and MUCH MUCH later:

r> $RxParse =
r>
qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;

yes, that is one long line of regex with at least 14 grabs. read it and
weep. you may want to gouge out your eyes and i sympathize with you.

the regex is assigned way away from where it is used (and it's a
horrible regex to boot. does this moron really think he can parse SGML
type stuff with a regex?). but the use of $14 is one of the worst pieces
of perl code i have ever seen. and there is an amazing amount of bad
perl out there (easy to find on the web and in too much of cpan). but i
have never seen $14 used before. that takes a really microencephalic
brain to use a numbered grab that large, with such an ugly regex and
being so far away from the regex. but we know this troll well enough to
know it can code this poorly and now we have proof.

We had this out with him just before you came back. Myself and a few other
tried to explain to him that his xml regex parsing thing would never fly,
and he actually acknowledged as much at one point after falling hopelessly
on his face. It seems he's gone back to the
this-is-my-parser-that-does-nothing-useful-anyone-would-want-and-doesn't-follow-xml-rules-and
I-don't-care attitude:

http://tinyurl.com/p84zb

Matt
I got some new ideas, gonna post them on the outer level. The core is filled with foul
language.

Just to let you know, that thread link you posted was indeed code by me. It was a specific
rolling substitution parser that created a hash of the elements. It was not serial.

The core regexp's may look similar but this is an entirely different thing. This is a
rolling serialized pre-compiled regexp with no substitution. This is a real XML parser.
Thats all it does. Uri fails to understand regex in his statement. This parser actually
parses a complex 380k xhtml file in a quarter of a second (on my machine). Thats all it does for now.
As to the 14 captures Uri was talking about, what he fails to realize is there is only 1 valid sucessfull
capture each statement sucess, otherwise the performance wouldn't be realized.

You can try to make me out an idiot. The silence now results from a real close look everybody has had
now. Remember, you can't take back what you post here. I'm used to insults but not the personal ones.
I posted this with caveats. None of those mentioned a conceptual error. The regexp logic was formulated
using the regexp pieces of the W3C Xml 1.1 standard. In no means do they tell you how to do the code posted
here.

I welcome your feedback always
 
R

robic0

Version 0.1000

I forgot to mention (maybe I did too lazy to look) 'namespace' is not
parsed out yet.

Since the main regexp is the performance key to this, every single addition/subtraction
was benched and tuned multiple times.

If I'm going to add really nice stuff (see below) I want there to be a minimul effect on
performance. If it creates a big hit, I'd rater create a parallel regexp within the module.
But doing that increases code size, the otherway, complexity.

I have no idea if this benches competitive with a C dll or not. I asume it does based on my experience.
There is nothing in the main compiled regexp that is *not* necessary in a C alternative, in terms of
conditionals (alternations and factoring). In fact it can't be any other way, none..

I was never one to sit down and write finished code from start to end. Now that its in this unfinished state,
one of the reasons I stopped working on it, is because of the fork in the road that exists now. This is the
point where it can diverge into multiple modules. I don't like that approach. I want to dwell on it a while
and see whats possible.

One thing that is definetly possible, is the ability to re-write the xml as it is being parsed.
Starting a parallel output file with additions or sans existing.
This includes all elements and content. I've seen a desire for this. I've seen it mentioned as well,
a desire for filtering, say namespace for example. This is possible. Of course if the main regexp is
mildly altered it will affect performance. However, there is no penaly to have multiple main regexps
without altering the 'if' handlers. The 'if' handlers, and sub regexp are relaxed in this restriction.

Its up to you. What you think VanderDick?
 
R

robic0

I forgot to mention (maybe I did too lazy to look) 'namespace' is not
parsed out yet.

Since the main regexp is the performance key to this, every single addition/subtraction
was benched and tuned multiple times.

If I'm going to add really nice stuff (see below) I want there to be a minimul effect on
performance. If it creates a big hit, I'd rater create a parallel regexp within the module.
But doing that increases code size, the otherway, complexity.

I have no idea if this benches competitive with a C dll or not. I asume it does based on my experience.
There is nothing in the main compiled regexp that is *not* necessary in a C alternative, in terms of
conditionals (alternations and factoring). In fact it can't be any other way, none..

I was never one to sit down and write finished code from start to end. Now that its in this unfinished state,
one of the reasons I stopped working on it, is because of the fork in the road that exists now. This is the
point where it can diverge into multiple modules. I don't like that approach. I want to dwell on it a while
and see whats possible.

One thing that is definetly possible, is the ability to re-write the xml as it is being parsed.
Starting a parallel output file with additions or sans existing.
This includes all elements and content. I've seen a desire for this. I've seen it mentioned as well,
a desire for filtering, say namespace for example. This is possible. Of course if the main regexp is
mildly altered it will affect performance. However, there is no penaly to have multiple main regexps
without altering the 'if' handlers.

Well, with minimal alterations, certainly padding captures to an envelope is possible. The overlap
in code in the switch may be benificial. Re-writeing the xml with real-time editing is always benificail
and absolutely possible.
 
C

cmic

Hello comp.lang.perl.perl.misc & al.

Frankly, robic0 make me laugh every day.
I can't believe this guy is serious. Too huge.
Frankly I love this group :
1- for *serious* Perl learning (thx Uri Gunnar, Tad, Ila, Xicheng,
Peter J., Anno, etc.)
2- for the fun tempest (call it troll if you like) robic0 managed to
launch
3- and for American idioms or insults...

Fr.comp.lang.perl is too ... academic in this regard.
Regards.
 
M

Michele Dondi

i made a mistake. it uses at least up to $17 which is worse than $14. i
apologize for that error in my review.

Nitpick: if it's at least up to $17, then it's at least up to $14.
It's not a mistake nor an error: it was an estimate; then you got a
better estimate, period.


Michele
 
R

robic0

Well you surely are a French idiot cmic, I'm surely glad we we Americans
died in the hundreds of thousands to save your fuckin country in ww2.
Looking back on it now, it was a payback for the French help during the
American Revolution. As far as I can see, the debt is paid. We're even.
So ahhh, peeessseee of u faggat twat !!!!!!!!!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top