Oh great gurus of the list, I need help with a regular expression please

C

cate

I have a perl file that will read an XML file and search for the tag
<pop> (and read the entire tag) with this expression:
$/='<'.$ARGV[1].'>';

where pop has been passed in as the argument. I now have to do the
same with an XML file that is coming with with this format: <pop
loc="AA11"> or <pop loc="A24"> or <pop loc="AA1">. I have tried a
number of different combinations but I just can't seem to get this
right. After this one I gave up:
$/='<'.$ARGV[1].' loc="\w+">';

If anyone can assist me on this, I'd be very grateful.

Thanks,

Cate
 
P

Paul Lalli

Subject: [...] I need help with a regular expression please
I have a perl file that will read an XML file and search

STOP. Regular expressions are not the correct tool for this job.
Regular expressions cannot parse XML. Any solution you come up with
will be buggy and will fail in fantastic and mysterious ways when
presented with perfectly valid XML that you were not counting on.

Go to http://search.cpan.org. Search for 'XML'. There are about a
hundred different modules that you should be using. I would recommend
XML::Simple for an introductory module, until/unless you find that
it's too "simple" for your needs.
right. After this one I gave up:
$/='<'.$ARGV[1].' loc="\w+">';

$/ is a string. You cannot put a regular expression into it. (Though
see File::Stream for a way around that restriction)

Paul Lalli
 
A

alorinna

Thank you. This program has been working great for all of my normal
XML files (we break them down from 10-140mb original files into
manageable sizes using Perl) but this has simply thrown me for a loop.
I'll check out your suggestions, and really appreciate your advice.

Cate
 
P

Paul Lalli

Thank you. This program has been working great for all
of my normal XML files (we break them down from 10-140mb
original files into manageable sizes using Perl) but this
has simply thrown me for a loop.

..... which is exactly what I said would happen. That's why you don't
use Regexps for parsing XML. :)
I'll check out your suggestions, and really appreciate
your advice.

Good luck.

Paul Lalli
 
C

Clenna Lumina

Paul said:
Subject: [...] I need help with a regular expression please
I have a perl file that will read an XML file and search

STOP. Regular expressions are not the correct tool for this job.
Regular expressions cannot parse XML. Any solution you come up with
will be buggy and will fail in fantastic and mysterious ways when
presented with perfectly valid XML that you were not counting on.

Go to http://search.cpan.org. Search for 'XML'. There are about a
hundred different modules that you should be using. I would recommend
XML::Simple for an introductory module, until/unless you find that
it's too "simple" for your needs.
right. After this one I gave up:
$/='<'.$ARGV[1].' loc="\w+">';

$/ is a string. You cannot put a regular expression into it. (Though
see File::Stream for a way around that restriction)

Although being able to use $/ as a regex would be rather nifty, if you
think about it. True, it would mimic split() in some ways, but the big
difference would be every 'line' read would be delimited by the regex
rather than "\n", in which you'd have to read in either big chunks or
the whole file entirely and _then_ apply the split, because trying to
apply a split like this on normal ("\n" delimited) lines cannot catch
multi-line delimited data. At least nowhere as elegant and cleanly as
this potentially would.

Bottom line: applying regex as an $INPUT_RECORD_SEPARATOR would be
infinitely useful, and to me seems right up Perl's alley.

An idea for Perl 6 perhaps?
 
B

Brad Baxter

Paul said:
right. After this one I gave up:
$/='<'.$ARGV[1].' loc="\w+">';
$/ is a string. You cannot put a regular expression into it. (Though
see File::Stream for a way around that restriction)

Although being able to use $/ as a regex would be rather nifty, if you
think about it. True, it would mimic split() in some ways, but the big
difference would be every 'line' read would be delimited by the regex
rather than "\n", in which you'd have to read in either big chunks or
the whole file entirely and _then_ apply the split, because trying to
apply a split like this on normal ("\n" delimited) lines cannot catch
multi-line delimited data. At least nowhere as elegant and cleanly as
this potentially would.

Bottom line: applying regex as an $INPUT_RECORD_SEPARATOR would be
infinitely useful, and to me seems right up Perl's alley.

An idea for Perl 6 perhaps?


Remember: the value of $/ is a string, not a regex.
awk has to be better for something. :)
-- perlvar

Apparently, it was a (rejected) idea for Perl 5 way back when ...
 
C

Clenna Lumina

Brad said:
Paul said:
right. After this one I gave up:
$/='<'.$ARGV[1].' loc="\w+">';
$/ is a string. You cannot put a regular expression into it.
(Though see File::Stream for a way around that restriction)

Although being able to use $/ as a regex would be rather nifty, if
you think about it. True, it would mimic split() in some ways, but
the big difference would be every 'line' read would be delimited by
the regex rather than "\n", in which you'd have to read in either
big chunks or the whole file entirely and _then_ apply the split,
because trying to apply a split like this on normal ("\n" delimited)
lines cannot catch multi-line delimited data. At least nowhere as
elegant and cleanly as this potentially would.

Bottom line: applying regex as an $INPUT_RECORD_SEPARATOR would be
infinitely useful, and to me seems right up Perl's alley.

An idea for Perl 6 perhaps?


Remember: the value of $/ is a string, not a regex.
awk has to be better for something. :)
-- perlvar

Apparently, it was a (rejected) idea for Perl 5 way back when ...

Was there a reason it was rejected? I think the usefulness it would
represent would of made it a shoe in? I would imagine it would not be
difficult to implement that, given the existing regex framework. I would
assume:

$/ = qr/\d+/; [1]
while (<INFILE>) { ...

would differ little from:

while ($string =~ /\d+/g) { ...

only instead of keeping track of the position in a string you're keeping
track of the position of a file. I can understand the potential for
extraneous reads (especially in huge files), but even that could be
taken care of with the use of some sort of buffering system?

Well, it is just an idea and if it can be pulled off in an optimized
manner (where reads don't suffer (too much)) it could prove to be an
invaluable feature.

Thoughts?



[1]
I would assume in such a scenario, if $/ were given a normal string it
would revert to the normal (current) behavior.
 
C

Clenna Lumina

Tad said:
^^^^^^^^
^^^^^^^^ heh. So it's a case of mistaken identity huh?

Oh, so now I'm the only one on the planet who doesn't know the correct
spelling of "shoo-in", heh.. brilliant detective work once again...
(note the sarcasm.)

And you're called me a troll in another thread? You're the one who KEEPS
"flame-baiting" and quite frankly this IS a form of trolling. So who is
the real troll here? The one being accused of it without merit, or the
person acting like one?

All I see is you following my every step as I walk down the sidewalk to
see if I stumble, ready to shout "ah ha!" at the slightest miss-step,
and it make you look all the more foolish.
 
P

Paul Lalli

Paul Lalli wrote:

Although being able to use $/ as a regex would be rather nifty,
if you think about it.

As I said, see File::Stream, which allows you to do just that.

Paul Lalli
 
A

Art VanDelay

^^^^^^^^
^^^^^^^^ heh. So it's a case of mistaken identity huh?


I agree, "would have" would have been more grammatically correct, but
last time I checked, no one's perfect. Oh wait, except you, Mr.
Infallible? Right?

In the end, you're just trolling and from the looks of it, trying to
start crap-filled off-shoot like the one you started a few threads down.



So what?


http://www.google.com/search?as_q="shoe+in"&hl=en&num=100
http://groups.google.com/groups?as_q="shoe+in"&hl=en&num=100

1,010,000 occurrences, 54,100 in news groups alone. Hardly a mistake of
a single individual.

Half the world knows "shoe in" incorrectly, assuming it is incorrect.
Google doesn't even attempt to offer a correction.

http://www.google.com/search?as_q="shoo+in"&hl=en&num=100
http://groups.google.com/groups?as_q="shoo+in"&hl=en&num=100

354,000 from the web-side, 15,500 from news groups. What does this say?
That either
- a) "shoe in" is correct, or
- b) "shoe in" is far more common despite being incorrect.

Either way, it should not be surprising to see a mistake like that.
Making huge deals about it make you appear small and insecure, like a
school yard bully who has nothing better to do.



So I ask again: So what?



--
Art

P.S. I know I don't post here regularly, and there are some here who
might say I have no right to comment (I do read this group regularly),
but there are times when something is so ridiculously stupid, I feel
something just has to be said.
 
M

Mumia W.

Tad said:
^^^^^^^^
^^^^^^^^ heh. So it's a case of mistaken identity huh?

Oh, so now I'm the only one on the planet who doesn't know the correct
spelling of "shoo-in", heh.. brilliant detective work once again...
(note the sarcasm.)

And you're called me a troll in another thread? You're the one who KEEPS
"flame-baiting" and quite frankly [...]

Now I'm beginning to think you're a troll. Clenna, I advise you do just
drop it.

(My spidey sense tells me that you're a morph of PG, but my killfile
won't care.)
 
T

Tad McClellan

Mumia W. said:
Now I'm beginning to think you're a troll. Clenna,


Just now beginning?

You're way behind everybody who was here in 2002 when it was
here last...

I advise you do just
drop it.


That won't happen as long as people keep following up to its posts.

(My spidey sense tells me that you're a morph of PG,


Gurl is most definitely not the Jsut Troll, she is a more
"honorable" person, if that can be said of any troll, than
Jsut is.

She only displays 2 of the 6 Jsut telltales, while we have now
seen all 6 of them with Clenna Lumina and friends.

but my killfile
won't care.)


If you are willing to go through the captcha, you can get a couple
dozen of the addresses it has used in the past:

http://groups.google.com/group/comp.lang.perl.misc/msg/52eedd87f97ff558
 
A

Art VanDelay

I must be getting a different news feed then you, because Tad is the one who
keep bringing it up. Yet It's Clenna who needs to drop it? What the hell is
this?

That won't happen as long as people keep following up to its posts.


Do you even hear yourself? YOU, Tad, are the one who keeps coming back with
this crap. Why don't YOU drop it and admit you might be wrong, as you yet to
provide any real evidence Clenna is this troll you speak of. So far since
Clenna entered this group of recent, I've see him or her do nothing wrong,
so what is your real agenda here, hmmm?



Care to share with the class on that? What has Clenna done wrong since he or
she has started posting here as of late?

Gurl is most definitely not the Jsut Troll, she is a more
"honorable" person, if that can be said of any troll, than
Jsut is.


Again, what as Clenna done since she started to warrant this? How do you
KNOW this is the same person? You all keep saying this as if there is no
doubt, yet continue to fail to provide any evidence.

She only displays 2 of the 6 Jsut telltales, while we have now
seen all 6 of them with Clenna Lumina and friends.


You have failed to provider any proof Clenna is actually one of these
people. I did not intend to become his or her lawyer, but when I see someone
getting stoned in the court yard for just passing through, I think it's
worth it to speak up because obvious no one else will. Either they are too
afraid to get digitally stoned as well or don't care one way or another.

What you all seem to be doing just seems wrong, and none of you have even
provided any justification, nor any evidence. You tell Clenna to drop it yet
it's perfectly ok for you to keep bringing it up. Great sense of justice we
have here ...
 
C

Clenna Lumina

Mumia W. said:
[...]
And you're called me a troll in another thread? You're the one who KEEPS
"flame-baiting" and quite frankly [...]

Now I'm beginning to think you're a troll. Clenna, I advise you do just
drop it.

Excuse me? Care to explain your logic here? Tad started both sub threads so
far, for no for seeable reason, yet I'm the one who is being told to drop
it. Why don't you ask Tad to stop this, as he is the one who started it on
both times.
(My spidey sense tells me that you're a morph of PG, but my killfile won't
care.)

Can't speak to that, as I am not familiar with the complete history of this
group. I've only read it from time to time along with many other groups.
There is only one group I've really been involved in and it relates to DNS.

I don't care if you don't understand that you're wrong, or even have the
decency to admit you *could* be wrong or so much as backup any of your
claims. I've wasted enough time trying to defend myself from people who make
it their mission to make others feel miserable and unwelcome.

I do want to pose one question, however. I am familiar with one piece of
history here, from a thread I once reason in the Google groups archives some
time ago, regarding one kira/Purl Gurl/Godzilla. In that thread, as I
recall, this person referred to the greater whole of this group who was
after her as "Frank". Now tell me, Tad, are you not doing the same thing,
lumping everyone who comes along who fits your criteria (ie, making typos
like "jsut" ?) as this infamous "troll" did? Do you really want to say that
your stalking tactics are not the least bit trollish, by any definition most
news group denizens might give?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top