how to find such strings?

M

mozilla.bugzilla

hi, greeting,

I am a newer for Perl, here is my question.

This is the text I got from the server,

<form name="ecomm_frm" method="post"
action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
<input type="hidden" name="TARGET" value="Button" />
<input type="hidden" name="ARGUMENT" value="" />
<input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
/>


How can I extract the value ("wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs") for STATE
from this text ? The length of string for "value" is not a constant.
can you guys help me to figure this out? Thanks


bugzilla.
 
G

Gunnar Hjalmarsson

I am a newer for Perl,

It serves no good purpose to make that statement everytime you post.
This is the text I got from the server,

<form name="ecomm_frm" method="post"
action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
<input type="hidden" name="TARGET" value="Button" />
<input type="hidden" name="ARGUMENT" value="" />
<input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
/>

How can I extract the value ("wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs") for STATE
from this text ? The length of string for "value" is not a constant.

There are at least three approaches:

1) Use the substr() and index() functions.

perldoc -f substr
perldoc -f index

The length of the value string doesn't need to be constant for that:

my $ident = 'name="STATE" value="';
my $pos1 = index($text, $ident) + length $ident;
my $pos2 = index $text, '"', $pos1;
print substr($text, $pos1, $pos2-$pos1), "\n";

2) Capture it with a regex in the m// operator.

perldoc perlop (where the m// operator is described)

perldoc perlrequick
perldoc perlretut
perldoc perlre

Chris gave you an example of that.

3) Use a module for parsing HTML

http://search.cpan.org/search?query=HTML+parse

Even if the third approach gives you the most robust code, there is
always a risk that your solution fails if the structure of the document
changes.
 
A

A. Sinan Unur

(e-mail address removed) wrote in @z14g2000cwz.googlegroups.com:
I am a newer for Perl,

I guess the correct English would be "I am new to Perl". Please note
that I am a non-native speaker as well. I think correcting persistent
errors in language usage is very important in the learning process.

That said, no one here is interested in whether you are just picking up
Perl, or have written many books on the topic. We are interested in
seeing well thought-out questions, and enjoy answering such questions.
As the posting guidelines also suggest, mentioning experience level in
posts, and non-sensical subject lines do bias some of us (myself
included) toward not answering such posts.

Not to mention that your chosen ID resembles a certain person whose name
I shall not speak :)
This is the text I got from the server,

That looks like HTML to me.
<form name="ecomm_frm" method="post"
action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
<input type="hidden" name="TARGET" value="Button" />
<input type="hidden" name="ARGUMENT" value="" />
<input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
/>

How can I extract the value ("wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs") for
STATE from this text ?

I would suggest using an HTML parser. There are quite a few such modules
on CPAN.

Note that your chances of getting a useful response increase
exponentially if you post a reasonable amount of code showing your
attempt to first tackle the problem yourself.

Sinan
 
A

A. Sinan Unur

(e-mail address removed) wrote: .... ....
If all your text is in $text, then this should do it..

if ( $text =~ m!<input type="hidden" name="STATE" value="(.*?)"/>!s )
{
print "$1\n";
}

You should use an HTML parser to parse HTML:

#!/usr/bin/perl

use strict;
use warnings;

my $form = do { local $/; <DATA> };

if ( $form =~ m!<input type="hidden" name="STATE" value="(.*?)"/>!s ) {
print "$1\n";
}

__END__
<form name="ecomm_frm" method="post"
action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
<input type="hidden" name="TARGET" value="Button" />
<input type="hidden" name="ARGUMENT" value="" />
<input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
/>

D:\Home> ttt

D:\Home>

One can, instead, use a proper HTML to parse HTML:

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TokeParser::Simple;

my $form = do { local $/; <DATA> };

my $p = HTML::TokeParser::Simple->new(\$form);

while(my $t = $p->get_token) {
if( $t->is_start_tag('input')
and 'STATE' eq $t->get_attr('name') ) {
print $t->get_attr('value')."\n";
}
}

__END__
<form name="ecomm_frm" method="post"
action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
<input type="hidden" name="TARGET" value="Button" />
<input type="hidden" name="ARGUMENT" value="" />
<input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
/>

D:\Home> ttt
wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs

Incidentally, your signature delimiter is incorrect. It should be two
dashes followed a space on a line by itself.
 
B

Brian Wakem

Chris said:
If all your text is in $text, then this should do it..

if ( $text =~ m!<input type="hidden" name="STATE" value="(.*?)"/>!s ) {
print "$1\n";
}


That regex wont match as I believe there will be a space before the /

I would use:-

if ( $text =~ m!<input type="hidden" name="STATE" value="([^"]+)"!s ) {
print "$1\n";
}

as there may or may not be a space and the / is not guaranteed to be their
either. Of course an HTML parsing module would avoid all of those issues.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top