how to find such strings?

Discussion in 'Perl Misc' started by mozilla.bugzilla@gmail.com, Jul 10, 2005.

  1. Guest

    hi, greeting,

    I am a newer for Perl, here is my question.

    This is the text I got from the server,

    <form name="ecomm_frm" method="post"
    action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
    <input type="hidden" name="TARGET" value="Button" />
    <input type="hidden" name="ARGUMENT" value="" />
    <input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
    />


    How can I extract the value ("wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs") for STATE
    from this text ? The length of string for "value" is not a constant.
    can you guys help me to figure this out? Thanks


    bugzilla.
     
    , Jul 10, 2005
    #1
    1. Advertising

  2. wrote:
    > I am a newer for Perl,


    It serves no good purpose to make that statement everytime you post.

    > This is the text I got from the server,
    >
    > <form name="ecomm_frm" method="post"
    > action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
    > <input type="hidden" name="TARGET" value="Button" />
    > <input type="hidden" name="ARGUMENT" value="" />
    > <input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
    > />
    >
    > How can I extract the value ("wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs") for STATE
    > from this text ? The length of string for "value" is not a constant.


    There are at least three approaches:

    1) Use the substr() and index() functions.

    perldoc -f substr
    perldoc -f index

    The length of the value string doesn't need to be constant for that:

    my $ident = 'name="STATE" value="';
    my $pos1 = index($text, $ident) + length $ident;
    my $pos2 = index $text, '"', $pos1;
    print substr($text, $pos1, $pos2-$pos1), "\n";

    2) Capture it with a regex in the m// operator.

    perldoc perlop (where the m// operator is described)

    perldoc perlrequick
    perldoc perlretut
    perldoc perlre

    Chris gave you an example of that.

    3) Use a module for parsing HTML

    http://search.cpan.org/search?query=HTML parse

    Even if the third approach gives you the most robust code, there is
    always a risk that your solution fails if the structure of the document
    changes.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jul 10, 2005
    #2
    1. Advertising

  3. wrote in news:1120951290.667637.235660
    @z14g2000cwz.googlegroups.com:

    > I am a newer for Perl,


    I guess the correct English would be "I am new to Perl". Please note
    that I am a non-native speaker as well. I think correcting persistent
    errors in language usage is very important in the learning process.

    That said, no one here is interested in whether you are just picking up
    Perl, or have written many books on the topic. We are interested in
    seeing well thought-out questions, and enjoy answering such questions.
    As the posting guidelines also suggest, mentioning experience level in
    posts, and non-sensical subject lines do bias some of us (myself
    included) toward not answering such posts.

    Not to mention that your chosen ID resembles a certain person whose name
    I shall not speak :)

    > This is the text I got from the server,


    That looks like HTML to me.

    > <form name="ecomm_frm" method="post"
    > action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
    > <input type="hidden" name="TARGET" value="Button" />
    > <input type="hidden" name="ARGUMENT" value="" />
    > <input type="hidden" name="STATE"

    value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
    > />
    >
    > How can I extract the value ("wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs") for
    > STATE from this text ?


    I would suggest using an HTML parser. There are quite a few such modules
    on CPAN.

    Note that your chances of getting a useful response increase
    exponentially if you post a reasonable amount of code showing your
    attempt to first tackle the problem yourself.

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Jul 10, 2005
    #3
  4. Chris Lowth <> wrote in
    news:LNZze.26101$:

    > wrote:

    ....
    >> <form name="ecomm_frm" method="post"
    >> action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
    >> <input type="hidden" name="TARGET" value="Button" />
    >> <input type="hidden" name="ARGUMENT" value="" />
    >> <input type="hidden" name="STATE"
    >> value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs" />
    >>
    >>
    >> How can I extract the value ("wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs") for
    >> STATE from this text ? The length of string for "value" is not a
    >> constant. can you guys help me to figure this out? Thanks

    ....
    > If all your text is in $text, then this should do it..
    >
    > if ( $text =~ m!<input type="hidden" name="STATE" value="(.*?)"/>!s )
    > {
    > print "$1\n";
    > }


    You should use an HTML parser to parse HTML:

    #!/usr/bin/perl

    use strict;
    use warnings;

    my $form = do { local $/; <DATA> };

    if ( $form =~ m!<input type="hidden" name="STATE" value="(.*?)"/>!s ) {
    print "$1\n";
    }

    __END__
    <form name="ecomm_frm" method="post"
    action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
    <input type="hidden" name="TARGET" value="Button" />
    <input type="hidden" name="ARGUMENT" value="" />
    <input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
    />

    D:\Home> ttt

    D:\Home>

    One can, instead, use a proper HTML to parse HTML:

    #!/usr/bin/perl

    use strict;
    use warnings;

    use HTML::TokeParser::Simple;

    my $form = do { local $/; <DATA> };

    my $p = HTML::TokeParser::Simple->new(\$form);

    while(my $t = $p->get_token) {
    if( $t->is_start_tag('input')
    and 'STATE' eq $t->get_attr('name') ) {
    print $t->get_attr('value')."\n";
    }
    }

    __END__
    <form name="ecomm_frm" method="post"
    action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
    <input type="hidden" name="TARGET" value="Button" />
    <input type="hidden" name="ARGUMENT" value="" />
    <input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
    />

    D:\Home> ttt
    wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs

    > --
    > http://www.lowth.com/rope - Scriptable IP packet match logic for
    > linux/iptables.


    Incidentally, your signature delimiter is incorrect. It should be two
    dashes followed a space on a line by itself.

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Jul 10, 2005
    #4
  5. Brian Wakem Guest

    Chris Lowth wrote:

    > wrote:
    >> hi, greeting,
    >>
    >> I am a newer for Perl, here is my question.
    >>
    >> This is the text I got from the server,
    >>
    >> <form name="ecomm_frm" method="post"
    >> action="process.aspx?c=us&amp;l=en&amp" id="ecomm_frm">
    >> <input type="hidden" name="TARGET" value="Button" />
    >> <input type="hidden" name="ARGUMENT" value="" />
    >> <input type="hidden" name="STATE" value="wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs"
    >> />
    >>
    >>
    >> How can I extract the value ("wxMDcyMzEwNzIyO3Q8O2w8aTwwPjs") for STATE
    >> from this text ? The length of string for "value" is not a constant.
    >> can you guys help me to figure this out? Thanks
    >>
    >>
    >> bugzilla.

    >
    > If all your text is in $text, then this should do it..
    >
    > if ( $text =~ m!<input type="hidden" name="STATE" value="(.*?)"/>!s ) {
    > print "$1\n";
    > }



    That regex wont match as I believe there will be a space before the /

    I would use:-

    if ( $text =~ m!<input type="hidden" name="STATE" value="([^"]+)"!s ) {
    print "$1\n";
    }

    as there may or may not be a space and the / is not guaranteed to be their
    either. Of course an HTML parsing module would avoid all of those issues.


    --
    Brian Wakem
     
    Brian Wakem, Jul 10, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. stefoid
    Replies:
    14
    Views:
    844
    ddimitrov
    Jul 6, 2006
  2. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    813
    Malcolm
    Jun 24, 2006
  3. Wybo Dekker
    Replies:
    1
    Views:
    398
    Yukihiro Matsumoto
    Nov 15, 2005
  4. vdvorkin
    Replies:
    0
    Views:
    440
    vdvorkin
    Feb 10, 2011
  5. vdvorkin
    Replies:
    3
    Views:
    868
    vdvorkin
    Feb 14, 2011
Loading...

Share This Page