Using Perl to get data from website

Discussion in 'Perl Misc' started by fiazidris, Mar 7, 2008.

  1. fiazidris

    fiazidris Guest

    Previously, I have written a perl script to access data from this URL:

    http://www.bangkokflightservices.com/our_cargo_track.php

    Some sample: MAWB - Master Airwaybill Number

    724-26332482
    724-61480672
    724-61441122

    and this was the final URL:

    http://203.151.118.123:8090/showc_track.php?m_prefix=724&m_sn=26332482&h_prefix=HWB&h_sn=

    But, now there is a change on the website and I couldn't extract
    through the same script. One change I noticed is the URL has changed
    to:

    <iframe src="http://203.151.118.123:8090/showc_track.php?
    m_prefix=724&m_sn=26332482&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14db65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc072b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74be&ch=
    " frameborder="0" scrolling="yes" height="700" width="100%"> </iframe>

    How can I programmatically obtain data for a list of MAWBs.

    Here is a sample script that I wrote which previously worked:

    #!/usr/bin/perl

    while (<>) {
    chomp;

    $mprefix = substr($_, 0, 3);
    $msn = substr($_, 4, 8);

    if (length($mprefix) ne 3) { next; }

    $currurl = 'http://203.151.118.123:8090/showc_track.php?
    m_prefix=' . $mprefix . '&m_sn=' . $msn .
    '&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14db65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc072b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74be&ch=
    ';


    $currresult = qx{curl -s '$currurl'};

    while ( $currresult=~ m#(.*)#g ) {
    $currline=$1;

    if ($currline =~ m#style12#i) {

    $currline =~ m#.*>(.*?)<.*#i;
    $result = $result . " / " . $1;
    }

    }
    print "***$result\n";
    $result = '';
    }
    fiazidris, Mar 7, 2008
    #1
    1. Advertising

  2. fiazidris

    Ben Morrow Guest

    Quoth fiazidris <>:
    > Previously, I have written a perl script to access data from this URL:
    >
    > http://www.bangkokflightservices.com/our_cargo_track.php
    >
    > Some sample: MAWB - Master Airwaybill Number
    >
    > 724-26332482
    > 724-61480672
    > 724-61441122
    >
    > and this was the final URL:
    >
    > http://203.151.118.123:8090/showc_track.php?m_prefix=724&m_sn=
    > 26332482&h_prefix=HWB&h_sn=
    >
    > But, now there is a change on the website and I couldn't extract
    > through the same script. One change I noticed is the URL has changed
    > to:
    >

    [url trimmed]
    > <iframe src="http://203.151.118.123:8090/showc_track.php?
    > m_prefix=724&m_sn=26332482&h_prefix=HWB&h_sn=&ecy=e076438db64c61..."
    > frameborder="0" scrolling="yes" height="700" width="100%"> </iframe>
    >
    > How can I programmatically obtain data for a list of MAWBs.


    Yuck, what a horrible page. <input> without <form>... I would use
    something like

    #!/usr/bin/perl

    use WWW::Mechanize;

    my $baseurl =
    'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
    my $hawb = 'h_prefix=HAWB&h_sn=';

    my $M = WWW::Mechanize->new(auto_check => 1);

    while (<>) {
    chomp;

    my ($mprefix, $msn) = /(...)(........)/ or do {
    warn "invalid MAWB: '$_'";
    next;
    };

    $M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb");
    $M->follow_link(url_regex => qr/showc_track/);
    my $content = $M->content;

    # process $content as before
    }

    You may need to adjust the follow_link call if there are several links on
    the same page that match that regex; see perldoc WWW::Mechanize for the
    arguments. If the server checks the Referer, you may also need to ->get
    /our_cargo_track.php first.

    Ben
    Ben Morrow, Mar 7, 2008
    #2
    1. Advertising

  3. fiazidris

    ifiaz Guest

    You may need to adjust the follow_link call if there are several links
    on
    the same page that match that regex; see perldoc WWW::Mechanize for
    the
    arguments. If the server checks the Referer, you may also need to -
    >get

    /our_cargo_track.php first.

    Ben
    ----

    Thank you for your prompt response.

    When I used the code with minor modifications, I still have the
    problem that I can't access the data as the process throws me to
    another page as below.

    This is what the $content contains:

    <script> window.open ('http://www.bangkokflightservices.com/
    our_cargo_track.php') ;
    setTimeout("window.close();", 10);
    </script>

    How to get to the actual data page. Please guide me here as I am a
    newbie.

    I don't know how to implement Referer and all that.


    ### This is the complete code I used.
    #!/usr/bin/perl

    use WWW::Mechanize;

    my $baseurl =
    'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
    my $hawb = 'h_prefix=HAWB&h_sn=';

    my $M = WWW::Mechanize->new(auto_check => 1);

    ## Added code for testing Only
    my $F = WWW::Mechanize->new(auto_check => 1);
    $F->get("http://www.bangkokflightservices.com/our_cargo_track.php");
    my $contentF = $F->content;
    #print "$contentF\n";
    #$M->add_header("Referer => 'http://www.bangkokflightservices.com/
    our_cargo_track.php'" )

    while (<>) {
    chomp;

    my ($mprefix, $msn) = /(...)-(........)/ or do {
    warn "invalid MAWB: '$_'";
    next;
    };

    print "$mprefix $msn\n";

    $M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb");
    $M->follow_link(url_regex => qr/showc_track/);
    my $content = $M->content;

    print "$content\n"; # for debugging

    # process $content as before
    #
    while ( $content =~ m#(.*)#g ) {
    $currline=$1;

    if ($currline =~ m#style12#i) {

    $currline =~ m#.*>(.*?)<.*#i;
    $result = $result . " / " . $1;
    }
    }
    print "***$result\n";
    $result = '';
    }
    ifiaz, Mar 7, 2008
    #3
  4. fiazidris

    ifiaz Guest

    Also, please so you know,

    my $baseurl =
    'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
    my $hawb = 'h_prefix=HAWB&h_sn=';

    h_prefix should be HWB and not HAWB.

    I have fixed that in my code and still the same problem that it throws
    me to a different page.



    On Mar 7, 9:46 pm, ifiaz <> wrote:
    > You may need to adjust the follow_link call if there are several links
    > on
    > the same page that match that regex; see perldoc WWW::Mechanize for
    > the
    > arguments. If the server checks the Referer, you may also need to ->get
    >
    > /our_cargo_track.php first.
    >
    > Ben
    > ----
    >
    > Thank you for your prompt response.
    >
    > When I used the code with minor modifications, I still have the
    > problem that I can't access the data as the process throws me to
    > another page as below.
    >
    > This is what the $content contains:
    >
    > <script> window.open ('http://www.bangkokflightservices.com/
    > our_cargo_track.php') ;
    > setTimeout("window.close();", 10);
    > </script>
    >
    > How to get to the actual data page. Please guide me here as I am a
    > newbie.
    >
    > I don't know how to implement Referer and all that.
    >
    > ### This is the complete code I used.
    > #!/usr/bin/perl
    >
    > use WWW::Mechanize;
    >
    > my $baseurl =
    > 'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
    > my $hawb = 'h_prefix=HAWB&h_sn=';
    >
    > my $M = WWW::Mechanize->new(auto_check => 1);
    >
    > ## Added code for testing Only
    > my $F = WWW::Mechanize->new(auto_check => 1);
    > $F->get("http://www.bangkokflightservices.com/our_cargo_track.php");
    > my $contentF = $F->content;
    > #print "$contentF\n";
    > #$M->add_header("Referer => 'http://www.bangkokflightservices.com/
    > our_cargo_track.php'" )
    >
    > while (<>) {
    > chomp;
    >
    > my ($mprefix, $msn) = /(...)-(........)/ or do {
    > warn "invalid MAWB: '$_'";
    > next;
    > };
    >
    > print "$mprefix $msn\n";
    >
    > $M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb");
    > $M->follow_link(url_regex => qr/showc_track/);
    > my $content = $M->content;
    >
    > print "$content\n"; # for debugging
    >
    > # process $content as before
    > #
    > while ( $content =~ m#(.*)#g ) {
    > $currline=$1;
    >
    > if ($currline =~ m#style12#i) {
    >
    > $currline =~ m#.*>(.*?)<.*#i;
    > $result = $result . " / " . $1;
    > }
    > }
    > print "***$result\n";
    > $result = '';
    >
    > }
    ifiaz, Mar 8, 2008
    #4
  5. fiazidris

    fiazidris Guest

    On Mar 8, 10:34 pm, ifiaz <> wrote:
    > Also, please so you know,
    >
    > my $baseurl =
    > 'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
    > my $hawb = 'h_prefix=HAWB&h_sn=';
    >
    > h_prefix should be HWB and not HAWB.
    >
    > I have fixed that in my code and still the same problem that it throws
    > me to a different page.
    >


    I have reached to a level where the following URL works on a browser:
    prefix and serials can be changed.

    http://203.151.118.123:8090/showc_t...94fd938ea4dd2b28c53fea6af74be&ch=%A0%A0%A0%A0

    but this URL doesn't return results using perl or curl.

    Ben Morrow, please help.
    fiazidris, Mar 10, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Antwerp
    Replies:
    0
    Views:
    1,205
    Antwerp
    Feb 15, 2005
  2. yma

    How to get data from a website?

    yma, Jul 2, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    328
    Chance Hopkins
    Jul 2, 2003
  3. Manuel

    Get data from another website.

    Manuel, Oct 29, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    1,162
    Joe Fallon
    Oct 30, 2004
  4. John
    Replies:
    13
    Views:
    347
  5. SVCitian

    perl curl get data from website

    SVCitian, Oct 16, 2010, in forum: Perl Misc
    Replies:
    15
    Views:
    736
Loading...

Share This Page