perl curl get data from website

Discussion in 'Perl Misc' started by SVCitian, Oct 16, 2010.

  1. SVCitian

    SVCitian Guest

    These 3 URLs work on a browser.. and return the same results... both
    Firefox and IE.

    But, I want to retrieve this programmatically using curl or perl..
    with the prefix and sn serial number changed each time... How can i
    make it work..

    Can you provide a simple curl command line .. or perl get http.. to
    demonstrate the retrieval.. thanks.


    http://www.bangkokflightservices.co...94fd938ea4dd2b28c53fea6af74be&ch=%A0%A0%A0%A0

    http://www.bangkokflightservices.co...efix=176&m_sn=75064953&h_prefix=HWB&h_sn=&ch=

    http://www.bangkokflightservices.co...m_prefix=176&m_sn=75064953&h_prefix=HWB&h_sn=
    SVCitian, Oct 16, 2010
    #1
    1. Advertising

  2. SVCitian <> wrote:
    >These 3 URLs work on a browser.. and return the same results... both
    >Firefox and IE.
    >
    >But, I want to retrieve this programmatically using curl or perl..
    >with the prefix and sn serial number changed each time... How can i
    >make it work..
    >
    >Can you provide a simple curl command line .. or perl get http.. to
    >demonstrate the retrieval.. thanks.


    See the FAQ: perldoc -q "HTML file"
    "How do I fetch an HTML file?"

    jue
    Jürgen Exner, Oct 16, 2010
    #2
    1. Advertising

  3. SVCitian

    SVCitian Guest

    On Oct 16, 9:28 pm, Jürgen Exner <> wrote:
    > SVCitian <> wrote:
    > >These 3 URLs work on a browser.. and return the same results... both
    > >Firefox and IE.

    >
    > >But, I want to retrieve this programmatically using curl or perl..
    > >with the prefix and sn serial number changed each time... How can i
    > >make it work..

    >
    > >Can you provide a simple curl command line .. or perl get http.. to
    > >demonstrate the retrieval.. thanks.

    >
    > See the FAQ: perldoc -q "HTML file"
    >         "How do I fetch an HTML file?"
    >
    > jue


    Actually i know how to use curl with in perl or use perl html
    commands.

    But, the problem is the above URL doesn't work even in the simplest
    case of:

    curl "http://www.google.com/url?sa=D&q=http://
    www.bangkokflightservices.com/our_cargo_track%26trace.php%3Fm_prefix%3D176%26m_sn%3D75064953%26h_prefix%3DHWB%26h_sn%3D&usg=AFQjCNFh02ikp7CSs9lxi_S7ec0Edw9m5g"

    I even tried to user "tamper data" firefox add to get behind the
    scenes of GET, POST, etc... but I can't proceed any further than the
    URLs given above.

    why? that may be something to do with ajax, cookie, user agent, or
    whatever. I have tried some combinations, but none works.

    It works on the browser just right out of the box.. even changing the
    prefix and serial numbers.

    So, i want to find out what i am missing that hinders the data
    retrieval.
    SVCitian, Oct 17, 2010
    #3
  4. SVCitian <> wrote:
    >But, the problem is the above URL doesn't work even in the simplest
    >case of:
    >
    >curl "http://www.google.com/url?sa=D&q=http://
    >www.bangkokflightservices.com/our_cargo_track%26trace.php%3Fm_prefix%3D176%26m_sn%3D75064953%26h_prefix%3DHWB%26h_sn%3D&usg=AFQjCNFh02ikp7CSs9lxi_S7ec0Edw9m5g"
    >
    >I even tried to user "tamper data" firefox add to get behind the
    >scenes of GET, POST, etc... but I can't proceed any further than the
    >URLs given above.


    An HTTP request using that URL above returns

    <!-- This page yong codeing. Please don't copy idea or code before owne
    argee. if you copy this code than see it's you code and you project. you
    is fucking man -->
    <script> window.open
    ('http://www.bangkokflightservices.com/our_cargo_track.php') ;
    setTimeout("window.close();", 10);
    </script>

    Were you expecting something different?

    >It works on the browser just right out of the box.. even changing the
    >prefix and serial numbers.


    Please define "works"/"doesn't work".
    I am getting just a blank page (FireFox 3.6.8; yeah, I am going to
    update now) which is not surprising given the HTTP response above.

    To me that is "doesn't work", but of course YMMV.

    >So, i want to find out what i am missing that hinders the data
    >retrieval.


    Are you getting something different from your script?

    jue
    Jürgen Exner, Oct 17, 2010
    #4
  5. SVCitian

    SVCitian Guest

    On Oct 17, 11:56 am, Jürgen Exner <> wrote:
    > SVCitian <> wrote:
    > >But, the problem is the above URL doesn't work even in the simplest
    > >case of:

    >
    > >curl "http://www.google.com/url?sa=D&q=http://
    > >www.bangkokflightservices.com/our_cargo_track%26trace.php%3Fm_prefix%..."

    >
    > >I even tried to user "tamper data" firefox add to get behind the
    > >scenes of GET, POST, etc... but I can't proceed any further than the
    > >URLs given above.

    >
    > An HTTP request using that URL above returns
    >
    > <!--  This page yong codeing. Please don't copy idea or code before owne
    > argee. if you copy this code than see it's you code and you project. you
    > is fucking man  -->
    >                 <script> window.open
    > ('http://www.bangkokflightservices.com/our_cargo_track.php') ;
    >                         setTimeout("window.close();", 10);
    >                 </script>
    >
    > Were you expecting something different?
    >
    > >It works on the browser just right out of the box.. even changing the
    > >prefix and serial numbers.

    >
    > Please define "works"/"doesn't work".
    > I am getting just a blank page (FireFox 3.6.8; yeah, I am going to
    > update now) which is not surprising given the HTTP response above.
    >
    > To me that is "doesn't work", but of course YMMV.
    >
    > >So, i want to find out what i am missing that hinders the data
    > >retrieval.

    >
    > Are you getting something different from your script?
    >
    > jue


    i am afraid we are not having the same response in our browsers.. due
    to cookies or whatever.
    Please try this,
    shortened version: http://goo.gl/FlGU
    full version:
    http://www.google.com/url?sa=D&q=ht...%A0%A0&usg=AFQjCNHTMCnorOy2WILngV1qdOWYyp-gkg

    what you expect to see is:
    about 7 lines of some transaction records from 14/10/2010 05:37pm to
    09:54pm.. if you don't get this result in your end then you will have
    to start from scratch.. then i suggest you go to the

    Homepage: http://www.bangkokflightservices.com/our_cargo_track.php

    and put 176 - 75064953 in MAWB suffix and prefix and click "Search"


    I want to get the exact same results of the resulting page through
    curl or perl http.

    It doesn't work for me.. when I put the above URL (this is how far I
    have reached).. using
    curl "... above url..."
    but, it works for me in firefox with the same URL.

    Let me know if you need more clarification.


    Yes.. it returns this in some occasions... but this is not what i
    expect.. i expect about 7 lines of transaction records. You will know
    what I mean when you start from scratch with the URL above and put
    prefix and suffix yourselves.

    <!-- This page yong codeing. Please don't copy idea or code before
    owne
    argee. if you copy this code than see it's you code and you project.
    you
    is fucking man -->


    Thanks.
    SVCitian, Oct 17, 2010
    #5
  6. On 2010-10-17, Tad McClellan <> wrote:
    > You might want to try it with the Web Scraping Proxy:
    >
    > http://www2.research.att.com/sw/tools/wsp/
    >
    > which is nice because it logs the traffic in the form of
    > Perl code that you can copy/paste/modify to suit your needs.


    A username and password are being requested by
    http://www2.research.att.com. The site says: "Enter Password"

    Now what?
    Ilya
    Ilya Zakharevich, Oct 17, 2010
    #6
  7. On 2010-10-17, Tad McClellan <> wrote:
    >> A username and password are being requested by
    >> http://www2.research.att.com. The site says: "Enter Password"


    >> Now what?


    > Don't overlook the BOLD text on the page I linked to.
    >
    > It's easy to do, I overlooked it at first too :)


    A wonderful example of steganography. And very well balanced - if it
    were 4 times longer, I would run a "Search" on the page...

    Thanks,
    Ilya
    Ilya Zakharevich, Oct 17, 2010
    #7
  8. SVCitian

    SVCitian Guest

    On Oct 17, 10:21 pm, Tad McClellan <> wrote:
    > SVCitian <> wrote:
    > > I even tried to user "tamper data" firefox add to get behind the
    > > scenes of GET, POST, etc... but I can't proceed any further than the
    > > URLs given above.

    >
    > > why? that may be something to do with ajax, cookie, user agent, or
    > > whatever.

    >
    > You might want to try it with the Web Scraping Proxy:
    >
    >    http://www2.research.att.com/sw/tools/wsp/
    >
    > which is nice because it logs the traffic in the form of
    > Perl code that you can copy/paste/modify to suit your needs.
    >
    > --
    > Tad McClellan
    > email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
    > The above message is a Usenet post.
    > I don't recall having given anyone permission to use it on a Web site.


    I have tried this long time back, and I couldn't make it work and also
    failed with the attempt.

    This in itself generated a whole search in forums for making it work.

    If anyone out there who has used wsp (and still have it on their
    computers), could you run my site through it and advise your findings.
    I think it just takes few minutes of your time if you have already
    made the wsp work for you.

    Will appreciate your assistance.

    Thank you.
    SVCitian, Oct 18, 2010
    #8
  9. SVCitian

    Guest

    On Mon, 18 Oct 2010 05:58:42 -0700 (PDT), SVCitian <> wrote:

    >On Oct 17, 10:21 pm, Tad McClellan <> wrote:
    >> SVCitian <> wrote:
    >> > I even tried to user "tamper data" firefox add to get behind the
    >> > scenes of GET, POST, etc... but I can't proceed any further than the
    >> > URLs given above.

    >>
    >> > why? that may be something to do with ajax, cookie, user agent, or
    >> > whatever.

    >>
    >> You might want to try it with the Web Scraping Proxy:
    >>
    >>    http://www2.research.att.com/sw/tools/wsp/
    >>
    >> which is nice because it logs the traffic in the form of
    >> Perl code that you can copy/paste/modify to suit your needs.
    >>
    >> --
    >> Tad McClellan
    >> email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
    >> The above message is a Usenet post.
    >> I don't recall having given anyone permission to use it on a Web site.

    >
    >I have tried this long time back, and I couldn't make it work and also
    >failed with the attempt.
    >
    >This in itself generated a whole search in forums for making it work.
    >
    >If anyone out there who has used wsp (and still have it on their
    >computers), could you run my site through it and advise your findings.
    >I think it just takes few minutes of your time if you have already
    >made the wsp work for you.
    >
    >Will appreciate your assistance.
    >
    >Thank you.
    >


    I think the key to using a buggy wsp.pl is to install openssl. Even then,
    its buggy as there's so much dependency on browser settings and caches.
    Might have to use a seperate machine for the proxy. I used it locally 127.0.0.1
    and enabled browser lowest security/privacy, disabled all advanced options.
    Still buggy, have to end process in task manager.

    After disabling everything in advanced options in IE6 (it had problems with
    png file downloads), this was captured with obtuse line breaks and possible
    unknown encoding (probably utf-8).

    I'm sure this won't help.

    -sln

    --- Proxy server running on rcx port: 5364

    # Request:
    http://www.bangkokflightservices.com/TrackTrace/showc_track.php?m_prefix=176&m_s
    n=75064953&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14d
    b65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc0
    72b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3
    fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74
    be&ch=%A0%A0%A0%A0
    # Cookie (NO Set-Cookie): 'PHPSESSID', '1831c0a805050e73bff5a54e0fa017d5
    '
    $request = new HTTP::Request('GET' =>
    "http://www.bangkokflightservices.com/TrackTrace/showc_track.php?m_prefix=176&m_
    sn=75064953&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14
    db65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc
    072b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b
    3fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af7
    4be&ch=%A0%A0%A0%A0");
    # Table 1: 11 rows; table nesting: 5
    # Saving web page as w4

    # Request:
    http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_prefix=176&m_sn
    =75064953&h_prefix=HWB&h_sn=&ch= &id=0.015485021941311072
    # Referer:
    http://www.bangkokflightservices.com/TrackTrace/showc_track.php?m_prefix=176&m_s
    n=75064953&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14d
    b65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc0
    72b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3
    fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74
    be&ch=%A0%A0%A0%A0
    # Cookie: 'PHPSESSID', '1831c0a805050e73bff5a54e0fa017d5
    '
    $request = new HTTP::Request('GET' =>
    "http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_prefix=176&m_s
    n=75064953&h_prefix=HWB&h_sn=&ch= &id=0.015485021941311072");
    # Table 1: 5 rows
    # Table 2: 9 rows
    # Saving web page as w5
    , Oct 18, 2010
    #9
  10. SVCitian

    SVCitian Guest

    On Oct 19, 2:26 am, wrote:
    > On Mon, 18 Oct 2010 05:58:42 -0700 (PDT),SVCitian<> wrote:
    > >On Oct 17, 10:21 pm, Tad McClellan <> wrote:
    > >>SVCitian<> wrote:
    > >> > I even tried to user "tamper data" firefox add to get behind the
    > >> > scenes of GET, POST, etc... but I can't proceed any further than the
    > >> > URLs given above.


    I have no clue of how to make heads or tails of the result.

    If you could post the result in a more helpful format.. I would
    appreciate it.

    Thanks.
    SVCitian, Oct 19, 2010
    #10
  11. SVCitian

    Guest

    On Tue, 19 Oct 2010 07:17:14 -0700 (PDT), SVCitian <> wrote:

    >On Oct 19, 2:26 am, wrote:
    >> On Mon, 18 Oct 2010 05:58:42 -0700 (PDT),SVCitian<> wrote:
    >> >On Oct 17, 10:21 pm, Tad McClellan <> wrote:
    >> >>SVCitian<> wrote:
    >> >> > I even tried to user "tamper data" firefox add to get behind the
    >> >> > scenes of GET, POST, etc... but I can't proceed any further than the
    >> >> > URLs given above.

    >
    >I have no clue of how to make heads or tails of the result.
    >
    >If you could post the result in a more helpful format.. I would
    >appreciate it.
    >
    >Thanks.


    Read the 1 page DDJ article, then the simple WSP page.

    http://www.drdobbs.com/184405362;jsessionid=0YIUE10HQDGWRQE1GHPSKHWATMY32JVN?queryText=wsp
    http://www2.research.att.com/sw/tools/wsp/

    The emitted LWP 'Get' => "long ass string"); lines (there are 2) are used to get the web page
    information you were looking for.

    Simple as that.

    If you want to run the wsp proxy yourself, its not that hard to do.
    You need to install the OpenSSL binary (the source is available), the
    Net::SSLeay (via ppm repository because it has other module dependencies),
    then run the wsp (version 2) downloaded from above site.
    You can run it locally. Set the browser lan connection to proxy, give
    it 127.0.0.1 (local host) with the default port wsp.pl runs on (5364).

    Clear your browser cache and cookies, set advanced options to not
    download pictures, and for the scary part, lower all security and privacy settings.
    If you use a gigantic hosts file (its a text file) that filters spam ip's,
    back it up, then clear the original.

    The main reason the proxy scraper is valuable is it lets the page's javascript
    do its thing (especially with cookies) then creating the result in a POST/GET
    lwp commands, bypassing the need to worry about js. Plus it takes the guess work
    out of duplicating the sequence so it can then be automated with LWP.

    -sln
    , Oct 19, 2010
    #11
  12. SVCitian

    Guest

    On Tue, 19 Oct 2010 07:17:14 -0700 (PDT), SVCitian <> wrote:

    >On Oct 19, 2:26 am, wrote:
    >> On Mon, 18 Oct 2010 05:58:42 -0700 (PDT),SVCitian<> wrote:
    >> >On Oct 17, 10:21 pm, Tad McClellan <> wrote:
    >> >>SVCitian<> wrote:
    >> >> > I even tried to user "tamper data" firefox add to get behind the
    >> >> > scenes of GET, POST, etc... but I can't proceed any further than the
    >> >> > URLs given above.

    >
    >I have no clue of how to make heads or tails of the result.
    >
    >If you could post the result in a more helpful format.. I would
    >appreciate it.
    >
    >Thanks.


    This may be better, my first afternoon with LWP.

    -sln

    ----------------------
    use strict;
    use warnings;

    use HTML::TableExtract;
    use HTTP::Cookies;
    use HTTP::Request::Common qw(POST GET);
    use LWP::UserAgent;

    my $show_content = 0;
    my ($content1, $content2);

    # Create cookies
    my $jar = HTTP::Cookies->new();

    # Create user agent
    my $ua = LWP::UserAgent->new();
    $ua->timeout( 10 );
    $ua->cookie_jar( $jar );
    $ua->agent("Microsoft Internet Explorer/6.0");


    # Create a first request: "get track table"
    # ---------

    my $request = HTTP::Request->new('GET' =>
    join '', qw{
    http://www.bangkokflightservices.com/TrackTrace/showc_track.php?m_prefix=176&m_s
    n=75064953&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14d
    b65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc0
    72b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3
    fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74
    be&ch=%A0%A0%A0%A0 } );

    # Pass request to agent

    my $res = $ua->request( $request );
    if ( $res->is_success ) {
    print "\nContent-1 .. OK\n\n";
    if ($show_content) {
    print $res->content, "\n\n";
    }
    $content1 = $res->content;
    }
    else {
    print "Request-1 Failed\n";
    print $res->status_line, "\n\n";
    die;
    }

    print '='x20, "\n\n";


    # Create asecond request: "get search table"
    # ---------

    $request = HTTP::Request->new('GET' =>
    join '', qw{
    http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_prefix=176&m_s
    n=75064953&h_prefix=HWB&h_sn=&ch= } );

    # Pass the request to agent

    $res = $ua->request( $request );
    if ( $res->is_success ) {
    print "Content-2 .. OK\n\n";
    if ($show_content) {
    print $res->content, "\n\n";
    }
    $content2 = $res->content;
    }
    else {
    print "Request-2 Failed\n";
    print $res->status_line, "\n\n";
    die;
    }

    print '='x20, "\n\n";
    print "Done!\n\n\n";
    print "Content 1 tables:\n", '-'x20, "\n\n";
    print_tables( $content1 );
    print "\nContent 2 tables:\n", '-'x20, "\n\n";
    print_tables( $content2 );

    exit;

    ## Table extract Util from wsp
    ##
    sub print_tables {
    my ($table, $row, $cell);
    my $tc = 0;
    my $table_extractor = HTML::TableExtract->new();
    $table_extractor->parse($_[0]);
    foreach $table ($table_extractor->table_states) {
    print "TABLE $tc:\n"; $tc++;
    my $rc = 0;
    foreach $row ($table->rows) {
    print "ROW $rc:\n"; $rc++;
    foreach $cell ( @$row ) {
    $cell = '' unless defined $cell;
    $cell =~ s/\n/ /g;
    $cell =~ s/[ \t]+/ /g;
    $cell =~ s/^[ \t]//;
    $cell =~ s/[ \t]$//;
    $cell =~ s/ *<\/td *//g;
    print "$cell|";
    }
    print "\n";
    }
    }
    }
    __END__


    Content-1 .. OK

    ====================

    Content-2 .. OK

    ====================

    Done!


    Content 1 tables:
    --------------------

    TABLE 0:
    ROW 0:
    |á||
    TABLE 1:
    ROW 0:
    á|||
    ROW 1:
    á||á|
    ROW 2:
    á|||
    TABLE 2:
    ROW 0:
    |
    ROW 1:
    |
    TABLE 3:
    ROW 0:
    á|
    ROW 1:
    |
    TABLE 4:
    ROW 0:
    |
    ROW 1:
    |
    TABLE 5:
    ROW 0:
    |

    Content 2 tables:
    --------------------

    TABLE 0:
    ROW 0:
    á||||
    ROW 1:
    á|Enter Master Air Waybill (MAWB)|
    ROW 2:
    Optional (For Import MAWB Only)|
    ROW 3:
    á||||
    ROW 4:
    ||* Master Air Waybill number example 123 - 12345678||
    TABLE 1:
    ROW 0:
    |||||||||||
    ROW 1:
    Item|AWB No|Flight No|Flight Date|Origin|Dest|ULD No|Status|Pieces|Weight|Time|
    ROW 2:
    1|176-75064953|EK 419|Oct 15 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
    743.00|Oct 14 2010 5:37PM|
    ROW 3:
    2|176-75064953|EK 419|Oct 15 2010|BKK|DXB|á|Accepted|3|743.00|Oct 14 2010 5:37PM
    |
    ROW 4:
    3|176-75064953|EK 373|Oct 15 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
    743.00|Oct 14 2010 6:12PM|
    ROW 5:
    4|176-75064953|EK 373|Oct 15 2010|BKK|DXB|SHCá|Export Transshipment|3|743.00|Oct
    14 2010 6:12PM|
    ROW 6:
    5|176-75064953|EK 373|Oct 14 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
    743.00|Oct 14 2010 6:42PM|
    ROW 7:
    6|176-75064953|EK 373|Oct 14 2010|BKK|DXB|PMC31131EKá|Manifested|3|743.00|Oct 14
    2010 6:57PM|
    ROW 8:
    7|176-75064953|EK 373|Oct 14 2010|BKK|DXB|á|Departed|3|743.00|Oct 14 2010 9:54PM
    |
    , Oct 20, 2010
    #12
  13. SVCitian

    Guest

    On Tue, 19 Oct 2010 20:10:42 -0700, wrote:

    >On Tue, 19 Oct 2010 07:17:14 -0700 (PDT), SVCitian <> wrote:
    >
    >>On Oct 19, 2:26 am, wrote:
    >>> On Mon, 18 Oct 2010 05:58:42 -0700 (PDT),SVCitian<> wrote:
    >>> >On Oct 17, 10:21 pm, Tad McClellan <> wrote:
    >>> >>SVCitian<> wrote:
    >>> >> > I even tried to user "tamper data" firefox add to get behind the
    >>> >> > scenes of GET, POST, etc... but I can't proceed any further than the
    >>> >> > URLs given above.

    >>
    >>I have no clue of how to make heads or tails of the result.
    >>
    >>If you could post the result in a more helpful format.. I would
    >>appreciate it.
    >>
    >>Thanks.

    >
    >This may be better, my first afternoon with LWP.
    >
    >-sln
    >
    >----------------------
    >use strict;
    >use warnings;
    >
    >use HTML::TableExtract;
    >use HTTP::Cookies;
    >use HTTP::Request::Common qw(POST GET);
    >use LWP::UserAgent;
    >
    >my $show_content = 0;
    >my ($content1, $content2);
    >


    [snip code]

    ># Create asecond request: "get search table"
    ># ---------
    >
    > $request = HTTP::Request->new('GET' =>
    > join '', qw{
    >http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_prefix=176&m_s
    >n=75064953&h_prefix=HWB&h_sn=&ch= } );
    >


    ## Or, to create a variable AWB lookup
    # my $WBNprefix = '176';
    # my $WBN = '75064953';
    # $request = HTTP::Request->new('GET' =>
    # "http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_prefix=" .
    # $WBNprefix. "&m_sn=" . $WBN . "&h_prefix=HWB&h_sn=&ch= ");

    -sln
    , Oct 20, 2010
    #13
  14. SVCitian

    SVCitian Guest

    On Oct 20, 10:47 am, wrote:
    > On Tue, 19 Oct 2010 20:10:42 -0700, wrote:
    > >On Tue, 19 Oct 2010 07:17:14 -0700 (PDT),SVCitian<emailsrvr-gro...@yahoo..com> wrote:

    >
    > >>On Oct 19, 2:26 am, wrote:
    > >>> On Mon, 18 Oct 2010 05:58:42 -0700 (PDT),SVCitian<> wrote:
    > >>> >On Oct 17, 10:21 pm, Tad McClellan <> wrote:
    > >>> >>SVCitian<> wrote:
    > >>> >> > I even tried to user "tamper data" firefox add to get behind the
    > >>> >> > scenes of GET, POST, etc... but I can't proceed any further thanthe
    > >>> >> > URLs given above.

    >
    > >>I have no clue of how to make heads or tails of the result.

    >
    > >>If you could post the result in a more helpful format.. I would
    > >>appreciate it.

    >
    > >>Thanks.

    >
    > >This may be better, my first afternoon with LWP.

    >
    > >-sln

    >
    > >----------------------
    > >use strict;
    > >use warnings;

    >
    > >use HTML::TableExtract;
    > >use HTTP::Cookies;
    > >use HTTP::Request::Common qw(POST GET);
    > >use LWP::UserAgent;

    >
    > >my $show_content = 0;
    > >my ($content1, $content2);

    >
    > [snip code]
    >
    > ># Create asecond request: "get search table"
    > ># ---------

    >
    > > $request = HTTP::Request->new('GET' =>
    > >  join '', qw{
    > >http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_pref...
    > >n=75064953&h_prefix=HWB&h_sn=&ch= } );

    >
    > ## Or, to create a variable AWB lookup
    > #  my $WBNprefix = '176';
    > #  my $WBN       = '75064953';
    > #   $request = HTTP::Request->new('GET' =>
    > #  "http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_prefix=" .
    > #   $WBNprefix. "&m_sn=" . $WBN . "&h_prefix=HWB&h_sn=&ch= ");
    >
    > -sln


    Thank you for your efforts... I will try out the perl code tomorrow.

    By the way, I am not on a linux machine.. I am on Windows XP using
    cygwin / perl.

    So, I don't know if the proxy and all the rest of it could work. Any
    way I will try if I ever get successful.

    Any other helpful pointers for Windows / Cygwin / Perl... will be
    appreciated too.

    Thanks.
    SVCitian, Oct 20, 2010
    #14
  15. SVCitian

    SVCitian Guest

    On Oct 20, 9:53 pm, SVCitian <> wrote:
    > On Oct 20, 10:47 am, wrote:
    >
    >
    >
    > > On Tue, 19 Oct 2010 20:10:42 -0700, wrote:
    > > >On Tue, 19 Oct 2010 07:17:14 -0700 (PDT),SVCitian<> wrote:

    >
    > > >>On Oct 19, 2:26 am, wrote:
    > > >>> On Mon, 18 Oct 2010 05:58:42 -0700 (PDT),SVCitian<> wrote:
    > > >>> >On Oct 17, 10:21 pm, Tad McClellan <> wrote:
    > > >>> >>SVCitian<> wrote:
    > > >>> >> > I even tried to user "tamper data" firefox add to get behind the
    > > >>> >> > scenes of GET, POST, etc... but I can't proceed any further than the
    > > >>> >> > URLs given above.

    >
    > > >>I have no clue of how to make heads or tails of the result.

    >
    > > >>If you could post the result in a more helpful format.. I would
    > > >>appreciate it.

    >
    > > >>Thanks.

    >
    > > >This may be better, my first afternoon with LWP.

    >
    > > >-sln

    >
    > > >----------------------
    > > >use strict;
    > > >use warnings;

    >
    > > >use HTML::TableExtract;
    > > >use HTTP::Cookies;
    > > >use HTTP::Request::Common qw(POST GET);
    > > >use LWP::UserAgent;

    >
    > > >my $show_content = 0;
    > > >my ($content1, $content2);

    >
    > > [snip code]

    >
    > > ># Create asecond request: "get search table"
    > > ># ---------

    >
    > > > $request = HTTP::Request->new('GET' =>
    > > >  join '', qw{
    > > >http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_pref....
    > > >n=75064953&h_prefix=HWB&h_sn=&ch= } );

    >
    > > ## Or, to create a variable AWB lookup
    > > #  my $WBNprefix = '176';
    > > #  my $WBN       = '75064953';
    > > #   $request = HTTP::Request->new('GET' =>
    > > #  "http://www.bangkokflightservices.com/TrackTrace/search_awb.php?m_prefix=" .
    > > #   $WBNprefix. "&m_sn=" . $WBN . "&h_prefix=HWB&h_sn=&ch= ");

    >
    > > -sln

    >
    > Thank you for your efforts... I will try out the perl code tomorrow.
    >
    > By the way, I am not on a linux machine.. I am on Windows XP using
    > cygwin / perl.
    >
    > So, I don't know if the proxy and all the rest of it could work. Any
    > way I will try if I ever get successful.
    >
    > Any other helpful pointers for Windows / Cygwin / Perl... will be
    > appreciated too.
    >
    > Thanks.


    Thank you sln.

    First I had a hard time making cpan work with Windows 7 / cygwin /
    perl.. the problem was found to be cpan.pm module and mirror site.
    made at that work.. and installed the required modules.

    And, the next step was to run your code... When i copied and pasted
    your code from the google groups.. I had issue with some "..." which
    the html was not formatted correctly... so, i had to make the best
    judgment and fixed the m_prefix and m_sn number correctly.

    another sample:
    m_prefix=081&m_sn=75133844

    And it works just as per the needs. Thank you so much.

    I have no idea why my initial direct "curl" execution cannot execute
    correctly... Can you please explain why a direct GET doesn't work with
    the URL.

    and why your code had to be instead.. what does the web site developer
    do to avoid getting direct GET result. Is it mainly to do with the
    cookie, or user agent or some form ajax issues, etc.??

    Also, can you please explain a bit about your code and what it does..
    just some comments.

    Thank you.. much appreciated.
    SVCitian, Oct 22, 2010
    #15
  16. SVCitian

    Guest

    On Fri, 22 Oct 2010 09:15:32 -0700 (PDT), SVCitian <> wrote:
    >Thank you sln.
    >
    >First I had a hard time making cpan work with Windows 7 / cygwin /
    >perl.. the problem was found to be cpan.pm module and mirror site.
    >made at that work.. and installed the required modules.
    >
    >And, the next step was to run your code... When i copied and pasted
    >your code from the google groups.. I had issue with some "..." which
    >the html was not formatted correctly... so, i had to make the best
    >judgment and fixed the m_prefix and m_sn number correctly.
    >
    >another sample:
    >m_prefix=081&m_sn=75133844
    >
    >And it works just as per the needs. Thank you so much.
    >
    >I have no idea why my initial direct "curl" execution cannot execute
    >correctly... Can you please explain why a direct GET doesn't work with
    >the URL.
    >
    >and why your code had to be instead.. what does the web site developer
    >do to avoid getting direct GET result. Is it mainly to do with the
    >cookie, or user agent or some form ajax issues, etc.??
    >
    >Also, can you please explain a bit about your code and what it does..
    >just some comments.
    >
    >Thank you.. much appreciated.


    Not a problem. I'm learning as I go.

    Whats going on with this is that it is using JavaScript and Ajax.
    The first GET is to load a minimal html page that has embedded JS
    that calls Ajax layer. At this time it also establishes a session id
    that is only good as long as the page is loaded.

    The html is sparse, and contains a table "container". One of the elements,
    a single <td> with an id of 'output', is being used as a placeholder into
    which more html/JS will be added dynamically with the next GET call.
    This is called a html/code fragment, its not a new page, its just the
    dynamic loading of table data. Each new WBN sent in subsequent GETs will
    return data (html fragment) for that <td> element (id="output").

    So, rendering the full page is at least a two-step process.
    Loading the main html frame, then loading html code fragment (table data
    for the Air Bill). Subsequent GETs (without leaving the page) just updates
    the table data to contain the information for a new Air Bill.

    Thats the way it works in the browser. In the browser, Java Script is run.
    It takes the url input and "constructs" a new url. The "new" url is formulated
    into a new request called XMLHttpRequest() object (similar to LWP request).
    This Ajax request object goes out and does a normal GET. Whats returned is a
    fragment of html, in this case table data containing info about the luggage
    for the particular Way Bill.

    So thats the reason it didn't work in LWP, the main page is just a shell for
    the dynamic data loaded later.
    WSP however, see's two requests, one for the main page, the other for the data
    fragment. WSP doesen't need to execute JS/Ajax, it just records the result of the
    interaction between the client/server.

    On the bottom of the main html page, we see this:

    <script>
    searchajax2('./search_awb.php?m_prefix=176&m_sn=75064953&h_prefix=HWB&h_sn=&ch=    ');
    reloadpage();
    </script>

    This is the first thing that is run.

    We see that the function searchajax2() first creates an Ajax request object
    (using that url). Then it asigns the ajax response reference to the
    <td> id="output" elements innerHTML. The ajax request is opened then sent:

    ajaxRequest.open("GET", url , true);
    ajaxRequest.send(null);

    Finally, searchajax2() function returns, then reloadpage() is called to render
    the DOM.

    Apparently, with regard to the LWP, its a two step process. First to load the
    main page skeleton, establish a cookie, then do sucessive calls to load
    each fragment with a new WBN info. The html fragments returned each contain
    specific information (mostly table data html) related to the WBN.

    I hope I am clear, trying not to overload the noise on the group.
    I am new to this too, but it doesn't look line rocket science.

    -sln

    Ps. Here is fleshed out example with some comments and added constuct
    to fetch mutilple Way Bills' data.
    To see the content, set $show_content = 1;
    and maybe redirect the output to a file:

    > perl lwp.pl > mycapture.txt


    -----------------------------------------------

    use strict;
    use warnings;

    use HTML::TableExtract;
    use HTTP::Cookies;
    use HTTP::Request::Common qw(POST GET);
    use LWP::UserAgent;

    my $show_content = 0; # 1 = shows response content (html)
    my ( $content1, $content2 );

    # Create cookies
    my $jar = HTTP::Cookies->new();

    # Create user agent
    my $ua = LWP::UserAgent->new();
    $ua->timeout( 10 );
    $ua->cookie_jar( $jar );
    $ua->agent( "Microsoft Internet Explorer/6.0" );


    # Create a first request: "get track table framework"
    # Note - this will establish a session with the server.
    # ---------

    my $request = HTTP::Request->new('GET' =>
    join '', qw{
    http://www.bangkokflightservices.com/TrackTrace/showc_track.php?m_prefix=176&m_s
    n=75064953&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14d
    b65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc0
    72b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3
    fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74
    be&ch=%A0%A0%A0%A0 &id=1.2405164500620218} );

    # Pass request to agent
    # Note - the response is just Java Script/Ajax laced
    # html document with a skeleton table. One of the table's element <td> has
    # an Id = "output" that recieves the real table data from the next request.
    # Apparently this establishes a cookie.

    my $res = $ua->request( $request );
    if ( $res->is_success ) {
    print "\nHtml main Content .. OK\n\n";
    if ($show_content) {
    print $res->content, "\n\n";
    }
    $content1 = $res->content;
    }
    else {
    print "Request (Html main Content) Failed\n";
    print $res->status_line, "\n\n";
    die;
    }

    print '='x20, "\n\n";

    # Create a second request: "get track table body"
    # Note - When running as an html document, JS/Ajax are used
    # to dynamically load table data (html) to put in <td id="output" ..>
    # already loaded with the first request (the main html).
    # The html that is returned is Dynamic Html fragment. This contails
    # the table data for a single prefix/serial no.
    # ---------

    # Loop, get the data for a couple of Way Bill Numbers.

    my %wbhash = ( '176'=>'75064953', '081'=>'75133844' );

    while (my ( $WBNprefix, $WBN ) = each %wbhash)
    {
    $request = HTTP::Request->new('GET' =>
    join '', (
    "http://www.bangkokflightservices.com/TrackTrace/search_awb.php?",
    "m_prefix=$WBNprefix",
    "&m_sn=$WBN",
    "&h_prefix=HWB",
    "&h_sn=&ch= ")
    );

    # Pass request to agent

    $res = $ua->request( $request );
    if ( $res->is_success ) {
    print "\nWay Bill fragment .. OK\n";
    if ($show_content) {
    print $res->content, "\n\n";
    }
    $content2 = $res->content;
    }
    else {
    print "Request (Way Bill html fragment Content) Failed\n";
    print $res->status_line, "\n\n";
    die;
    }
    print "Way Bill ($WBNprefix - $WBN) Content tables:\n", '-'x20, "\n\n";
    print_tables( $content2 );
    print "\n";
    }

    print '='x20, "\n\n";
    print "Done!\n\n\n";

    exit;

    ## Table extract Util from wsp
    ##
    sub print_tables {
    my ( $table, $row, $cell );
    my $tc = 0;
    my $table_extractor = HTML::TableExtract->new();
    $table_extractor->parse( $_[0] );
    foreach $table ( $table_extractor->table_states ) {
    print "TABLE $tc:\n"; $tc++;
    my $rc = 0;
    foreach $row ( $table->rows ) {
    print "ROW $rc:\n"; $rc++;
    foreach $cell ( @$row ) {
    $cell = '' unless defined $cell;
    $cell =~ s/\n/ /g;
    $cell =~ s/[ \t]+/ /g;
    $cell =~ s/^[ \t]//;
    $cell =~ s/[ \t]$//;
    $cell =~ s/ *<\/td *//g;
    print "$cell|";
    }
    print "\n";
    }
    }
    }
    __END__

    Html main Content .. OK

    ====================


    Way Bill fragment .. OK
    Way Bill (081 - 75133844) Content tables:
    --------------------

    TABLE 0:
    ROW 0:
    á||||
    ROW 1:
    á|Enter Master Air Waybill (MAWB)|
    ROW 2:
    Optional (For Import MAWB Only)|
    ROW 3:
    á||||
    ROW 4:
    ||* Master Air Waybill number example 123 - 12345678||
    TABLE 1:
    ROW 0:
    ||||||||||
    ROW 1:
    Item|AWB No|Flight No|Flight Date|Origin|Dest|Status|Pieces|Weight|Time|
    ROW 2:
    1|081-75133844|JQ 029|Oct 19 2010|MEL|BKK|Delivered|2|1,480.00|Oct 20 2010 - 125
    5|


    Way Bill fragment .. OK
    Way Bill (176 - 75064953) Content tables:
    --------------------

    TABLE 0:
    ROW 0:
    á||||
    ROW 1:
    á|Enter Master Air Waybill (MAWB)|
    ROW 2:
    Optional (For Import MAWB Only)|
    ROW 3:
    á||||
    ROW 4:
    ||* Master Air Waybill number example 123 - 12345678||
    TABLE 1:
    ROW 0:
    |||||||||||
    ROW 1:
    Item|AWB No|Flight No|Flight Date|Origin|Dest|ULD No|Status|Pieces|Weight|Time|
    ROW 2:
    1|176-75064953|EK 419|Oct 15 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
    743.00|Oct 14 2010 5:37PM|
    ROW 3:
    2|176-75064953|EK 419|Oct 15 2010|BKK|DXB|á|Accepted|3|743.00|Oct 14 2010 5:37PM
    |
    ROW 4:
    3|176-75064953|EK 373|Oct 15 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
    743.00|Oct 14 2010 6:12PM|
    ROW 5:
    4|176-75064953|EK 373|Oct 15 2010|BKK|DXB|SHCá|Export Transshipment|3|743.00|Oct
    14 2010 6:12PM|
    ROW 6:
    5|176-75064953|EK 373|Oct 14 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
    743.00|Oct 14 2010 6:42PM|
    ROW 7:
    6|176-75064953|EK 373|Oct 14 2010|BKK|DXB|PMC31131EKá|Manifested|3|743.00|Oct 14
    2010 6:57PM|
    ROW 8:
    7|176-75064953|EK 373|Oct 14 2010|BKK|DXB|á|Departed|3|743.00|Oct 14 2010 9:54PM
    |

    ====================

    Done!
    , Oct 23, 2010
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris

    cURL in ASP.NET

    Chris, Dec 17, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    8,163
    Joerg Jooss
    Dec 19, 2004
  2. ß Ø ® G

    Page Curl Javascript wtd

    ß Ø ® G, Jan 26, 2004, in forum: HTML
    Replies:
    10
    Views:
    6,756
    supermann
    Jun 7, 2007
  3. post data using curl

    , Feb 12, 2007, in forum: Python
    Replies:
    0
    Views:
    375
  4. Fiaz Idris
    Replies:
    13
    Views:
    1,795
    ifiaz
    Mar 17, 2005
  5. wkhedr

    Curl/Perl http post performanc issue

    wkhedr, Aug 3, 2006, in forum: Perl Misc
    Replies:
    3
    Views:
    327
    wkhedr
    Aug 3, 2006
Loading...

Share This Page