regular expression

Discussion in 'Perl Misc' started by Luciano Tolomei, Apr 28, 2005.

  1. i have to match some content in an html file.

    the file is a single line.

    i have to get some cell content from it.

    i have build an expression but it match a lot of <tr></tr> together instead of matching on a time...

    the line is more complex (with rowspan... ) and i have to retrieve a lot of cell's and there are a lot of rows.

    but here a simplified example:

    <tr><td><font color=\"#EFAD00\">xxxx</font></td></tr><tr><td><font color=\"#EFAD00\">yyyy</font></td></tr>

    <tr>(.*)<font color=\"#EFAD00\">([^<]*)(.*)</tr>

    it match yyyy, i need to match xxxx and yyyy.

    i think that i have to change (.*) to make it match everything but not the </tr>
    but i do not know how to do it.
     
    Luciano Tolomei, Apr 28, 2005
    #1
    1. Advertising

  2. Luciano Tolomei

    John Bokma Guest

    Luciano Tolomei wrote:

    > i have to match some content in an html file.


    So put this in your subject. A regexp might not be the answer.

    Look at File::Slurp to read the file in one go.

    [ snip ]

    > i think that i have to change (.*) to make it match everything but not
    > the </tr> but i do not know how to do it.


    I think you have to have a peek at HTML::TreeBuilder

    see:
    http://johnbokma.com/perl/froogle-script.html
    http://johnbokma.com/perl/phpbb-remote-backup.html

    for examples.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Apr 28, 2005
    #2
    1. Advertising

  3. * Luciano Tolomei schrieb:
    >
    > i have to match some content in an html file. the file is a single
    > line. i have to get some cell content from it. i have build an
    > expression but it match a lot of <tr></tr> together instead of
    > matching on a time...
    > the line is more complex (with rowspan... ) and i have to retrieve
    > a lot of cell's and there are a lot of rows. but here a simplified
    > example:
    >
    > <tr><td><font color=\"#EFAD00\">xxxx</font></td></tr><tr><td><font
    > color=\"#EFAD00\">yyyy</font></td></tr>
    >
    > <tr>(.*)<font color=\"#EFAD00\">([^<]*)(.*)</tr>
    >
    > it match yyyy, i need to match xxxx and yyyy. i think that i have to
    > change (.*) to make it match everything but not the </tr> but i do
    > not know how to do it.


    It seems you want to strip all the HTML tags from your data. Once you
    were doing that with your example above the string "xxxx yyyy" remains.

    Have a look for HTML::Strip on CPAN. This module is for stripping all
    the HTML stuff. Afterwards you could try to split() your data as usual.

    regards,
    fabian
     
    Fabian Pilkowski, Apr 28, 2005
    #3
  4. [ Please limit your line lengths to the conventional 70-72 characters. ]


    Luciano Tolomei <> wrote:

    > Subject: regular expression



    That is not the Right Tool for your job.


    > i have to match some content in an html file.



    You should use a module that understands HTML for processing HTML data.


    > i have to get some cell content from it.


    > the line is more complex (with rowspan... ) and i have to retrieve
    > a lot of cell's and there are a lot of rows.



    There is a module that can handle that for you, no need to
    reinvent that wheel.


    > i think that i have to change (.*) to make it match everything



    I think you have to change to:

    use HTML::TableExtract;

    :)


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 28, 2005
    #4
  5. On Thu, 28 Apr 2005 12:16:18 +0200, Luciano Tolomei wrote:

    > i think that i have to change (.*) to make it match everything but not the </tr>
    > but i do not know how to do it.


    Have a look at (.*?) instead of (.*). Bu as others already noted, regexps
    are not the best way to tackle this.

    M4
    --
    Redundancy is a great way to introduce more single points of failure.
     
    Martijn Lievaart, Apr 28, 2005
    #5
  6. Tad McClellan wrote:


    > I think you have to change to:


    > use HTML::TableExtract;


    > :)



    i do
    really thanks.
     
    Luciano Tolomei, Apr 28, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Keith-Earl
    Replies:
    1
    Views:
    463
    Mary Chipman
    Jun 15, 2004
  2. VSK
    Replies:
    2
    Views:
    2,335
  3. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    863
    Alan Moore
    Dec 2, 2005
  4. GIMME
    Replies:
    3
    Views:
    11,997
    vforvikash
    Dec 29, 2008
  5. Noman Shapiro
    Replies:
    0
    Views:
    239
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page