Please help me how is easiest way to extract text between some variable text

Discussion in 'Perl Misc' started by Mladen, Feb 20, 2011.

  1. Mladen

    Mladen Guest

    Please help me how is easiest way to extract text between some variable text



    Original text



    <TH class=name width=100>New name</TH> need to
    extract: New name

    <TH class=name width=50>Test name </TH> need to
    extract: Test name

    <TH class=name width=65>Name 2</TH> need
    to extract: Name 2



    Thanks in advance
    Mladen, Feb 20, 2011
    #1
    1. Advertising

  2. "Mladen" <> wrote:
    >Please help me how is easiest way to extract text between some variable text
    >
    >Original text
    ><TH class=name width=100>New name</TH> need to
    >extract: New name
    >
    ><TH class=name width=50>Test name </TH> need to
    >extract: Test name
    >
    ><TH class=name width=65>Name 2</TH> need
    >to extract: Name 2


    You have a well-defined data structure. Treating it and analysing it as
    if it were plain text would be foolish. Instead take advantage of the
    existing structure and use a parser that can parse this data structure.

    jue
    Jürgen Exner, Feb 21, 2011
    #2
    1. Advertising

  3. Mladen

    Guest

    Re: Please help me how is easiest way to extract text between somevariable text

    On Feb 20, 11:33 pm, "Mladen" <> wrote:
    > Please help me how is easiest way to extract text between some variable text
    >
    > Original text
    >
    > <TH class=name width=100>New name</TH>                            need to
    > extract: New name
    >
    > <TH class=name width=50>Test name </TH>                             need to
    > extract: Test name
    >
    > <TH class=name width=65>Name 2</TH>                                    need
    > to extract: Name 2
    >
    > Thanks in advance




    #!/usr/local/bin/perl
    use strict;
    use warnings;
    local $\ = qq{\n};
    my $np;
    $np =
    qr{
    [<]
    (?:
    (?> [^<>]+ )
    |
    (??{ $np })
    )*
    [>]
    }xms
    ;

    my $var ='
    original text
    <TH class=name width=100>New name</TH>
    <TH class=name width=50>Test name </TH>
    need to
    <TH class=name width=65>Name 2</TH>
    need
    Thanks in advance
    ';
    while ($var =~ m/ $np /xmsg) {
    print $1 if $var =~ m/\G(.*?)<\/TH>/xmscg;
    }
    __END__
    , Feb 21, 2011
    #3
  4. Mladen

    ccc31807 Guest

    Re: Please help me how is easiest way to extract text between somevariable text

    On Feb 20, 1:33 pm, "Mladen" <> wrote:
    > Please help me how is easiest way to extract text between some variable text
    > <TH class=name width=100>New name</TH>       need to extract: New name


    A couple of weeks back, hymie! posted a thread enditled 'table -->
    pre'. He wanted to extract the content of an HTML table to preformat
    it. I posted the following script and output.

    Perl gives you a number of ways to do what you want, many of them
    simple minded and primitive, others pretty sophisticated. I generally
    prefer the former, the more simple minded and primitive the better.
    You probably should approach a problem like this in an incremental
    fashion, by first matching the least possible amount of what you want,
    and adding to it little by little until you get what you want. You
    don't need to use a regular expression, index() and substr() will do
    the same kind of thing.

    Other technologies will do the same kind of thing. I routinely do this
    in vi (vim), when I want to transfer some content from one function to
    another function, for instance, converting a SQL query to a hash
    declaration.

    CC.

    SCRIPT
    #! perl
    use strict;
    use warnings;

    my $content = '';
    while (<DATA>)
    {
    next unless /\w/;
    chomp;
    if ($_ =~ m!<(\/?)table!)
    {
    $content .= "<$1pre>";
    next;
    }
    elsif ($_ =~ m!<\/?tr!)
    {
    $content .= "
    \n";
    next;
    }
    elsif ($_ =~ m!<t[dh]>([^<]*)<\/t[dh]>!)
    {
    $content .= sprintf("%-20s", $1);
    next;
    }
    else
    {
    warn "ERROR: $_\n";
    }
    }

    print $content;

    exit(0);

    __DATA__
    <table>
    <tr>
    <td>George</td>
    <td>Washington</td>
    <td>Virginia</td>
    <td>1788</td>
    </tr>
    <tr>
    <td>George</td>
    <td>Washington</td>
    <td>Virginia</td>
    <td>1792</td>
    </tr>
    <tr>
    <td>John</td>
    <td>Adams</td>
    <td>Massachesetts</td>
    <td>1796</td>
    </tr>
    <tr>
    <td>Thomas</td>
    <td>Jefferson</td>
    <td>Virginia</td>
    <td>1800</td>
    </tr>
    <tr>
    <td>Thomas</td>
    <td>Jefferson</td>
    <td>Virginia</td>
    <td>1804</td>
    </tr>
    </table>

    OUTPUT'
    <pre>
    George Washington Virginia 1788

    George Washington Virginia 1792

    John Adams Massachesetts 1796

    Thomas Jefferson Virginia 1800

    Thomas Jefferson Virginia 1804
    </pre>
    ccc31807, Feb 21, 2011
    #4
  5. Mladen

    Guest

    On Sun, 20 Feb 2011 19:33:18 +0100, "Mladen" <> wrote:

    >Please help me how is easiest way to extract text between some variable text


    Output:
    'New name'
    'Test name '
    'Name 2'

    If you wish to run the @content elements through a sub-container to extract
    more, you must set up a sub that re-defines the 'Container Expression' regex
    for each sub-container you need. There are variations on the theme of the
    container expressions, but this superficiously get you started.

    -sln

    -------------
    ie:
    my ($open, $close, $rx);
    my $comment = qr{ see below };
    my $attrib = qr{ see below };
    ...
    defineContainer ( '(?i:TH)' );
    ...
    defineContainer ( '(?i:TR)' );
    ...
    sub defineContainer {
    my $tag = shift;
    $open = qr{ see below <$tag ... }
    $rx = qr( see below }
    }
    -------------------------

    use strict;
    use warnings;

    # Primitive Definitions
    #
    my $comment = qr{(?xs)
    <! (?:\[CDATA\[.*?\]\]|--.*?--|\[[A-Z][A-Z\ ]*\[.*?\]\]) >
    };

    my $attrib = qr{(?x)
    (?:\s+ (?: [^>"'\/]* (?:"[^"]*"|'[^']*'|["']|(?:\/(?!>))?))++)
    };

    my $open = qr{(?x) <TH (?: \s*|$attrib ) > };
    my $close = qr{(?x) </TH \s*> };

    # Container Expression
    #
    my $rx = qr{(?xs)
    $comment
    | ( # Recursion group, the 'container'
    $open
    ( # Container 'contents' to capture
    (?:
    $comment
    | (?:(?!$open|$close|$comment).)++
    | (?1)
    )*
    )
    $close
    )
    };

    # Parse Code
    #
    my $tog;
    my $text = join '', <DATA>;

    my @Contents = map { !($tog=!$tog) && defined() ? $_ : () } $text =~ /$rx/g;

    for (@Contents) {
    print "'$_'\n";
    }


    __DATA__
    <TH class=name width=100>New name</TH> need to
    extract: New name

    <TH class=name width=50>Test name </TH> need to
    extract: Test name

    <TH class=name width=65>Name 2</TH> need
    to extract: Name 2
    , Feb 21, 2011
    #5
  6. Mladen

    Peter Scott Guest

    Re: Please help me how is easiest way to extract text between somevariable text

    On Mon, 21 Feb 2011 07:38:33 -0600, Tad McClellan wrote:
    > <> wrote:
    >> }xms

    > ^^
    > ^^
    > ^^
    >> ;

    >
    >
    > The "m" modifier affects only the "^" and "$" anchors. It is a no-op if
    > your pattern does not contain those anchors.
    >
    > The "s" modifier affects only the "." metacharacter. It is a no-op if
    > your pattern does not contain that character.
    >
    > You should not enable special treatment if you are not going to make use
    > of that special treatment.


    The poster is following the principles in Damian Conway's "Perl Best
    Practices," which state: "Use the /xms flags on every regular expression
    you ever write [...] It takes about a week to accustom your fingers to
    automatically typing /xms on every [regex]..."

    --
    Peter Scott
    http://www.perlmedic.com/ http://www.perldebugged.com/
    http://www.informit.com/store/product.aspx?isbn=0137001274
    http://www.oreillyschool.com/courses/perl3/
    Peter Scott, Feb 22, 2011
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. tom
    Replies:
    4
    Views:
    14,101
    jpowers5882
    Oct 24, 2008
  2. =?Utf-8?B?S2VubmV0aA==?=

    Easiest way to bind a grid datasource to a datatable

    =?Utf-8?B?S2VubmV0aA==?=, Jan 18, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    6,719
    =?Utf-8?B?S2VubmV0aA==?=
    Jan 18, 2004
  3. =?Utf-8?B?QmlsbCBCb3Jn?=

    Easiest way to get page name from error routine?

    =?Utf-8?B?QmlsbCBCb3Jn?=, Mar 2, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    442
    Tommy
    Mar 3, 2004
  4. Henry
    Replies:
    1
    Views:
    385
    Frank Wisniewski
    Jul 21, 2004
  5. KK
    Replies:
    2
    Views:
    545
    Big Brian
    Oct 14, 2003
Loading...

Share This Page