regex newbie question

Discussion in 'Perl Misc' started by ZMAN, Mar 28, 2005.

  1. ZMAN

    ZMAN Guest

    Hello all!

    Reading in lines from a file.
    I want to ignore all the text before it gets to the line
    "<!--document_starts_here-->"
    and write out the remainder of the text to a file.

    This is the way I'm attempting this..
    Thanks in advance!
    BZ

    open DATAOUT, ">$data_file" or die "can't open $data_file $!";

    foreach $line (@lines)
    {

    if ($line =~ m/<!--document_starts_here-->/i)
    {
    print "This line contains the word : $line\n";

    #### write remainder of file out
    }

    print DATAOUT "$line";

    }


    close (DATAOUT)
     
    ZMAN, Mar 28, 2005
    #1
    1. Advertising

  2. "ZMAN" <> wrote in news:0nJ1e.21235$uw6.16103
    @trnddc06:

    > Reading in lines from a file.
    > I want to ignore all the text before it gets to the line
    > "<!--document_starts_here-->"
    > and write out the remainder of the text to a file.


    use strict;
    use warnings;

    missing.

    > open DATAOUT, ">$data_file" or die "can't open $data_file $!";


    open my $data_out, '>', $data_file or die "Can't open $data_file: $!";

    See

    perldoc -q always

    The question, of course, is where are you reading the data in?

    > foreach $line (@lines)
    > {
    >
    > if ($line =~ m/<!--document_starts_here-->/i)
    > {
    > print "This line contains the word : $line\n";
    >
    > #### write remainder of file out
    > }
    >
    > print DATAOUT "$line";
    >
    > }
    >
    >
    > close (DATAOUT)


    Please post real code. Please see the posting guidelines for this group
    to find out how you can help others help you.

    It *seems* to me like you are initially slurping the file. There is no
    need for that.

    #! /usr/bin/perl

    use strict;
    use warnings;

    while(<DATA>) {
    next unless /<!--document_starts_here-->/i;
    while(<DATA>) {
    print;
    }
    }

    __END__
    <html>
    <head>
    <title>Test</title>
    </head>

    <!--document_starts_here-->
    <body>
    <h1>Some document</h1>
    <p>ya ba da ba doo</p>
    </body>
    </html>
     
    A. Sinan Unur, Mar 28, 2005
    #2
    1. Advertising

  3. ZMAN wrote:
    >
    > Reading in lines from a file.


    It looks like you are iterating over an array instead.


    > I want to ignore all the text before it gets to the line
    > "<!--document_starts_here-->"
    > and write out the remainder of the text to a file.


    If you were actually reading in from a file it would be a lot easier, like:

    while ( <DATAIN> ) {
    if ( /<!--document_starts_here-->/i .. eof DATAIN ) {
    print DATAOUT;
    }
    }



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Mar 28, 2005
    #3
  4. ZMAN <> wrote:

    > Reading in lines from a file.



    No you're not. Your code contains no input statements.


    > I want to ignore all the text before it gets to the line
    > "<!--document_starts_here-->"
    > and write out the remainder of the text to a file.



    while ( <> ) {
    last if $_ eq "<!--document_starts_here-->\n";
    }

    while ( <> ) {
    print;
    }


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Mar 28, 2005
    #4
  5. Tad McClellan wrote:
    > ZMAN <> wrote:
    >
    >>Reading in lines from a file.

    >
    > No you're not. Your code contains no input statements.
    >
    >>I want to ignore all the text before it gets to the line
    >>"<!--document_starts_here-->"
    >>and write out the remainder of the text to a file.

    >
    > while ( <> ) {
    > last if $_ eq "<!--document_starts_here-->\n";


    I think that the OP wanted a case insensitive match.

    > }
    >
    > while ( <> ) {
    > print;
    > }


    $_ = <> until /<!--document_starts_here-->/i;

    print while <>;


    # :)

    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Mar 28, 2005
    #5
  6. ZMAN

    ZMAN Guest

    John
    Thanks! That's just what I needed.
    My goal was to pull over some html, remove the PHP, write them to a temp
    dir, and then have SWISH index them.

    In my haste I didn't post all of my code.
    So, here it is, works great and thanks again!

    #!/usr/local/bin/perl -w
    use strict;

    my $dest_dir = "../../temp/";
    my $src_dir = "../../html/";

    mkdir("$dest_dir",0777) || die "cannot mkdir $dest_dir: $!";

    opendir(DIR, $src_dir);
    my @files = readdir(DIR);
    closedir(DIR);
    my $file;

    print "Indexing the following files\n";

    foreach $file (@files)
    {

    next if($file eq '.');
    next if($file eq '..');
    next if($file =~ /\.php/);
    next if($file =~ /\.inc/);
    next if($file =~ /\.pl/);
    next if($file =~ /\.cgi/);
    next if($file =~ /\.txt/);
    next if($file =~ /_R\.html$/);
    next if(! ($file =~ /\.html$/));

    print "$file\n";

    open(FILE, "<$src_dir$file") or die "Can't open $file : $!";

    open (DATAOUT, "> $dest_dir$file") or die "can't open $dest_dir$file
    $!";

    while ( <FILE> ) {
    if ( /<!--document_starts_here-->/i .. eof FILE ) {
    print DATAOUT;
    }
    }

    }
    close DATAOUT;

    close FILE;

    system("./swish -c swish.conf");
    system("rm -rf $dest_dir");


    "John W. Krahn" <> wrote in message
    news:SFJ1e.12011$x8.6765@edtnps90...
    > ZMAN wrote:
    > >
    > > Reading in lines from a file.

    >
    > It looks like you are iterating over an array instead.
    >
    >
    > > I want to ignore all the text before it gets to the line
    > > "<!--document_starts_here-->"
    > > and write out the remainder of the text to a file.

    >
    > If you were actually reading in from a file it would be a lot easier,

    like:
    >
    > while ( <DATAIN> ) {
    > if ( /<!--document_starts_here-->/i .. eof DATAIN ) {
    > print DATAOUT;
    > }
    > }
    >
    >
    >
    > John
    > --
    > use Perl;
    > program
    > fulfillment
     
    ZMAN, Mar 29, 2005
    #6
  7. ZMAN <> wrote:

    > mkdir("$dest_dir",0777) || die "cannot mkdir $dest_dir: $!";



    perldoc -q vars

    What's wrong with always quoting "$vars"?

    mkdir($dest_dir, 0777) || die "cannot mkdir $dest_dir: $!";


    > opendir(DIR, $src_dir);



    You should test the return value from opendir() just like you did
    with mkdir().


    > next if($file =~ /\.php/);
    > next if($file =~ /\.inc/);
    > next if($file =~ /\.pl/);
    > next if($file =~ /\.cgi/);
    > next if($file =~ /\.txt/);



    next if $file =~ /\.(php|inc|pl|cgi|txt)$/; # anchor to end of string


    > next if($file =~ /_R\.html$/);



    You anchored here, but not the earlier ones.


    > next if(! ($file =~ /\.html$/));



    next unless $file =~ /\.html$/;

    You don't need the earlier batch of tests if you have this
    one anyway...



    [ snip TOFU ]

    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Mar 29, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    745
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,693
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    629
  4. Xah Lee
    Replies:
    1
    Views:
    972
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    834
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page