Silly li'l perl script, plx improve.

George Orwell · Jan 3, 2007

#!/usr/bin/perl
# Public Domain
use strict;
use warnings;
open(IN, "$ARGV[0]") or die("No input file. Use perldoc $0 for more
info.\n");
my @in = <IN>;
if($ARGV[1]){
open(OUT, ">>", "$ARGV[1]");
select OUT;
}
# The regex below does all the work... (I'm sure you could write this
program in a one-liner...)
foreach(@in){
if($_=~m/(http:\/\/\S+\.[\w\d.\/_%]+)/){
print "$1\n";
}
}
close(IN);

Keith Keller · Jan 4, 2007

}
# The regex below does all the work... (I'm sure you could write this
program in a one-liner...)
foreach(@in){
if($_=~m/(http:\/\/\S+\.[\w\d.\/_%]+)/){
print "$1\n";
}

perl -n -e 'print "$1\n" if (m/(your_pattern_here)/)' infile > outfile

I'll let somebody else nitpick the regex. I will say that it's hard to
suggest improvements when you don't state what you expect to send to the
program or what you expect to be output. Have you read the Posting
Guidelines that are frequently posted here?

--keith

Eric Schwartz · Jan 4, 2007

George Orwell said:
#!/usr/bin/perl
# Public Domain
use strict;
use warnings;

Good so far.

open(IN, "$ARGV[0]") or die("No input file. Use perldoc $0 for more
info.\n");

You needlessly quote $ARGV[0] there; I'm not sure why. This can be
improved with lexical filehandles and the three-arg form of open(),
but major kudos for handling the failure with die. However, there are
more reasons for failing to open than the file not existing-- you may
not have read permissions, for instance. Just let Perl tell you why
it didn't work:

open my $infile, '<', $ARGV[0] or die "Couldn't open $ARGV[0]: $!";

my @in = <IN>;

You're only processing it line-by-line below, so why bother reading it
in all at once? If the file is large, this could use a lot more
memory than you are expecting. Omit this line entirely.

if($ARGV[1]){
open(OUT, ">>", "$ARGV[1]");
select OUT;
}

See above-- you need to check for failure to open the output file as
well, because you might not have permissions to write in that
directory, or the file could already exist, and you don't have
permissions to write to it, or any number of other things.

if($ARGV[1]){
open my $outfile, ">>", $ARGV[1])
or die "Couldn't open $ARGV[1] for output: $!";
select $outfile;
}

Note: because $outfile is declared in the if block, you can't refer to
it later. If you want to modify the program to use it later, you'll
need to declare $outfile outside the loop:

my $outfile;
if ($ARGV[1]) {....}

# The regex below does all the work... (I'm sure you could write this
program in a one-liner...)
foreach(@in){
if($_=~m/(http:\/\/\S+\.[\w\d.\/_%]+)/){
print "$1\n";
}
}
close(IN);

It's hard to tell what you want from your regex, as you are not
matching URLs with port numbers in them, or with GET parameters
(following a '?' character). I'll fix it in a VERY ugly way below.
Use HTML::LinkExtor instead.

# this reads one line at a time from $infile, which means you don't have
# to have the whole thing in memory at once.
# also, the regex is untested, and probably misses a few special cases--
# kinda rushed through rfc1808
while(<$infile>) {
print $1 if m|(http:// # protocol
[-\w.]+ # host
(?: :\d+)? # port number (optional)
/[\`-%$.+!*'(),\w

&=]* # path
(?: \?[\`-%$.+!*'(),\w;/?

&=]) # query
)
|igsmx;
}

Ugh! That's ugly (and probably broken, because I can't be arsed to
test it.)

Here's something pretty:

#!/usr/bin/perl
use warnings;
use strict;
use HTML::LinkExtor;

my $extractor = HTML::LinkExtor->new();

$extractor->parse_file($ARGV[0]);

if($ARGV[1]){
open my $outfile, ">>", $ARGV[1])
or die "Couldn't open $ARGV[1] for output: $!";
select $outfile;
}

print "Links found in $ARGV[0]:\n"
foreach ($extractor->links()) {
my ($tag, %links) = @{$_};
print join("\n", grep /^http:/ values %links);
}

NOTE: also untested, but I'm a heck of a lot more confident in that
one than I am in the previous one.

-=Eric

Klaus · Jan 4, 2007

George said:
#!/usr/bin/perl
# Public Domain
use strict;
use warnings;
open(IN, "$ARGV[0]") or die("No input file. Use perldoc $0 for more
info.\n");
my @in = <IN>;
if($ARGV[1]){
open(OUT, ">>", "$ARGV[1]");
select OUT;
}
# The regex below does all the work... (I'm sure you could write this
program in a one-liner...)
foreach(@in){
if($_=~m/(http:\/\/\S+\.[\w\d.\/_%]+)/){

I have some doubts about the second "." in the regexp (the one after
\d.), but I'll ignore my doubts...

print "$1\n";
}
}
close(IN);

#!/usr/bin/perl
# Public Domain
use strict;
use warnings;

defined $ARGV[0] or die "Error - No parameters";

my $OUT;
if (defined $ARGV[1]){
open $OUT, '>>', $ARGV[1] or die "Error - Open >>$ARGV[1]: $!";
select $OUT;
}

open my $IN, '<', $ARGV[0] or die "Error - Open <$ARGV[0]: $!";

while (<$IN>) {
# The regex below does all the work...
# (I'm sure you could write this program in a one-liner...)
if(m{(http://\S+\.[\w\d./_%]+)}){ print $1, "\n" }
}

Klaus · Jan 4, 2007

Eric said:
if($ARGV[1]){
open my $outfile, ">>", $ARGV[1])
or die "Couldn't open $ARGV[1] for output: $!";
select $outfile;
}

Note: because $outfile is declared in the if block, you can't refer to
it later. If you want to modify the program to use it later, you'll
need to declare $outfile outside the loop:

I agree, and I would like to add what I think will happen in the above
code fragment under the condition that $ARGV[1] is defined and not ""
and not "0" and not 0:

- the condition if($ARGV[1]) is true, so we enter the if-branch
- the file $ARGV[1] is opened as $outfile for output (append)
- and, given that the open was successful,
- sets the default to $outfile for all subsequent prints (select)
- the file $outfile is closed as the scope of the if exits

Now, for the rest of the program, all subsequent prints fail silently
because print now defaults to the closed filehandle $outfile.

I did not test the above scenario, but would like to ask anyway:
Am I right ?

my $outfile;
if ($ARGV[1]) {....}

Eric Schwartz · Jan 4, 2007

Klaus said:
Eric said:

if($ARGV[1]){
open my $outfile, ">>", $ARGV[1])
or die "Couldn't open $ARGV[1] for output: $!";
select $outfile;
}

Note: because $outfile is declared in the if block, you can't refer to
it later. If you want to modify the program to use it later, you'll
need to declare $outfile outside the loop:

Click to expand...

I agree, and I would like to add what I think will happen in the above
code fragment under the condition that $ARGV[1] is defined and not ""
and not "0" and not 0:

How about testing what does happen? It probably would have taken you
less time than it took to compose your post. That's not an insult, by
the way-- I'm just pointing out that reality trumps logic by a large
factor.

$ cat /tmp/foo
if ($ARGV[0]) {
open my $outfile, '>', $ARGV[0] or die "couldn't open: $!";
select $outfile;
}

for (0..20_000_000) {
print
}

emschwar@aragorn:~$ perl /tmp/foo /tmp/beh
emschwar@aragorn:~$ cat /tmp/beh
01234567891011121314151617181920212223242526272829303132.....

- the condition if($ARGV[1]) is true, so we enter the if-branch
- the file $ARGV[1] is opened as $outfile for output (append)
- and, given that the open was successful,
- sets the default to $outfile for all subsequent prints (select)

So far, you're correct.

- the file $outfile is closed as the scope of the if exits

Where do you get this last assertion? I don't find it anywhere in
perldoc -f open-- in fact, it suggests you use IO::File if you want
the behaviour you're asking for. In this case I cheated, and let the
OS close all open filehandles on exit for me. Yes, it's a bit
naughty, but not enough (in this context only) to keep me up at
nights. Toss in a

close();

at the end if you like.

Now, for the rest of the program, all subsequent prints fail silently
because print now defaults to the closed filehandle $outfile.

I did not test the above scenario, but would like to ask anyway:
Am I right ?

Why didn't you test it? It's very easy to create such a test (see
above, took me about 2 minutes), and you'd have answered your own
question before I even had my morning caffeine.

my $outfile;
if ($ARGV[1]) {....}

Click to expand...

That gives you a filehandle that is accessible in a much larger scope
than you need it. Which is not evil, but I prefer to keep things as
tightly scoped as possible.

-=Eric

Help me to Improve	11	Oct 7, 2011
Brocade Switch Perl Script	1	Aug 19, 2016
terminating fifo in perl	0	Dec 6, 2012
wrapper script calling anotherr script	1	Sep 15, 2012
Need help/advice to improve script	24	Jul 21, 2010
Need suggestions to improve code	2	Aug 24, 2013
Problem Splitting Text String	2	Dec 28, 2022
Push regex search result into hash with multiple values	14	May 19, 2014

Silly li'l perl script, plx improve.

George Orwell

Keith Keller

Eric Schwartz

Klaus

Klaus

Eric Schwartz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads