best way to do this?

M

MJL

I'm sure this is not the most efficient way to accomplish my goal of
taking a file of text and converting it into a list of individual
words and punctuation symbols. It works, but I am curious about how
to do it differently. Thanks!

#!/usr/bin/perl
open INF, "./testfile1.txt";
while (<INF>)
{
@words = split;
push @list, @words;
}

foreach(@list)
{
/\S+\w+/;
if ($& ne "") {push @list2, "$&\n";}
if ($' ne "") {push @list2, "$'\n";}
}


open OUTF, ">./testfile2.txt";
print OUTF @list2;
close INF;
close OUTF;
 
G

Gunnar Hjalmarsson

MJL said:
I'm sure this is not the most efficient way to accomplish my goal
of taking a file of text and converting it into a list of
individual words and punctuation symbols. It works, but I am
curious about how to do it differently. Thanks!

#!/usr/bin/perl
open INF, "./testfile1.txt";
while (<INF>)
{
@words = split;
push @list, @words;
}

foreach(@list)
{
/\S+\w+/;
if ($& ne "") {push @list2, "$&\n";}
if ($' ne "") {push @list2, "$'\n";}
}


open OUTF, ">./testfile2.txt";
print OUTF @list2;
close INF;
close OUTF;

Well, I think this accomplishes the same thing, but without the @arrays:

#!/usr/bin/perl
use strict;
use warnings;
open INF, './testfile1.txt' or die $!;
open OUTF, '> ./testfile2.txt' or die $!;
while (<INF>) {
while( /(\S+\w+)(\S+)?/g ) {
print OUTF "$1\n";
print OUTF "$2\n" if $2;
}
}
close INF;
close OUTF;
__END__

Another thing is whether it actually does what you want...
 
A

Anno Siegel

MJL said:
I'm sure this is not the most efficient way to accomplish my goal of
taking a file of text and converting it into a list of individual
words and punctuation symbols. It works, but I am curious about how
to do it differently. Thanks!

#!/usr/bin/perl
open INF, "./testfile1.txt";
while (<INF>)
{
@words = split;
push @list, @words;
}

foreach(@list)
{
/\S+\w+/;
if ($& ne "") {push @list2, "$&\n";}
if ($' ne "") {push @list2, "$'\n";}
}


open OUTF, ">./testfile2.txt";
print OUTF @list2;
close INF;
close OUTF;

You can gain more out of the first split, if you split not only on
white space, but word boundaries too. That way, the string neatly
separates in consecutive pieces of word-characters and punctuation,
with blanks removed.

There is also no good reason to collect the parts first. You might
as well separate them right in the loop. So:

my ( @words, @punct);
while ( <DATA> ) {
for ( split /\s+|\b/ ) {
if ( /\w/ ) {
push @words, $_;
} else {
push @punct, $_;
}
}
}

or, in more compact form

while ( <DATA> ) {
push @{ /\w/ ? \ @words : \ @punct}, $_ for split /\s+|\b/;
}

Anno
 
M

MJL

Thanks to all for great alternatives! I am having a great time
running and dissecting all of these suggestions.

I should clarify my goal: I want to write a program that takes a text
file or a text string and turn it into an html file/string. Each
individual word is to become a link to a definition of that word.
Punctuation is to be excluded of course and each word is to be defined
only once. I wrote a version that works as a cgi program. It still
needs a little work. I appologize for any poor or innefficient use of
the language. This is not a homework assignment or anything. I'm
just playing around, trying to learn a little perl. Thanks again!

#!/usr/bin/perl

# process a string and turn it into a webpage with internal links to
definitions...

use CGI qw:)standard);

$_ = param("mytext");
@list = split;
foreach(@list)
{
/\S+\w+/;
if ($& ne "")
{
push @list2, "<a href=\"#defn_$&\">$&</a> \n";
$ins =
"<a name=defn_$&>definition of $&:</a>
\n\n<p>\n\n\n</p>\n<hr>\n\n";
$chk = 0;
foreach(@list4)
{
if ($_ eq $ins) {$chk = 1;break;}
}
if ($chk == 0)
{
push @list4, $ins;
}
}
if ($' ne "") {push @list2, "$'\n";}
}

print header(), start_html("definitions"), h1("Definitions");
foreach(@list2) {print;}
print h1("definitions");
foreach(@list4) {print;}
 
D

David K. Wall

message said:
bowsayge said:
Abigail said to us:

[ Splitting a file into words an symbols question ]
[snip]
The above line folds all consecutive words together.


Yes, now that Bowsayge removed the map() which prevented this.

What map()?
No, don't. Just leave the correct answer Abigail gave alone.

See below...

How is that correct? If I change INF to DATA to make it self-
contained:

<code>
my @list;
while (<DATA>) {
s/\s+//g;
push @list => map {"$_\n"} split /(\w+)/;
}
print @list;

__DATA__
The language is intended to be practical (easy to use,
efficient, complete) rather than beautiful (tiny,
elegant, minimal).
</code>


....then the above code produces this output:


<output>

Thelanguageisintendedtobepractical
(
easytouse
,

efficient
,
complete
)
ratherthanbeautiful
(
tiny
,

elegant
,
minimal
).
</output>


That doesn't look correct, and I was careful to cut-and-paste the
code from Abigail's post (not the followup), making only the change
mentioned. (INF to DATA)
 
D

David K. Wall

message said:
David K. Wall said:
in message
Abigail said to us:

[ Splitting a file into words an symbols question ]
[snip]

Or you could do:

while (<INF>) {
s/\s+//g;

The above line folds all consecutive words together.


Yes, now that Bowsayge removed the map() which prevented this.

What map()?


The map() which he removed from Abigail's first example (which
works correctly).

Ah, OK. I thought you meant the second example instead of the the
first. Never mind. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top