Trim whitespace with cookbook recipe does not result in trimmed array

I

io

Kind folks of Perlidom-

My intent is to slurp a big text file (say, a chapter from the
English literature). I then want to trim all the white space and
newlines, so I get an array of compact text. I've looked into the
Perl Cookbook, and came up with this.
However, it doesn't work. The array slurped up has as many spaces as
the original. In fact, it looks the same.

####################################
#!/usr/bin/perl
use warnings;

open INPUT, "textfile" or die $1;
my @data;

while(<INPUT>) {
my $fh = <INPUT>;
chomp $fh;
$fh =~ s/^\s+//;
$fh =~ s/\+$//;
push @data, $fh;
}

print @data, "\n";
###################################

Is the issue the print statement? Is it the stream? Is it a scope
issue with push @data?

I don't get it! Please help...

TIA.
 
U

usenet

io said:
#!/usr/bin/perl
use warnings;

Using warnings() is good, but it's only part of what's really important
here - you forgot to use strict(); which is THE most important
statement in any Perl program.
open INPUT, "textfile" or die $1;

What do you think $1 will contain here? Check out

perldoc perlvar

and look for variables related to "error" (especially $!)
my @data;

while(<INPUT>) {
my $fh = <INPUT>;

Make up your mind how you want to read the file. Each time you say
<INPUT> you read from the file. I think you want to leave off the
second statement completely and just do:

while (my $fh = <INPUT>) {

($fh is a terrible name for the variable, BTW, since it usually means
"file handle" which it is not - the filehandle here is called *INPUT).
chomp $fh;
$fh =~ s/^\s+//;

OK, fine, strip off trailing linefeed and leading whitespace.
$fh =~ s/\+$//;

huh? Did you cut-and-paste that, or are you re-typing code into usenet
(bad idea). If that is actual cut-and-paste, be advised that you are
removing all trailing plus signs.
push @data, $fh;
}

print @data, "\n";
###################################

Your main problem is the two <INPUT> reads. Secondary problem is the
faulty second s/// statement.
 
I

it_says_BALLS_on_your_forehead

Using warnings() is good, but it's only part of what's really important
here - you forgot to use strict(); which is THE most important
statement in any Perl program.


What do you think $1 will contain here? Check out

perldoc perlvar

and look for variables related to "error" (especially $!)


Make up your mind how you want to read the file. Each time you say
<INPUT> you read from the file. I think you want to leave off the
second statement completely and just do:

while (my $fh = <INPUT>) {

($fh is a terrible name for the variable, BTW, since it usually means
"file handle" which it is not - the filehandle here is called *INPUT).


OK, fine, strip off trailing linefeed and leading whitespace.
actually, this is trimming *leading* whitespace...
 
I

it_says_BALLS_on_your_forehead

it_says_BALLS_on_your_forehead said:
actually, this is trimming *leading* whitespace...
....which is exactly what you said. perhaps i should learn to read.
 
I

it_says_BALLS_on_your_forehead

io said:
Kind folks of Perlidom-

My intent is to slurp a big text file (say, a chapter from the
English literature). I then want to trim all the white space and
newlines, so I get an array of compact text. I've looked into the
Perl Cookbook, and came up with this.
However, it doesn't work. The array slurped up has as many spaces as
the original. In fact, it looks the same.

####################################
#!/usr/bin/perl
use warnings;

open INPUT, "textfile" or die $1;

i think you missed the shift key :)

open INPUT, "textfile" or die $!;

although really this should be:

my $file = 'textfile';
open ( my $fh, '<', $file ) or die "can't open $file: $!\n";
# now s/INPUT/\$fh/g;
my @data;

while(<INPUT>) {
my $fh = <INPUT>;
chomp $fh;
$fh =~ s/^\s+//;
$fh =~ s/\+$//;

i think you mean:
$fh =~ s/\s+$//; # i don't think you need the chomp if you're doing
this...
push @data, $fh;
}

print @data, "\n";
###################################

Is the issue the print statement? Is it the stream? Is it a scope
issue with push @data?

I don't get it! Please help...

technically, the above code does not 'slurp'. you are performing
line-by-line processing. check out Uri Guttman's article on slurping:
http://www.perl.com/pub/a/2003/11/21/slurp.html
 
I

io

io wrote:

What do you think $1 will contain here? Check out

Your main problem is the two <INPUT> reads. Secondary problem is the
faulty second s/// statement.

Hi --

I undertook your recommendations. I still get an unmodified array when I print it.
I'm really confused as to why no detructive modification was made to
the array. I stil get an output with spaces.
Maybe I formulated the problem in a bad way...I'd like to have *no*
spaces between characters, as well as no carriage return (Unix,
Windows, etc). Could taht be the problem?
I'll 'fess up that I don't grok regexes yet.


#!/usr/bin/perl
use warnings;
use strict;

open INPUT, "textfile" or die $!;
my @data;

my @element;
while(my $file = <INPUT>) { # filehandle in *INPUT
# chomp $file; # don't need the chomp because if the second regex
$file =~ s/^\s+//;
push @data, $file;
$file =~ s/\s+$//;
push @data, $file;


my $element;
foreach $element (@data) {
print $element;
}
}


Any ideas anyone?

TIA.
 
P

Paul Lalli

io said:
I undertook your recommendations. I still get an unmodified array when I print it.

You have not made any attempt at modifying an array. What part of your
code did you think was doing this?
I'm really confused as to why no detructive modification was made to
the array. I stil get an output with spaces.
Maybe I formulated the problem in a bad way...I'd like to have *no*
spaces between characters, as well as no carriage return (Unix,
Windows, etc). Could taht be the problem?

What problem?
I'll 'fess up that I don't grok regexes yet.

Your problem is that you are copy and pasting code that you've seen
elsewhere without knowing what it does.
#!/usr/bin/perl
use warnings;
use strict;

open INPUT, "textfile" or die $!;
my @data;

my @element;
while(my $file = <INPUT>) { # filehandle in *INPUT
# chomp $file; # don't need the chomp because if the second regex
$file =~ s/^\s+//;

This removes all space from the BEGINING OF THE LINE
push @data, $file;

This adds the current line to @data.
$file =~ s/\s+$//;

This removes all space from the END OF THE LINE
push @data, $file;

This adds the SAME line, now with ending spaces removed, to @data.
my $element;
foreach $element (@data) {

This loops through each line that you put into @data (which you did
twice for each line)
print $element;

This prints each element, one per loop

Your for loop is inside your while loop. You are printing the entire
contents of @data once for every line of the file, meaning you are
getting output similar to:
line 1
line 1
line 2
line 1
line 2
line 3
line 1
line 2
line 3
line 4
}


Any ideas anyone?

Yes. Learn. Do not simply copy and paste. Make an effort to
understand what your program is doing.

Go read:
perldoc perlretut
perldoc perlre
perldoc perlreref

To remove *all* spaces from a scalar value, regardless of where:

$file =~ s/\s+//g;


Please go read the Posting Guidelines for this group, and follow their
advice. Specifically, show your sample input, your desired output, and
your actual output. Post a *self-contained* program, using the
__DATA__ marker and <DATA> pseudo-filehandle, rather than just asking
us to accept the fact that you have an input file that you've opened
for reading.

Paul Lalli
 
C

Ch Lamprecht

io said:
Hi --

I undertook your recommendations. I still get an unmodified array when I print it.
I'm really confused as to why no detructive modification was made to
the array. I stil get an output with spaces.
Maybe I formulated the problem in a bad way...I'd like to have *no*
spaces between characters, as well as no carriage return (Unix,
Windows, etc).

Hi,
I can hardly believe that this really is what you want:
No spaces, no newlines...

use warnings;
use strict;

my $text;

while(my $file = <DATA>) {
$file =~ s/\s+//g;
$text.=$file;
}
print $text;

__DATA__
Hi --

I undertook your recommendations. I still get an unmodified array when
I print it.
I'm really confused as to why no detructive modification was made to
the array. I stil get an output with spaces.
Maybe I formulated the problem in a bad way...I'd like to have *no*
spaces between characters, as well as no carriage return (Unix,
Windows, etc). Could taht be the problem?
I'll 'fess up that I don't grok regexes yet.
 
I

it_says_BALLS_on_your_forehead

Ch said:
Hi,
I can hardly believe that this really is what you want:
No spaces, no newlines...

use warnings;
use strict;

my $text;

while(my $file = <DATA>) {
$file =~ s/\s+//g;
$text.=$file;
}
print $text;

i agree. i thought about suggesting s/\s+//g, but that wouldn't really
be *trimming* whitespace, that would be removing it altogether.

what we need is a more exact definition of your problem/goal.
 
S

SomeDude

Em Sun, 12 Feb 2006 18:50:16 -0200, io escreveu:
Hi --

I undertook your recommendations. I still get an unmodified array when I print it.
I'm really confused as to why no detructive modification was made to


Thanks for your answers guys.
Just a note: don't go assuming a newbie hasn't read.
The semantics of the while loop and filehandles is not trivial (eg., when
implicit atribution happens to global $_) and I lost a good chunk of my
afternoon in the Camel book trying to understand and test some things.

Please look up DATA in the Camel book and check what page
it is on. You can't expect a newbie to know that. If you expect that
messages to be posted with data "in place" then don't complain to newbies,
modify the Posting Guidelines, where no recommendation to <DATA> and
__DATA__ can be found. Those are on Chapter 10 of a very thick Perl book
(Ed Peschko's).

http://groups.google.de/group/comp....?q=POsting+Guidelines&rnum=3#cc5d6f2ea37a3190

Yes, the regex code was pasted from the Cookbook, that's what it's for,
but my English comprehension and my limited regex got me stuck.
 
T

Tad McClellan

My intent is to slurp a big text file (say, a chapter from the
English literature). I then want to trim all the white space and
newlines,


Here you say _all_ whitespace, but your code appears to be trying
to delete only leading and trailing whitespace.

Which is it?

The value of the implementation is directly proportional to
the value of the specification you know.

so I get an array of compact text.


The array slurped up has as many spaces as
the original. In fact, it looks the same.
^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^

I am afraid that I do not believe you...

####################################
#!/usr/bin/perl
use warnings;

open INPUT, "textfile" or die $1;
^
^
You missed the SHIFT key there.

It should be $! not $1.

while(<INPUT>) {


Here you read a line into the $_ variable, but you never output
it, so you should be missing every other line in your output,
ie. it won't "look the same" as the input file.

my $fh = <INPUT>;


Here you read a 2nd line (you read 2 lines for each loop iteration).

I'd say that $fh is a really poor choice of name for something
that is not a filehandle.

$fh =~ s/\+$//;
^^
^^

Here you miss an "s".

Please take more care in composing your posts, it is wasteful
to submit the wrong stuff to hundreds of people.

Is the issue the print statement?

No.


Is it the stream?

No.


Is it a scope
issue with push @data?

No.


I don't get it!


Perl is doing exactly what you told it to do.

Please help...


Tell Perl to do something else instead.
 
U

Uri Guttman

TM> my @data = map { s/^\s+//; s/\s+$// } <INPUT>; # untested

tad, i ashamed of you for posting that line. :) you know the map block
returns its last value and not the original in $_? and even though that
is not in void context, i eschew any side effects in map/grep as they
are meant to be functional in style. so i would slurp first and trim
later:

use File::Slurp

my @data = read_file( 'whatever_file' ) ;
s/^\s+//, s/\s+$// for @data ;
TM> ^^^^^^^^^^^^^^^^^
TM> ^^^^^^^^^^^^^^^^^

TM> I am afraid that I do not believe you...

i don't either. since he is not being clear about his goal of removing
whitespace how could we trust his opinion of bad output? we don't even
have a proper spec to test against.

TM> Tell Perl to do something else instead.

and be accurate in telling us what you actually want done (best with
input and expected output examples) and why you think it is not
working. otherwise we are doing brain surgery on you while wearing
boxing gloves. do you want that?

uri
 
A

A. Sinan Unur

Em Sun, 12 Feb 2006 18:50:16 -0200, io escreveu:

....

If you expect that
messages to be posted with data "in place" then don't complain to
newbies, modify the Posting Guidelines, where no recommendation to
<DATA> and __DATA__ can be found.

That is a blatant lie. See the subsection with the title "Provide enough
information".

*PLONK*

Sinan
 
P

Paul Lalli

SomeDude said:
Thanks for your answers guys.
Just a note: don't go assuming a newbie hasn't read.

That is the only possible assumption when "a newbie" gives no
indication otherwise.
The semantics of the while loop and filehandles is not trivial (eg., when
implicit atribution happens to global $_) and I lost a good chunk of my
afternoon in the Camel book trying to understand and test some things.

Please look up DATA in the Camel book and check what page
it is on.

Okay. Let's see, the index of my Camel book points has an entry "DATA
filehandle" which points me to the page on special variables, which
says:

DATA

[PKG] This special filehandle refers to anything following either
the __END__ token or the __DATA__ token in the current file. The
__END__ token always opens the main::DATA filehandle, and so is used in
the main program. The __DATA__ token opens the DATA handle in whichever
package is in effect at the time, so different modules can each have
their own DATA filehandle, since they (presumably) have different
package names.
You can't expect a newbie to know that.

I beg to differ.
If you expect that
messages to be posted with data "in place" then don't complain to newbies,
modify the Posting Guidelines, where no recommendation to <DATA> and
__DATA__ can be found.

Ahem. You are either mistaken or outright lying. Go read the Posting
Guidelines again.

Tad McClellan wrote (hundreds of times) :
Describe *precisely* the input to your program. Also provide example
input data for your program. If you need to show file input, use the
__DATA__ token (perldata.pod) to provide the file contents inside of
your Perl program.

It tells you what to do, and gives you the pointer to precisely where
__DATA__ is described.
Those are on Chapter 10 of a very thick Perl book
(Ed Peschko's).

Never heard of him. Perhaps you need a better book. And why are you
talking about this book when you just asked us to go look it up in the
Camel?
http://groups.google.de/group/comp....?q=POsting+Guidelines&rnum=3#cc5d6f2ea37a3190

Yes, the regex code was pasted from the Cookbook, that's what it's for,

No, it's really not. It's for helping you understand how to make your
own Perl programs. It is not for blind copy and pastes.

This was an excellent way to get yourself plonked by many of the most
knowledgeable and helpful people in this newsgroup, by the way. Your
response was a very unfortunate choice to have made. The correct
response was "Oh, I'm sorry, I'll go re-read the Posting Guidelines and
fix my posts in the future."

Fare thee well,
Paul Lalli
 
T

Tad McClellan

io said:
I'd like to have *no*
spaces between characters,
I'll 'fess up that I don't grok regexes yet.


That's OK, because you do not need regexes to accomplish that task:

$str =~ tr/ \n\r\f\t//d; # delete _all_ whitespace characters


There is no regular expression there.
 
S

SomeDude

*PLONK*

Sinan

Yeah, OK, my bad here it is:

Describe *precisely* the input to your program. Also provide example
input data for your program. If you need to show file input, use the
__DATA__ token (perldata.pod) to provide the file contents inside of
your Perl program.

It was like a foreign language to me. I don't think that was well written.
A little example might've gone a long way.
 
S

SomeDude

Em Mon, 13 Feb 2006 04:59:48 -0800, Paul Lalli escreveu:
SomeDude wrote:

Okay. Let's see, the index of my Camel book points has an entry "DATA
filehandle" which points me to the page on special variables, which
says:
Which is page? Get serious. Or modify the Posting Guidelines. The way
it's written would get one to flunk essay writings in college. IMHO.
IT's called communicating clearly, and it goes a long, long way in
corporations and other well-paying jobs.

Anyways, this is what I wanted to do. I decided to use a suggestion that
used a string, it was faster than any modification I would've done, plus
I can use unpack easily.

Cheers and thanks very much and sorry for any misunderstandings.


#!/usr/bin/perl
use warnings;
use strict;

# When not using __DATA__
#open INPUT, "textfile" or die $!;

my $text;
my @chars;
my @array;

while(my $file = <DATA>) {
$file =~ s/\s+//g; # All whitespaces and newlines removed globally
$text.=$file;
}
print $text;
print "\n---------------------------------------\n";

# Takes $string and throws it into an array of ASCII values"
@array=unpack("C*", $text);
print "@array\n";


# turn string into an array;
# all characters are spearated by a space
@chars = split //, $text;
# just print it;
print "\n--------------------------------------\n";
print "@chars\n";



__DATA__
Deuteronomy, chapter 2


Compare with Revised Standard Version: Deut.02



1: Then we turned, and took our journey into the wilderness by the way of the Red sea, as the LORD spake unto me: and we compassed mount Seir many days.
2: And the LORD spake unto me, saying,
3: Ye have compassed this mountain long enough: turn you northward.
4: And command thou the people, saying, Ye are to pass through the coast of your brethren the children of Esau, which dwell in Seir; and they shall be afraid of you: take ye good heed unto yourselves therefore:
 
T

Tad McClellan

SomeDude said:
Just a note: don't go assuming a newbie hasn't read.


So you have so much experience with newbie postings that
you can conclude that with confidence? (rhetorical question)

People are likely to assume the most common case, whether you ask
them to assume the rare case or not.

It isn't your fault that the overwhelming majority of newbies
post before attempting any reading, but you still get to take
the heat. :-(

It is unfortunate, but it is also the reality.

Please look up DATA in the Camel book


I don't care what the Camel book says, it is only backup, it is
not the authority. I care what the real authority says, namely
the standard docs that ship with perl.

If you are programming in Perl, then you have surely read about
the data types available in the language.

The DATA token is described about halfway through perldata.pod.

You can't expect a newbie to know that.


I can expect the newbie to go away, read it, and then come back though.

If you expect that
messages to be posted with data "in place" then don't complain to newbies,
modify the Posting Guidelines, where no recommendation to <DATA> and
__DATA__ can be found.


... If you need to show file input, use the __DATA__
token (perldata.pod) ...

That looks like both a recommendation and a reference to me.
 
M

Matt Garrish

SomeDude said:
Em Mon, 13 Feb 2006 04:59:48 -0800, Paul Lalli escreveu:

Which is page? Get serious. Or modify the Posting Guidelines. The way
it's written would get one to flunk essay writings in college. IMHO.
IT's called communicating clearly, and it goes a long, long way in
corporations and other well-paying jobs.

Hmm, pot calling the kettle black...

Matt
 
P

Paul Lalli

SomeDude said:
Em Mon, 13 Feb 2006 04:59:48 -0800, Paul Lalli escreveu:

Which is page?

Why would anyone care what page something in the Camel is on? The
Camel is a refernence. Do you care about on what page your entry is
found when looking in an encyclopedia or a dictionary?
Get serious. Or modify the Posting Guidelines.

The Posting Guidelines - at the very least this section of them - are
perfectly fine. They tell you what to do, and if you don't understand
the instruction, they tell you precisely where to get more information
about the instruction. The fact that you didn't bother reading or
reading carefully enough is your fault, not the Guidelines.
The way
it's written would get one to flunk essay writings in college. IMHO.

Fortunately, the guidelines are not an essay.
IT's called communicating clearly, and it goes a long, long way in
corporations and other well-paying jobs.

Gee. I guess I imagined going to my job today, working at a national
banking corporation.
Anyways, this is what I wanted to do. I decided to use a suggestion that
used a string, it was faster than any modification I would've done, plus
I can use unpack easily.

Cheers and thanks very much and sorry for any misunderstandings.


#!/usr/bin/perl
use warnings;
use strict;

# When not using __DATA__
#open INPUT, "textfile" or die $!;

my $text;
my @chars;
my @array;

while(my $file = <DATA>) {
$file =~ s/\s+//g; # All whitespaces and newlines removed globally
$text.=$file;
}

Why are you using four lines when two will do? More explicitly, why
are you forcing Perl to do all these consecutive reads and
substitutions? Why not just one?

print $text;
print "\n---------------------------------------\n";

# Takes $string and throws it into an array of ASCII values"
@array=unpack("C*", $text);
print "@array\n";


# turn string into an array;
# all characters are spearated by a space
@chars = split //, $text;
# just print it;
print "\n--------------------------------------\n";
print "@chars\n";

If you had ever once *said* that was your goal, it's quite likely
someone would have helped you acheive that result about 5 posts ago.

Go read the Posting Guidelines. Again.

Paul Lalli
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top