equivalent of chomp in perl

K

Keith Thompson

[with respect to perl chomp]
Are you sure? chomp is used for a specific purpose, which is removing
the newline from a whole line read from input.

That would be chop, not chomp. chomp removes the newline only
if the newline is there.

Yes, which is consistent with Jordan'd description.

chop (which is the older function) blindly removes the last character
of a string, whether it's a newline or not.

More details to follow elsethread.
 
K

Keith Thompson

Jordan Abel said:
Anyway, in answer to the question

void
chomp(char *x) {
char *p = strrchr(x,'\n');
if(p) *p = 0;
}

Not quite.

Let's look at the definition from the Camel Book (the canonical book
on Perl, comparable to K&R). This is arguably a little off-topic, but
it's a good example of the pitfalls of emulating features of one
language in another language.

chomp VARIABLE
chomp LIST
chomp

This function (normally) deletes a trailing newline from the end
of a string contained in a variable. This is a slightly safer
version of chop (described next) in that it has no effect upon a
string that doesn't end in a newline. More specifically, it
deletes the terminating string corresponding to the current value
of $/, and not just any last character.

Unlike chop, chomp returns the number of characters deleted. If $/
is "" (in paragraph mode), chomp removes all trailing newlines
from the selected string (or strings, if chomping a LIST). You
cannot chomp a literal, only a variable.

Some of the features of chomp aren't applicable to a C version: it can
be applied to either a list or a single variable, and the argument
defaults to $_ (if you're not familiar with Perl, don't worry about
what that means). And C has no equivalent to $/ (which in Perl lets
you set the input record separator to something other than the default
"\n").

So a reasonable C chomp() would operate on a single string, would
remove the last character if and only if it's a '\n', and would return
1 if it removed a character and 0 if it didn't.

Jordan, your version clobbers the last '\n' in the string, even if
it's not at the end of the string.

Here's my attempt:

int chomp(char *s)
{
size_t len = strlen(s);
if (len == 0) {
return 0;
}
else if (s[len-1] == '\n') {
s[len-1] = '\0';
return 1;
}
else {
return 0;
}
}

Another difference from Perl is that C arrays don't re-size
themselves. In Perl, strings are first-class objects. In C, a string
is data format, not a data type; if you shorten a string like this,
the remainder of the string is still there in the array. (A C dynamic
string library would probably deal with this.)
 
J

Jordan Abel

2006-11-03 said:
Jordan Abel said:
Anyway, in answer to the question

void
chomp(char *x) {
char *p = strrchr(x,'\n');
if(p) *p = 0;
}

Not quite.

Let's look at the definition from the Camel Book (the canonical book
on Perl, comparable to K&R). This is arguably a little off-topic, but
it's a good example of the pitfalls of emulating features of one
language in another language.

chomp VARIABLE
chomp LIST
chomp

This function (normally) deletes a trailing newline from the end
of a string contained in a variable. This is a slightly safer
version of chop (described next) in that it has no effect upon a
string that doesn't end in a newline. More specifically, it
deletes the terminating string corresponding to the current value
of $/, and not just any last character.

Unlike chop, chomp returns the number of characters deleted. If $/
is "" (in paragraph mode), chomp removes all trailing newlines
from the selected string (or strings, if chomping a LIST). You
cannot chomp a literal, only a variable.

Some of the features of chomp aren't applicable to a C version: it can
be applied to either a list or a single variable, and the argument
defaults to $_ (if you're not familiar with Perl, don't worry about
what that means). And C has no equivalent to $/ (which in Perl lets
you set the input record separator to something other than the default
"\n").

So a reasonable C chomp() would operate on a single string, would
remove the last character if and only if it's a '\n', and would return
1 if it removed a character and 0 if it didn't.

Jordan, your version clobbers the last '\n' in the string, even if
it's not at the end of the string.

Here's my attempt:

int chomp(char *s)
{
size_t len = strlen(s);
if (len == 0) {
return 0;
}
else if (s[len-1] == '\n') {
s[len-1] = '\0';
return 1;
}
else {
return 0;
}
}

I did mess up.

How about

chomp(s)
char *s;
{
if(s && *s) {
char *p = strrchr(s,0);
if(p[-1]=='\n') {
p[-1]=0;
return 1;
}
}
return 0;
}

may be slightly more efficient than yours if the compiler doesn't get
especially clever with strlen calls.

I threw in null pointer handling because it's practically free.

You could add an int INPUT_RECORD_SEPARATOR; variable, but for it to be
useful you'd need an fgets replacement.
 
F

Flash Gordon

Jordan said:
2006-11-03 said:
Jordan Abel said:
Anyway, in answer to the question

void
chomp(char *x) {
char *p = strrchr(x,'\n');
if(p) *p = 0;
}
Not quite.

Let's look at the definition from the Camel Book (the canonical book
on Perl, comparable to K&R). This is arguably a little off-topic, but
it's a good example of the pitfalls of emulating features of one
language in another language.

chomp VARIABLE
chomp LIST
chomp

This function (normally) deletes a trailing newline from the end
of a string contained in a variable. This is a slightly safer
version of chop (described next) in that it has no effect upon a
string that doesn't end in a newline. More specifically, it
deletes the terminating string corresponding to the current value
of $/, and not just any last character.

Unlike chop, chomp returns the number of characters deleted. If $/
is "" (in paragraph mode), chomp removes all trailing newlines
from the selected string (or strings, if chomping a LIST). You
cannot chomp a literal, only a variable.

Some of the features of chomp aren't applicable to a C version: it can
be applied to either a list or a single variable, and the argument
defaults to $_ (if you're not familiar with Perl, don't worry about
what that means). And C has no equivalent to $/ (which in Perl lets
you set the input record separator to something other than the default
"\n").

So a reasonable C chomp() would operate on a single string, would
remove the last character if and only if it's a '\n', and would return
1 if it removed a character and 0 if it didn't.

Jordan, your version clobbers the last '\n' in the string, even if
it's not at the end of the string.

Here's my attempt:

int chomp(char *s)
{
size_t len = strlen(s);
if (len == 0) {
return 0;
}
else if (s[len-1] == '\n') {
s[len-1] = '\0';
return 1;
}
else {
return 0;
}
}

I did mess up.

How about

chomp(s)
char *s;
{

Why use old style function definitions? It is very rare these days to
find a compiler that does not handle prototypes.
int chomp(char *s)
if(s && *s) {
char *p = strrchr(s,0);

I can't see any reason for strrchr to be noticeably faster than strlen
and it could be slower if the compiler does not do clever handling of
that call. After all, it has to find the end of the string before
starting to scan backwards for a value of 0 (which it will find on the
first check) and all strlen has to do is find the end of the string
either counting as it goes or doing a pointer subtraction at the end.
if(p[-1]=='\n') {
p[-1]=0;
return 1;
}
}
return 0;
}

may be slightly more efficient than yours if the compiler doesn't get
especially clever with strlen calls.

I don't think it is likely to be faster. It has the advantage of being
shorter than Keith's but you can achieve that by using

int chomp(char *s)
{
if(s && *s) {
s += strlen(s) - 1;
if (*s == '\n') {
*s = 0;
return 1;
}
}
return 0;
}

I don't expect any significant performance improvement, however if you
want micro-optimisations which is all I see in yours over Keith's I've
removed the need for the extra complexity of strrchr over strlen and
removed the need to index in to an array twice by doing one subtraction.

I prefer a version using strlen to strrchr myself, but this is because
to me it expresses the intent better rather than for reasons of
efficiency since I don't see any of them as having significant
efficiency problems.

Now for an even shorter version that I would *not* put in real code
int chomp(char *s)
{
return (s && *s && *(s += strlen(s) - 1) == '\n')?!(*s = 0):0;
}
I threw in null pointer handling because it's practically free.

Agreed, and that matches the Perl chomp better since the Perl chomp will
not crash on being passed an undef (the nearest Perl equivalent to
passing a null pointer instead of a string).
You could add an int INPUT_RECORD_SEPARATOR; variable, but for it to be
useful you'd need an fgets replacement.

To get the equivalent of Perl reading a line from a file you would need
an fgets replacement anyway, of which a few have been posted to this
group in the past some of which could easily be modified to suit.
 
J

Jordan Abel

2006-11-04 said:
Why use old style function definitions?

felt like it?
I can't see any reason for strrchr to be noticeably faster than strlen
and it could be slower if the compiler does not do clever handling of
that call. After all, it has to find the end of the string before
starting to scan backwards for a value of 0 (which it will find on the
first check) and all strlen has to do is find the end of the string
either counting as it goes or doing a pointer subtraction at the end.

It's doing a pointer subtraction that we'll then be adding back together
in the caller - I see that as wasteful.

The other bit about efficiency is the original was returning 0 from more
than one place from different 'if' statements.
 
J

james of tucson

Keith said:
If you happen to know what Perl's chomp function does

It removes zero or more instances of a globally defined character
equivalence from the end each element of an input list or each element
of the list of values of a hash.

You obviously don't want to deal with the details of an off-topic
language, but you were inaccurate in your description.

Granted, it seems that there are perl programmers who don't know what
chomp actually does, because they only use it in one degenerate case.
 
J

james of tucson

Jordan said:
Are you sure? chomp is used for a specific purpose, which is removing
the newline from a whole line read from input.

That's like saying printf(...) is used for a specific purpose, which is
putting the characters "Hello, world!" on the console.
 
J

Jordan Abel

2006-11-04 said:
It removes zero or more instances of a globally defined character
equivalence from the end each element of an input list or each element
of the list of values of a hash.

You obviously don't want to deal with the details of an off-topic
language, but you were inaccurate in your description.

Granted, it seems that there are perl programmers who don't know what
chomp actually does, because they only use it in one degenerate case.

Which is thus what it actually does. What it's capable of is a different
matter. Sure it has _potential_ in terms of removing stuff from lists
or hashes or characters other than newline, but cutting the newline off
of single strings, typically those read from input, is its day job.

Regardless, it has nothing to do with "getting rid of all unread data
from a stream up to the next newline" or anything like that.
 
J

james of tucson

Jordan said:
Which is thus what it actually does. What it's capable of is a different
matter.

So you'd agree that printf(...) is mainly for putting the characters
"Hello, world!" on the console, then?

I would almost bet that chomp() is implemented as a specialization of
map{}, which doesn't *have* a day job, but is pretty much the most
powerful thing in perl.
 
F

Flash Gordon

Jordan said:
felt like it?


It's doing a pointer subtraction that we'll then be adding back together
in the caller - I see that as wasteful.

It may not be doing a pointer subtraction, the processor might be
counting as it goes through (I've certainly used processors where that
could be done at 0 cost in memory or time). As I also said, strrchr has
to then work out that it is at the '/0' character or the compiler has to
special case in some manner strrchr being called to search for the '/0'
that terminates the string. So strrchr is in my opinion at least as
likely to be wasteful.
The other bit about efficiency is the original was returning 0 from more
than one place from different 'if' statements.

Your code had to ifs, Keith's had two, so just as much branching. There
was unlikely to be any significant difference in performance.

Anyway, you didn't comment on my versions one of which only had one
return ;-)
 
K

Keith Thompson

james of tucson said:
It removes zero or more instances of a globally defined character
equivalence from the end each element of an input list or each element
of the list of values of a hash.

You obviously don't want to deal with the details of an off-topic
language, but you were inaccurate in your description.

In the article to which you're replying, I did not attempt to fully
describe what Perl's chomp function does. I inferred what the OP was
trying to do from his description and from the subset of chomp's
behavior of which is easily translated to C. Elsethread, I quoted the
actual definition of chomp from the camel book.

BTW, your description is ambiguous. I don't know what you mean by
"character equivalence", and your description could easily be read to
imply that chomp applied to "hello\n\n" will remove both newline
characters. chomp removes zero or one character from each string.
 
K

Keith Thompson

james of tucson said:
That's like saying printf(...) is used for a specific purpose, which is
putting the characters "Hello, world!" on the console.

Not really. Probably 99% of calls to chomp in Perl programs do
exactly that (though it's certainly far more versatile). Similarly,
printf() can do far more that print "Hello, world!"; the difference is
that in real programs it usually does.
 
W

Walter Roberson

[with respect to perl chomp]
and your description could easily be read to
imply that chomp applied to "hello\n\n" will remove both newline
characters. chomp removes zero or one character from each string.

$ perldoc -f chomp
"When in paragraph mode ($/ = ""), it removes all trailing newlines
from the string."
 
W

Walter Roberson

[perl chomp]
I would almost bet that chomp() is implemented as a specialization of
map{}, which doesn't *have* a day job, but is pretty much the most
powerful thing in perl.

You'd lose that bet, at least in perl 5.8.4 (and probably much the
same for all earlier versions.) Source file doop.c about lines 1004
to 1139, Perl_do_chomp() routine. It is clearly its own routine,
not a calling upon map and clearly not being a degenerate version
of map. Quite a bit of of the routine is preoccupied with utf8 handling.
 
W

Walter Roberson

[perl chomp]
It removes zero or more instances of a globally defined character
equivalence from the end each element of an input list or each element
of the list of values of a hash.

It does not, at least not at perl 5.8.4.

$ perldoc perlvar
$/ [...] You may set it to a multi-character string
to match a multi-character delimiter,
[...] Remember: the value of $/ is a string, not a regexp.
AWK has to be better for something :)

Not, in other words, a character equivilance: it is a literal match,
except in its treatment of "", undef, and "\n\n" .
 
K

Keith Thompson

[with respect to perl chomp]
and your description could easily be read to
imply that chomp applied to "hello\n\n" will remove both newline
characters. chomp removes zero or one character from each string.

$ perldoc -f chomp
"When in paragraph mode ($/ = ""), it removes all trailing newlines
from the string."

Ok, I missed that (even though I recently posted it myself). But
that's not the way it's usually used, and I don't believe it's what
the OP was looking for.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,151
Latest member
JaclynMarl
Top