Split, variable delimiter

H

Heath

Uri said:
t> Can be done. Set $delim to "\s" not ' '.

rtfm some more please. first off you want \s+ to get something like
split ' '. and ' ' is not only special cased for splitting on any
whitespace but it emulates awk's behavior of skipping leading white
space.

Just setting $delim = "\s" or "\s+" gives me a 'Unrecognized escape \s'
warning. '\s+' works nicely but, as you stated, still doesn't take
care of the leading whitespaces which is what I'd like.

Bo said:
By way of the qr construct. In a hypothetical future Perl version

my $delim=' ';
my @fields=split($delim);

would be equivalent to

my @fields=split(' ');

while

my $delim=qr/ /;
my @fields=split($delim);

would be equivalent to

my @fields=split(/ /);

and everyone would be happy. Right?


/Bo Lindbergh

This is actually how I thought it would work, but until your
hypothetical version of Perl is released I guess I'll have to:

my @fields = split ($delim ? $delim : ' ');

which is decidedly less pretty.

Aaron said:
The documentation explains this quite well. However, it never says
that ' ' can't be passed in a variable, as far as I could find. Since
this is such a special case (Is there any other case in perl when this
is true?), perhaps the documentation should have a few lines added to
make that clear.

I think that would definitely help. This actually got pretty annoying
until I caved in and posted.


Thanks to all for helping clear this up.
 
U

Uri Guttman

H> Just setting $delim = "\s" or "\s+" gives me a 'Unrecognized escape \s'
H> warning. '\s+' works nicely but, as you stated, still doesn't take
H> care of the leading whitespaces which is what I'd like.

be careful of what quote chars you use there. the 2 first ones are in ""
which will eat \ stuff and it is probably what is spitting out the error
as \s is not a valid escape char in "" string (like \n is). the ''
version doesn't look at \ (except before \ and ') so your regex gets
passed thru as is. better yet is to use qr// as it will even make sure
you pass in a clean regex.


H> This is actually how I thought it would work, but until your
H> hypothetical version of Perl is released I guess I'll have to:

H> my @fields = split ($delim ? $delim : ' ');

H> which is decidedly less pretty.

since that only tests $delim for perl truth, you can just use:

@fields = split ($delim || ' ');


BTW, you should not make one followup to three different posts (and from
three different posters). it only confuses the threads.

uri
 
T

Tad McClellan

[ Please do not top-post.
Text rearranged into a sensible order.
]


thrill5 said:
Uri Guttman said:
"H" == Heath <[email protected]> writes:

H> Yes, I read through that before I ever posted. The behavior I'm
H> after is that of [split ' '].
H> All I need is a value to assign to $delim such that a [split $delim]
H> will give me the same behavior as a [split].

can't be done. so choose another solution.
Can be done.


OK, let's see it then!

Got code that "does" it?

I think not.

Set $delim to "\s" not ' '.


The OP wants to ignore the leading empty fields, your suggestion
does not ignore the leading empty fields.


perl -le '$_="\t\t\tone\ttwo\tthree"; print for split'
one
two
three


perl -le '$_="\t\t\tone\ttwo\tthree"; $delim = "\s"; print for split $delim'
one two three
(there is no 's' to split on)

perl -le '$_="\t\t\tone\ttwo\tthree"; $delim = q(\s); print for split $delim'



one
two
three
(3 leading empty fields)

perl -le '$_="\t\t\tone\ttwo\tthree"; $delim = "\s"; print for split /\s/'



one
two
three
(3 leading empty fields again)



If you do not understand the question, it is best to refrain
from answering it.
 
T

thrill5

To Uri,

Yes, I do have a few more comments.

1) Your an ASSHOLE.

2) The answer was better than 'Can't be done'.

3) If jumping in my shit makes you feel like more of "programmer" than I
feel sorry for you and take back the 'ASSHOLE' comment.

4) Posting messages is a way to learn more and your "holier-than-thou'
attitude makes some people uncomfortable and they so they don't post. I am
not one of those people. Next time, please for the sake of the rest of us,
leave the attitude at the door.

Scott
 
A

A. Sinan Unur

To Uri,

Yes, I do have a few more comments.

1) Your an

You might want to learn how to spell properly if you are going to call
people names.

.....
Next time, please for the sake of the
rest of us, leave the attitude at the door.


Good advice for you. Bye.

Sinan
 
T

thrill5

Wow! Two birds with one stone!!

Scott
A. Sinan Unur said:
You might want to learn how to spell properly if you are going to call
people names.

....



Good advice for you. Bye.

Sinan
 
U

Uri Guttman

t> To Uri,
t> Yes, I do have a few more comments.

t> 1) Your an ASSHOLE.

s/your/you're/. please spell your insults correctly if you want them to
have the maximum effect.

t> 2) The answer was better than 'Can't be done'.

but it was wrong. hmmm which is better, a long and wrong answer or short
correct one. tough choice. think about it. don't hurt yourself doing it.

t> 3) If jumping in my shit makes you feel like more of "programmer" than I
t> feel sorry for you and take back the 'ASSHOLE' comment.

i have no need for any emotional connection with you. i was working in
the perl universe. too bad you don't get that. i don't feel sorry for
you.

t> 4) Posting messages is a way to learn more and your
t> "holier-than-thou' attitude makes some people uncomfortable and
t> they so they don't post. I am not one of those people. Next time,
t> please for the sake of the rest of us, leave the attitude at the
t> door.

hmm. pot meet kettle. in the world of programming, accuracy and
correctness are generally appreciated more than social niceties. but
then you may want to be told sweet lies about how great your code is and
how it satisfies the requirements. i prefer to have correct code and i
am open to anyone who could improve my work. kinda puts the work over my
personal glory. but you wouldn't understand that.

<snip of entire botton quote>

and you still don't know about top posting. when will you learn
anything!?!

have fun!

uri
 
P

Paul Lalli

Uri said:
s/your/you're/. please spell your insults correctly if you want them to
have the maximum effect.

While I do, of course, agree whole-heartily with your entire post, I
can help but point out my amusement that someone who flat out refuses
to use the shift key to correctly capitalize his sentences is
complaining about someone else's spelling... ;-)

Paul Lalli
 
I

it_says_BALLS_on_your_forehead

Paul said:
While I do, of course, agree whole-heartily with your entire post, I
can help but point out my amusement that someone who flat out refuses
to use the shift key to correctly capitalize his sentences is
complaining about someone else's spelling... ;-)

you "can help" but choose not to, apparently ;-).
 
P

Paul Lalli

it_says_BALLS_on_your_forehead said:
you "can help" but choose not to, apparently ;-).

You know... I went over that post 3 times before submitting, because I
*really* wanted to avoid anyone calling me on my calling someone else
on their calling someone else on grammatical and spelling mistakes....
and I completely missed an entire lack of "'t". Sheesh.

Paul Lalli
 
A

Aaron Baugher

Paul Lalli said:
You know... I went over that post 3 times before submitting, because
I *really* wanted to avoid anyone calling me on my calling someone
else on their calling someone else on grammatical and spelling
mistakes.... and I completely missed an entire lack of "'t".
Sheesh.

That's one of the immutable laws of nature: when posting a grammar or
spelling flame, you're guaranteed to make a similar error of your
own. Usually a worse one.
 
U

Uri Guttman

PL> You know... I went over that post 3 times before submitting,
PL> because I *really* wanted to avoid anyone calling me on my calling
PL> someone else on their calling someone else on grammatical and
PL> spelling mistakes.... and I completely missed an entire lack of
PL> "'t". Sheesh.

SERVERS YU WRITE FOUR TROLING ON MY LAK UF CAPITALIZATION! I DON'T LYKE
YOU'RE ATITOOD!

uri
 
H

Heath

Uri said:
H> This is actually how I thought it would work, but until your
H> hypothetical version of Perl is released I guess I'll have to:

H> my @fields = split ($delim ? $delim : ' ');

H> which is decidedly less pretty.

since that only tests $delim for perl truth, you can just use:

@fields = split ($delim || ' ');

Well, that made sense to me too. It turns out we were both wrong:

use strict;
use warnings;

$_ = "\t\t\tthis\t\tis\tsome\t\t\ttext\t\t\n";

my $delim = '';

print ($delim || "\n\$delim is false\n"); # Sanity check.
print "split - variable w/or\n";
print "[$_]\n" for (split ($delim || ' '));

print "split - variable w/ternary\n";
print "[$_]\n" for (split ($delim ? $delim : ' '));

# Impractical useage of || and ?: just for illustration.

print "\nsplit - literal w/or\n";
print "[$_]\n" for (split ('' || ' '));

print "split - literal w/ternary\n";
print "[$_]\n" for (split ('' ? '' : ' '));


======== OUTPUT ============

$delim is false
split - variable w/or
[ this is some
text
]
split - variable w/ternary
[ this is some
text
]

split - literal w/or
[this]
[is]
[some]
[text]
split - literal w/ternary
[this]
[is]
[some]
[text]

======= END OUTPUT ========

My first guess was that in order to get the special case [split ' '],
the call must appear exactly that way in the code (or as [split]).
My last 4 lines of code shatter that theory, so I'm at a loss. Maybe
it has something to do with the levels of interpretation. ie: because
one of the operands of the or/ternary is a variable, the expression
isn't interpreted as a literal. I know very little about perl
internals so I'm just guessing here.
BTW, you should not make one followup to three different posts (and from
three different posters). it only confuses the threads.

uri

Sorry, thanks for the tip.
 
H

Heath

A subtle yet accurate indicator of where this discussion
has gone:

==== my right border =====

Related Pages
Spelling errors galore
"Liason"? It should of course be "liaison". This mistake has
....
thestar.com.my

Who You Callin' Ungrammatical?
By Jan Freeman, the Boston Globe. WHOM IS disappearing from the ...
www.fcnp.com

perlfaq7 - Perl Language Issues
www.perl.com

==== END my right border =====

Listed in order of relevance.
 
U

Uri Guttman

H> Well, that made sense to me too. It turns out we were both wrong:

H> # Impractical useage of || and ?: just for illustration.

H> print "\nsplit - literal w/or\n";
H> print "[$_]\n" for (split ('' || ' '));

H> print "split - literal w/ternary\n";
H> print "[$_]\n" for (split ('' ? '' : ' '));

those are compile time folded constant expressions. so split will see
' ' only. seems that split's signature checking is compile time for ' '
and not runtime in any way.

so the only solution is different calls to split. that can be done
easily with anon subs or basic conditionals.

uri
 
R

robic0

robic0 wrote:


It is a bug (I mean a feature) of split. According to the docs

Uh, what part of the docs are you refering to?

the Perl parser seems to look for the single quoted space ' ' and that
differentiates it from a space " " as a pattern.

It does not. ' ', " ", q{ }, qq{ } are all the same in this context. They
are differentiated from / /, qr{ }, and $x where $x eq ' ';
split ' '; is NOT the same context as split " "; nor
$dl = ' '; split $dl;

The discussion is about the special parsing done for "split ' '" context,
it is not about what, for exapmple split $dl; where $dl = ' '; means.

Wake up man..........

Heres the posting quotes of some internal posts (which the argument hinges):
If you don't want to read this, the jist is its a compiler with multiple form
intrisick functions.......
==============================================

H> Yes, I read through that before I ever posted. The behavior I'm
H> after is that of [split ' ']. I don't get that behavior when I pass
H> the space char to split via a variable. I would simply just like to
H> know why that is and how I can get that behavior by passing a variable,
H> if it is possible at all.

rtfm some more:

As a special case, specifying a PATTERN of space
(' ') will split on white space just as "split" with
no arguments does. Thus, "split(' ')" can be used
to emulate awk's default behavior, ...

note that PATTERN is the actual literal passed to split so it can't be a
variable. otherwise how could it tell / / from ' ' from $foo = ' '? this

I don't know, I would consider this a bug, aka, left out check. Within split, if
the $foo name is passed as a literal name, the contents have to be obtained.
So if $foo = "' '", it should be fairly obvious what the meaning is.

But I don't think some intrinsics work that way. I think as far as the Pattern
in split, the parser looks for a split ' ' or split pattern and internally changes
the call to a different function with different parameters, than any other form of split.
There may be several internal split functions.
Since it has to be parsed anyway, its easier to redirect different "forms" to predefined
functions that handle specific ones. Thereby speeding up the processor.
is a very odd way to get that special behavior as it is inband and very
special cased. if you must have that vs other splits on demand, use a
sub to handle your cased and do 2 different splits based on $foo eq '
'. or use code refs for each split type. many ways to handle it.

H> All I need is a value to assign to $delim such that a [split $delim]
H> will give me the same behavior as a [split].

can't be done. so choose another solution.

uri
====================

"r" == robic0 <robic0> writes:


r> I don't know, I would consider this a bug, aka, left out
r> check. Within split, if the $foo name is passed as a literal name,
r> the contents have to be obtained. So if $foo = "' '", it should be
r> fairly obvious what the meaning is.

i consider you a genomic bug.

r> But I don't think some intrinsics work that way. I think as far as
r> the Pattern in split, the parser looks for a split ' ' or split
r> pattern and internally changes the call to a different function
r> with different parameters, than any other form of split. There may
r> be several internal split functions. Since it has to be parsed
r> anyway, its easier to redirect different "forms" to predefined
r> functions that handle specific ones. Thereby speeding up the
r> processor.

speeding up the processor? what kind of crack are you smoking? this
whole discussion has nothing to do with the speed of split. the various
special behaviors of split do not need seperate implmentations.
I'm speechless.. You just discounted all compiled and semi-compiled (fixup)
languages. You must think Perl core is written in Perl.

The "Processor" is commonly known as the "engine", the core. Perl follows
that and has multiple core implementations of intrinsics, it does a modified
compile at loadtime and further compiles at runtime.

OR it does just a HARDCODED replacement of the "split ' '" form!
Which is a hack patch of some feeble minded Perl programmer.
There are no other options at all!!
Eh?
I'm going to have to state this one more time:

OR it does just a HARDCODED replacement of the "split ' '" form!
Which is a hack patch of some feeble minded Perl programmer.

Where it does it i don't know. But i'll tell you one thing...
The PARAMETER is fuckin *not* passed as a literal to the
os subroutine as uri ghelik the spoon man mentioned!!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top