Suggestion: str.itersplit()

D

Dustan

From my searches here, there is no equivalent to java's
StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?
 
M

Marc 'BlackJack' Rintsch

From my searches here, there is no equivalent to java's
StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?

Does it really make such a difference?

Ciao,
Marc 'BlackJack' Rintsch
 
S

subscriber123

StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?

That would be good, because then you could iterate over strings the
same way that you iterate over files:

for line in string.itersplit("\n"):
## for block ##
 
D

Dustan

StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?

If anybody could inform me on how to get my hands on the python source
code, I might even be able to come up with an example of how it could
be implemented. I have no idea how to unzip that tgz or tar.bz2 file
on a windows machine, though (and that's not from lack of trying).
 
A

attn.steven.kuo

StringTokenizer in python, which seems like a real shame to me.

However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.

Comments?



If your delimiter is a non-empty string, you
can use an iterator like:

def it(S, sub):
start = 0
sublen = len(sub)
while True:
idx = S.find(sub,start)
if idx == -1:
yield S[start:]
raise StopIteration
else:
yield S[start:idx]
start = idx + sublen

target_string = 'abcabcabc'
for subs in it(target_string,'b'):
print subs


For something more complex,
you may be able to use
re.finditer.
 
J

Jorge Godoy

Dustan said:
If anybody could inform me on how to get my hands on the python source
code, I might even be able to come up with an example of how it could
be implemented. I have no idea how to unzip that tgz or tar.bz2 file
on a windows machine, though (and that's not from lack of trying).

You can try WinZip. Last time I had to use a Windows machine it was
able to untar + gunzip some files perfectly fine (as we are able to
unzip and unrar on *nix...).
 
A

Alex Martelli

Dustan said:
If anybody could inform me on how to get my hands on the python source
code, I might even be able to come up with an example of how it could
be implemented. I have no idea how to unzip that tgz or tar.bz2 file
on a windows machine, though (and that's not from lack of trying).

Top search hit for
windows tar
is <http://gnuwin32.sourceforge.net/packages/tar.htm> , but its contents
suggest using <http://gnuwin32.sourceforge.net/packages/bsdtar.htm>
instead (it has "the ability to direcly create and manipulate .tar,
..tar.gz, tar.bz2, .zip, .gz and .bz2 archives, understands the most-used
options of GNU Tar, and is also much faster; for most purposes it is to
be preferred to GNU Tar", to quote).


Alex
 
D

Dustan

StringTokenizer in python, which seems like a real shame to me.
However, str.split() works just as well, except for the fact that it
creates it all at one go. I suggest an itersplit be introduced for
lazy evaluation, if you don't want to take up recourses, and it could
be used just like java's StringTokenizer.
Comments?

If your delimiter is a non-empty string, you
can use an iterator like:

def it(S, sub):
start = 0
sublen = len(sub)
while True:
idx = S.find(sub,start)
if idx == -1:
yield S[start:]
raise StopIteration
else:
yield S[start:idx]
start = idx + sublen

target_string = 'abcabcabc'
for subs in it(target_string,'b'):
print subs

Thanks.

Well, now I know it can be implemented in a reasonably efficient
manner in pure python (ie without having side-efect strings that
aren't of any use, as with concatenation). That's what I was mainly
concerned about.

I feel that it could be a builtin function (seriously, the world
wouldn't end if it was, and nor would python), but this'll work.
That's my last word on the subject.
 
D

Dustan

If anybody could inform me on how to get my hands on the python source
code, I might even be able to come up with an example of how it could
be implemented. I have no idea how to unzip that tgz or tar.bz2 file
on a windows machine, though (and that's not from lack of trying).

Thanks to both Jorge Godoy and Alex Martelli for their responses; I
went with winzip. After spending about 10 minutes looking at this
stuff, I can easily conclude that having the code and understanding
the code are 2 very different things (and yes, I do have some
experience in C and C++). But that's a matter to tackle on another day.
 
S

Stargaming

subscriber123 said:
That would be good, because then you could iterate over strings the
same way that you iterate over files:

for line in string.itersplit("\n"):
## for block ##
.... This is a comment.
.... With a few more lines.""".... print line
....
Hello world.
This is a comment.
With a few more lines..... print line
....
Hello world.
This is a comment.
With a few more lines.

Iterators would just speed up the whole thing and be more pythonic
(since development goes straight into the direction of converting all
and everything into iterators).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,151
Latest member
JaclynMarl
Top