Good String Tokenizer

J

JamesHoward

I have searched the board and noticed that there isn't really any sort
of good implementation of a string tokenizer that will tokenize based
on a custom set of tokens and return both the tokens and the parts
between the tokens.

For example, if I have the string:

"Hello, World! How are you?"

And my splitting points are comma, and exclamation point then I would
expect to get back.

["Hello", ",", " World", "!", " How are you?"]

Does anyone know of a tokenizer that will allow for this sort of use?

Thanks in advance,
Jim Howard
 
J

James Stroud

JamesHoward said:
I have searched the board and noticed that there isn't really any sort
of good implementation of a string tokenizer that will tokenize based
on a custom set of tokens and return both the tokens and the parts
between the tokens.

For example, if I have the string:

"Hello, World! How are you?"

And my splitting points are comma, and exclamation point then I would
expect to get back.

["Hello", ",", " World", "!", " How are you?"]

Does anyone know of a tokenizer that will allow for this sort of use?

Thanks in advance,
Jim Howard

Pyparsing: http://pyparsing.wikispaces.com/

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
D

Duncan Booth

JamesHoward said:
I have searched the board
what board? I don't see any boards here.
And my splitting points are comma, and exclamation point then I would
expect to get back.

["Hello", ",", " World", "!", " How are you?"]

Does anyone know of a tokenizer that will allow for this sort of use?
import re
re.split("([!,])", "Hello, World! How are you?")
['Hello', ',', ' World', '!', ' How are you?']
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top