splitting strings with python

S

sbucking

im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)


str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /


the varibles i need are str*, def*.

sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings


"..." means that there are sometimes more then two definitions per
meaning.


im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

..
 
I

inhahe

im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)


str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /


the varibles i need are str*, def*.

sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings


"..." means that there are sometimes more then two definitions per
meaning.


im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

.

i don't think you can do it with string.split, although i guess you could do
it with re.split, although i think it's easier to use re.findall.

import re
re.findall("[a-zA-Z][ a-zA-Z0-9]*", inputstring)

should work.
 
S

sbucking

one problem is that str1 is unicode (japanese kanji), and str2 is
japanese kana

can i still use re.findall(~)?

thanks for your help!
 
S

sbucking

sorry, i should be more specific about the encoding

it's euc-jp

i googled alittle, and you can still use re.findall with the japanese
kana, but i didnt find anything about kanji.
 
K

Kent Johnson

im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)


str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /


the varibles i need are str*, def*.

Could you post a few examples of real data and what you want to extract from it? The above raises a few questions:
- are str* and def* single words or can they include whitespace, comma, slash, paren...
- not clear what replaces the ... (or if they are literal)

This might be a good job for PyParsing.

Kent
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top