splitting strings with python

Discussion in 'Python' started by sbucking@gmail.com, Jun 9, 2005.

  1. Guest

    im trying to split a string with this form (the string is from a
    japanese dictionary file with mulitple definitions in english for each
    japanese word)


    str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /


    the varibles i need are str*, def*.

    sometimes the (1) and (2) are not included - they are included only if
    the word has two different meanings


    "..." means that there are sometimes more then two definitions per
    meaning.


    im trying to use the re.split() function but with no luck.

    Is this possible with python, or am i dreamin!?

    All the best,

    ..
     
    , Jun 9, 2005
    #1
    1. Advertising

  2. inhahe Guest

    <> wrote in message
    news:...
    > im trying to split a string with this form (the string is from a
    > japanese dictionary file with mulitple definitions in english for each
    > japanese word)
    >
    >
    > str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
    >
    >
    > the varibles i need are str*, def*.
    >
    > sometimes the (1) and (2) are not included - they are included only if
    > the word has two different meanings
    >
    >
    > "..." means that there are sometimes more then two definitions per
    > meaning.
    >
    >
    > im trying to use the re.split() function but with no luck.
    >
    > Is this possible with python, or am i dreamin!?
    >
    > All the best,
    >
    > .
    >


    i don't think you can do it with string.split, although i guess you could do
    it with re.split, although i think it's easier to use re.findall.

    import re
    re.findall("[a-zA-Z][ a-zA-Z0-9]*", inputstring)

    should work.
     
    inhahe, Jun 9, 2005
    #2
    1. Advertising

  3. Guest

    one problem is that str1 is unicode (japanese kanji), and str2 is
    japanese kana

    can i still use re.findall(~)?

    thanks for your help!
     
    , Jun 9, 2005
    #3
  4. Guest

    sorry, i should be more specific about the encoding

    it's euc-jp

    i googled alittle, and you can still use re.findall with the japanese
    kana, but i didnt find anything about kanji.
     
    , Jun 9, 2005
    #4
  5. Kent Johnson Guest

    wrote:
    > im trying to split a string with this form (the string is from a
    > japanese dictionary file with mulitple definitions in english for each
    > japanese word)
    >
    >
    > str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
    >
    >
    > the varibles i need are str*, def*.


    Could you post a few examples of real data and what you want to extract from it? The above raises a few questions:
    - are str* and def* single words or can they include whitespace, comma, slash, paren...
    - not clear what replaces the ... (or if they are literal)

    This might be a good job for PyParsing.

    Kent
    >
    > sometimes the (1) and (2) are not included - they are included only if
    > the word has two different meanings
    >
    >
    > "..." means that there are sometimes more then two definitions per
    > meaning.
    >
    >
    > im trying to use the re.split() function but with no luck.
    >
    > Is this possible with python, or am i dreamin!?
    >
    > All the best,
    >
    > .
    >
     
    Kent Johnson, Jun 9, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. leni
    Replies:
    3
    Views:
    933
    Dag Sunde
    Aug 7, 2005
  2. John Ericson
    Replies:
    0
    Views:
    431
    John Ericson
    Jul 19, 2003
  3. Mark
    Replies:
    0
    Views:
    447
  4. John Dibling
    Replies:
    0
    Views:
    420
    John Dibling
    Jul 19, 2003
  5. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    772
    Malcolm
    Jun 24, 2006
Loading...

Share This Page