Using wildcards...

Discussion in 'Python' started by Harlin Seritt, May 2, 2005.

  1. I looked all over the net but could not find if it is possible to
    insert wildcards into strings. What I am trying to do is this: I am
    trying to parse text from a Bible file. In case you're not familiar
    with the way the Bible organizes itself, it is broken down into Books >
    Chapters > Verses. The particular text I am working with are organized
    into Book files (*.txt -- flat text file). Here is what the file looks
    like:

    {1:1} Random text here. {1:2} More text here. and so on.

    Of course the {*} can be of any length, so I can't just do .split()
    based on the length of the bracket text. What I would like to do is to
    ..split() using something akin to this:

    textdata.split('{*}') # The '*' being a wildcard

    Is this possible to do? If so, how is it done?

    Thanks,

    Harlin Seritt
     
    Harlin Seritt, May 2, 2005
    #1
    1. Advertising

  2. Harlin Seritt wrote:

    > I looked all over the net but could not find if it is possible to
    > insert wildcards into strings. What I am trying to do is this: I am
    > trying to parse text from a Bible file. In case you're not familiar
    > with the way the Bible organizes itself, it is broken down into Books >
    > Chapters > Verses. The particular text I am working with are organized
    > into Book files (*.txt -- flat text file). Here is what the file looks
    > like:
    >
    > {1:1} Random text here. {1:2} More text here. and so on.
    >
    > Of course the {*} can be of any length, so I can't just do .split()
    > based on the length of the bracket text. What I would like to do is to
    > .split() using something akin to this:
    >
    > textdata.split('{*}') # The '*' being a wildcard
    >
    > Is this possible to do? If so, how is it done?


    You can use the split function in the re module with a suitable regular
    expression:

    >>> re.split('{\d+:\d+}', textdata)

    ['', ' Random text here. ', ' More text here. and so on.']

    {\d+:\d+} means 'match {, then one or more digits, then :, then one or
    more digits, then }'.

    re.split('{.*}', textdata) would be a more direct translation of your
    wildcard, but that doesn't work: .* matches as much as possible, so in
    your example it would match '{1:1} Random text here. {1:2}' instead of
    just '{1:1}' and '{1:2}'.

    --
    If I have been able to see further, it was only because I stood
    on the shoulders of giants. -- Isaac Newton

    Roel Schroeven
     
    Roel Schroeven, May 2, 2005
    #2
    1. Advertising

  3. Harlin Seritt wrote:
    > {1:1} Random text here. {1:2} More text here. and so on.
    >
    > Of course the {*} can be of any length, so I can't just do .split()
    > based on the length of the bracket text. What I would like to do is to
    > .split() using something akin to this:
    >
    > textdata.split('{*}') # The '*' being a wildcard
    >
    > Is this possible to do? If so, how is it done?
    >


    You should look into re module.
    regex has more flexible features for text processing than string
    module or methods.

    - Regular expression operations
    http://docs.python.org/lib/module-re.html
    - HOWTO
    http://www.amk.ca/python/howto/regex/

    In your case, the code would go like this:

    >>> text = '{1:1} Random text here. {1:2} More text here. and so on.'
    >>> import re
    >>> pattern = re.compile('{\d+:\d+}')
    >>> pattern.split(text)

    ['', ' Random text here. ', ' More text here. and so on.']

    --
    george

    http://www.dynkin.com/
     
    George Yoshida, May 2, 2005
    #3
  4. Harlin Seritt

    Kent Johnson Guest

    Harlin Seritt wrote:
    > I looked all over the net but could not find if it is possible to
    > insert wildcards into strings. What I am trying to do is this: I am
    > trying to parse text from a Bible file. In case you're not familiar
    > with the way the Bible organizes itself, it is broken down into Books >
    > Chapters > Verses. The particular text I am working with are organized
    > into Book files (*.txt -- flat text file). Here is what the file looks
    > like:
    >
    > {1:1} Random text here. {1:2} More text here. and so on.
    >
    > Of course the {*} can be of any length, so I can't just do .split()
    > based on the length of the bracket text. What I would like to do is to
    > .split() using something akin to this:
    >
    > textdata.split('{*}') # The '*' being a wildcard


    You can do this with the re module. For example

    >>> import re
    >>> s = '{1:1} Random text here. {1:2} More text here. and so on.'
    >>> re.split(r'\{[^}]+\}', s)

    ['', ' Random text here. ', ' More text here. and so on.']

    If you want to be a little stricter in what you accept for the split you could look explicitly for
    digits:
    >>> re.split(r'\{\d+:\d+\}', s)

    ['', ' Random text here. ', ' More text here. and so on.']

    Kent
     
    Kent Johnson, May 2, 2005
    #4
  5. George that is what I'm looking for. Thanks, Harlin
     
    Harlin Seritt, May 2, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Josh Martin
    Replies:
    6
    Views:
    1,861
    Josh Martin
    Nov 23, 2003
  2. Web learner
    Replies:
    0
    Views:
    745
    Web learner
    Apr 18, 2006
  3. Dipankar
    Replies:
    17
    Views:
    10,742
    Arne Vajhøj
    Aug 1, 2009
  4. William Hudspeth
    Replies:
    2
    Views:
    298
    Fabio FZero
    Mar 15, 2007
  5. William Hudspeth
    Replies:
    1
    Views:
    400
    Sion Arrowsmith
    Mar 16, 2007
Loading...

Share This Page