Can python read up to where a certain pattern is matched?

Discussion in 'Python' started by Anthony Liu, Mar 6, 2004.

  1. Anthony Liu

    Anthony Liu Guest

    I am kinda new to Python, but not new to programming.
    I am a certified Java programmer.

    I don't want to read line after line, neither do I
    want to read the whole file all at once. Thus none of
    read(), readline(), readlines() is what I want. I want
    to read a text file sentence by sentence.

    A sentence by definition is roughly the part between a
    full stop and another full stop or !, ?

    So, for example, for the following text:

    "Some words here, and some other words. Then another
    segment follows, and more. This is a question, a junk
    question, followed by a question mark?"

    It has 3 sentences (2 full stops and 1 question mark),
    and therefore I want to read it in 3 lumps and each
    lump gives me one complete sentence as follows:

    lump 1: Some words here, and some other words.

    lump 2: Then another segment follows, and more.

    lump 3: This is a question, a junk question, followed
    by a question mark?

    How can I achieve this? Do we have a readsentence()
    function?

    Please give a hint. Thank you!


    __________________________________
    Do you Yahoo!?
    Yahoo! Search - Find what you’re looking for faster
    http://search.yahoo.com
     
    Anthony Liu, Mar 6, 2004
    #1
    1. Advertising

  2. Anthony Liu

    William Park Guest

    Anthony Liu <> wrote:
    > I am kinda new to Python, but not new to programming.
    > I am a certified Java programmer.
    >
    > I don't want to read line after line, neither do I
    > want to read the whole file all at once. Thus none of
    > read(), readline(), readlines() is what I want. I want
    > to read a text file sentence by sentence.


    Question: How do I read sentence by sentence?
    Answer: Read input stream char by char.

    --
    William Park, Open Geometry Consulting, <>
    Linux solution for data processing and document management.
     
    William Park, Mar 7, 2004
    #2
    1. Advertising

  3. On 7 Mar 2004 03:00:14 GMT, William Park <>
    declaimed the following in comp.lang.python:

    > Question: How do I read sentence by sentence?
    > Answer: Read input stream char by char.


    Ugh... Even my jaded neophyte self (as of Intro to FORTRAN,
    1976) wouldn't consider that... Of course, since FORTRAN basically was
    line-oriented, one would be biased to other methods.

    IE; write a wrapper subroutine that reads whole lines, looks for
    ".", and returns what lies before it (including it); then shift the
    remains and append the next line for the subsequent call.

    --
    > ============================================================== <
    > | Wulfraed Dennis Lee Bieber KD6MOG <
    > | Bestiaria Support Staff <
    > ============================================================== <
    > Home Page: <http://www.dm.net/~wulfraed/> <
    > Overflow Page: <http://wlfraed.home.netcom.com/> <
     
    Dennis Lee Bieber, Mar 7, 2004
    #3
  4. Anthony Liu

    F. Petitjean Guest

    On Fri, 5 Mar 2004, Anthony Liu <> wrote:
    > I am kinda new to Python, but not new to programming.
    >
    > I don't want to read line after line, neither do I
    > want to read the whole file all at once. Thus none of
    > read(), readline(), readlines() is what I want. I want
    > to read a text file sentence by sentence.
    >
    > A sentence by definition is roughly the part between a
    > full stop and another full stop or !, ?
    >
    > So, for example, for the following text:
    >
    > "Some words here, and some other words. Then another
    > segment follows, and more. This is a question, a junk
    > question, followed by a question mark?"
    >
    > It has 3 sentences (2 full stops and 1 question mark),
    > snip
    > How can I achieve this? Do we have a readsentence()
    > function?
    >
    > Please give a hint. Thank you!
    >

    the hint :
    import itertools
    help(itertool.takewhile)

    # not tested (no python 2.3 on Debian gateway at home)

    import itertools
    def readsentence(iterable, ends = (".", "!", "?"), yield_fn=''.join):
    """generator function which yields sentences terminated by ends"""
    end_pred = ends
    if not callable(ends):
    end_pred = lambda c : c not in ends
    it = iter(iterable)
    while True:
    sentence = []
    add = sentence.append
    for c in itertools.takewhile(end_pred, it)
    add(c)
    # How to have the item skipped by takewhile ?
    t = tuple(sentence)
    if callable(yield_fn):
    t = yield_fn(t)
    yield t

    text = """\
    Some words here, and some other words. Then another
    segment follows, and more. This is a question, a junk
    question, followed by a question mark?"""

    for sentence in readsentence(text):
    print sentence
     
    F. Petitjean, Mar 7, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    676
  2. Andrew Bennetts
    Replies:
    2
    Views:
    371
    Andrew Bennetts
    Mar 6, 2004
  3. Anthony Liu
    Replies:
    0
    Views:
    383
    Anthony Liu
    Mar 7, 2004
  4. Replies:
    2
    Views:
    551
    bruce barker
    Mar 25, 2008
  5. Li Chen
    Replies:
    2
    Views:
    131
    Li Chen
    Aug 3, 2007
Loading...

Share This Page