Can python read up to where a certain pattern is matched?

A

Anthony Liu

I am kinda new to Python, but not new to programming.
I am a certified Java programmer.

I don't want to read line after line, neither do I
want to read the whole file all at once. Thus none of
read(), readline(), readlines() is what I want. I want
to read a text file sentence by sentence.

A sentence by definition is roughly the part between a
full stop and another full stop or !, ?

So, for example, for the following text:

"Some words here, and some other words. Then another
segment follows, and more. This is a question, a junk
question, followed by a question mark?"

It has 3 sentences (2 full stops and 1 question mark),
and therefore I want to read it in 3 lumps and each
lump gives me one complete sentence as follows:

lump 1: Some words here, and some other words.

lump 2: Then another segment follows, and more.

lump 3: This is a question, a junk question, followed
by a question mark?

How can I achieve this? Do we have a readsentence()
function?

Please give a hint. Thank you!


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com
 
W

William Park

Anthony Liu said:
I am kinda new to Python, but not new to programming.
I am a certified Java programmer.

I don't want to read line after line, neither do I
want to read the whole file all at once. Thus none of
read(), readline(), readlines() is what I want. I want
to read a text file sentence by sentence.

Question: How do I read sentence by sentence?
Answer: Read input stream char by char.
 
D

Dennis Lee Bieber

Question: How do I read sentence by sentence?
Answer: Read input stream char by char.

Ugh... Even my jaded neophyte self (as of Intro to FORTRAN,
1976) wouldn't consider that... Of course, since FORTRAN basically was
line-oriented, one would be biased to other methods.

IE; write a wrapper subroutine that reads whole lines, looks for
".", and returns what lies before it (including it); then shift the
remains and append the next line for the subsequent call.

--
 
F

F. Petitjean

I am kinda new to Python, but not new to programming.

I don't want to read line after line, neither do I
want to read the whole file all at once. Thus none of
read(), readline(), readlines() is what I want. I want
to read a text file sentence by sentence.

A sentence by definition is roughly the part between a
full stop and another full stop or !, ?

So, for example, for the following text:

"Some words here, and some other words. Then another
segment follows, and more. This is a question, a junk
question, followed by a question mark?"

It has 3 sentences (2 full stops and 1 question mark),
snip
How can I achieve this? Do we have a readsentence()
function?

Please give a hint. Thank you!
the hint :
import itertools
help(itertool.takewhile)

# not tested (no python 2.3 on Debian gateway at home)

import itertools
def readsentence(iterable, ends = (".", "!", "?"), yield_fn=''.join):
"""generator function which yields sentences terminated by ends"""
end_pred = ends
if not callable(ends):
end_pred = lambda c : c not in ends
it = iter(iterable)
while True:
sentence = []
add = sentence.append
for c in itertools.takewhile(end_pred, it)
add(c)
# How to have the item skipped by takewhile ?
t = tuple(sentence)
if callable(yield_fn):
t = yield_fn(t)
yield t

text = """\
Some words here, and some other words. Then another
segment follows, and more. This is a question, a junk
question, followed by a question mark?"""

for sentence in readsentence(text):
print sentence
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top