Scanning a file character by character

S

Spacebar265

Hi. Does anyone know how to scan a file character by character and
have each character so I can put it into a variable. I am attempting
to make a chatbot and need this to read the saved input to look for
spelling mistakes and further analysis of user input.
Thanks
Spacebar265
 
B

Bard Aase

Hi. Does anyone know how to scan a file character by character and
have each character so I can put it into a variable. I am attempting
to make a chatbot and need this to read the saved input to look for
spelling mistakes and further analysis of user input.
Thanks
Spacebar265

You can read one byte at the time using the read() method on the file-
object.
http://docs.python.org/library/stdtypes.html#file.read

e.g.:
f=open("myfile.txt")
byte=f.read(1)
 
G

Gabriel Genellina

Hi. Does anyone know how to scan a file character by character and
have each character so I can put it into a variable. I am attempting
to make a chatbot and need this to read the saved input to look for
spelling mistakes and further analysis of user input.

Read the file one line at a time, and process each line one character at a
time:

with open(filename, "r") as f:
for line in f:
for c in line:
process(c)

But probably you want to process one *word* at a time; the easiest way
(perhaps inaccurate) is to just split on whitespace:

...
for word in line.split():
process(word)
 
J

Jorgen Grahn

Hi. Does anyone know how to scan a file character by character and
have each character so I can put it into a variable. I am attempting
to make a chatbot and need this to read the saved input to look for
spelling mistakes and further analysis of user input.

That does not follow. To analyze a text, the worst possible starting
point is one variable for each character (what would you call them --
character_1, character_2, ... character_65802 ?)

/Jorgen
 
S

Spacebar265

That does not follow. To analyze a text, the worst possible starting
point is one variable for eachcharacter(what would you call them --
character_1, character_2, ... character_65802 ?)

/Jorgen

How else would you check for spelling mistakes? Because input would be
very unlikely to be lengthy paragraphs I wouldn't even need very many
variables. If anyone could suggest an alternative method this would be
much appreciated.
 
S

Steve Holden

I believe most people would read the input a line at a time and split
the lines into words. It does depend whether you are attempting
real-time spelling correction, though. That would be a different case.

regards
Steve
 
S

Spacebar265

I believe most people would read the input a line at a time and split
the lines into words. It does depend whether you are attempting
real-time spelling correction, though. That would be a different case.

regards
 Steve

Thanks. How would I do separate lines into words without scanning one
character at a time?
 
S

Steven D'Aprano

How would I do separate lines into words without scanning one character
at a time?

Scan a line at a time, then split each line into words.


for line in open('myfile.txt'):
words = line.split()


should work for a particularly simple-minded idea of words.
 
H

Hendrik van Rooyen

Spacebar265 said:
Thanks. How would I do separate lines into words without scanning one
character at a time?

Type the following at the interactive prompt and see what happens:

s = "This is a string composed of a few words and a newline\n"
help(s.split)
help(s.rstrip)
help(s.strip)
dir(s)

- Hendrik
 
S

Steven D'Aprano

Steven D'Aprano said:
Scan a line at a time, then split each line into words.


for line in open('myfile.txt'):
words = line.split()


should work for a particularly simple-minded idea of words.
Or for a slightly less simple minded splitting you could try re.split:
re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']


Perhaps I'm missing something, but the above regex does the exact same
thing as line.split() except it is significantly slower and harder to
read.

Neither deal with quoted text, apostrophes, hyphens, punctuation or any
other details of real-world text. That's what I mean by "simple-minded".
 
T

Tim Chase

Or for a slightly less simple minded splitting you could try re.split:
re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']


Perhaps I'm missing something, but the above regex does the exact same
thing as line.split() except it is significantly slower and harder to
read.

Neither deal with quoted text, apostrophes, hyphens, punctuation or any
other details of real-world text. That's what I mean by "simple-minded".
>>> s = "The quick brown fox jumps, and falls over."
>>> import re
>>> re.split(r"(\w+)", s)[1::2] ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
>>> s.split()
['The', 'quick', 'brown', 'fox', 'jumps,', 'and', 'falls',
'over.']

Note the difference in "jumps" vs. "jumps," (extra comma in the
string.split() version) and likewise the period after "over".
Thus not quite "the exact same thing as line.split()".

I think an easier-to-read variant would be
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

which just finds words. One could also just limit it to letters with

re.findall("[a-zA-Z]", s)

as "\w" is a little more encompassing (letters and underscores)
if that's a problem.

-tkc
 
R

Rhodri James

Steven D'Aprano said:
On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote:

How would I do separate lines into words without scanning one
character at a time?

Scan a line at a time, then split each line into words.


for line in open('myfile.txt'):
words = line.split()


should work for a particularly simple-minded idea of words.
Or for a slightly less simple minded splitting you could try re.split:
re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']


Perhaps I'm missing something, but the above regex does the exact same
thing as line.split() except it is significantly slower and harder to
read.

Neither deal with quoted text, apostrophes, hyphens, punctuation or any
other details of real-world text. That's what I mean by "simple-minded".

You're missing something :) Specifically, the punctuation gets swept
up with the whitespace, and the extended slice skips it. Apostrophes
(and possibly hyphenation) are still a bit moot, though.
 
S

Steven D'Aprano

Or for a slightly less simple minded splitting you could try re.split:

re.split("(\w+)", "The quick brown fox jumps, and falls
over.")[1::2]
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']


Perhaps I'm missing something, but the above regex does the exact same
thing as line.split() except it is significantly slower and harder to
read.
....

Note the difference in "jumps" vs. "jumps," (extra comma in the
string.split() version) and likewise the period after "over". Thus not
quite "the exact same thing as line.split()".

Um... yes. I'll just slink away quietly now... nothing to see here...
 
M

MRAB

Steven said:
Or for a slightly less simple minded splitting you could try re.split:

re.split("(\w+)", "The quick brown fox jumps, and falls
over.")[1::2]
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

Perhaps I'm missing something, but the above regex does the exact same
thing as line.split() except it is significantly slower and harder to
read.
...

Note the difference in "jumps" vs. "jumps," (extra comma in the
string.split() version) and likewise the period after "over". Thus not
quite "the exact same thing as line.split()".

Um... yes. I'll just slink away quietly now... nothing to see here...
You could've used str.translate to strip out the unwanted characters.
 
S

Spacebar265

Scan a line at a time, then split each line into words.
for line in open('myfile.txt'):
    words = line.split()
should work for a particularly simple-minded idea of words.

Or for a slightly less simple minded splitting you could try re.split:
re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]

['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

Using this code how would it load each word into a temporary variable.
 
R

Rhodri James

How would I do separate lines into words without scanning one character
at a time?
Scan a line at a time, then split each line into words.
for line in open('myfile.txt'):
    words = line.split()
should work for a particularly simple-minded idea of words.

Or for a slightly less simple minded splitting you could try re.split:
re.split("(\w+)", "The quick brown fox jumps, and falls
over.")[1::2]

['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
Using this code how would it load each word into a temporary variable.

Why on earth would you want to? Just index through the list.
 
J

Josh Dukes

In [401]: import shlex

In [402]: shlex.split("""Joe went to 'the store' where he bought a "box of chocolates" and stuff.""")
Out[402]:
['Joe',
'went',
'to',
'the store',
'where',
'he',
'bought',
'a',
'box of chocolates',
'and',
'stuff.']

how's that work for ya?

http://docs.python.org/library/shlex.html

Or for a slightly less simple minded splitting you could try
re.split:

re.split("(\w+)", "The quick brown fox jumps, and falls
over.")[1::2]
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']


Perhaps I'm missing something, but the above regex does the exact
same thing as line.split() except it is significantly slower and
harder to read.

Neither deal with quoted text, apostrophes, hyphens, punctuation or
any other details of real-world text. That's what I mean by
"simple-minded".
s = "The quick brown fox jumps, and falls over."
import re
re.split(r"(\w+)", s)[1::2] ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
s.split()
['The', 'quick', 'brown', 'fox', 'jumps,', 'and', 'falls',
'over.']

Note the difference in "jumps" vs. "jumps," (extra comma in the
string.split() version) and likewise the period after "over".
Thus not quite "the exact same thing as line.split()".

I think an easier-to-read variant would be
['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

which just finds words. One could also just limit it to letters with

re.findall("[a-zA-Z]", s)

as "\w" is a little more encompassing (letters and underscores)
if that's a problem.

-tkc
 
T

Tim Chase

Josh said:
In [401]: import shlex

In [402]: shlex.split("""Joe went to 'the store' where he bought a "box of chocolates" and stuff.""")

how's that work for ya?

It works great if that's the desired behavior. However, the OP
wrote about splitting the lines into separate words, not
"treating quoted items as a single word". (OP: "How would I do
separate lines into words without scanning one character at a time?")

But for pulling out quoted strings as units, the shlex is a great
module.

-tkc
 
R

rzed

om:
On Feb 11, 1:06 am, Duncan Booth <[email protected]>
wrote: [...]
re.split("(\w+)", "The quick brown fox jumps, and falls
over.")[1::2]

['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls',
'over']

Using this code how would it load each word into a temporary
variable.
import re
list_name = re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]
list_name[2]
'brown'

You see, temporary variables are set. Their names are spelled
'list_name[x]', where x is an index into the list. If your plan was
instead to have predefined names of variables, what would they be
called? How many would you have? With list variables, you will have
enough, and you will know their names.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top