Scanning a file character by character

Discussion in 'Python' started by Spacebar265, Feb 5, 2009.

  1. Spacebar265

    Spacebar265 Guest

    Hi. Does anyone know how to scan a file character by character and
    have each character so I can put it into a variable. I am attempting
    to make a chatbot and need this to read the saved input to look for
    spelling mistakes and further analysis of user input.
    Thanks
    Spacebar265
    Spacebar265, Feb 5, 2009
    #1
    1. Advertising

  2. Spacebar265

    Bard Aase Guest

    On 5 Feb, 07:48, Spacebar265 <> wrote:
    > Hi. Does anyone know how to scan a file character by character and
    > have each character so I can put it into a variable. I am attempting
    > to make a chatbot and need this to read the saved input to look for
    > spelling mistakes and further analysis of user input.
    > Thanks
    > Spacebar265


    You can read one byte at the time using the read() method on the file-
    object.
    http://docs.python.org/library/stdtypes.html#file.read

    e.g.:
    f=open("myfile.txt")
    byte=f.read(1)
    Bard Aase, Feb 5, 2009
    #2
    1. Advertising

  3. En Thu, 05 Feb 2009 04:48:13 -0200, Spacebar265 <>
    escribió:

    > Hi. Does anyone know how to scan a file character by character and
    > have each character so I can put it into a variable. I am attempting
    > to make a chatbot and need this to read the saved input to look for
    > spelling mistakes and further analysis of user input.


    Read the file one line at a time, and process each line one character at a
    time:

    with open(filename, "r") as f:
    for line in f:
    for c in line:
    process(c)

    But probably you want to process one *word* at a time; the easiest way
    (perhaps inaccurate) is to just split on whitespace:

    ...
    for word in line.split():
    process(word)

    --
    Gabriel Genellina
    Gabriel Genellina, Feb 5, 2009
    #3
  4. Spacebar265

    Jorgen Grahn Guest

    On Wed, 4 Feb 2009 22:48:13 -0800 (PST), Spacebar265 <> wrote:
    > Hi. Does anyone know how to scan a file character by character and
    > have each character so I can put it into a variable. I am attempting
    > to make a chatbot and need this to read the saved input to look for
    > spelling mistakes and further analysis of user input.


    That does not follow. To analyze a text, the worst possible starting
    point is one variable for each character (what would you call them --
    character_1, character_2, ... character_65802 ?)

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
    \X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
    Jorgen Grahn, Feb 6, 2009
    #4
  5. Spacebar265

    Spacebar265 Guest

    On Feb 7, 2:17 am, Jorgen Grahn <> wrote:
    > On Wed, 4 Feb 2009 22:48:13 -0800 (PST), Spacebar265 <> wrote:
    > > Hi. Does anyone know how to scan a filecharacterbycharacterand
    > > have eachcharacterso I can put it into a variable. I am attempting
    > > to make a chatbot and need this to read the saved input to look for
    > > spelling mistakes and further analysis of user input.

    >
    > That does not follow. To analyze a text, the worst possible starting
    > point is one variable for eachcharacter(what would you call them --
    > character_1, character_2, ... character_65802 ?)
    >
    > /Jorgen
    >
    > --
    >   // Jorgen Grahn <grahn@        Ph'nglui mglw'nafh Cthulhu
    > \X/     snipabacken.se>          R'lyeh wgah'nagl fhtagn!


    How else would you check for spelling mistakes? Because input would be
    very unlikely to be lengthy paragraphs I wouldn't even need very many
    variables. If anyone could suggest an alternative method this would be
    much appreciated.
    Spacebar265, Feb 9, 2009
    #5
  6. Spacebar265

    Steve Holden Guest

    Spacebar265 wrote:
    > On Feb 7, 2:17 am, Jorgen Grahn <> wrote:
    >> On Wed, 4 Feb 2009 22:48:13 -0800 (PST), Spacebar265 <> wrote:
    >>> Hi. Does anyone know how to scan a filecharacterbycharacterand
    >>> have eachcharacterso I can put it into a variable. I am attempting
    >>> to make a chatbot and need this to read the saved input to look for
    >>> spelling mistakes and further analysis of user input.

    >> That does not follow. To analyze a text, the worst possible starting
    >> point is one variable for eachcharacter(what would you call them --
    >> character_1, character_2, ... character_65802 ?)
    >>

    I believe most people would read the input a line at a time and split
    the lines into words. It does depend whether you are attempting
    real-time spelling correction, though. That would be a different case.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
    Steve Holden, Feb 9, 2009
    #6
  7. Spacebar265

    Spacebar265 Guest

    On Feb 9, 5:13 pm, Steve Holden <> wrote:
    > Spacebar265 wrote:
    > > On Feb 7, 2:17 am, Jorgen Grahn <> wrote:
    > >> On Wed, 4 Feb 2009 22:48:13 -0800 (PST), Spacebar265 <> wrote:
    > >>> Hi. Does anyone know how to scan a filecharacterbycharacterand
    > >>> have eachcharacterso I can put it into a variable. I am attempting
    > >>> to make a chatbot and need this to read the saved input to look for
    > >>> spelling mistakes and further analysis of user input.
    > >> That does not follow. To analyze a text, the worst possible starting
    > >> point is one variable for eachcharacter(what would you call them --
    > >> character_1, character_2, ... character_65802 ?)

    >
    > I believe most people would read the input a line at a time and split
    > the lines into words. It does depend whether you are attempting
    > real-time spelling correction, though. That would be a different case.
    >
    > regards
    >  Steve
    > --
    > Steve Holden        +1 571 484 6266   +1 800 494 3119
    > Holden Web LLC              http://www.holdenweb.com/


    Thanks. How would I do separate lines into words without scanning one
    character at a time?
    Spacebar265, Feb 10, 2009
    #7
  8. On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote:

    > How would I do separate lines into words without scanning one character
    > at a time?


    Scan a line at a time, then split each line into words.


    for line in open('myfile.txt'):
    words = line.split()


    should work for a particularly simple-minded idea of words.



    --
    Steven
    Steven D'Aprano, Feb 10, 2009
    #8
  9. "Spacebar265" <> wrote:

    >Thanks. How would I do separate lines into words without scanning one
    >character at a time?


    Type the following at the interactive prompt and see what happens:

    s = "This is a string composed of a few words and a newline\n"
    help(s.split)
    help(s.rstrip)
    help(s.strip)
    dir(s)

    - Hendrik
    Hendrik van Rooyen, Feb 10, 2009
    #9
  10. On Tue, 10 Feb 2009 12:06:06 +0000, Duncan Booth wrote:

    > Steven D'Aprano <> wrote:
    >
    >> On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote:
    >>
    >>> How would I do separate lines into words without scanning one
    >>> character at a time?

    >>
    >> Scan a line at a time, then split each line into words.
    >>
    >>
    >> for line in open('myfile.txt'):
    >> words = line.split()
    >>
    >>
    >> should work for a particularly simple-minded idea of words.
    >>

    > Or for a slightly less simple minded splitting you could try re.split:
    >
    >>>> re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]

    > ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']



    Perhaps I'm missing something, but the above regex does the exact same
    thing as line.split() except it is significantly slower and harder to
    read.

    Neither deal with quoted text, apostrophes, hyphens, punctuation or any
    other details of real-world text. That's what I mean by "simple-minded".


    --
    Steven
    Steven D'Aprano, Feb 10, 2009
    #10
  11. Spacebar265

    Tim Chase Guest

    >> Or for a slightly less simple minded splitting you could try re.split:
    >>
    >>>>> re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]

    >> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

    >
    >
    > Perhaps I'm missing something, but the above regex does the exact same
    > thing as line.split() except it is significantly slower and harder to
    > read.
    >
    > Neither deal with quoted text, apostrophes, hyphens, punctuation or any
    > other details of real-world text. That's what I mean by "simple-minded".


    >>> s = "The quick brown fox jumps, and falls over."
    >>> import re
    >>> re.split(r"(\w+)", s)[1::2]

    ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
    >>> s.split()

    ['The', 'quick', 'brown', 'fox', 'jumps,', 'and', 'falls',
    'over.']

    Note the difference in "jumps" vs. "jumps," (extra comma in the
    string.split() version) and likewise the period after "over".
    Thus not quite "the exact same thing as line.split()".

    I think an easier-to-read variant would be

    >>> re.findall(r"\w+", s)

    ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

    which just finds words. One could also just limit it to letters with

    re.findall("[a-zA-Z]", s)

    as "\w" is a little more encompassing (letters and underscores)
    if that's a problem.

    -tkc
    Tim Chase, Feb 10, 2009
    #11
  12. Spacebar265

    Rhodri James Guest

    On Tue, 10 Feb 2009 22:02:57 -0000, Steven D'Aprano
    <> wrote:

    > On Tue, 10 Feb 2009 12:06:06 +0000, Duncan Booth wrote:
    >
    >> Steven D'Aprano <> wrote:
    >>
    >>> On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote:
    >>>
    >>>> How would I do separate lines into words without scanning one
    >>>> character at a time?
    >>>
    >>> Scan a line at a time, then split each line into words.
    >>>
    >>>
    >>> for line in open('myfile.txt'):
    >>> words = line.split()
    >>>
    >>>
    >>> should work for a particularly simple-minded idea of words.
    >>>

    >> Or for a slightly less simple minded splitting you could try re.split:
    >>
    >>>>> re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]

    >> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

    >
    >
    > Perhaps I'm missing something, but the above regex does the exact same
    > thing as line.split() except it is significantly slower and harder to
    > read.
    >
    > Neither deal with quoted text, apostrophes, hyphens, punctuation or any
    > other details of real-world text. That's what I mean by "simple-minded".


    You're missing something :) Specifically, the punctuation gets swept
    up with the whitespace, and the extended slice skips it. Apostrophes
    (and possibly hyphenation) are still a bit moot, though.



    --
    Rhodri James *-* Wildebeeste Herder to the Masses
    Rhodri James, Feb 10, 2009
    #12
  13. On Tue, 10 Feb 2009 16:46:30 -0600, Tim Chase wrote:

    >>> Or for a slightly less simple minded splitting you could try re.split:
    >>>
    >>>>>> re.split("(\w+)", "The quick brown fox jumps, and falls
    >>>>>> over.")[1::2]
    >>> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

    >>
    >>
    >> Perhaps I'm missing something, but the above regex does the exact same
    >> thing as line.split() except it is significantly slower and harder to
    >> read.


    ....

    > Note the difference in "jumps" vs. "jumps," (extra comma in the
    > string.split() version) and likewise the period after "over". Thus not
    > quite "the exact same thing as line.split()".


    Um... yes. I'll just slink away quietly now... nothing to see here...


    --
    Steven
    Steven D'Aprano, Feb 10, 2009
    #13
  14. Spacebar265

    MRAB Guest

    Steven D'Aprano wrote:
    > On Tue, 10 Feb 2009 16:46:30 -0600, Tim Chase wrote:
    >
    >>>> Or for a slightly less simple minded splitting you could try re.split:
    >>>>
    >>>>>>> re.split("(\w+)", "The quick brown fox jumps, and falls
    >>>>>>> over.")[1::2]
    >>>> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
    >>>
    >>> Perhaps I'm missing something, but the above regex does the exact same
    >>> thing as line.split() except it is significantly slower and harder to
    >>> read.

    >
    > ...
    >
    >> Note the difference in "jumps" vs. "jumps," (extra comma in the
    >> string.split() version) and likewise the period after "over". Thus not
    >> quite "the exact same thing as line.split()".

    >
    > Um... yes. I'll just slink away quietly now... nothing to see here...
    >

    You could've used str.translate to strip out the unwanted characters.
    MRAB, Feb 10, 2009
    #14
  15. Spacebar265

    Spacebar265 Guest

    On Feb 11, 1:06 am, Duncan Booth <> wrote:
    > Steven D'Aprano <> wrote:
    > > On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote:

    >
    > >> How would I do separate lines into words without scanning one character
    > >> at a time?

    >
    > > Scan a line at a time, then split each line into words.

    >
    > > for line in open('myfile.txt'):
    > >     words = line.split()

    >
    > > should work for a particularly simple-minded idea of words.

    >
    > Or for a slightly less simple minded splitting you could try re.split:
    >
    > >>> re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]

    >
    > ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
    >
    > --
    > Duncan Boothhttp://kupuguy.blogspot.com


    Using this code how would it load each word into a temporary variable.
    Spacebar265, Feb 13, 2009
    #15
  16. Spacebar265

    Rhodri James Guest

    On Fri, 13 Feb 2009 03:24:21 -0000, Spacebar265 <>
    wrote:

    > On Feb 11, 1:06 am, Duncan Booth <> wrote:
    >> Steven D'Aprano <> wrote:
    >> > On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote:

    >>
    >> >> How would I do separate lines into words without scanning one

    >> character
    >> >> at a time?

    >>
    >> > Scan a line at a time, then split each line into words.

    >>
    >> > for line in open('myfile.txt'):
    >> >     words = line.split()

    >>
    >> > should work for a particularly simple-minded idea of words.

    >>
    >> Or for a slightly less simple minded splitting you could try re.split:
    >>
    >> >>> re.split("(\w+)", "The quick brown fox jumps, and falls

    >> over.")[1::2]
    >>
    >> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

    > Using this code how would it load each word into a temporary variable.


    Why on earth would you want to? Just index through the list.


    --
    Rhodri James *-* Wildebeeste Herder to the Masses
    Rhodri James, Feb 13, 2009
    #16
  17. Spacebar265

    Josh Dukes Guest

    In [401]: import shlex

    In [402]: shlex.split("""Joe went to 'the store' where he bought a "box of chocolates" and stuff.""")
    Out[402]:
    ['Joe',
    'went',
    'to',
    'the store',
    'where',
    'he',
    'bought',
    'a',
    'box of chocolates',
    'and',
    'stuff.']

    how's that work for ya?

    http://docs.python.org/library/shlex.html

    On Tue, 10 Feb 2009 16:46:30 -0600
    Tim Chase <> wrote:

    > >> Or for a slightly less simple minded splitting you could try
    > >> re.split:
    > >>
    > >>>>> re.split("(\w+)", "The quick brown fox jumps, and falls
    > >>>>> over.")[1::2]
    > >> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']

    > >
    > >
    > > Perhaps I'm missing something, but the above regex does the exact
    > > same thing as line.split() except it is significantly slower and
    > > harder to read.
    > >
    > > Neither deal with quoted text, apostrophes, hyphens, punctuation or
    > > any other details of real-world text. That's what I mean by
    > > "simple-minded".

    >
    > >>> s = "The quick brown fox jumps, and falls over."
    > >>> import re
    > >>> re.split(r"(\w+)", s)[1::2]

    > ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
    > >>> s.split()

    > ['The', 'quick', 'brown', 'fox', 'jumps,', 'and', 'falls',
    > 'over.']
    >
    > Note the difference in "jumps" vs. "jumps," (extra comma in the
    > string.split() version) and likewise the period after "over".
    > Thus not quite "the exact same thing as line.split()".
    >
    > I think an easier-to-read variant would be
    >
    > >>> re.findall(r"\w+", s)

    > ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
    >
    > which just finds words. One could also just limit it to letters with
    >
    > re.findall("[a-zA-Z]", s)
    >
    > as "\w" is a little more encompassing (letters and underscores)
    > if that's a problem.
    >
    > -tkc
    >
    >
    >
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list



    --

    Josh Dukes
    MicroVu IT Department
    Josh Dukes, Feb 17, 2009
    #17
  18. Spacebar265

    Tim Chase Guest

    Josh Dukes wrote:
    > In [401]: import shlex
    >
    > In [402]: shlex.split("""Joe went to 'the store' where he bought a "box of chocolates" and stuff.""")
    >
    > how's that work for ya?


    It works great if that's the desired behavior. However, the OP
    wrote about splitting the lines into separate words, not
    "treating quoted items as a single word". (OP: "How would I do
    separate lines into words without scanning one character at a time?")

    But for pulling out quoted strings as units, the shlex is a great
    module.

    -tkc
    Tim Chase, Feb 17, 2009
    #18
  19. Spacebar265

    rzed Guest

    Spacebar265 <> wrote in
    news:
    om:

    > On Feb 11, 1:06 am, Duncan Booth <>
    > wrote:

    [...]
    >> >>> re.split("(\w+)", "The quick brown fox jumps, and falls
    >> >>> over.")[1::2]

    >>
    >> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls',
    >> 'over']

    >
    > Using this code how would it load each word into a temporary
    > variable.


    >>> import re
    >>> list_name = re.split("(\w+)", "The quick brown fox jumps, and

    falls over.")[1::2]
    >>> list_name[2]

    'brown'

    You see, temporary variables are set. Their names are spelled
    'list_name[x]', where x is an index into the list. If your plan was
    instead to have predefined names of variables, what would they be
    called? How many would you have? With list variables, you will have
    enough, and you will know their names.

    --
    rzed
    rzed, Feb 22, 2009
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brent Burkart

    Scanning log file for content

    Brent Burkart, Jan 23, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    264
    Brent Burkart
    Jan 23, 2004
  2. Rishi  Dhupar

    File Scanning on a Unix Box

    Rishi Dhupar, Apr 1, 2005, in forum: Java
    Replies:
    4
    Views:
    619
    Nigel Wade
    Apr 4, 2005
  3. Scanning a file

    , Oct 28, 2005, in forum: Python
    Replies:
    79
    Views:
    1,599
    David Rasmussen
    Nov 2, 2005
  4. Jeppe Jakobsen
    Replies:
    2
    Views:
    83
    Cameron McBride
    Jan 29, 2006
  5. Anton van der Steen

    Scanning file version 2.0

    Anton van der Steen, Jan 17, 2006, in forum: Perl Misc
    Replies:
    5
    Views:
    109
    Raghuramaiah Gompa
    Jan 24, 2006
Loading...

Share This Page