Not able to read blank lines and spaces on a small text file

R

Ruben

Hello.

I am trying to read a small text file using the readline statement. I
can only read the first 2 records from the file. It stops at the blank
lines or at lines with only spaces. I have a while statement checking
for an empty string "" which I understand represents an EOF in Python.
The text file has some blank lines with spaces and other with blanks.

Thanks a lot.

Ruben

The following is the text file: The first line begins with OrgID.
OrgID: Joe S. Smith
OrgName: Smith Foundation




OrgID: Ronald K.Jones
OrgName: Jones Foundation

The following is my script:



#Open input file to be processed with READ access
input_file = open("C:/Documents and Settings/ruben/My
Documents/Python/text.txt", "r")

empty_string_lines = 0


record = input_file.readline()
while record != "":

try:

record = input_file.readline()


# Split words separated by delimiter(TAB) or separated by spaces
# into elements of the list "key_value_pair"


key_value_pair = record.split()

key = key_value_pair[0]

# Slice/delete first element of list "key_value_pair"

value = key_value_pair[1:]

# Join all elements from the list "value" and add a "blank space" in
# between elements

concatenated_value= ' '.join(value)

print concatenated_value

if record == "":

empty_string_lines += 1
print " Victor Empty string lines = ", empty_string_lines
break
# Get values from table

if key == "OrgID:":
org_id = value

elif key == "OrgName:":
org_name = value

elif record == ' ':
print "Blank line"

elif record == '':
print "END OF FILE"

print "RECORD = ", record

except IndexError:

break
if record == "":
print "EOF", record

elif record == '\0':

print "NULL Characters found"


elif record == "\n":

print "Newline found"

elif record == " ":

print "Blank line found"



# Close file

input_file.close()
 
P

Peter Hickman

Ruben said:
Hello.

I am trying to read a small text file using the readline statement. I
can only read the first 2 records from the file. It stops at the blank
lines or at lines with only spaces. I have a while statement checking
for an empty string "" which I understand represents an EOF in Python.
The text file has some blank lines with spaces and other with blanks.

My brain is not really working but a blank line is not how python processes a
the EOF.

Try something like:

input_file = open("C:/Documents and Settings/ruben/My
Documents/Python/text.txt", "r")

for line in input_file:
print line,
 
P

Pierre Fortin

I am trying to read a small text file using the readline statement. I
can only read the first 2 records from the file. It stops at the blank
lines or at lines with only spaces. I have a while statement checking
for an empty string "" which I understand represents an EOF in Python.
The text file has some blank lines with spaces and other with blanks.

An empty record is not the same as "no record"...

Change this:
while record != "":

to:

while record:
 
P

Peter Hansen

Ruben said:
while record != "":
try:
record = input_file.readline()
key_value_pair = record.split()
key = key_value_pair[0]
value = key_value_pair[1:]
concatenated_value= ' '.join(value)
if record == "":
empty_string_lines += 1
print " Victor Empty string lines = ", empty_string_lines
break
except IndexError:
break

The first time this code reads a line which doesn't
contain whitespace separated records, the record.split()
call will return an empty list, and the next line will
try to retrieve the first element from it, raising an
IndexError, which will terminate the loop.

Your code doesn't seem to be following any of the usual
Python idioms. I suggest starting with the following pattern
instead, and growing the code from there:

input_file = open(...)
try:
for record in input_file:
if record.strip() == '': # blank line, ignore
continue

# record-processing code follows here
key_value_pair = record.split()
key = key_value_pair[0]
... etc....

finally:
input_file.close()


The above pattern will allow the record-processing code
to handle *only* non-blank lines (including lines that
have just empty whitespace), simplify everything immensely.

-Peter
 
P

Peter Hansen

Peter said:
My brain is not really working but a blank line is not how python
processes a the EOF.

Actually, it is when using things like .readline(), which return even
the newline \n at the end of the line...

-Peter
 
L

Larry Bates

I think you were trying to make this a little harder
than it actually is. Try this:

#Open input file to be processed with READ access
input_file = open("C:/pytest.txt", "r")
empty_string_lines=0
#
# Use the fact that you can iterate over a file
# and you don't have to call readline or worry
# with EOF issues. input_file will return a single
# record each time through the loop and fall out
# at EOF.
#
for record in input_file:
record=record.strip() # Strip trailing "\n"
if not record:
empty_string_lines+=1
print "Blank line"
continue

#
# Split words separated by delimiter(TAB) into elements key,value"
# Limit split to first tab and the value is left intact.
#
try: key, value=record.split('\t',1)
except:
print "Bad record skipped, record=", record
print "records must be of format key:<tab>value"
continue

print 'key=', key,' value=',value
#
# Get values from table
#
if key == "OrgID:": org_id = value
elif key == "OrgName:": org_name = value
#
# Do something else with the values here
#

# Close file
print "END OF FILE, empty_string_lines=", empty_string_lines
input_file.close()

Larry Bates
Syscon, Inc.
Ruben said:
Hello.

I am trying to read a small text file using the readline statement. I
can only read the first 2 records from the file. It stops at the blank
lines or at lines with only spaces. I have a while statement checking
for an empty string "" which I understand represents an EOF in Python.
The text file has some blank lines with spaces and other with blanks.

Thanks a lot.

Ruben

The following is the text file: The first line begins with OrgID.
OrgID: Joe S. Smith
OrgName: Smith Foundation




OrgID: Ronald K.Jones
OrgName: Jones Foundation

The following is my script:



#Open input file to be processed with READ access
input_file = open("C:/Documents and Settings/ruben/My
Documents/Python/text.txt", "r")

empty_string_lines = 0


record = input_file.readline()
while record != "":

try:

record = input_file.readline()


# Split words separated by delimiter(TAB) or separated by spaces
# into elements of the list "key_value_pair"


key_value_pair = record.split()

key = key_value_pair[0]

# Slice/delete first element of list "key_value_pair"

value = key_value_pair[1:]

# Join all elements from the list "value" and add a "blank space" in
# between elements

concatenated_value= ' '.join(value)

print concatenated_value

if record == "":

empty_string_lines += 1
print " Victor Empty string lines = ", empty_string_lines
break
# Get values from table

if key == "OrgID:":
org_id = value

elif key == "OrgName:":
org_name = value

elif record == ' ':
print "Blank line"

elif record == '':
print "END OF FILE"

print "RECORD = ", record

except IndexError:

break
if record == "":
print "EOF", record

elif record == '\0':

print "NULL Characters found"


elif record == "\n":

print "Newline found"

elif record == " ":

print "Blank line found"



# Close file

input_file.close()
 
C

Carlos Ribeiro

Ruben

I hope you don't mind what I'm going to say. Your current solution is
a bit confusing, and there are better idioms in Python to solve your
problem. It seems that you either tried to port a program written in
other language such as C, or written this one with a C-like mind.
There are several reasons why we here like Python, but writing C-like
code is not one of them. Said that, I have to warn you that I'm not a
Python expert (there are a few terrific ones around here) and that my
opinion here is given as an non-authoritative advice. Be warned :)

To loop over the lines in a text file, use the following snippet:

input_file = open("C:\\work\\readlines.txt", "r")
for line in input_file.readlines():
print "[",line,"]"

There is no need to do a loop like you did. The loop above will check
all conditions - EOF< empty files, and so on. Now, in order to process
your lines, you need to write something like a state machine. It's
easier done than said. You just have to read line by line, checking
what you have read, and building the complete record as you go. Try
this -- it's heavily commented, but it's very short:

input_file = open("C:\\work\\readlines.txt", "r")

import string

for line in input_file.readlines():
# line may still have the /n line ending marker -- trim it
# it will also remove any extraneous blank space. it's
# not actually mandatory, but helps a little bit if you
# need to print the line and analyze it.
line = line.strip()

# we'll use the split function here because it's simpler
# you can also use regular expressions here, but it's
# slightly more difficult to read first time. Let's keep
# it simple. maxsplit is a keyword parameter that tells
# split to stop after doing finding the first splitting
# position.
try:
field_name, field_value = string.split(line, maxsplit=1)
except:
# if it can't properly split the line in two, it's
# either an invalid record or a blank line. Just
# skip it and continue
continue

if field_name == "OrgID:":
record_id = field_value
if field_name == "OrgName:":
record_value = field_value
# assuming that this is the last value read,
# print the whole record
print record_id, "-", record_value

input_file.close()

The result is:

Joe S. Smith - Smith Foundation
Ronald K.Jones - Jones Foundation

Please note that I purposefully avoided defining new classes here or
using other common Python constructs. The solution could be much more
elegantly written than this, but I still hope to have helped you.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top