Reading newlines from a text file

T

Thomas Philips

I have a data file that I read with readline(), and would like to
control the formats of the lines when they are printed. I have tried
inserting escape sequences into the data file, but am having trouble
getting them to work as I think they should. For example, if my data
file has only one line which reads:
1\n234\n567

I would like to read it with a command of the form
x=datafile.readline()

and I would like
print x

to give me
1
234
567

The readline works like a charm, but the print gives me
1\n234\n567
Clearly the linefeeds are not interpreted as such. How can I get the
Python interpreter to correctly interpret escape sequences in strings
that are read from files?

Thomas Philips
 
P

Peter Otten

Thomas said:
I have a data file that I read with readline(), and would like to
control the formats of the lines when they are printed. I have tried
inserting escape sequences into the data file, but am having trouble
getting them to work as I think they should. For example, if my data
file has only one line which reads:
1\n234\n567

I would like to read it with a command of the form
x=datafile.readline()

and I would like
print x

to give me
1
234
567

The readline works like a charm, but the print gives me
1\n234\n567
Clearly the linefeeds are not interpreted as such. How can I get the
Python interpreter to correctly interpret escape sequences in strings
that are read from files?

I see you are using "correct" as a synonym for "the way I want" :)
The Python interpreter does not interpret data read from a file and you were
in for serious trouble if it would. However, you can process the data in
any way you like, and here's how to replace C-style escape sequences with
the corresponding characters:
.... print line
....
\ntoerichte\nlogik\nboeser\nkobold

\ntoerichte\nlogik\nboeser\nkobold
.... print line
....

toerichte
logik
boeser
kobold


toerichte
logik
boeser
kobold

Peter
 
J

John Roth

Thomas Philips said:
I have a data file that I read with readline(), and would like to
control the formats of the lines when they are printed. I have tried
inserting escape sequences into the data file, but am having trouble
getting them to work as I think they should. For example, if my data
file has only one line which reads:
1\n234\n567

I would like to read it with a command of the form
x=datafile.readline()

and I would like
print x

to give me
1
234
567

The readline works like a charm, but the print gives me
1\n234\n567
Clearly the linefeeds are not interpreted as such. How can I get the
Python interpreter to correctly interpret escape sequences in strings
that are read from files?

Thomas Philips

You've got a couple of misconceptions. If you use a standard
open(<file>, "rt"), Python will only recognize end of line sequences
for your system. Windows, unices and the Mac all have different
conventions, so if you're on Windows and your file has unix or mac
newline sequences, they won't be recognized. They'll come in on
one readline().

The second misconception seems to be that Python will strip
newlines when it reads in line mode. It doesn't. It does convert
the OS dependent newline sequences to a standard \n, but that's
all. Each line read is still has a newline at the end (except the last one,
if it was missing in the file.)

Print, on the other hand, is going to add a newline regardless of
whether one exists in the data.

You can solve the first problem by adding a "U" somewhere in
the open/file call. I'm not sure where, check the docs. To solve
the second problem, you need to do one of several things:

1) if you want to use print, strip the newline from the string
before writing it. The print statement will add it to the end.

2) if all you want is line endings for your system, open with
"wt" and put a /n at the end of each line. Python will take care
of the rest.

3) if you want line endings for a different system, open the file
as 'wb' and insert the proper sequence at the end yourself.

HTH
John Roth
 
T

Thomas Philips

Peter,
|
Very neat. That is exactly what I want it to do. On looking through
the help for open() (or its replacement, file()), I did not see a
reference to a "string_escape" option. Where is this documented (in
language comprehensible to a newbie)?

Also, I notice that you have two backslashes before each n in
"\\ntoerichte\\nlogik\\nboeser\\nkobold\n". Why is there a need for 2
backslashes -would not one have done the trick?


Thomas Philips
 
D

Diez B. Roggisch

Also, I notice that you have two backslashes before each n in
"\\ntoerichte\\nlogik\\nboeser\\nkobold\n". Why is there a need for 2
backslashes -would not one have done the trick?

Two backslashes escape the bacslash - otherwise the text in tmp.txt would
have contained _newlines_, not the \n-escape sequence you wanted. Test
this:

print "foo\nbar"
foo
bar
print "foo\\nbar"
foo\nbar

And the encoding argument is in codecs.open, not in builtin open - so you'll
find it documented in the codecs-module.
 
P

Peter Otten

Thomas said:
Very neat. That is exactly what I want it to do. On looking through

After reading John Roth's post I'm no longer sure. If you want multiple
lines from a file you don't need escape sequences, just use

s = file("tmp.txt", "U").read()

to put it all in a single string. Only if you want to interpret each line in
a file as multiple line strings, the codec hack should be considered. To
get the best solution, you'd rather post the concrete problem you are
trying to solve.
the help for open() (or its replacement, file()), I did not see a
reference to a "string_escape" option. Where is this documented (in
language comprehensible to a newbie)?

The documentation is here (and on the neighbouring pages):

http://www.python.org/doc/current/lib/node127.html

Codecs are probably not the first thing you'll want to learn about Python,
though.
Also, I notice that you have two backslashes before each n in
"\\ntoerichte\\nlogik\\nboeser\\nkobold\n". Why is there a need for 2
backslashes -would not one have done the trick?

I think Diez has already explained that. Another way to find out is to run
my example and then look into tmp.txt. There are actually two lines. Can
you find out why? Hint: the last \n is different.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top