Reading files, splitting on a delimiter and newlines.

  • Thread starter Bruno Desthuilliers
  • Start date
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!

data = {}
key = None
for line in open('yourfile.txt'):
line = line.strip()
if not line:
# skip empty lines
continue
if '=' in line:
key, value = map(str.strip, line.split('=', 1))
data[key] = value
elif key is None:
# first line without a '='
raise ValueError("invalid format")
else:
# multiline
data[key] += "\n" + line


print data
=> {'myValue3': 'contents of value3', 'myValue2': 'contents of value2
but\nwith a new line here', 'myValue1': 'contents of value1'}

HTH
 
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
Check the length of the list returned from split; this allows
your to append to the previously extracted value if need be.

import StringIO
import pprint

buf = """\
myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3
"""

mockfile = StringIO.StringIO(buf)

record=dict()

for line in mockfile:
kvpair = line.split('=', 2)

You want :
kvpair = line.split('=', 1)
['x ', ' 42 ', ' 33']

if len(kvpair) == 2:
key, value = kvpair
record[key] = value
else:
record[key] += line

Also, this won't handle the case where the first line doesn't contain an
'=' (NameError, name 'key' is not defined)
 
C

chrispwd

Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!
 
K

kyosohma

Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!

I'm confused. You don't want the newline to be present, but you can't
remove it because the data has to stay intact? If you don't want to
change it, then what's the problem?

Mike
 
S

Stargaming

I'm confused. You don't want the newline to be present, but you can't
remove it because the data has to stay intact? If you don't want to
change it, then what's the problem?

Mike

It's obviously that simple line-by-line filtering won't handle multi-line
statements.

You could solve that by saving the last item you added something to and,
if the line currently handles doesn't look like an assignment, append it
to this item. You might run into problems with such data:

foo = modern maths
proved that 1 = 1
bar = single

If your dataset always has indendation on subsequent lines, you might use
this. Or if the key's name is always just one word.

HTH,
Stargaming
 
J

John Machin

It's obviously that simple line-by-line filtering won't handle multi-line
statements.

You could solve that by saving the last item you added something to and,
if the line currently handles doesn't look like an assignment, append it
to this item. You might run into problems with such data:

foo = modern maths
proved that 1 = 1
bar = single

If your dataset always has indendation on subsequent lines, you might use
this. Or if the key's name is always just one word.

My take: all of the above, plus: Given that you want to extract stuff
of the form <LHS> = <RHS> I'd suggest developing a fairly precise
regular expression for LHS, maybe even for RHS, and trying this on as
many of these files as you can.

Why an RE for RHS? Consider:

foo = somebody said "I think that
REs = trouble
maybe_better = pyparsing"

:)
 
A

attn.steven.kuo

Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!



Check the length of the list returned from split; this allows
your to append to the previously extracted value if need be.

import StringIO
import pprint

buf = """\
myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3
"""

mockfile = StringIO.StringIO(buf)

record=dict()

for line in mockfile:
kvpair = line.split('=', 2)
if len(kvpair) == 2:
key, value = kvpair
record[key] = value
else:
record[key] += line

pprint.pprint(record)

# lstrip() to remove newlines if needed ...
 
C

chrispwd

I have a situation where I have a file that contains text similar to:
myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3
My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).
After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.
In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.
I'm a bit confused as how to go about getting this to work.
Any suggestions on an approach would be greatly appreciated!

Check the length of the list returned from split; this allows
your to append to the previously extracted value if need be.

import StringIO
import pprint

buf = """\
myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3
"""

mockfile = StringIO.StringIO(buf)

record=dict()

for line in mockfile:
kvpair = line.split('=', 2)
if len(kvpair) == 2:
key, value = kvpair
record[key] = value
else:
record[key] += line

pprint.pprint(record)

# lstrip() to remove newlines if needed ...

Great thank you! That was the logic I was looking for.
 
H

Hendrik van Rooyen

: said:
I'm confused. You don't want the newline to be present, but you can't
remove it because the data has to stay intact? If you don't want to
change it, then what's the problem?

I think the OP's trouble is that the value he wants gets split up by the
newline at the end of the line when he uses readline().

One can try adding the single value to the previous value in the previous
key/value pair when the split does not yield two values - a bit hackish,
but given structured input data it might work.

- Hendrik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top