Stripping non-numbers from a file parse without nested lists?

daku9999 · Mar 31, 2009

There has got to be a better way of doing this:

I'm reading in a file that has a lot of garbage, but eventually has
something that looks similar to:
(some lines of garbage)
dip/dir.
(some more lines of garbage)
55/158
(some more lines of garbage)
33/156
etc.

and I'm stripping out the 55/158 values (with error checking
removed):
ï»¿------
ï»¿ï»¿ï»¿ï»¿def read_data(filename):
fh = open(filename, "r", encoding="ascii")

for line in fh:
for word in line.lower().split():
if "/" in word and "dip" not in word:
temp = word.partition("/")
dip.append(temp[0])
dir.append(temp[2])
-----

I can't figure out a nicer way of doing it without turning the thing
into a nested list (non-ideal). I could put the entire tuple inside
of a list, but that gets ugly with retrieval. I'm sure there is an
easier way to store this. I was having trouble with dictionary's due
to non-uniquie keys when I tried that route.

Any ideas for a better way to store it? This ultimately parses a
giant amount of data (ascii dxf's) and spits the information into a
csv, and I find the writing of nested lists cumbersome and I'm sure
I'm missing something as I'm quite new to Python.

Thanks.

daku9999 · Apr 1, 2009

What you're doing (pace error checking) seems fine for the data
structures that you're using. I'm not entirely clear what your usage
pattern for "dip" and "dir" is once you've got them, so I can't say
whether there's a more appropriate shape for them. I am a bit curious
though as to why a nested list is non-ideal?

...
if "/" in word and "dip" not in word:
dip_n_dir.append(word.split("/", 1))

is marginally shorter, and has the virtue of making it harder to use
unrelated dip and dir values together.

Rhodri,

Thanks. That works better than what I had before and I learned a new
method of parsing what I was looking for.

Now I'm on to jumping a set number of lines from a given positive
search match:

....(lines of garbage)...
5656 (or some other value I want, but don't explicitly know)
....(18 lines of garbage)...
search object
....(lines of garbage)...

I've tried:

def read_poles(filename):
index = 0
fh = None
try:
fh = open(filename, "r")
lines=fh.readlines()
while True:

if "search object" in lines[index]
poles = int(lines[index-18])
print(poles)

index +=1

except(IndexError): pass

finally:
if fh is not None: # close file
fh.close()

------------------

Which half works. If it's not found, IndexError is caught and passed
(avoids quitting on lines[index out of range]. The print(poles)
properly displays the value I am looking for (_always_ 18 lines before
the search object).

However, since it is assigned using the index variable, the value of
poles doesn't keep (poles is always zero when referenced outside of
the read_poles function). I'm assuming because I'm pointing to a
certain position of an object and once index moves on, it no longer
points to anything valid. My python book suggested using
copy.deepcopy, but that didn't get around the fact I am calling it on
(index-18).

Any experience jumping back (or forward) a set number of lines once a
search object is found? This is the only way I can think of doing it
and it clearly has some problems.

Reading the file line by line using for line in blah works for finding
the search object, but I can't see a way of going back the 18 lines to
grabbing what I need.

Thanks for the help! I'm slowly getting this mangled mess of a file
into something automated (hand investigating the several thousand
files I need to do would be unpleasant).

Lorenzo · Apr 1, 2009

Maybe you can try a regex, something like

------
import re
pattern = re.compile('^(\d+)/(\d+).*')
ï»¿ï»¿ï»¿ï»¿def read_data(filename):
fh = open(filename, "r", encoding="ascii")

for line in fh:
if pattern.match(line):
dip_,dir_ = pattern.match(line).groups()
dip.append(dip_)
dir.append(dir_)
-----

jay logan · Apr 1, 2009

What you're doing (pace error checking) seems fine for the data
structures that you're using. I'm not entirely clear what your usage
pattern for "dip" and "dir" is once you've got them, so I can't say
whether there's a more appropriate shape for them. I am a bit curious
though as to why a nested list is non-ideal?

Click to expand...

...
if "/" in word and "dip" not in word:
dip_n_dir.append(word.split("/", 1))

Click to expand...

is marginally shorter, and has the virtue of making it harder to use
unrelated dip and dir values together.

Click to expand...

Rhodri,

Thanks. That works better than what I had before and I learned a new
method of parsing what I was looking for.

Now I'm on to jumping a set number of lines from a given positive
search match:

...(lines of garbage)...
5656 (or some other value I want, but don't explicitly know)
...(18 lines of garbage)...
search object
...(lines of garbage)...

I've tried:

def read_poles(filename):
index = 0
fh = None
try:
fh = open(filename, "r")
lines=fh.readlines()
while True:

if "search object" in lines[index]
poles = int(lines[index-18])
print(poles)

index +=1

except(IndexError): pass

finally:
if fh is not None: # close file
fh.close()

------------------

Which half works. If it's not found, IndexError is caught and passed
(avoids quitting on lines[index out of range]. The print(poles)
properly displays the value I am looking for (_always_ 18 lines before
the search object).

However, since it is assigned using the index variable, the value of
poles doesn't keep (poles is always zero when referenced outside of
the read_poles function). I'm assuming because I'm pointing to a
certain position of an object and once index moves on, it no longer
points to anything valid. My python book suggested using
copy.deepcopy, but that didn't get around the fact I am calling it on
(index-18).

Any experience jumping back (or forward) a set number of lines once a
search object is found? This is the only way I can think of doing it
and it clearly has some problems.

Reading the file line by line using for line in blah works for finding
the search object, but I can't see a way of going back the 18 lines to
grabbing what I need.

Thanks for the help! I'm slowly getting this mangled mess of a file
into something automated (hand investigating the several thousand
files I need to do would be unpleasant).

# You could try using a deque holding 18 lines and search using that
deque
# This is untested, but here's a try (>=Python 3.0)
from collections import deque
import itertools as it
import sys

def read_poles(filename):
with open(filename) as f:
line_iter = iter(f)
d = deque(it.islice(line_iter,17), maxlen=18)

for line in line_iter:
d.append(line)

if 'search object' in line:
poles = int(d[0])
print(poles)
return poles
else:
print('No poles found in', filename, file=sys.err)

jay logan · Apr 1, 2009

On Mar 31, 6:47 pm, "Rhodri James" <[email protected]>
wrote:

Thanks. That works better than what I had before and I learned a new
method of parsing what I was looking for.

Click to expand...

Now I'm on to jumping a set number of lines from a given positive
search match:

Click to expand...

...(lines of garbage)...
5656 (or some other value I want, but don't explicitly know)
...(18 lines of garbage)...
search object
...(lines of garbage)...

Click to expand...

I've tried:

Click to expand...

def read_poles(filename):
index = 0
fh = None
try:
fh = open(filename, "r")
lines=fh.readlines()
while True:

Click to expand...

if "search object" in lines[index]
poles = int(lines[index-18])
print(poles)

Click to expand...

index +=1

Click to expand...

except(IndexError): pass

Click to expand...

finally:
if fh is not None: # close file
fh.close()

------------------

Click to expand...

Which half works. If it's not found, IndexError is caught and passed
(avoids quitting on lines[index out of range]. The print(poles)
properly displays the value I am looking for (_always_ 18 lines before
the search object).

Click to expand...

However, since it is assigned using the index variable, the value of
poles doesn't keep (poles is always zero when referenced outside of
the read_poles function). I'm assuming because I'm pointing to a
certain position of an object and once index moves on, it no longer
points to anything valid. My python book suggested using
copy.deepcopy, but that didn't get around the fact I am calling it on
(index-18).

Click to expand...

Any experience jumping back (or forward) a set number of lines once a
search object is found? This is the only way I can think of doing it
and it clearly has some problems.

Click to expand...

Reading the file line by line using for line in blah works for finding
the search object, but I can't see a way of going back the 18 lines to
grabbing what I need.

Click to expand...

Thanks for the help! I'm slowly getting this mangled mess of a file
into something automated (hand investigating the several thousand
files I need to do would be unpleasant).

Click to expand...

# You could try using a deque holding 18 lines and search using that
deque
# This is untested, but here's a try (>=Python 3.0)
from collections import deque
import itertools as it
import sys

def read_poles(filename):
with open(filename) as f:
line_iter = iter(f)
d = deque(it.islice(line_iter,17), maxlen=18)

for line in line_iter:
d.append(line)

if 'search object' in line:
poles = int(d[0])
print(poles)
return poles
else:
print('No poles found in', filename, file=sys.err)

Notice that I returned the "pole" from the function so you could catch
the return value as follows:
pole = read_poles(filename)

if pole is None:
# no poles found
else:
print('Function returned this pole:', pole)

If you need a list of poles, then return a list:

def read_poles(filename):
all_poles = []
with open(filename) as f:
line_iter = iter(f)
d = deque(it.islice(line_iter,17), maxlen=18)

for line in line_iter:
d.append(line)

if 'search object' in line:
all_poles.append(int(d[0]))
return all_poles

....
poles = read_poles(filename)

if poles:
print('Here are the poles:\n', '\n'.join(map(str,poles)))
else:
print('There were no poles found in', filename)

daku9999 · Apr 1, 2009

On Apr 1, 2:35 am, (e-mail address removed) wrote:

On Mar 31, 6:47 pm, "Rhodri James" <[email protected]>
wrote:
What you're doing (pace error checking) seems fine for the data
structures that you're using. I'm not entirely clear what your usage
pattern for "dip" and "dir" is once you've got them, so I can't say
whether there's a more appropriate shape for them. I am a bit curious
though as to why a nested list is non-ideal?
...
if "/" in word and "dip" not in word:
dip_n_dir.append(word.split("/", 1))
is marginally shorter, and has the virtue of making it harder to use
unrelated dip and dir values together.
--
Rhodri James *-* Wildebeeste Herder to the Masses
Rhodri,
Thanks. That works better than what I had before and I learned a new
method of parsing what I was looking for.
Now I'm on to jumping a set number of lines from a given positive
search match:
...(lines of garbage)...
5656 (or some other value I want, but don't explicitly know)
...(18 lines of garbage)...
search object
...(lines of garbage)...
I've tried:
def read_poles(filename):
index = 0
fh = None
try:
fh = open(filename, "r")
lines=fh.readlines()
while True:
if "search object" in lines[index]
poles = int(lines[index-18])
print(poles)
index +=1
except(IndexError): pass
finally:
if fh is not None: # close file
fh.close()
------------------
Which half works. If it's not found, IndexError is caught and passed
(avoids quitting on lines[index out of range]. The print(poles)
properly displays the value I am looking for (_always_ 18 lines before
the search object).
However, since it is assigned using the index variable, the value of
poles doesn't keep (poles is always zero when referenced outside of
the read_poles function). I'm assuming because I'm pointing to a
certain position of an object and once index moves on, it no longer
points to anything valid. My python book suggested using
copy.deepcopy, but that didn't get around the fact I am calling it on
(index-18).
Any experience jumping back (or forward) a set number of lines once a
search object is found? This is the only way I can think of doing it
and it clearly has some problems.
Reading the file line by line using for line in blah works for finding
the search object, but I can't see a way of going back the 18 lines to
grabbing what I need.
Thanks for the help! I'm slowly getting this mangled mess of a file
into something automated (hand investigating the several thousand
files I need to do would be unpleasant).

Click to expand...

Click to expand...

# You could try using a deque holding 18 lines and search using that
deque
# This is untested, but here's a try (>=Python 3.0)
from collections import deque
import itertools as it
import sys

Click to expand...

def read_poles(filename):
with open(filename) as f:
line_iter = iter(f)
d = deque(it.islice(line_iter,17), maxlen=18)

Click to expand...

for line in line_iter:
d.append(line)

Click to expand...

if 'search object' in line:
poles = int(d[0])
print(poles)
return poles
else:
print('No poles found in', filename, file=sys..err)

Click to expand...

Notice that I returned the "pole" from the function so you could catch
the return value as follows:
pole = read_poles(filename)

if pole is None:
# no poles found
else:
print('Function returned this pole:', pole)

If you need a list of poles, then return a list:

def read_poles(filename):
all_poles = []
with open(filename) as f:
line_iter = iter(f)
d = deque(it.islice(line_iter,17), maxlen=18)

for line in line_iter:
d.append(line)

if 'search object' in line:
all_poles.append(int(d[0]))
return all_poles

...
poles = read_poles(filename)

if poles:
print('Here are the poles:\n', '\n'.join(map(str,poles)))
else:
print('There were no poles found in', filename)

I think I found an easier (if possibly uglier way) of doing it:

for filenames in files.split():
try:
fh = open(filenames.replace("/","\\"),"r")
lines=fh.readlines()
except(IOError) as err:
print(filename, err)
finally:
if fh is not None:
fh.close()
print(read_poles4(lines))

.... which opens my file (always < 10 megs) into the list lines

def read_poles4(lines):
try:
poles = lines[(lines.index("Poles Plotted\n") - 18)].rstrip()
return poles
except ValueError as err:
return err

....

Seems like the simpler solution, at least for small files where I can
hold the entire thing in memory.

Thanks!

stripping non-numeric data from a string	7	May 10, 2005
non-static method cannot be referenced from a static context	2	Jan 31, 2010
Refactoring; arbitrary expression in lists	15	Jan 12, 2005
way to remove all non-ascii characters from a file?	5	Feb 13, 2004
Shrinky-dink Python (also, non-Unicode Python build is broken)	10	Jan 16, 2006
read a file's line into an array, error: makes pointer from integer without a cast	6	Apr 5, 2007
PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011
PyWart: PEP8: A cauldron of inconsistencies.	7	Jul 27, 2011

Stripping non-numbers from a file parse without nested lists?

daku9999

daku9999

Lorenzo

jay logan

jay logan

daku9999

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads