Stripping non-numbers from a file parse without nested lists?

D

daku9999

There has got to be a better way of doing this:

I'm reading in a file that has a lot of garbage, but eventually has
something that looks similar to:
(some lines of garbage)
dip/dir.
(some more lines of garbage)
55/158
(some more lines of garbage)
33/156
etc.

and I'm stripping out the 55/158 values (with error checking
removed):
------
def read_data(filename):
fh = open(filename, "r", encoding="ascii")

for line in fh:
for word in line.lower().split():
if "/" in word and "dip" not in word:
temp = word.partition("/")
dip.append(temp[0])
dir.append(temp[2])
-----

I can't figure out a nicer way of doing it without turning the thing
into a nested list (non-ideal). I could put the entire tuple inside
of a list, but that gets ugly with retrieval. I'm sure there is an
easier way to store this. I was having trouble with dictionary's due
to non-uniquie keys when I tried that route.

Any ideas for a better way to store it? This ultimately parses a
giant amount of data (ascii dxf's) and spits the information into a
csv, and I find the writing of nested lists cumbersome and I'm sure
I'm missing something as I'm quite new to Python.

Thanks.
 
D

daku9999

What you're doing (pace error checking) seems fine for the data
structures that you're using.  I'm not entirely clear what your usage
pattern for "dip" and "dir" is once you've got them, so I can't say
whether there's a more appropriate shape for them.  I am a bit curious
though as to why a nested list is non-ideal?

...
     if "/" in word and "dip" not in word:
        dip_n_dir.append(word.split("/", 1))

is marginally shorter, and has the virtue of making it harder to use
unrelated dip and dir values together.

Rhodri,

Thanks. That works better than what I had before and I learned a new
method of parsing what I was looking for.

Now I'm on to jumping a set number of lines from a given positive
search match:

....(lines of garbage)...
5656 (or some other value I want, but don't explicitly know)
....(18 lines of garbage)...
search object
....(lines of garbage)...

I've tried:

def read_poles(filename):
index = 0
fh = None
try:
fh = open(filename, "r")
lines=fh.readlines()
while True:

if "search object" in lines[index]
poles = int(lines[index-18])
print(poles)

index +=1

except(IndexError): pass

finally:
if fh is not None: # close file
fh.close()

------------------

Which half works. If it's not found, IndexError is caught and passed
(avoids quitting on lines[index out of range]. The print(poles)
properly displays the value I am looking for (_always_ 18 lines before
the search object).

However, since it is assigned using the index variable, the value of
poles doesn't keep (poles is always zero when referenced outside of
the read_poles function). I'm assuming because I'm pointing to a
certain position of an object and once index moves on, it no longer
points to anything valid. My python book suggested using
copy.deepcopy, but that didn't get around the fact I am calling it on
(index-18).

Any experience jumping back (or forward) a set number of lines once a
search object is found? This is the only way I can think of doing it
and it clearly has some problems.

Reading the file line by line using for line in blah works for finding
the search object, but I can't see a way of going back the 18 lines to
grabbing what I need.

Thanks for the help! I'm slowly getting this mangled mess of a file
into something automated (hand investigating the several thousand
files I need to do would be unpleasant).
 
L

Lorenzo

Maybe you can try a regex, something like

------
import re
pattern = re.compile('^(\d+)/(\d+).*')
def read_data(filename):
fh = open(filename, "r", encoding="ascii")

for line in fh:
if pattern.match(line):
dip_,dir_ = pattern.match(line).groups()
dip.append(dip_)
dir.append(dir_)
-----
 
J

jay logan

What you're doing (pace error checking) seems fine for the data
structures that you're using.  I'm not entirely clear what your usage
pattern for "dip" and "dir" is once you've got them, so I can't say
whether there's a more appropriate shape for them.  I am a bit curious
though as to why a nested list is non-ideal?
...
     if "/" in word and "dip" not in word:
        dip_n_dir.append(word.split("/", 1))
is marginally shorter, and has the virtue of making it harder to use
unrelated dip and dir values together.

Rhodri,

Thanks.  That works better than what I had before and I learned a new
method of parsing what I was looking for.

Now I'm on to jumping a set number of lines from a given positive
search match:

...(lines of garbage)...
5656      (or some other value I want, but don't explicitly know)
...(18 lines of garbage)...
search object
...(lines of garbage)...

I've tried:

def read_poles(filename):
  index = 0
  fh = None
  try:
      fh = open(filename, "r")
      lines=fh.readlines()
      while True:

          if "search object" in lines[index]
              poles = int(lines[index-18])
              print(poles)

          index +=1

  except(IndexError): pass

  finally:
      if fh is not None: # close file
          fh.close()

------------------

Which half works.  If it's not found, IndexError is caught and passed
(avoids quitting on lines[index out of range].  The print(poles)
properly displays the value I am looking for (_always_ 18 lines before
the search object).

However, since it is assigned using the index variable, the value of
poles doesn't keep (poles is always zero when referenced outside of
the read_poles function).  I'm assuming because I'm pointing to a
certain position of an object and once index moves on, it no longer
points to anything valid.  My python book suggested using
copy.deepcopy, but that didn't get around the fact I am calling it on
(index-18).

Any experience jumping back (or forward) a set number of lines once a
search object is found?  This is the only way I can think of doing it
and it clearly has some problems.

Reading the file line by line using for line in blah works for finding
the search object, but I can't see a way of going back the 18 lines to
grabbing what I need.

Thanks for the help!  I'm slowly getting this mangled mess of a file
into something automated (hand investigating the several thousand
files I need to do would be unpleasant).

# You could try using a deque holding 18 lines and search using that
deque
# This is untested, but here's a try (>=Python 3.0)
from collections import deque
import itertools as it
import sys


def read_poles(filename):
with open(filename) as f:
line_iter = iter(f)
d = deque(it.islice(line_iter,17), maxlen=18)

for line in line_iter:
d.append(line)

if 'search object' in line:
poles = int(d[0])
print(poles)
return poles
else:
print('No poles found in', filename, file=sys.err)
 
J

jay logan

On Mar 31, 6:47 pm, "Rhodri James" <[email protected]>
wrote:

Thanks.  That works better than what I had before and I learned a new
method of parsing what I was looking for.
Now I'm on to jumping a set number of lines from a given positive
search match:
...(lines of garbage)...
5656      (or some other value I want, but don't explicitly know)
...(18 lines of garbage)...
search object
...(lines of garbage)...
I've tried:
def read_poles(filename):
  index = 0
  fh = None
  try:
      fh = open(filename, "r")
      lines=fh.readlines()
      while True:
          if "search object" in lines[index]
              poles = int(lines[index-18])
              print(poles)
          index +=1
  except(IndexError): pass
  finally:
      if fh is not None: # close file
          fh.close()
------------------

Which half works.  If it's not found, IndexError is caught and passed
(avoids quitting on lines[index out of range].  The print(poles)
properly displays the value I am looking for (_always_ 18 lines before
the search object).
However, since it is assigned using the index variable, the value of
poles doesn't keep (poles is always zero when referenced outside of
the read_poles function).  I'm assuming because I'm pointing to a
certain position of an object and once index moves on, it no longer
points to anything valid.  My python book suggested using
copy.deepcopy, but that didn't get around the fact I am calling it on
(index-18).
Any experience jumping back (or forward) a set number of lines once a
search object is found?  This is the only way I can think of doing it
and it clearly has some problems.
Reading the file line by line using for line in blah works for finding
the search object, but I can't see a way of going back the 18 lines to
grabbing what I need.
Thanks for the help!  I'm slowly getting this mangled mess of a file
into something automated (hand investigating the several thousand
files I need to do would be unpleasant).

# You could try using a deque holding 18 lines and search using that
deque
# This is untested, but here's a try (>=Python 3.0)
from collections import deque
import itertools as it
import sys

def read_poles(filename):
    with open(filename) as f:
        line_iter = iter(f)
        d = deque(it.islice(line_iter,17), maxlen=18)

        for line in line_iter:
            d.append(line)

            if 'search object' in line:
                poles = int(d[0])
                print(poles)
                return poles
        else:
            print('No poles found in', filename, file=sys.err)

Notice that I returned the "pole" from the function so you could catch
the return value as follows:
pole = read_poles(filename)

if pole is None:
# no poles found
else:
print('Function returned this pole:', pole)

If you need a list of poles, then return a list:


def read_poles(filename):
all_poles = []
with open(filename) as f:
line_iter = iter(f)
d = deque(it.islice(line_iter,17), maxlen=18)

for line in line_iter:
d.append(line)

if 'search object' in line:
all_poles.append(int(d[0]))
return all_poles


....
poles = read_poles(filename)

if poles:
print('Here are the poles:\n', '\n'.join(map(str,poles)))
else:
print('There were no poles found in', filename)
 
D

daku9999

On Apr 1, 2:35 am, (e-mail address removed) wrote:
On Mar 31, 6:47 pm, "Rhodri James" <[email protected]>
wrote:
What you're doing (pace error checking) seems fine for the data
structures that you're using.  I'm not entirely clear what your usage
pattern for "dip" and "dir" is once you've got them, so I can't say
whether there's a more appropriate shape for them.  I am a bit curious
though as to why a nested list is non-ideal?
...
     if "/" in word and "dip" not in word:
        dip_n_dir.append(word.split("/", 1))
is marginally shorter, and has the virtue of making it harder to use
unrelated dip and dir values together.
--
Rhodri James *-* Wildebeeste Herder to the Masses
Rhodri,
Thanks.  That works better than what I had before and I learned a new
method of parsing what I was looking for.
Now I'm on to jumping a set number of lines from a given positive
search match:
...(lines of garbage)...
5656      (or some other value I want, but don't explicitly know)
...(18 lines of garbage)...
search object
...(lines of garbage)...
I've tried:
def read_poles(filename):
  index = 0
  fh = None
  try:
      fh = open(filename, "r")
      lines=fh.readlines()
      while True:
          if "search object" in lines[index]
              poles = int(lines[index-18])
              print(poles)
          index +=1
  except(IndexError): pass
  finally:
      if fh is not None: # close file
          fh.close()
------------------
Which half works.  If it's not found, IndexError is caught and passed
(avoids quitting on lines[index out of range].  The print(poles)
properly displays the value I am looking for (_always_ 18 lines before
the search object).
However, since it is assigned using the index variable, the value of
poles doesn't keep (poles is always zero when referenced outside of
the read_poles function).  I'm assuming because I'm pointing to a
certain position of an object and once index moves on, it no longer
points to anything valid.  My python book suggested using
copy.deepcopy, but that didn't get around the fact I am calling it on
(index-18).
Any experience jumping back (or forward) a set number of lines once a
search object is found?  This is the only way I can think of doing it
and it clearly has some problems.
Reading the file line by line using for line in blah works for finding
the search object, but I can't see a way of going back the 18 lines to
grabbing what I need.
Thanks for the help!  I'm slowly getting this mangled mess of a file
into something automated (hand investigating the several thousand
files I need to do would be unpleasant).
# You could try using a deque holding 18 lines and search using that
deque
# This is untested, but here's a try (>=Python 3.0)
from collections import deque
import itertools as it
import sys
def read_poles(filename):
    with open(filename) as f:
        line_iter = iter(f)
        d = deque(it.islice(line_iter,17), maxlen=18)
        for line in line_iter:
            d.append(line)
            if 'search object' in line:
                poles = int(d[0])
                print(poles)
                return poles
        else:
            print('No poles found in', filename, file=sys..err)

Notice that I returned the "pole" from the function so you could catch
the return value as follows:
pole = read_poles(filename)

if pole is None:
    # no poles found
else:
    print('Function returned this pole:', pole)

If you need a list of poles, then return a list:

def read_poles(filename):
    all_poles = []
    with open(filename) as f:
        line_iter = iter(f)
        d = deque(it.islice(line_iter,17), maxlen=18)

        for line in line_iter:
            d.append(line)

            if 'search object' in line:
                all_poles.append(int(d[0]))
    return all_poles

...
poles = read_poles(filename)

if poles:
    print('Here are the poles:\n', '\n'.join(map(str,poles)))
else:
    print('There were no poles found in', filename)


I think I found an easier (if possibly uglier way) of doing it:

for filenames in files.split():
try:
fh = open(filenames.replace("/","\\"),"r")
lines=fh.readlines()
except(IOError) as err:
print(filename, err)
finally:
if fh is not None:
fh.close()
print(read_poles4(lines))

.... which opens my file (always < 10 megs) into the list lines

def read_poles4(lines):
try:
poles = lines[(lines.index("Poles Plotted\n") - 18)].rstrip()
return poles
except ValueError as err:
return err

....

Seems like the simpler solution, at least for small files where I can
hold the entire thing in memory.

Thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top