help on Implementing a list of dicts with no data pattern

R

rlelis

Hi guys,

I'm working on this long file, where i have to keep reading and
storing different excerpts of text (data) in different variables (list).

Once done that i want to store in dicts the data i got from the lists mentioned before. I want them on a list of dicts for later RDBMs purpose's.

The data i'm working with, don't have fixed pattern (see example bellow), so what i'm doing is for each row, i want to store combinations of word/value (Key-value) to keep track of all the data.

My problem is that once i'm iterating over the list (original one a.k.a file_content in the link), then i'm nesting several if clause to match
the keys i want. Done that i select the keys i want to give them values and lastly i append that dict into a new list. The problem here is that i end up always with the last line repeated several times for each row it found's.

Please take a look on what i have now:
http://pastebin.com/A9eka7p9
 
D

Dave Angel


Please read this http://wiki.python.org/moin/GoogleGroupsPython.

I'm working on this long file, where i have to keep reading and
storing different excerpts of text (data) in different variables (list).

Once done that i want to store in dicts the data i got from the lists mentioned before. I want them on a list of dicts for later RDBMs purpose's.

The data i'm working with, don't have fixed pattern (see example bellow), so what i'm doing is for each row, i want to store combinations of word/value (Key-value) to keep track of all the data.

My problem is that once i'm iterating over the list (original one a.k.a file_content in the link), then i'm nesting several if clause to match
the keys i want. Done that i select the keys i want to give them values and lastly i append that dict into a new list. The problem here is that i end up always with the last line repeated several times for each row it found's.

Please take a look on what i have now:
http://pastebin.com/A9eka7p9

If you want a response here, please post the code here. If it's over 50
lines, simplify it first. And be sure and tell us what version of
Python and what other dependencies you might have. And what OS you're
targeting.
 
M

MRAB

Hi guys,

I'm working on this long file, where i have to keep reading and
storing different excerpts of text (data) in different variables (list).

Once done that i want to store in dicts the data i got from the lists mentioned before. I want them on a list of dicts for later RDBMs purpose's.

The data i'm working with, don't have fixed pattern (see example bellow), so what i'm doing is for each row, i want to store combinations of word/value (Key-value) to keep track of all the data.

My problem is that once i'm iterating over the list (original one a.k.a file_content in the link), then i'm nesting several if clause to match
the keys i want. Done that i select the keys i want to give them values and lastly i append that dict into a new list. The problem here is that i end up always with the last line repeated several times for each row it found's.

Please take a look on what i have now:
http://pastebin.com/A9eka7p9
You're creating a dict for highway_dict and a dict for aging_dict, and
then using those dicts for every iteration of the 'for' loop.

You're also appending both of the dicts onto the 'queue_row' list for
every iteration of the 'for' loop.

I think that what you meant to do was, for each match, to create a
dict, populate it, and then append it to the list.
 
R

rlelis

Hi guys,



I'm working on this long file, where i have to keep reading and

storing different excerpts of text (data) in different variables (list).



Once done that i want to store in dicts the data i got from the lists mentioned before. I want them on a list of dicts for later RDBMs purpose's.



The data i'm working with, don't have fixed pattern (see example bellow), so what i'm doing is for each row, i want to store combinations of word/value (Key-value) to keep track of all the data.



My problem is that once i'm iterating over the list (original one a.k.a file_content in the link), then i'm nesting several if clause to match

the keys i want. Done that i select the keys i want to give them values and lastly i append that dict into a new list. The problem here is that i end up always with the last line repeated several times for each row it found's.



Please take a look on what i have now:

http://pastebin.com/A9eka7p9

Sorry, i thought that a link to pastebin could be helpfully since it captures the syntax highlights and spacings. I don't have a fifty line code there. The 25 lines below, where to show you guys a picture of what is going on, to be more intuitive.
This is what i have for now:

highway_dict = {}
aging_dict = {}
queue_row = []
for content in file_content:
if 'aging' in content:
# aging 0 100
collumns = ''.join(map(str, content[:1])).replace('-','_').lower()
total_values =''.join(map(str, content[1:2]))
aging_values = ''.join(map(str, content[2:]))

aging_dict['total'], aging_dict[collumns] = total, aging_values
queue_row.append(aging_dict)

if 'highway' in content:
#highway | 4 | disable | 25
collumns = ''.join(map(str, content[:1])).replace('-','_').lower()
lanes_values =''.join(map(str, content[1:2]))
state_values = ''.join(map(str, content[2:3])).strip('')
limit_values = ''.join(map(str, content[3:4])).strip('')

highway_dict['lanes'], highway_dict['state'], highway_dict['limit(mph)'] = lanes, state, limit_values
queue_row.append(highway_dict)
 
D

Dave Angel

Sorry, i thought that a link to pastebin could be helpfully since it captures the syntax highlights and spacings. I don't have a fifty line code there. The 25 lines below, where to show you guys a picture of what is going on, to be more intuitive.
This is what i have for now:

The entire following set of comments is probably outdated since you
apparently did NOT use readlines() or equivalent to get file_content.
So you'd better give us some sample data, a program that can actually
run without getting exceptions due to misnamed variables, and a
description of just what you expected to be in each result variable.

It'd also be smart to mention what version of Python you're targeting.

..... what follows was a waste of my time ...

file_content is not defined, but we can guess you have read it from a
text file with readlines(), or more efficiently that it's simply a file
object for a file opened with "r". Can we see sample data, maybe for 3
or four lines?

file_content = [
"A4 value2 aging",
"b8 value99 paging",
"-1 this is aging a test",
"B2 repeaagingts",
]

The sample, or the description, should indicate if repeats of the
"columns" column are allowed, as with b and B above.
highway_dict = {}
aging_dict = {}
queue_row = []
for content in file_content:
if 'aging' in content:
# aging 0 100
collumns = ''.join(map(str, content[:1])).replace('-','_').lower()
total_values =''.join(map(str, content[1:2]))
aging_values = ''.join(map(str, content[2:]))

Those three lines would be much more reasonable and readable if you
eliminated all the list stuff, and just did what was needed. Also,
calling a one-character string "collumns" or "total_values" makes no
sense to me.

collumns = content[:1].replace('-','_').lower()
total_values = content[1:2]
aging_values = content[2:]
aging_dict['total'], aging_dict[collumns] = total, aging_values

That line tries to get clever, and ends up obscuring what's really
happening. Further, the value in total, if any is NOT what you just
extracted in total_values.
aging_dict['total'] = total
aging_dict[collumns] = aging_values

queue_row.append(aging_dict)

Just what do you expect to be in the aging_dict here? If you intended
that each item of queue_row contains a dict with just one item, then you
need to clear aging_dict each time through the loop. As it stands the
list ends up with a bunch of dicts, each with possibly one more entry
than the previous dict.

All the same remarks apply to the following code. Additionally, you
don't use collumns for anything, and you use lanes and state when you
presumably meant lanes_values and state_values.
if 'highway' in content:

#highway | 4 | disable | 25
collumns = ''.join(map(str, content[:1])).replace('-','_').lower()
lanes_values =''.join(map(str, content[1:2]))
state_values = ''.join(map(str, content[2:3])).strip('')
limit_values = ''.join(map(str, content[3:4])).strip('')

highway_dict['lanes'], highway_dict['state'], highway_dict['limit(mph)'] = lanes, state, limit_values
queue_row.append(highway_dict)

Now, when
 
N

Neil Cerutti

This is what i have for now:

highway_dict = {}
aging_dict = {}
queue_row = []
for content in file_content:
if 'aging' in content:
# aging 0 100
collumns = ''.join(map(str, content[:1])).replace('-','_').lower()
total_values =''.join(map(str, content[1:2]))
aging_values = ''.join(map(str, content[2:]))

aging_dict['total'], aging_dict[collumns] = total, aging_values
queue_row.append(aging_dict)

if 'highway' in content:
#highway | 4 | disable | 25
collumns = ''.join(map(str, content[:1])).replace('-','_').lower()
lanes_values =''.join(map(str, content[1:2]))
state_values = ''.join(map(str, content[2:3])).strip('')
limit_values = ''.join(map(str, content[3:4])).strip('')

highway_dict['lanes'], highway_dict['state'], highway_dict['limit(mph)'] = lanes, state, limit_values
queue_row.append(highway_dict)

Can you provide a short example of input and what you had hoped
to see in the lists and dicts at the end?
 
R

rlelis

I apologize once again.
Is my first post here and i'm getting used to the group as long as i get the feedback of my errors by you guys.
I'm using Python 2.7.3 with no dependencies, i'm simply using the standard
library.
Here is the "big picture" of the scenario(i have added it in the pastebin link too)

FILE OUTPUT DESIRED:
aging:
aging |total |age
aging |0 |100
aging |2 |115
aging |3 |1
aging |4 |10

highway:
highway | lanes | state | limit(mph)
highway | 4 | disable | 25
highway | 2 | disable | 245
highway | 3 | disable | 125
highway | 2 | enable | 255
highway | 3 | disable | 212
highway | 8 | disable | 78

FILE INPUT EXCERPT EXAMPLE:
aging 0 100
aging 2 115
aging 3 1
highway 4 disable 25
highway 2 disable 245
highway 0 enable 125

Meanwhile i have change the code a little bit and achieve a output closer to what i want:
highway_dict = {}
aging_dict = {}
queue_counters={}
queue_row = []
for content in file_content:
if 'aging' in content:
# aging 0 100
columns = ', '.join(map(str, content[:1])).replace('-','_').lower()
total_values =''.join(map(str, content[1:2]))
aging_values = '\t'.join(map(str, content[2:]))

aging_dict['total'], aging_dict[columns] = total, aging_values
queue_counters[columns] = aging_dict
if 'highway' in content:
#highway | 4 | disable | 25
columns = ''.join(map(str, content[:1])).replace('-','_')..lower()
lanes_values =''.join(map(str, content[1:2]))
state_values = ''.join(map(str, content[2:3])).strip('')
limit_values = ''.join(map(str, content[3:4])).strip('')

highway_dict['lanes'], highway_dict['state'], highway_dict['limit(mph)'] = lanes, state, limit_values
queue_counters[columns] = highway_dict
queue_row.append(queue_counters)

Now i'm adding the different dicts to a "main" one (queue_counters). The problem here is that i'm keeping falling on the iteration issue. I only get the last row on my ouput. My last printout was:

queue_counters: {'aging': {'age': '10', 'total': '4'}, 'highway': {'lanes': '8','state': 'disable', 'limit': '78'}}

@Dave Angel

"The sample, or the description, should indicate if repeats of the
"columns" column are allowed, as with b and B above. "
- Yes the columns repetition are allowed.

"That line tries to get clever, and ends up obscuring what's really
happening. Further, the value in total, if any is NOT what you just
extracted in total_values."
- this variable name total_values means that i'm storing the content (different values)of the total column, and the same applies to the other columns (might
not be the best names, but don't forget that we are just prototyping here).The total, age, etc etc variable name i had to set them, once they don't come with the original file, but is important to give them names to help on RDBMs purposes later, and not only that, it's handy right?

The column variable refers to the object name (aging and highway). I'm doing that because in the original source i have to deal with string formattingof
strange names. Remember that this is for prototyping, that's why i'm tryingto
resume things here.
 
D

Dave Angel

I apologize once again.
Is my first post here and i'm getting used to the group as long as i get the feedback of my errors by you guys.
I'm using Python 2.7.3 with no dependencies, i'm simply using the standard
library.
Here is the "big picture" of the scenario(i have added it in the pastebin link too)

FILE OUTPUT DESIRED:
aging:
aging |total |age
aging |0 |100
aging |2 |115
aging |3 |1
aging |4 |10

highway:
highway | lanes | state | limit(mph)
highway | 4 | disable | 25
highway | 2 | disable | 245
highway | 3 | disable | 125
highway | 2 | enable | 255
highway | 3 | disable | 212
highway | 8 | disable | 78

FILE INPUT EXCERPT EXAMPLE:
aging 0 100
aging 2 115
aging 3 1
highway 4 disable 25
highway 2 disable 245
highway 0 enable 125

Meanwhile i have change the code a little bit and achieve a output closer to what i want:

You're still missing the file read code. My earlier assumption that it
was a simple readlines() was bogus, or the line:
if "aging" in file-content:

would never work.

Perhaps you have something like:

infile = open("xxxx","r")
file_content = [line.split() for line in infile]

if you confirm it, I'll try to go through the code again, trying to make
sense of it.
 
R

rlelis

On Thursday, May 9, 2013 12:47:47 AM UTC+1, rlelis wrote:
@Dave Angel
this is how i mange to read and store the data in file.
data = []
# readdata
f = open(source_file, 'r')
for line in f:
header = (line.strip()).lower()
# conditions(if/else clauses) on the header content to filter desired data
data.append(header)
 
D

Dave Angel

On Thursday, May 9, 2013 12:47:47 AM UTC+1, rlelis wrote:
@Dave Angel
this is how i mange to read and store the data in file.
data = []
# readdata
f = open(source_file, 'r')
for line in f:
header = (line.strip()).lower()
# conditions(if/else clauses) on the header content to filter desired data
data.append(header)

From earlier message:
highway_dict = {}
aging_dict = {}
queue_counters={}
queue_row = []
for content in file_content:
if 'aging' in content:

So your data is a list of strings, each representing one line of the
file. When you run this, you'll get something like:

NameError: name 'file_content' is not defined

If I assume you've just neglected to include the assignment,
file_content = data, then the next problem is

if 'aging' in content:

will never fire. Likewise
if 'highway' in content:

will never fire. So the rest of the code is irrelevant.
 
R

rlelis

On Thursday, May 9, 2013 7:19:38 PM UTC+1, Dave Angel wrote:

Yes it's a list of string. I don't get the NameError: name 'file_content' is not defined in my code.

After i appended the headers i wanted to cut the data list it little bit more because there was some data (imagine some other collumns) to the left that didn't needed.

file_content = []
for d in data:
file_content.append(d[1:])

from this point on i've showed the code,
highway_dict = {}
aging_dict = {}
queue_counters={}
queue_row = []
for content in file_content:
if 'aging' in content:
# aging 0 100
# code here

etc, etc
 
D

Dave Angel

On Thursday, May 9, 2013 7:19:38 PM UTC+1, Dave Angel wrote:

Yes it's a list of string. I don't get the NameError: name 'file_content' is not defined in my code.

That's because you have the 3 lines below which we hadn't seen yet.
After i appended the headers i wanted to cut the data list it little bit more because there was some data (imagine some other collumns) to the left that didn't needed.

file_content = []
for d in data:
file_content.append(d[1:])

from this point on i've showed the code,
highway_dict = {}
aging_dict = {}
queue_counters={}
queue_row = []
for content in file_content:
if 'aging' in content:
# aging 0 100
# code here

OK, so I now have some code I can actually run. Unfortunately, it
produces an error:

Traceback (most recent call last):
File "ricardo.py", line 23, in <module>
aging_dict['total'], aging_dict[columns] = total, aging_values
NameError: name 'total' is not defined

So I'll make a reasonable guess that you meant total_values there. I
still can't understand how you're testing this code, when there are
trivial bugs in it.

Next, I get:

Traceback (most recent call last):
File "ricardo.py", line 32, in <module>
highway_dict['lanes'], highway_dict['state'],
highway_dict['limit(mph)'] = lanes, state, limit_values
NameError: name 'lanes' is not defined

and then:

Traceback (most recent call last):
File "ricardo.py", line 32, in <module>
highway_dict['lanes'], highway_dict['state'],
highway_dict['limit(mph)'] = lanes_values, state, limit_values
NameError: name 'state' is not defined

Each of those not-defined errors was pointed out by me earlier in the
thread.

I don't see any output logic, so I guess it's up to us to guess what the
meanings and scope of the various lists and dicts are. I figure the
queue_row is your final collection that you hope to get results from.
It's a list containing many references to a single queue_counters
object. So naturally, they all look the same.

If you want them to be different, you have to create a new one each
time. Move the line:
queue_counters={}

inside the loop, right after the line:
for content in file_content:

There are a bunch of other things wrong, like not lining up the columns
when you're substringing content, but this may be your major stumbling
block. Note: you may have to also move the highway_dict and
aging_dict; I haven't figured out what they're for, yet.

Following is the code I've been using to try to figure out what you were
intending:


file_content = [
"aging 0 100",
"aging 2 115",
"aging 3 1",
"highway 4 disable 25",
"highway 2 disable 245",
"highway 0 enable 125",
]

highway_dict = {}
aging_dict = {}
#queue_counters={}
queue_row = []
for content in file_content:
queue_counters={}
if 'aging' in content:
# aging 0 100
columns = ', '.join(map(str,
content[:1])).replace('-','_').lower()
print "columns:", columns
total_values =''.join(map(str, content[1:2]))
aging_values = '\t'.join(map(str, content[2:]))

aging_dict['total'], aging_dict[columns] =
total_values, aging_values
queue_counters[columns] = aging_dict
if 'highway' in content:
#highway | 4 | disable | 25
columns = ''.join(map(str,
content[:1])).replace('-','_').lower()
lanes_values =''.join(map(str, content[1:2]))
state_values = ''.join(map(str, content[2:3])).strip('')
limit_values = ''.join(map(str, content[3:4])).strip('')

highway_dict['lanes'], highway_dict['state'],
highway_dict['limit(mph)'] = lanes_values, state_values, limit_values
queue_counters[columns] = highway_dict
queue_row.append(queue_counters)


print
print "h dict:", highway_dict
print
print "aging dict:", aging_dict
print
print "q counters:", queue_counters
for key, item in queue_counters.iteritems():
print key, item

print
print "q row:", queue_row
for item in queue_row:
print item
 
N

Neil Cerutti

That's because you have the 3 lines below which we hadn't seen yet.

Heroic efforts, Dave!

To rlelis:

Do not start to program until you understand what you want to do.
Work it out on a sheet of paper, or at least in your mind.

If you can't provide sample input and the expected output from
it, chances are you aren't ready to start writing code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top