Parsing Data, Storing into an array, Infinite Backslashes

S

supercomputer

I am using this function to parse data I have stored in an array.

This is what the array looks like:

[['Memory', '0', 'Summary', '0'], ['Memory', '0', 'Speed',
'PC3200U-30330'], ['Memory', '0', 'Type', 'DDR SDRAM'], ['Memory', '0',
'Size', '512'], ['Memory', '0', 'Slot', 'DIMM0/J11'], ['Memory', '0',
'ConfigurationType', '2'], ['Memory', '1', 'Summary', '0'], ['Memory',
'1', 'Speed', 'PC3200U-30330'], ['Memory', '1', 'Type', 'DDR SDRAM'],
['Memory', '1', 'Size', '512'], ['Memory', '1', 'Slot', 'DIMM1/J12'],
['Memory', '1', 'ConfigurationType', '2'], ['Memory', '2', 'Summary',
'0'], ['Memory', '2', 'Speed', 'PC3200U-30330'], ['Memory', '2',
'Type', 'DDR SDRAM'], ['Memory', '2', 'Size', '512'], ['Memory', '2',
'Slot', 'DIMM2/J13'], ['Memory', '2', 'ConfigurationType', '2'],
['Memory', '3', 'Summary', '0'], ['Memory', '3', 'Speed',
'PC3200U-30330'], ['Memory', '3', 'Type', 'DDR SDRAM'], ['Memory', '3',
'Size', '512'], ['Memory', '3', 'Slot', 'DIMM3/J14'], ['Memory', '3',
'ConfigurationType', '2']]

This is the code to parse the array:

count=0
place=0
query=[]
while 1:
try:
i=fetch.next()
except StopIteration:
break
if i[1] != count:
++count
query.append(count)
qval=`query[count]`
query[count]=qval+i[2]+"="+i[3]+", "

print qval,"\n"

When it runs I get an output similar to this.

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Type=DDR
SDRAM,
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Size=512,
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Slot=DIMM2/J13,
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'ConfigurationType=2,
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Summary=0,
\\\\\\\\\\\\\\\'Speed=PC3200U-30330, \\\\\\\'Type=DDR SDRAM,
\\\'Size=512, \'Slot=DIMM3/J14, '

When it's supposed to print just the plain text with the numbers etc.

I have changed these lines:

qval=`query[count]`
query[count]=qval+i[2]+"="+i[3]+", "

To this:

query[count]=query[count]+i[2]+"="+i[3]+", "

I get this error:

Traceback (most recent call last): File "infnode.py", line 60, in ?
query[count]=query[count]+i[2]+"="+i[3]+", "TypeError: unsupported
operand type(s) for +: 'int' and 'str'

So I try and fix it by doing this:

query[count]=`query[count]`+i[2]+"="+i[3]+", "

Can someone please point me in the right direction I am sure that the
`query[count]` is causing the backslashes.

Thanks in advance.
 
J

Jeff Epler

Your code is needlessly complicated.

Instead of this business
while 1:
try:
i = fetch.next()
except stopIteration:
break
simply write:
for i in fetch:
(if there's an explicit 'fetch = iter(somethingelse)' in code you did
not show, then get rid of that and just loop 'for i in somethingelse')

i[1] will never compare equal to count, because i[1] is always a string
and count is always an integer. Integers and strings are never equal to
each other.

Wring code like
x = "a string " + 3
does not work in Python. You can either convert to a string and then
use the + operator to concatenate:
x = "a string " + str(3)
or you can use %-formatting:
x = "a string %s" % 3
("%s" accepts any sort of object, not just strings)

Using repr(...) (`...` is just a shorthand for this) is what is really
introducing the backslashes. When it outputs a string, it quotes the
string using backslashes. But you pass the old part of the prepared
string through it each time, which leads to doubling backslashes.

Below is a program I wrote to process the data in your message. It prints
out
Memory 2 Summary=0, Speed=PC3200U-30330, Type=DDR SDRAM, Size=512, Slot=DIMM2/J13, ConfigurationType=2
Memory 3 Summary=0, Speed=PC3200U-30330, Type=DDR SDRAM, Size=512, Slot=DIMM3/J14, ConfigurationType=2
Memory 0 Summary=0, Speed=PC3200U-30330, Type=DDR SDRAM, Size=512, Slot=DIMM0/J11, ConfigurationType=2
Memory 1 Summary=0, Speed=PC3200U-30330, Type=DDR SDRAM, Size=512, Slot=DIMM1/J12, ConfigurationType=2
the result is out of order because the result of calling .items() on a
dict is in an arbitrary order.

Jeff

s = [['Memory', '0', 'Summary', '0'], ['Memory', '0', 'Speed',
'PC3200U-30330'], ['Memory', '0', 'Type', 'DDR SDRAM'], ['Memory', '0',
'Size', '512'], ['Memory', '0', 'Slot', 'DIMM0/J11'], ['Memory', '0',
'ConfigurationType', '2'], ['Memory', '1', 'Summary', '0'], ['Memory',
'1', 'Speed', 'PC3200U-30330'], ['Memory', '1', 'Type', 'DDR SDRAM'],
['Memory', '1', 'Size', '512'], ['Memory', '1', 'Slot', 'DIMM1/J12'],
['Memory', '1', 'ConfigurationType', '2'], ['Memory', '2', 'Summary',
'0'], ['Memory', '2', 'Speed', 'PC3200U-30330'], ['Memory', '2',
'Type', 'DDR SDRAM'], ['Memory', '2', 'Size', '512'], ['Memory', '2',
'Slot', 'DIMM2/J13'], ['Memory', '2', 'ConfigurationType', '2'],
['Memory', '3', 'Summary', '0'], ['Memory', '3', 'Speed',
'PC3200U-30330'], ['Memory', '3', 'Type', 'DDR SDRAM'], ['Memory', '3',
'Size', '512'], ['Memory', '3', 'Slot', 'DIMM3/J14'], ['Memory', '3',
'ConfigurationType', '2']]

query = {}

for a, b, c, d in s:
if not query.has_key((a,b)): query[(a,b)] = []
query[(a,b)].append("%s=%s" % (c, d))

for (a,b), v in query.items():
print a, b, ", ".join(v)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFC0uMvJd01MZaTXX0RAm5dAKCsFownutZiB2pc2xf9PTGb2hRyUgCfT7ti
E3RKhgPPE1u9/D5MKa1F/Ho=
=HtOG
-----END PGP SIGNATURE-----
 
S

Steven D'Aprano

I am using this function to parse data I have stored in an array.

This is what the array looks like:

[['Memory', '0', 'Summary', '0'], ['Memory', '0', 'Speed',
'PC3200U-30330'], ['Memory', '0', 'Type', 'DDR SDRAM'], ... ]

[snip more of the array]
This is the code to parse the array:

count=0
place=0

place is not used in your function. Remove it.
query=[]
while 1:
try:
i=fetch.next()

What is fetch and what does fetch.next() do?

It is considered bad programming practice to use a variable i for anything
except for i in range(). i for "index", not i for "next record".
except StopIteration:
break
if i[1] != count:

What is i? A list? A tuple? A dict? What is stored in it?
++count
query.append(count)

Why are you appending the numeric value of count to the list query? Since
count starts at zero, and increases by one, your list is just [1, 2, 3, ...]
qval=`query[count]`

It looks like you are setting the variable qval to the string
representation of a number. Backticks are being depreciated, you should
write this as qval = str(query[count]).

But if I have understood your program logic correctly, query[count] up
to this point is just count. So a much simpler way is to just use qval =
str(count).
query[count]=qval+i[2]+"="+i[3]+", "

Impossible to know what this does since we don't know what i is. Hint: it
is easier to read and parse expressions by adding a small amount of
whitespace:

query[count] = qval + i[2] + "=" + i[3] + ", "
print qval,"\n"

When it runs I get an output similar to this.

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Type=DDR
SDRAM,
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Size=512,
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Slot=DIMM2/J13,
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'ConfigurationType=2,
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Summary=0,
\\\\\\\\\\\\\\\'Speed=PC3200U-30330, \\\\\\\'Type=DDR SDRAM,
\\\'Size=512, \'Slot=DIMM3/J14, '

When it's supposed to print just the plain text with the numbers etc.


See below for some further hints.

I have changed these lines:

qval=`query[count]`
query[count]=qval+i[2]+"="+i[3]+", "

To this:

query[count]=query[count]+i[2]+"="+i[3]+", "

I get this error:

Traceback (most recent call last): File "infnode.py", line 60, in ?
query[count]=query[count]+i[2]+"="+i[3]+", "TypeError: unsupported
operand type(s) for +: 'int' and 'str'

Yes. query[count] is an integer equal to count. i[2] is who-knows-what.
"=" is a string. You can't add strings to ints.
So I try and fix it by doing this:

query[count]=`query[count]`+i[2]+"="+i[3]+", "

That is functionally equivalent to your first version.
Can someone please point me in the right direction I am sure that the
`query[count]` is causing the backslashes.

I doubt it very much. But you can test it by adding some print lines in
your code: change this:

qval=`query[count]`
query[count]=qval+i[2]+"="+i[3]+", "

to this:

print "Count is: ", count
print "query[count] is: ", query[count]
qval=`query[count]`
print "qval is: ", qval
query[count]=qval+i[2]+"="+i[3]+", "
print "query[count] changed.\nNew value is: ", query[count]
 
D

Dennis Lee Bieber

count=0
place=0
query=[]
while 1:
try:
i=fetch.next()

Where is the fetch object defined? And what is it supposed to be
returning?
except StopIteration:
break
if i[1] != count:
++count
query.append(count)
qval=`query[count]`
query[count]=qval+i[2]+"="+i[3]+", "

print qval,"\n"
When it's supposed to print just the plain text with the numbers etc.
Which numbers? The "memory" number IN the data, or some
incremental counter you are hoping will match?

Watch out for text wrapping in the browser...

-=-=-=-=-=-=-=-=-=-

# I've deliberately re-ordered some of the items
inData = [ ['Memory', '0', 'Summary', '0'],
['Memory', '0', 'Speed', 'PC3200U-30330'],
['Memory', '0', 'Type', 'DDR SDRAM'],
['Memory', '0', 'Size', '512'],
['Memory', '0', 'Slot', 'DIMM0/J11'],
['Memory', '0', 'ConfigurationType', '2'],
['Memory', '1', 'Size', '512'],
['Memory', '1', 'Slot', 'DIMM1/J12'],
['Memory', '1', 'ConfigurationType', '2'],
['Memory', '2', 'Summary', '0'],
['Memory', '2', 'Speed', 'PC3200U-30330'],
['Memory', '2', 'Type', 'DDR SDRAM'],
['Memory', '2', 'Size', '512'],
['Memory', '1', 'Summary', '0'],
['Memory', '1', 'Speed', 'PC3200U-30330'],
['Memory', '1', 'Type', 'DDR SDRAM'],
['Memory', '2', 'Slot', 'DIMM2/J13'],
['Memory', '2', 'ConfigurationType', '2'],
['Memory', '3', 'Summary', '0'],
['Memory', '3', 'Speed', 'PC3200U-30330'],
['Memory', '3', 'Type', 'DDR SDRAM'],
['Memory', '3', 'Size', '512'],
['Memory', '3', 'Slot', 'DIMM3/J14'],
['Memory', '3', 'ConfigurationType', '2'] ]

# Since I scrambled the order, lets build a dictionary to recollect
stuff
tDict = {}

for ln in inData:
if ln[0] != "Memory":
print "Bad data entry"
print ln
else:
# add a dictionary entry for memory ln[1], with key ln[2] and
value ln[3]
tDict.setdefault(ln[1], {})[ln[2]] = ln[3]

# input data has been reformatted, process each subdictionary for
output
for k, v in tDict.items():
for sk, sv in v.items():
print "%5s:\t%s=%s" % (k, sk, sv)
print
-=-=-=-=-=-=-=-=-=-=-=-
E:\UserData\Dennis Lee Bieber\My Documents>script1.py
1: Slot=DIMM1/J12
1: Speed=PC3200U-30330
1: Summary=0
1: ConfigurationType=2
1: Type=DDR SDRAM
1: Size=512

0: Slot=DIMM0/J11
0: Speed=PC3200U-30330
0: Summary=0
0: ConfigurationType=2
0: Type=DDR SDRAM
0: Size=512

3: Slot=DIMM3/J14
3: Speed=PC3200U-30330
3: Summary=0
3: ConfigurationType=2
3: Type=DDR SDRAM
3: Size=512

2: Slot=DIMM2/J13
2: Speed=PC3200U-30330
2: Summary=0
2: ConfigurationType=2
2: Type=DDR SDRAM
2: Size=512


E:\UserData\Dennis Lee Bieber\My Documents>

You'll note that using the intermediate dictionary allows me
collect mixed order data -- but without using a separate sort operation,
the retrieval is in whatever order Python gives it.

--
 
S

supercomputer

Thanks for all the help, I'm not sure what approach I'm going to try
but I think I'll try all of your suggestions and see which one fits
best.

The variable "i" held the following array:

[['Memory', '0', 'Summary', '0'], ['Memory', '0', 'Speed',
'PC3200U-30330'], ['Memory', '0', 'Type', 'DDR SDRAM'], ['Memory', '0',
'Size', '512'], ['Memory', '0', 'Slot', 'DIMM0/J11'], ['Memory', '0',
'ConfigurationType', '2'], ['Memory', '1', 'Summary', '0'], ['Memory',
'1', 'Speed', 'PC3200U-30330'], ['Memory', '1', 'Type', 'DDR SDRAM'],
['Memory', '1', 'Size', '512'], ['Memory', '1', 'Slot', 'DIMM1/J12'],
['Memory', '1', 'ConfigurationType', '2'], ['Memory', '2', 'Summary',
'0'], ['Memory', '2', 'Speed', 'PC3200U-30330'], ['Memory', '2',
'Type', 'DDR SDRAM'], ['Memory', '2', 'Size', '512'], ['Memory', '2',
'Slot', 'DIMM2/J13'],
Where is the fetch object defined? And what is it supposed to be
returning?

Fetch is declared a few lines up in the program with this
fetch=iter(ed) it just goes through the array and returns the next part
of it.
query[count]=qval+i[2]+"="+i[3]+", "
Impossible to know what this does since we don't know what i is. Hint: it
is easier to read and parse expressions by adding a small amount of
whitespace:

I am trying to assign each new memory slot to a new part in the array.
So when memory is either 0,1,2,3 it will assign it to query[0],
query[1], query[2], query[3]
 
D

Dennis Lee Bieber

Thanks for all the help, I'm not sure what approach I'm going to try
but I think I'll try all of your suggestions and see which one fits
best.

The variable "i" held the following array:
Did it? Since later on you are using "i" to contain one element
from the data array.
[['Memory', '0', 'Summary', '0'], ['Memory', '0', 'Speed',
'PC3200U-30330'], ['Memory', '0', 'Type', 'DDR SDRAM'], ['Memory', '0',
said:
'Slot', 'DIMM2/J13'],
Fetch is declared a few lines up in the program with this
fetch=iter(ed) it just goes through the array and returns the next part
of it.
Which is a native capability of lists and tuple already.
query[count]=qval+i[2]+"="+i[3]+", "
Impossible to know what this does since we don't know what i is. Hint: it
is easier to read and parse expressions by adding a small amount of
whitespace:

I am trying to assign each new memory slot to a new part in the array.
So when memory is either 0,1,2,3 it will assign it to query[0],
query[1], query[2], query[3]

Unfortunately there are two concerns, to start with...

One, Python lists are dynamic sized -- one can not assign/append to q[3]
without having created [0]..[2]. This is one reason so many of the
attempted solutions are using dictionaries; they are unordered.

Two, the code won't behave properly if the input data is in mixed order.
You are using count to keep track of an incrementing index, but you are
(attempting) to increment it every time the data "memory" number changes
with no test for the number being a match to the next in sequence,
rather than just using the "memory" number itself for the index.

Here's a version using lists in place of the dictionary (though
I haven't fully tested it, it does handle my simple reordered data, but
I don't have a skipped number in that):

-=-=-=-=-=-=-=-=-


# I've deliberately re-ordered some of the items
inData = [ ['Memory', '0', 'Summary', '0'],
['Memory', '0', 'Speed', 'PC3200U-30330'],
['Memory', '0', 'Type', 'DDR SDRAM'],
['Memory', '0', 'Size', '512'],
['Memory', '0', 'Slot', 'DIMM0/J11'],
['Memory', '0', 'ConfigurationType', '2'],
['Memory', '1', 'Size', '512'],
['Memory', '1', 'Slot', 'DIMM1/J12'],
['Memory', '1', 'ConfigurationType', '2'],
['Memory', '2', 'Summary', '0'],
['Memory', '2', 'Speed', 'PC3200U-30330'],
['Memory', '2', 'Type', 'DDR SDRAM'],
['Memory', '2', 'Size', '512'],
['Memory', '1', 'Summary', '0'],
['Memory', '1', 'Speed', 'PC3200U-30330'],
['Memory', '1', 'Type', 'DDR SDRAM'],
['Memory', '2', 'Slot', 'DIMM2/J13'],
['Memory', '2', 'ConfigurationType', '2'],
['Memory', '3', 'Summary', '0'],
['Memory', '3', 'Speed', 'PC3200U-30330'],
['Memory', '3', 'Type', 'DDR SDRAM'],
['Memory', '3', 'Size', '512'],
['Memory', '3', 'Slot', 'DIMM3/J14'],
['Memory', '3', 'ConfigurationType', '2'] ]

### Since I scrambled the order, lets build a dictionary to recollect
stuff
##tDict = {}
##
##for ln in inData:
## if ln[0] != "Memory":
## print "Bad data entry"
## print ln
## else:
## # add a dictionary entry for memory ln[1], with key ln[2]
and value ln[3]
## tDict.setdefault(ln[1], {})[ln[2]] = ln[3]
##
### input data has been reformatted, process each subdictionary for
output
##for k, v in tDict.items():
## for sk, sv in v.items():
## print "%5s:\t%s=%s" % (k, sk, sv)
## print

# This time, lets try to manage a list of concatenated strings
tList = []
for ln in inData:
if ln[0] != "Memory":
print "Bad data entry"
print ln
else:
ID = int(ln[1]) # convert string number to integer
item = "%s=%s" % (ln[2], ln[3])

if ID < len(tList):
# list element already allocated, but could be empty
if tList[ID]:
tList[ID] = tList[ID] + ", " + item
else:
tList[ID] = item

elif ID == len(tList):
# ID is next element to be added to list
tList.append(item)

else:
# ID skips some unallocated elements, so create them
tList = tList + ([None] * (ID - len(tList)))
tList.append(item)


# output the reformatted list
for i in range(len(tList)):
print "Memory %5s:\t%s" % (i, tList)

-=-=-=-=-=-=-=-=-=-

Since I put everything into a concatenated string, the lines are
too long and are wrapped by my client here...

E:\UserData\Dennis Lee Bieber\My Documents>script1.py
Memory 0: Summary=0, Speed=PC3200U-30330, Type=DDR SDRAM,
Size=512, Slot=DIMM0/J11, ConfigurationType=2
Memory 1: Size=512, Slot=DIMM1/J12, ConfigurationType=2,
Summary=0, Speed=PC3200U-30330, Type=DDR SDRAM
Memory 2: Summary=0, Speed=PC3200U-30330, Type=DDR SDRAM,
Size=512, Slot=DIMM2/J13, ConfigurationType=2
Memory 3: Summary=0, Speed=PC3200U-30330, Type=DDR SDRAM,
Size=512, Slot=DIMM3/J14, ConfigurationType=2

E:\UserData\Dennis Lee Bieber\My Documents>
--
 
S

supercomputer

I ended up using this code to solve my problem.
for a, b, c, d in s:
if not query.has_key((a,b)): query[(a,b)] = []
query[(a,b)].append("%s=%s" % (c, d))
for (a,b), v in query.items():
print a, b, ", ".join(v)

I'm relatively new to python/programming in general. I usually write
in php and have produced this website and application
http://my-pbs.sf.net One of the things that makes php easy to program
in is the documentation provided at php.net. It is extremely easy to
find the correct functions to use. Are there any sites you could
recommend that discusses structures of loops and strings? I have an
OReilly book called, "Programming Python" and it focuses to much on
showing examples rather than structure and methods.

Thanks for the help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top