Parsing String, Dictionary Lookups, Writing to Database Table

R

Rich Shepard

I need to learn how to process a byte stream from a form reader where each
pair of bytes has meaning according to lookup dictionaries, then use the
values to build an array of rows inserted into a sqlite3 database table.

Here's the context: The OMR card reader sends a stream of 69 bytes over
the serial line; the last byte is a carriage return ('\r') indicating the
end of record. Three pairs (in specific positions at the beginning of the
stream) represent blanks (no value); two other pairs represent character
strings; the values are determined from two dictionaries. The remaining 28
pairs represent values from a third dictionary.

What I'm doing: I think that I have the first part correct; that is, the
three blanks and two string values. The first three dictionaries are:

DATA_MAP_BLANK = {
chr(32)+chr(32): 0 # SP + SP
}
DATA_MAP_2 = {
chr(32)+chr(36): 'nat', # SP + $
chr(96)+chr(32): 'eco', # + SP
chr(36)+chr(32): 'soc' # $ + SP
}
DATA_MAP_5 = {
chr(32)+chr(36): 'pro', # SP + $
chr(96)+chr(32): 'neu', # + SP
chr(36)+chr(32): 'con' # $ + SP
}

I read the data into a string and split that into byte tokens, then start
building the row to be inserted into a database table:

line = ser.readline()
split_line = line.split()

# then pre-pend the record number to the front of the row
row.join(', ', vote_id)

# extract category choice (row 2, bytes 2 and 3); look up value in dictionary
cat = split_line(2:4)
row.join(', ', DATA_MAP_2(cat) # I'm not sure this is a correct lookup

# extract position (row 5, bytes 8 and 9P; look up value in dictionary
pos = split_line(8:10)
row.join(', ', DATA_MAP_5(pos))

Is the above the most 'correct' way of extracting specific byte pairs,
using them as dictionary keys to get values, then build a string of
comma-and-quote values for insertion in the database table?

Then, I've no idea how to get the rest of the data parsed for use as keys
in the last data mapping dictionary. I do not see a skip value for slicing,
other than in Numeric Python, and I'm not yet building an array of data.

Here's the last dictionary:

DATA_MAP_7 = {
chr(32)+chr(16): 1.000, # SP + DLE
chr(32)+chr(8): 2.000, # SP + BS
chr(32)+chr(4): 3.000, # SP + EOT
chr(32)+chr(2): 4.000, # SP + STX
chr(32)+chr(1): 5.000, # SP + SOH
chr(64)+chr(32): 6.000, # @ + SP
chr(16)+chr(32): 7.000, # DLE + SP
chr(8)+chr(32): 8.000, # BS + SP
chr(4)+chr(32): 9.000, # EOT + SP
chr(34)+chr(8): 0.500, # " + BS
chr(34)+chr(4): 0.333, # " + EOT
chr(34)+chr(2): 0.025, # " + STX
chr(34)+chr(1): 0.200, # " + SOH
chr(66)+chr(32): 0.167, # B + SP
chr(18)+chr(32): 0.143, # DC2 + SP
chr(10)+chr(32): 0.125, # LF + SP
chr(6)+chr(32): 0.111 # ACK + SP
}

I know how I'd do all this in C, but since I'm learning python I have not
found how best to accomplish this despite the books and online references
I've read.

TIA,

Rich
 
G

George Sakkis

Rich said:
I know how I'd do all this in C, but since I'm learning python I have not
found how best to accomplish this despite the books and online references
I've read.

Can you post one or more examples of expected input-output pairs ? From
your description it's not really clear what's going on. Also, please
post working code; the snippets you posted were out of context (what is
row?) and not always correct syntactically (split_line(2:4)).

George
 
M

mensanator

Rich said:
I need to learn how to process a byte stream from a form reader where each
pair of bytes has meaning according to lookup dictionaries, then use the
values to build an array of rows inserted into a sqlite3 database table.

Here's the context: The OMR card reader sends a stream of 69 bytes over
the serial line; the last byte is a carriage return ('\r') indicating the
end of record. Three pairs (in specific positions at the beginning of the
stream) represent blanks (no value); two other pairs represent character
strings; the values are determined from two dictionaries. The remaining 28
pairs represent values from a third dictionary.

What I'm doing: I think that I have the first part correct; that is, the
three blanks and two string values. The first three dictionaries are:

DATA_MAP_BLANK = {
chr(32)+chr(32): 0 # SP + SP
}
DATA_MAP_2 = {
chr(32)+chr(36): 'nat', # SP + $
chr(96)+chr(32): 'eco', # + SP
chr(36)+chr(32): 'soc' # $ + SP
}
DATA_MAP_5 = {
chr(32)+chr(36): 'pro', # SP + $
chr(96)+chr(32): 'neu', # + SP
chr(36)+chr(32): 'con' # $ + SP
}

I read the data into a string and split that into byte tokens, then start
building the row to be inserted into a database table:

line = ser.readline()
split_line = line.split()

# then pre-pend the record number to the front of the row
row.join(', ', vote_id)

# extract category choice (row 2, bytes 2 and 3); look up value in dictionary
cat = split_line(2:4)
row.join(', ', DATA_MAP_2(cat) # I'm not sure this is a correct lookup

# extract position (row 5, bytes 8 and 9P; look up value in dictionary
pos = split_line(8:10)
row.join(', ', DATA_MAP_5(pos))

Is the above the most 'correct' way of extracting specific byte pairs,
using them as dictionary keys to get values, then build a string of
comma-and-quote values for insertion in the database table?

No, if you're going to insert into sqlite3, you don't want a
csv string, you want a list of values (see example code
below).
Then, I've no idea how to get the rest of the data parsed for use as keys
in the last data mapping dictionary. I do not see a skip value for slicing,
other than in Numeric Python, and I'm not yet building an array of data.

Here's the last dictionary:

DATA_MAP_7 = {
chr(32)+chr(16): 1.000, # SP + DLE
chr(32)+chr(8): 2.000, # SP + BS
chr(32)+chr(4): 3.000, # SP + EOT
chr(32)+chr(2): 4.000, # SP + STX
chr(32)+chr(1): 5.000, # SP + SOH
chr(64)+chr(32): 6.000, # @ + SP
chr(16)+chr(32): 7.000, # DLE + SP
chr(8)+chr(32): 8.000, # BS + SP
chr(4)+chr(32): 9.000, # EOT + SP
chr(34)+chr(8): 0.500, # " + BS
chr(34)+chr(4): 0.333, # " + EOT
chr(34)+chr(2): 0.025, # " + STX
chr(34)+chr(1): 0.200, # " + SOH
chr(66)+chr(32): 0.167, # B + SP
chr(18)+chr(32): 0.143, # DC2 + SP
chr(10)+chr(32): 0.125, # LF + SP
chr(6)+chr(32): 0.111 # ACK + SP
}

I know how I'd do all this in C, but since I'm learning python I have not
found how best to accomplish this despite the books and online references
I've read.

As I'm now an expert in sqlite3, here's a sample program that
may do what you need. Note, I had to create some test data.
Make sure it's correct with regards to how your input stream
works.

Also, if vote_id is autoincrement, that's one less thing your
program has to do.

import sqlite3
import random

con = sqlite3.connect(":memory:")
cur = con.cursor()

cur.execute("""
CREATE TABLE test (vote_id integer primary key autoincrement,
category text,
position text,
f01 real,
f02 real,
f03 real,
f04 real,
f05 real,
f06 real,
f07 real,
f08 real,
f09 real,
f10 real,
f11 real,
f12 real,
f13 real,
f14 real,
f15 real,
f16 real,
f17 real,
f18 real,
f19 real,
f20 real,
f21 real,
f22 real,
f23 real,
f24 real,
f25 real,
f26 real,
f27 real,
f28 real,
f29 real)
""")

DATA_MAP_BLANK = {chr(32)+chr(32):0}

DATA_MAP_2 = {chr(32)+chr(36): 'nat', \
chr(96)+chr(32): 'eco', \
chr(36)+chr(32): 'soc'}

DATA_MAP_5 = {chr(32)+chr(36): 'pro', \
chr(96)+chr(32): 'neu', \
chr(36)+chr(32): 'con'}

DATA_MAP_7 = {chr(32)+chr(16): 1.000, \
chr(32)+chr(8): 2.000, \
chr(32)+chr(4): 3.000, \
chr(32)+chr(2): 4.000, \
chr(32)+chr(1): 5.000, \
chr(64)+chr(32): 6.000, \
chr(16)+chr(32): 7.000, \
chr(8)+chr(32): 8.000, \
chr(4)+chr(32): 9.000, \
chr(34)+chr(8): 0.500, \
chr(34)+chr(4): 0.333, \
chr(34)+chr(2): 0.025, \
chr(34)+chr(1): 0.200, \
chr(66)+chr(32): 0.167, \
chr(18)+chr(32): 0.143, \
chr(10)+chr(32): 0.125, \
chr(6)+chr(32): 0.111}

# create test data
DM2_key = DATA_MAP_2.keys()
DM5_key = DATA_MAP_5.keys()
DM7_key = DATA_MAP_7.keys()
DM7_key.extend(DM7_key)

for i in xrange(10):
random.shuffle(DM2_key)
random.shuffle(DM5_key)
random.shuffle(DM7_key)

test = ' ' + DM2_key[0] + ' ' + DM5_key[0]
for i in xrange(29):
test = test + DM7_key

values = [DATA_MAP_2[test[2:4]],DATA_MAP_5[test[8:10]]]
for i in xrange(10,68,2):
values.append(DATA_MAP_7[test[i:i+2]])

cur.execute("""
INSERT INTO test ( category,
position,
f01,
f02,
f03,
f04,
f05,
f06,
f07,
f08,
f09,
f10,
f11,
f12,
f13,
f14,
f15,
f16,
f17,
f18,
f19,
f20,
f21,
f22,
f23,
f24,
f25,
f26,
f27,
f28,
f29)
VALUES
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"""
,values)


# read back what was inserted to verify
cur.execute("""SELECT * FROM test""")
d = cur.fetchall()

for i in d:
print
print
print "Record: %d Category: %s Position: %s" % (i[0],i[1],i[2])
print '-'*40,
count = 0
for j in i[3:31]:
if (count % 6)==0: print '\n',
print '%0.3f ' % (j),
count += 1

## Sample run
##
## Record: 1 Category: soc Position: con
## ----------------------------------------
## 0.111 0.125 0.200 3.000 3.000 6.000
## 9.000 5.000 8.000 7.000 0.333 0.500
## 7.000 0.025 0.167 5.000 8.000 4.000
## 2.000 1.000 0.200 0.143 0.025 9.000
## 1.000 0.143 0.111 6.000
##
## Record: 2 Category: soc Position: neu
## ----------------------------------------
## 0.500 0.333 5.000 1.000 0.143 2.000
## 0.500 6.000 0.200 1.000 2.000 0.333
## 8.000 0.125 0.143 0.111 0.200 9.000
## 4.000 7.000 3.000 0.167 9.000 8.000
## 7.000 0.167 0.111 4.000
##
## Record: 3 Category: soc Position: pro
## ----------------------------------------
## 0.025 0.333 0.167 7.000 0.333 1.000
## 0.200 3.000 0.111 1.000 6.000 5.000
## 0.111 9.000 0.143 0.025 7.000 8.000
## 0.143 9.000 0.500 8.000 0.125 2.000
## 4.000 3.000 5.000 0.200
##
## Record: 4 Category: eco Position: neu
## ----------------------------------------
## 2.000 8.000 3.000 4.000 0.125 0.167
## 0.125 6.000 0.143 0.025 0.025 0.500
## 7.000 2.000 0.111 3.000 4.000 9.000
## 0.167 0.500 0.200 8.000 0.333 1.000
## 9.000 0.111 7.000 5.000
##
## Record: 5 Category: nat Position: con
## ----------------------------------------
## 9.000 0.333 1.000 7.000 9.000 0.111
## 0.200 0.333 5.000 0.500 0.500 5.000
## 2.000 8.000 2.000 0.125 4.000 4.000
## 0.025 0.200 0.125 0.111 3.000 6.000
## 7.000 8.000 0.167 0.167
##
## Record: 6 Category: eco Position: pro
## ----------------------------------------
## 2.000 0.200 0.167 5.000 3.000 4.000
## 0.333 8.000 0.200 0.025 5.000 0.143
## 9.000 0.167 0.125 0.111 7.000 0.025
## 0.111 6.000 0.333 1.000 0.500 0.500
## 0.125 0.143 1.000 3.000
##
## Record: 7 Category: soc Position: neu
## ----------------------------------------
## 0.333 1.000 0.500 0.167 0.111 5.000
## 9.000 0.143 6.000 0.125 4.000 6.000
## 3.000 0.143 0.125 7.000 0.200 4.000
## 2.000 0.025 0.167 7.000 8.000 3.000
## 9.000 0.025 5.000 0.111
##
## Record: 8 Category: eco Position: pro
## ----------------------------------------
## 6.000 2.000 0.125 3.000 4.000 0.025
## 0.167 5.000 0.025 0.143 0.111 0.200
## 2.000 1.000 4.000 0.167 7.000 0.111
## 5.000 0.500 6.000 8.000 1.000 0.333
## 0.125 9.000 0.200 9.000
##
## Record: 9 Category: soc Position: pro
## ----------------------------------------
## 3.000 0.500 0.143 7.000 5.000 6.000
## 4.000 0.333 0.025 1.000 0.200 3.000
## 9.000 8.000 6.000 0.500 0.111 1.000
## 0.333 0.167 0.025 5.000 8.000 9.000
## 0.125 0.111 0.200 2.000
##
## Record: 10 Category: soc Position: con
## ----------------------------------------
## 4.000 2.000 0.025 7.000 0.500 0.333
## 5.000 6.000 0.125 0.143 0.500 0.200
## 0.167 0.167 4.000 9.000 9.000 6.000
## 0.111 0.200 0.143 0.333 1.000 2.000
## 0.125 3.000 8.000 3.000
 
R

Rich Shepard

No, if you're going to insert into sqlite3, you don't want a csv string,
you want a list of values (see example code below).

Thank you very much. It makes solid sense and I can see the differences
and where I was not on track. I greatly appreciate the time and effort you
put into helping me.

Rich
 
D

Dennis Lee Bieber

Here's the context: The OMR card reader sends a stream of 69 bytes over
the serial line; the last byte is a carriage return ('\r') indicating the

That might be tricky... I don't know if the serial module considers
a \r as end of line; Python, itself, uses \n internally.
end of record. Three pairs (in specific positions at the beginning of the
stream) represent blanks (no value); two other pairs represent character
strings; the values are determined from two dictionaries. The remaining 28
pairs represent values from a third dictionary.

What I'm doing: I think that I have the first part correct; that is, the
three blanks and two string values. The first three dictionaries are:

DATA_MAP_BLANK = {
chr(32)+chr(32): 0 # SP + SP
}

If that is the ONLY data that is going to be in this dictionary, it
is probably more explicit to handle this as a special case and just do a
comparison:

if whatever == " ":
DATA_MAP_2 = {
chr(32)+chr(36): 'nat', # SP + $
chr(96)+chr(32): 'eco', # + SP
chr(36)+chr(32): 'soc' # $ + SP
}

Why all the chr() calls?

" $" :
"\x60 " : #actually ` seems to be valid on my system
"$ " :
DATA_MAP_5 = {
chr(32)+chr(36): 'pro', # SP + $
chr(96)+chr(32): 'neu', # + SP
chr(36)+chr(32): 'con' # $ + SP
}

#consolidated mapping
DATA_MAP_x = { 0x2024 : ("nat", "pro", None),
0x6020 : ("eco", "neu", None),
0x2420 : ("soc", "con", None),
0x2010 : (None, None, 1.0),
0x2008 : (None, None, 2.0),
0x2004 : (None, None, 3.0),
0x2002 : (None, None, 4.0),
0x2001 : (None, None, 5.0),
0x4020 : (None, None, 6.0),
0x1020 : (None, None, 7.0),
0x0820 : (None, None, 8.0),
0x0420 : (None, None, 9.0),
0x2208 : (None, None, 0.5),
0x2204 : (None, None, 0.333),
0x2202 : (None, None, 0.025), #not 0.25?
0x2201 : (None, None, 0.200),
0x4220 : (None, None, 0.167),
0x1220 : (None, None, 0.143),
0x1020 : (None, None, 0.125),
0x0620 : (None, None, 0.111) }

I read the data into a string and split that into byte tokens, then start
building the row to be inserted into a database table:

line = ser.readline()
split_line = line.split()

This is going to split the line on spaces -- that means you won't
HAVE any spaces in the parts of "split_line" on which to match your
dictionary.
# then pre-pend the record number to the front of the row
row.join(', ', vote_id)

# extract category choice (row 2, bytes 2 and 3); look up value in dictionary
cat = split_line(2:4)

After the above line.split(), split_line will be a list of
"words"... You need to specify which word to extract from...
row.join(', ', DATA_MAP_2(cat) # I'm not sure this is a correct lookup

# extract position (row 5, bytes 8 and 9P; look up value in dictionary
pos = split_line(8:10)
row.join(', ', DATA_MAP_5(pos))
The struct module is probably going to be more useful...

fields = struct.unpack(
"2x h 4x h 28h x",
line)
#skip two bytes, 2byte short, skip 4 bytes, 2byte short, 28 shorts,
#skip trailing \r

catg = DATA_MAP_x[fields[0]] [0]
post = DATA_MAP_x[fields[1]] [1]
vals = [ DATA_MAP_x[v] [3] for v in fields[2:] ]

row = [vote_id]
row.append(catg)
row.append(post)
row.extend(vals)
Is the above the most 'correct' way of extracting specific byte pairs,
using them as dictionary keys to get values, then build a string of
comma-and-quote values for insertion in the database table?
You don't use CSV to insert into a database... CSV is an external
format that needs to be parsed before inserting into the data base...
and we've just spent time parsing your data...

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top