How to link foreign keys & primary keys using python?

H

Harry George

sonal said:
Hi all,
I hv started with python just recently... and have been assigned to
make an utility which would be used for data validations... ,snip]
plz help me... till get my hands on python... :)

1. This sure looks like a school assignment.

2. "till get my..." Do you actually not have python installed yet? It
is hopeless to tackle this problem if you haven't done a few simple
"hello, world" tasks.

3. Do you have a data model? In this case, you need to think
carefully about what a RDBMS, PK, and FK are. In otherwords, a
metamodel. Generally, once you understand the data structures you
should implement them pretty much verbatim in a "model" module.
Then do reader/writer modules so you can load that model from your
data sources and dump out to your data sinks.

4. Do you have a testsuite and test harness? Put together your test
harness, then develop for the simplest case, then add complexity.
E.g., no FK, FK with 1 attr, FK with multiple attrs, FKs with
shared attrs.
 
S

sonal

Hi all,
I hv started with python just recently... and have been assigned to
make an utility which would be used for data validations...
In short we take up various comma separated data files
for eg: area.txt, school.txt, students.txt.... and so on (ok?!?)
now,
1. area code used in the school.txt must be defined in the area.txt
(Primary key in area => area_code defined in area.txt
& Foreign key on school => area_code defined in school.txt)

i hv created primary key using the following piece of code:

# primary key for area.txt (index created on the column AREACODE as per
the
# schema defined earlier )
area_pk = PartitionedPK( name = 'Area PK ',
save_to =
'../Index/area_code.idx',
fields = ['A_AREACODE'] )
# col name in area schema

# foreign key is defined as follows...
school_fk = HashedFK( name = ' School code FK',
load_from = '../Index/area_code.idx',
fields = ['S_AREACODE'] )
# col name in school schema

Description for abv code:
1. An index {area_code.idx } is formed on the field AREACODE in the
area.txt
(i.e.,A_AREACODE)
2. The data values in the S_AREACODE field in the school.txt are
checked in the index {area_code.idx}
If the area code given in school.txt is not present in the area
code, then the record is not validated(as foreign key constraint
fails.)

Now if the Primary key is on mutiple columns...the foreign key, which
is also definedon the same no. of columns works..
for eg..

# primary key for school.txt (index created on the columns AREACODE &
SCHOOLCODE as per the
# schema defined earlier )
school_pk = PartitionedPK( name = 'School PK ',
save_to = '../Index/school.idx',
fields =
['S_AREACODE','S_SCHOOLCODE'] ) # col names in school schema

# foreign key for students is defined as follows...
student_fk = HashedFK( name = ' STUDENT code FK',
load_from = '../Index/student.idx',
fields =
['STUD_AREACODE','STUD_SCHOOLCODE'] ) # col name in
student schema

Now if I hv to define foreign key on student.txt but with only one
field, school code, so as to make sure that the school name given in
the student.txt exists in the school.txt whatever be the area code...
I am unable to do this...

Say, if the AREACODE field is not present in the student.txt file....
then???
I tried using the foll code

student_fk = HashedFK( name = ' Student FK',
load_from = '../Index/school.idx',
fields = ['STUD_SCHOOLCODE'] )

Its showing an AttributeError : 'list' object has no attribute
'has_key'

I also tried making another index with only SCHOOLCODE field (another
PK for school.txt) and the foreign key on student.txt to be loaded from
this index...
it still shows the same error
That's may be becoz... one SCHOOLCODE may repeat often, with a
different AREACODE... so may be we need to index on distinct
SCHOOLCODES in school.txt...
how do i do that??

I hope i am not confusing the genious' ... plz help me...
till get my hands on python... :)
 
M

MTD

Your post is confusing. Here is my advice: investigate the use of
dictionaries. Dictionaries can allow you to define data in the form {
key:data }, e.g.

{ area_code : area_data }

{ (area_code,school_code) : school_data }

{ (school_code,student_code) : student_data }
 
S

sonal.patankar

Hi Mr. George,

Let me try it again...

I am not using any relational database to store the required tables
with primary keys & foreign keys....

When I say PRIMARY KEY =>
1. It means an index is created on the specified fields
(Out of various fields given in the comma separated txt file)
FileFormat: CODE, FIRST_NAME, last_name, area_of_expertise, country
Eg: A1,Harry,George, python, XYZCOUNTRY--------(1st record)

2. The index can be formed on a single field or on multiple fields
Eg: a. 'CODE' (single field ) {pk_code}
b. 'CODE' & 'NAME' (multiple fields ) {pk_code_fname}

Now when I say FOREIGN KEY =>
1. If the foreign Key is formed on the field 'CODE' in another text
file
Format: subsriber_code,CODE,first_name, no_of_posts,active(Y/N)
Eg: SUB_001, A1, Harry, 50, Y

This means the CODE (A1) given here is checked in the index formed
above
with primary key: pk_code...

2. If the foreign Key is formed on the fields 'CODE' & 'FIRST_NAME'
Format: subsriber_code,CODE,FIRST_NAME, no_of_posts,active(Y/N)
Eg: SUB_001, A1, Harry, 50, Y

This means the CODE (A1) & FIRST_NAME (Harry) given here
are checked in the index formed above with primary key:
pk_code_fname...

I am done till here.....

The problem starts if I have defined an index on multiple fields
(composite PK)
say: CODE & FIRST_NAME (pk_code_fname)
and if I need to define a FK on a single field out of these
say: CODE

I am unable to do that...
Mr. George, I thought i must explain the code i am dealin with,
for better understanding.., but i am sorry i confused you all the more
(incase, u want to view the code please refer to the code snippets in
my first query posted)

I hope you atleast get an idea of what i am hunting for.... :(
 
S

sonal.patankar

MTD said:
Your post is confusing. Here is my advice: investigate the use of
dictionaries. Dictionaries can allow you to define data in the form {
key:data }, e.g.

{ area_code : area_data }

{ (area_code,school_code) : school_data }

{ (school_code,student_code) : student_data }

Thanx Mr. Marc...
I am surely investigating the dictionaries... but the problem is that
I have to use the existing code...
and thas what is creating problems for me... :(
 
S

sonal

Hi Mr. George,
Sorry for confusing u so much...
Let me try it again...

I am not using any relational database to store the required tables
with primary keys & foreign keys....

When I say PRIMARY KEY =>
1. It means an index is created on the specified fields
(Out of various fields given in the comma separated txt file)
FileFormat: CODE, FIRST_NAME, last_name, area_of_expertise, country
Eg: A1,Harry,George, python, XYZCOUNTRY--------(1st record)

2. The index can be formed on a single field or on multiple fields
Eg: a. 'CODE' (single field ) {pk_code}
b. 'CODE' & 'NAME' (multiple fields ) {pk_code_fname}

Now when I say FOREIGN KEY =>
1. If the foreign Key is formed on the field 'CODE' in another text
file
Format: subsriber_code,CODE,first_name, no_of_posts,active(Y/N)
Eg: SUB_001, A1, Harry, 50, Y

This means the CODE (A1) given here is checked in the index formed
above
with primary key: pk_code...

2. If the foreign Key is formed on the fields 'CODE' & 'FIRST_NAME'
Format: subsriber_code,CODE,FIRST_NAME, no_of_posts,active(Y/N)
Eg: SUB_001, A1, Harry, 50, Y

This means the CODE (A1) & FIRST_NAME (Harry) given here
are checked in the index formed above with primary key:
pk_code_fname...

I am done till here.....

The problem starts if I have defined an index on multiple fields
(composite PK)
say: CODE & FIRST_NAME (pk_code_fname)
and if I need to define a FK on a single field out of these
say: CODE

I am unable to do that...
Mr. George, I thought i must explain the code i am dealin with,
for better understanding.., but i am sorry i confused you all the more
(incase, u want to view the code please refer to the code snippets in
my first query posted)

Thanks & regards,
sonal
 
S

sonal

MTD said:
Your post is confusing. Here is my advice: investigate the use of
dictionaries. Dictionaries can allow you to define data in the form {
key:data }, e.g.

{ area_code : area_data }

{ (area_code,school_code) : school_data }

{ (school_code,student_code) : student_data }

Thanx Mr. Marc...
I am surely investigating the dictionaries... but the problem is that
I have to use the existing code...
and thats what is creating problems for me... :(
 
S

Steve Holden

Hi Mr. George,

Let me try it again...

I am not using any relational database to store the required tables
with primary keys & foreign keys....
None the less, you are using relational database terminology. If you
want people to understand you then you should use it correctly or
explain yourself in other ways ...
When I say PRIMARY KEY =>
1. It means an index is created on the specified fields
(Out of various fields given in the comma separated txt file)
FileFormat: CODE, FIRST_NAME, last_name, area_of_expertise, country
Eg: A1,Harry,George, python, XYZCOUNTRY--------(1st record)

2. The index can be formed on a single field or on multiple fields
Eg: a. 'CODE' (single field ) {pk_code}
b. 'CODE' & 'NAME' (multiple fields ) {pk_code_fname}

Now when I say FOREIGN KEY =>
1. If the foreign Key is formed on the field 'CODE' in another text
file
Format: subsriber_code,CODE,first_name, no_of_posts,active(Y/N)
Eg: SUB_001, A1, Harry, 50, Y

This means the CODE (A1) given here is checked in the index formed
above
with primary key: pk_code...

2. If the foreign Key is formed on the fields 'CODE' & 'FIRST_NAME'
Format: subsriber_code,CODE,FIRST_NAME, no_of_posts,active(Y/N)
Eg: SUB_001, A1, Harry, 50, Y

This means the CODE (A1) & FIRST_NAME (Harry) given here
are checked in the index formed above with primary key:
pk_code_fname...

I am done till here.....

The problem starts if I have defined an index on multiple fields
(composite PK)
say: CODE & FIRST_NAME (pk_code_fname)
and if I need to define a FK on a single field out of these
say: CODE

I am unable to do that...
Mr. George, I thought i must explain the code i am dealin with,
for better understanding.., but i am sorry i confused you all the more
(incase, u want to view the code please refer to the code snippets in
my first query posted)

I hope you atleast get an idea of what i am hunting for.... :(
You say you have to use existing code, which is a pity because it seems
that code is organised in such a way as to make your problem difficult.

The only ways to locate information by a *portion* of the primary key,
given that you appear to be using tuples as primary key values, is
either to maintain a separate index on the partial key (in other words
have a lookup table that holds the partial key values).

Note, also, that you run a risk here: there is no guarantee that only
one record will have any given partial key value.

The whole point of foreign keys in databases is that they are *complete*
primary key values, and can therefore refer at at most one occurrence of
the referenced entity. It is not correct to refer to a reference to a
partial primary key as a "foreign key", precisely because all guarantees
of uniqueness are lost.

It seems that you are not a native English speaker, so I am sorry if
this is leading to further problems.

The more simply you can explain yourself the easier it will be to help!

regards
Steve
 
S

Steve Holden

sonal said:
Hi Mr. George,
Sorry for confusing u so much...
Let me try it again...

I am not using any relational database to store the required tables
with primary keys & foreign keys....

When I say PRIMARY KEY =>
1. It means an index is created on the specified fields
(Out of various fields given in the comma separated txt file)
FileFormat: CODE, FIRST_NAME, last_name, area_of_expertise, country
Eg: A1,Harry,George, python, XYZCOUNTRY--------(1st record)

2. The index can be formed on a single field or on multiple fields
Eg: a. 'CODE' (single field ) {pk_code}
b. 'CODE' & 'NAME' (multiple fields ) {pk_code_fname}
What, in Python, *are* these indexes - lists, dictionaries, tuples or
something else?
Now when I say FOREIGN KEY =>
1. If the foreign Key is formed on the field 'CODE' in another text
file
Format: subsriber_code,CODE,first_name, no_of_posts,active(Y/N)
Eg: SUB_001, A1, Harry, 50, Y

This means the CODE (A1) given here is checked in the index formed
above
with primary key: pk_code...

2. If the foreign Key is formed on the fields 'CODE' & 'FIRST_NAME'
Format: subsriber_code,CODE,FIRST_NAME, no_of_posts,active(Y/N)
Eg: SUB_001, A1, Harry, 50, Y

This means the CODE (A1) & FIRST_NAME (Harry) given here
are checked in the index formed above with primary key:
pk_code_fname...

I am done till here.....

The problem starts if I have defined an index on multiple fields
(composite PK)
say: CODE & FIRST_NAME (pk_code_fname)
and if I need to define a FK on a single field out of these
say: CODE

I am unable to do that...
Mr. George, I thought i must explain the code i am dealin with,
for better understanding.., but i am sorry i confused you all the more
(incase, u want to view the code please refer to the code snippets in
my first query posted)

Thanks & regards,
sonal
A lot depends on the data structure that the code uses to represent the
"indexes". Perhaps you could explain how things are looked up in them?

regards
Steve
 
S

sonal

Hi Mr. Steve,
The *indexes* i am using are lists...
The code for creation of the PartionedPK is given below...
*******************************************************************************
class PartitionedPK(object):
def __init__(self, name, save_to, fields):
self.name = name
self.idx_name = save_to
self.part_name = save_to + PARTITION_SUFFIX
self.fields = fields
self.digester = sha.new

def setup(self, schema):
self.partitions = [[] for i in range(256)]
self.flush_pos = [[] for i in range(256)]
self.flush_count = 0
self.index = {}
self.offsets = field_offsets(self.fields, schema)
if not self.offsets:
raise ValueError('One or more index field names are
invalid')

self.idx_file = open(self.idx_name, 'wb+')
self.part_file = open(self.part_name, 'wb+')

def save(self):
pickle.dump(self.flush_count, self.part_file, -1)
pickle.dump(self.flush_pos, self.part_file, -1)
self.idx_file.close()
self.part_file.close()

def flush(self):
self.flush_count += 1
for i in range(256):
self.flush_pos.append(self.idx_file.tell())
pickle.dump(self.partitions, self.idx_file, -1)
self.partitions = []

def valid(self, record, data):
key = self.digester(''.join( [data for i in self.offsets]
)).digest()
self.partitions[ ord(key[0]) ].append( (key, record) )
# defer checking till later
return True

def finalize(self):
self.flush()
errors = []
for bin in range(256):
#show('Checking %s, bin %d/256 ... ' % (self.name, bin))
seen = {}
has = seen.has_key
for flush in range(self.flush_count):
self.idx_file.seek( self.flush_pos[bin][flush] )
records = pickle.load(self.idx_file)
for key, value in records:
if has(key):
errors.append(value)
else:
seen[key] = value
return errors
*******************************************************************************
the PK definition is as follows:
vol_pk = PartitionedPK( name = 'VOL_PK',
save_to = '../Index/vol.idx',
fields = ['ID','Type','Curr_Code','Tenor'])

The code for the Foreign Key declaration (referencing to the indexes)
is as given below...
*******************************************************************************
class HashedFK(object):
def __init__(self, name, load_from, fields):
self.name = name
self.filename = load_from
self.fields = fields
self.digester = sha.new

def setup(self, schema):
self.index = {}
self.offsets = field_offsets(self.fields, schema)

if not self.offsets:
raise ValueError('One or more index field names are
invalid')

file = open(self.filename, 'rb+')
self.index = pickle.load(file)
file.close()

def valid(self, record, fields):
key = self.digester(''.join( [fields for i in self.offsets]
)).digest()

return self.index.has_key(key)

def flush(self):
pass

def finalize(self):
return None
*******************************************************************************
the FK definition is as follows:
vol_fk = HashedFK( name = ' VOL_FK,
load_from = '../Index/vol.idx',
fields= ['ID','Type','Curr_Code','Tenor'])

The code is working fine when the foreign key is referenced to the
complete primary key

But if the FK were to be on only 'ID'
the FK defn would have been like =>

vol_fk = HashedFK( name = ' VOL_FK,
load_from = '../Index/vol.idx',
fields = ['ID'] )
This is were the problem lies...
it shows AttributeError: 'list' object has no attribute 'has_key'

I have also tried defining another PK with a single field as follows =>
Tvol_pk = PartitionedPK( name = 'TVOL_PK',
save_to = '../Index/tvol.idx',
fields = ['ID'] )
The index ''tvol.idx'' is being created at the given location(path
specified)
but referencing to this index(i.e., tvol.idx) with the vol_fk given
above also gives
the same error.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top