Form saving, db question ?

V

vlado

I have a form like this

Param type1(outputText) : value1(inpuText)
Param type2 (outputText) : value2(inputText)
.. .
.. .

and so on. And I want to save it in a database. Is it better to have one
table which will have columns like this :

ID, PARAM_TYPE, PARAM_VALUE
1 FORD_MUSTANG 10
2 AMG_CLK 50

and so on.....


, or

table that will have columns like this :

ID, PARAM_TYPE_FK, PARAM_VALUE,
1 1 10
2 2 50

and so on....




where PARAM_TYPE_FK is FK to look_up table cointaing types like
(FORD_MUSTANG, MERCEDES_AMG...and so on) , or to have these types in
let's say enum in web controller and write just (1,2) in PARAM_TYPE_FK
column without look up table.


Thanks in advance.
 
L

Lew

I have a form like this

Param type1(outputText) : value1(inpuText)
Param type2 (outputText) : value2(inputText)
.                        .
.                        .

and so on. And I want to save it in a database. Is it better to have one
table which will have columns like this :

ID, PARAM_TYPE, PARAM_VALUE
1             FORD_MUSTANG    10
2             AMG_CLK                 50

Wow, that didn't line up well.
and so on.....

, or

table that will have columns like this :

ID, PARAM_TYPE_FK, PARAM_VALUE,
1                    1                        10
2                    2                        50

and so on....

where PARAM_TYPE_FK is FK to look_up table cointaing types like
(FORD_MUSTANG, MERCEDES_AMG...and so on) , or to have these types in
let's say enum in web controller and write just (1,2) in PARAM_TYPE_FK
column without look up table.

That depends.

You will need a search engine and/or Wikipedia for the following.

If you are using a relational database, it is best to keep it at least
at third normal form (3NF), unless it's a data warehouse, in which
case the time dimension changes the idea of what constitutes a normal
form.

Your question seems to focus on the matter of natural keys vs.
sequenced surrogate keys. What is the business-domain meaning of the
"ID" column in your layout?

When you do a relational database design, at least at the so-called
"logical" level, you need to know what constitute the keys in your
problem domain. I venture to gamble that "ID" is not part of the
problem domain and therefore no part of your logical design. If you
do use such a surrogate key, it will be an optimization detail of the
physical implementation that has no place in the domain model.

For your particular problem, I'd probably have a lookup table
representing the car types to keep them consistent, and copy the
actual value into the table that references them.

table PARAMS:
PARAM_TYPE PARAM_VALUE
------------ -----------
Ford Mustang 10
Chevy Camaro 17

table PARAM_TYPES
PARAM_TYPE
------------
Ford Mustang
Chevy Camaro

Yes, there'd be a FK relationship between PARAM_TYPE in both tables.

This is not a Java question, but Java programmers do need to know
about database design.
 
V

vlado

I'm using a relation database, and ID in these tables is used
as a primary key. What do you mean by you will have a lookup table with
car types to keep them consistent ? Is't lookup table bes used when
let's say I hava a drop down list in
my form with data that which will be rearly changed or updated ?


Thanks for the answer.
 
A

Arved Sandstrom

vlado said:
I'm using a relation database, and ID in these tables is used as
a primary key. What do you mean by you will have a lookup table with car
types to keep them consistent ? Is't lookup table bes used when let's
say I hava a drop down list in
my form with data that which will be rearly changed or updated ?

Thanks for the answer.

My question would still be, in *your* problem is the car type (something
like FORD_MUSTANG) part of the identity? Lew was trying to get at this,
because it's important.

If it's not part of the identity then that field is more of a free-form
descriptive string; if you wish to pre-define the available values that
then is still more of a user-input presentation-layer constraint, and
you could use an enum mapping to a varchar field to store the actual
values. In this case your ID primary key is a surrogate key and you may
have no natural key at all.

If OTOH the car type is part of the identity (it's all or some of a/the
natural key) then you may very well still be using the ID column as a
surrogate and primary key, but it's now also in your interests to ensure
that the car type column is constrained (a unique index) on the
data/RDBMS side. In which case the table that enforces this constraint
isn't going to be the table that stores user input; hence foreign keys.
In this case, as Lew said, the actual param value is in the user input
table, that uses a foreign key relationship to point back at the table
containing the car type natural keys.

"Rarely changed or updated" isn't typically your driver in deciding data
relationships, what the natural keys are, or how you'll enforce data
integrity in your tables. It's definitely a factor - for example, I'd
take it into account when fine-tuning entity caching in JPA - but not at
this stage.

AHS
 
V

vlado

My question would still be, in *your* problem is the car type
(something like FORD_MUSTANG) part of the identity? Lew was trying to
get at this, because it's important.

If it's not part of the identity then that field is more of a
free-form descriptive string; if you wish to pre-define the available
values that then is still more of a user-input presentation-layer
constraint, and you could use an enum mapping to a varchar field to
store the actual values. In this case your ID primary key is a
surrogate key and you may have no natural key at all.

If OTOH the car type is part of the identity (it's all or some of
a/the natural key) then you may very well still be using the ID column
as a surrogate and primary key, but it's now also in your interests to
ensure that the car type column is constrained (a unique index) on the
data/RDBMS side. In which case the table that enforces this constraint
isn't going to be the table that stores user input; hence foreign
keys. In this case, as Lew said, the actual param value is in the user
input table, that uses a foreign key relationship to point back at the
table containing the car type natural keys.

"Rarely changed or updated" isn't typically your driver in deciding
data relationships, what the natural keys are, or how you'll enforce
data integrity in your tables. It's definitely a factor - for example,
I'd take it into account when fine-tuning entity caching in JPA - but
not at this stage.

AHS
1.What do you mean by "part of identity" ?

2.Other thing is that on every project we have practise to have lookup
tables , like country that have for example ID, NAME columns, so I can't
use table without primary key (ID).
 
V

vlado

Here is a concrete example in attachemt how the form looks like.
Input forms are prices and name of the car garage. So can you now
suggest me how would the database table(s) look like, and how will the
entites look like ?
 
V

vlado

CAR GARAGE NAME: CITY GARAGE (inputText)

------------------------------------------------------
| 2005 | 2006 | 2007 | 2008 |
-------------------------------------------------------
AMG | 200.000 | | | |
--------------------------------- ----------------------
OPEL | 15.000 | 20.000 | | |
-------------------------------------------------------


AND SO ON.....





Where Car garage name is inputText, and so are prices that are enetered
beneth years.
So can you please give me suggestion how the datatable(s) should look
like to save this form, and how
should entites look like ?


Thanks is advance.
 
L

Lew

Actually, I was trying to get at why he was using ID specifically.
The answer "it's the primary key" is flat-out silly - that's circular
reasoning. It's not the primary key because it's the primary key.

WHY is it the primary key? What does it *mean*?

If the answer is "nothing in terms of the problem domain" (vlado - you
need to know what "problem domain" means) then the logical analysis
should eliminate ID and look at the *real* primary key - the one that
identifies the thing described *in terms of the domain of discourse*.

A database table is a representation of the problem domain - it
doesn't drive the definitions, it reflects them.

vlado - that point is the key.
1.What do you mean by "part of identity" ?

Once again, read about third normal form (3NF). I didn't mention it
for its irrelevance, but its relevance.

You are going to have extremely poor results if you don't familiarize
yourself with the basics of database design.

"Identity" is the attribute or collection of attributes that uniquely
identifies an entity. Think of it as the object attributes that
contribute to 'equals()' evaluation, if you like.

Answer the question I asked above, which you have for some reason so
far refused to do: What is the *meaning* of ID in the problem domain?

It isn't that "it's the primary key". That's backwards thinking.
What does it represent in terms of the entities that make it the
primary key.

Again, I'm willing to gamble that it means nothing.

Please answer this question.
2.Other thing is that on every project we have practise to have lookup
tables , like country that have for example ID, NAME columns, so I can't
use table without primary key (ID).

ID and "primary key" are not synonyms.

What is the *real* key?
 
L

Lew

       CAR GARAGE NAME: CITY GARAGE (inputText)

------------------------------------------------------
                    |  2005      |  2006     |  2007    | 2008    |
-------------------------------------------------------
AMG          | 200.000   |              |               |             |
--------------------------------- ----------------------
OPEL         | 15.000     |  20.000   |            |              |
-------------------------------------------------------

AND SO ON.....

Where Car garage name is inputText, and so are prices that are enetered
beneth years.
So can you please give me suggestion how the datatable(s) should look
like to save this form, and how
should entites look like ?

I already gave one answer to that question, but you didn't answer
mine.

-- PostgreSQL
CREATE TABLE garage
(
garage VARCHAR(32) PRIMARY KEY,
address TEXT,
info TEXT
);

CREATE TABE make
(
make VARCHAR(32) PRIMARY KEY
);

CREATE TABLE car
(
car VARCHAR(32) PRIMARY KEY,
make VARCHAR(32) NOT NULL FOREIGN KEY REFERENCES make (make)
garage VARCHAR(32) FOREIGN KEY REFERENCES garage (garage),
whenin DATE,
info TEXT
);

-- no surrogate key ID needed or wanted
 
L

Lew

I already gave one answer to that question, but you didn't answer
mine.

You need to answer the questions we ask, and research the things we
suggest, else why do you bother to ask us for help?
-- PostgreSQL
CREATE TABLE garage
(
  garage  VARCHAR(32) PRIMARY KEY,
  address TEXT,
  info    TEXT
);

CREATE TABE make
(
  make    VARCHAR(32) PRIMARY KEY
);

CREATE TABLE car
(
  car    VARCHAR(32) PRIMARY KEY,
  make   VARCHAR(32) NOT NULL FOREIGN KEY REFERENCES make (make)
  garage VARCHAR(32) FOREIGN KEY REFERENCES garage (garage),
  whenin DATE,

-- oops, I forgot
price NUMERIC (10, 2),
 
A

Arne Vajhøj

Actually, I was trying to get at why he was using ID specifically.
The answer "it's the primary key" is flat-out silly - that's circular
reasoning. It's not the primary key because it's the primary key.

WHY is it the primary key? What does it *mean*?

If the answer is "nothing in terms of the problem domain" (vlado - you
need to know what "problem domain" means) then the logical analysis
should eliminate ID and look at the *real* primary key - the one that
identifies the thing described *in terms of the domain of discourse*.

A database table is a representation of the problem domain - it
doesn't drive the definitions, it reflects them.

That is what about half the DB people thinks.

The other half thinks exactly the opposite.

Arne
 
L

Lew

That is what about half the DB people thinks.

The other half thinks exactly the opposite.

That's not at all accurate, though I'd be interested in the statistical basis
for your observation.

The only way I can take the meaning of "exactly the opposite" is that you are
suggesting "half" of "database people" think that the problem domain should
represent the database. I have never in my decades of reading about database
design encountered anyone who suggested the opposite of having the database
model the problem domain, or even deprecated the importance of the unique
attribute collection that defines an entity in the business-domain model.

I'm familiar with the controversy over sequenced surrogate keys vs. not using
them, but no one I've read in that debate has ever suggested that the tail
should wag the dog. /Au contraire/ the proponents of surrogate keys I've read
suggested that surrogate keys help to model the problem domain, and protect
against volatility in the "natural" key. (See the Developerworks article
mentioned /infra/.)

Furthermore, if one does not analyze the problem domain for natural keys, one
cannot properly assign sequenced surrogate keys to the rows, and will risk
update anomalies.

<http://en.wikipedia.org/wiki/Third_normal_form
manages to explain 3NF quite clearly without once resorting to sequenced
surrogate keys.

The ACM article at
<http://www.bkent.net/Doc/simple5.htm>
goes through five normal forms without once resorting to sequenced surrogate keys.

Same with
<http://support.microsoft.com/kb/283878>

The Developerworks article at
<http://www.ibm.com/developerworks/web/library/wa-dbdsgn1.html>
discusses how to pick keys and the motivations to use surrogate keys
responsibly. They provide a balanced viewpoint, but are careful to point out,
"As part of the plumbing, the surrogate key has no need to ever be visible
outside the DB. In particular, it should never be revealed to the user. ... If
a business need arises for providing the user with a unique identifier to a
particular dataset, this identifier should be considered real /business/ data
and kept separate from the plumbing." [emph. original]
 
A

Arne Vajhøj

That's not at all accurate, though I'd be interested in the statistical
basis for your observation.

That seems to be what is observed when DB people discuss.
The only way I can take the meaning of "exactly the opposite" is that
you are suggesting "half" of "database people" think that the problem
domain should represent the database.

You usually have some terms to describe when someone post something
that is not particular relevant.

Is this a strawman or a non sequitur (or both)?
I'm familiar with the controversy over sequenced surrogate keys vs. not
using them,

Then why do you then post:

# If the answer is "nothing in terms of the problem domain" (vlado - you
# need to know what "problem domain" means) then the logical analysis
# should eliminate ID and look at the *real* primary key - the one that
# identifies the thing described *in terms of the domain of discourse*.

as if it is the one and only truth??
Furthermore, if one does not analyze the problem domain for natural
keys, one cannot properly assign sequenced surrogate keys to the rows,
and will risk update anomalies.

Any source for that?
<http://en.wikipedia.org/wiki/Third_normal_form
manages to explain 3NF quite clearly without once resorting to sequenced
surrogate keys.

The ACM article at
<http://www.bkent.net/Doc/simple5.htm>
goes through five normal forms without once resorting to sequenced
surrogate keys.

Same with
<http://support.microsoft.com/kb/283878>

No one has claimed that it is not possible.
The Developerworks article at
<http://www.ibm.com/developerworks/web/library/wa-dbdsgn1.html>
discusses how to pick keys and the motivations to use surrogate keys
responsibly. They provide a balanced viewpoint, but are careful to point
out, "As part of the plumbing, the surrogate key has no need to ever be
visible outside the DB. In particular, it should never be revealed to
the user. ... If a business need arises for providing the user with a
unique identifier to a particular dataset, this identifier should be
considered real /business/ data and kept separate from the plumbing."
[emph. original]

That is one of the main arguments for surrogate keys.

The problem with real keys is if it need to change.

Surrogate keys that are not exposed outside the database
never change.

Arne
 
L

Lew

Then why do you then post:

# If the answer is "nothing in terms of the problem domain" (vlado - you
# need to know what "problem domain" means) then the logical analysis
# should eliminate ID and look at the *real* primary key - the one that
# identifies the thing described *in terms of the domain of discourse*.

as if it is the one and only truth??

Because there I'm talking about the logical analysis, as I clearly stated, and
not the physical implementation. You cannot appropriately choose a sequenced
surrogate key unless you know what the natural key is, i.e., what the identity
is that you're modeling.
 
L

Lew

Lew said:
The Developerworks article at
<http://www.ibm.com/developerworks/web/library/wa-dbdsgn1.html>
discusses how to pick keys and the motivations to use surrogate keys
responsibly. They provide a balanced viewpoint, but are careful to point
out, "As part of the plumbing, the surrogate key has no need to ever be
visible outside the DB. In particular, it should never be revealed to
the user. ... If a business need arises for providing the user with a
unique identifier to a particular dataset, this identifier should be
considered real /business/ data and kept separate from the plumbing."
[emph. original]
That is one of the main arguments for surrogate keys.

The problem with real keys is if it need to change.

Surrogate keys that are not exposed outside the database
never change.

I have not spoken against surrogate keys. You appear to think that I have.
What I said was that they are not part of the logical analysis but of the
physical implementation. Surrogate keys properly are applied as described in
the cited Developerworks article, as a *surrogate* for the natural key to
handle certain implementation issues and to optimize access.

The problem with not understanding the natural keys is that surrogate keys are
misapplied and screw up normalization. I've seen examples in the field where
misunderstanding surrogate key usage led to requiring 3- or more-way joins
where a two-way join would have sufficed with proper key analysis. I see it
misapplied in the OP's case, where he has no clue as to the identity that he's
modeling with his cargo-cult "ID" columns. You have to understand the
identity being modeled in the business domain to understand for what the key
is a surrogate.

There are other techniques to deal with slowly-changing natural keys than
internal surrogate keys. Like most things, there are tradeoffs involved in
the choices.

Another mistake with surrogate keys is that they get applied to many-to-many
linking tables where they are a problem. Not every table needs them.

I would appreciate you providing a reference that states "the exact opposite"
of my points. You yourself haven't even formulated a statement in this
discussion that is the "opposite" of anything I've said, other than to vaguely
claim that somehow I'm wrong. I'm not able so far to find a reference
claiming the opposite of my points here, nor have I seen one in nearly thirty
years of studying, implementing and using databases. Perhaps I missed it, in
which case your citation will enlighten me.
 
V

vlado

You need to answer the questions we ask, and research the things we
suggest, else why do you bother to ask us for help?


-- oops, I forgot
price NUMERIC (10, 2),
And I wan't to answer to your questions, but with lot's of replies
I can't find what the questions are, so can I write them down please ?

Thanks
 
L

Lew

vlado said:
And I wan't to answer to your questions, but with lot's of replies I
can't find what the questions are, so can I write them down please ?

I asked, "What does 'ID' represent in the problem domain?"

To rephrase and perhaps make the concept clearer, what is it in the problem
domain that makes one entity distinguishable from another?

If there is an "ID" in the problem domain, where does it come from?

If there is not an "ID" in the problem domain, for what entity attribute or
collection of attributes is it a surrogate?

One more question: Have you studied or even read yet any of the links
regarding key selection and normalization that have been referenced in this
discussion?
 
V

vlado

I asked, "What does 'ID' represent in the problem domain?"

To rephrase and perhaps make the concept clearer, what is it in the
problem domain that makes one entity distinguishable from another?

If there is an "ID" in the problem domain, where does it come from?

If there is not an "ID" in the problem domain, for what entity
attribute or collection of attributes is it a surrogate?

One more question: Have you studied or even read yet any of the links
regarding key selection and normalization that have been referenced in
this discussion?

To be sensire ID in the problem domain represents nothing. It's
just and indentifier to make one entity unique form another. When I
persist entity in DB I'm fetching it by his database generated(surogate)
ID. The problem is that my current DBA is poorly designing
our database beacause we have over 500 tables and every table has
ID(LONG-surogate), and niether of them doesn't use some natural primary
key like you set for Car table that it will be car field, and I don't
know why. That's the rason why
I would use ID(surogate) in every new table that I need beacuse it is
the common practise for my DBA, but it seems that he isn't doing it
right :-(
 
L

Lew

vlado said:
To be sensire ID in the problem domain represents nothing. It's just and
indentifier to make one entity unique form another. When I persist
entity in DB I'm fetching it by his database generated(surogate) ID. The
problem is that my current DBA is poorly designing
our database beacause we have over 500 tables and every table has
ID(LONG-surogate), and niether of them doesn't use some natural primary
key like you set for Car table that it will be car field, and I don't
know why. That's the rason why
I would use ID(surogate) in every new table that I need beacuse it is
the common practise for my DBA, but it seems that he isn't doing it
right :-(

As Arne pointed out earlier, there's a good argument for LONG id fields to
serve as the primary key for entity tables.

That doesn't mean every table should have one.

Certain tables represent entities, like CAR in your example. (I took the
liberty of picking my own table names here.) I'll use LONG ID columns for
entities here though I won't always choose to do that every time.

CREATE TABLE CAR
(
ID SERIAL NOT NULL PRIMARY KEY,
VIN VARCHAR (17) UNIQUE,
MAKE VARCHAR (60),
MODEL VARCHAR (60),
YEAR INTEGER
);

"MAKE" and "MODEL" can be foreign keys to like-named tables. Those referenced
tables are usually lookups for valid values rather than entities in their own
right. They can have ID fields, but the short ones (and most are short) can
also have values directly in the columns, denorming the value into the
referencing table. This saves on JOIN overhead in SELECTS that use that column.

Obviously VIN is the natural key. The ID surrogate key allows changes to the
natural key with less stress to inter-table relationships, if done correctly.

Some SQL dialects support an enum type similar to the Java enum. Then it's a
choice of how you enforce the values and whether you need extensibility.

Point being that a lookup table can have a serial ID but probably doesn't need
one.

A linking table should not have a surrogate key. A linking table enforces
structure - many-to-many relantionships between tables. It rarely needs or
has more than the foreign key columns that tie other tables together. The
primary key is the multi=column concatenation of the foreign key columns. The
table's purpose is to support JOIN clauses, and extra ID columns interfere
with that.

ID columns in entity tables keep join-table keys relatively compact:

CREATE TABLE cardriver -- many cars to many drivers
(
car INTEGER FOREIGN KEY REFERENCES car (id),
driver INTEGER FOREIGN KEY REFERENCES driver (id),
PRIMARY KEY (car, driver)
);

Multi-way JOIN SELECTs that go from, say, car to residence via driver but
displaying columns only from car and residence can skip the driver entity
table altogether and use just linking tables in the middle.

SELECT car.*, res.* FROM car
JOIN cardriver ON car.id = cardriver.car
JOIN driverresidence dres ON cardriver.driver = dres.driver
JOIN residence res on dres.residence = res.id
WHERE ...
 
V

vlado

As Arne pointed out earlier, there's a good argument for LONG id
fields to serve as the primary key for entity tables.

That doesn't mean every table should have one.

Certain tables represent entities, like CAR in your example. (I took
the liberty of picking my own table names here.) I'll use LONG ID
columns for entities here though I won't always choose to do that
every time.

CREATE TABLE CAR
(
ID SERIAL NOT NULL PRIMARY KEY,
VIN VARCHAR (17) UNIQUE,
MAKE VARCHAR (60),
MODEL VARCHAR (60),
YEAR INTEGER
);

"MAKE" and "MODEL" can be foreign keys to like-named tables. Those
referenced tables are usually lookups for valid values rather than
entities in their own right. They can have ID fields, but the short
ones (and most are short) can also have values directly in the
columns, denorming the value into the referencing table. This saves
on JOIN overhead in SELECTS that use that column.

Obviously VIN is the natural key. The ID surrogate key allows changes
to the natural key with less stress to inter-table relationships, if
done correctly.

Some SQL dialects support an enum type similar to the Java enum. Then
it's a choice of how you enforce the values and whether you need
extensibility.

Point being that a lookup table can have a serial ID but probably
doesn't need one.

A linking table should not have a surrogate key. A linking table
enforces structure - many-to-many relantionships between tables. It
rarely needs or has more than the foreign key columns that tie other
tables together. The primary key is the multi=column concatenation of
the foreign key columns. The table's purpose is to support JOIN
clauses, and extra ID columns interfere with that.

ID columns in entity tables keep join-table keys relatively compact:

CREATE TABLE cardriver -- many cars to many drivers
(
car INTEGER FOREIGN KEY REFERENCES car (id),
driver INTEGER FOREIGN KEY REFERENCES driver (id),
PRIMARY KEY (car, driver)
);

Multi-way JOIN SELECTs that go from, say, car to residence via driver
but displaying columns only from car and residence can skip the driver
entity table altogether and use just linking tables in the middle.

SELECT car.*, res.* FROM car
JOIN cardriver ON car.id = cardriver.car
JOIN driverresidence dres ON cardriver.driver = dres.driver
JOIN residence res on dres.residence = res.id
WHERE ...
Regarding to the table :

CAR
(
ID SERIAL NOT NULL PRIMARY KEY,
VIN VARCHAR (17) UNIQUE,
MAKE VARCHAR (60),
MODEL VARCHAR (60),
YEAR INTEGER
); , if I make MAKE, MODEL and YEAR FK to the separate tables MAKE,
MODEL, YEAR, you are telling me that these tables doesn't have to have
surrogate keys, yes ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top