Modified Single Table Inheritance

R

Ryan Glover

Hello,

I am developing an application that will have 100's of model classes
each derived from a single source class. Each model will have 4 similar
attributes but then have about a dozen unique ones each. I looked into
using STI but balked at creating tables with a 1000 columns. I then
looked at using inherits_from and creating an new table for each class
that holds the new class columns. Problem is, I'll end up with 100's of
tables, which also does not appeal to me.

My third solution, the one I am asking advice on, is this.

I create a single table with the 4 common columns and a dozen columns
with names such as float1, float2, float3, ... , string1, string2, ..
etc. Creating enough columns of each type to cover my largest subclass.
Then I use STI and for each class I map the generic columns to the nicer
names inside the subclasses. So in one class float1 may be miles/hour
while in another class it might be turkeys/hectare.

Does this sound like a reasonable approach?

Also, how would I map the generic names to nice names in my subclasses?
(I am a RoR noob)

And of course, does anyone see a much better way to do what I have
proposed?

Thank you,
Ryan Glover
 
E

Eric Hodel

I am developing an application that will have 100's of model classes
each derived from a single source class. Each model will have 4
similar
attributes but then have about a dozen unique ones each. I looked
into
using STI but balked at creating tables with a 1000 columns. I then
looked at using inherits_from and creating an new table for each class
that holds the new class columns. Problem is, I'll end up with
100's of
tables, which also does not appeal to me.

I think you want the Rails mailing list.
 
M

Mat Schaffer

My third solution, the one I am asking advice on, is this.

I create a single table with the 4 common columns and a dozen columns
with names such as float1, float2, float3, ... , string1, string2, ..
etc. Creating enough columns of each type to cover my largest
subclass.
Then I use STI and for each class I map the generic columns to the
nicer
names inside the subclasses. So in one class float1 may be miles/hour
while in another class it might be turkeys/hectare.

Does this sound like a reasonable approach?

Also, how would I map the generic names to nice names in my
subclasses?
(I am a RoR noob)

And of course, does anyone see a much better way to do what I have
proposed?

Even though we're not technically on the RoR list, I thought this was
still an interesting question for ORM, STI, OOP and probably some
other acronyms too :)

I'd be curious to hear just what it is that you're doing that needs
100s of classes, first. Sounds like you might be taking OOP a bit
too far for your purposes.

But to answer your question, I like the concept but it brings up fear
of bugs. Dropping meaningful column names will wreak havoc on any
cases where you have to write a query by hand. But everything is a
trade off. Personally, I'd try the 100s of tables route first and
see how it panned out.

Best of luck. Sounds like a tricky one.
-Mat
 
D

David Vallner

--------------enig769AF9E99DE70C00A818BF51
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Ryan said:
Hello,
=20
I am developing an application that will have 100's of model classes
each derived from a single source class. Each model will have 4 similar=
attributes but then have about a dozen unique ones each.

Why are they deriving from a single class anyway?

If it's only for a few identical attributes without significant
functionality to go along, I'd say it's a bad idea to use inheritance.
And with latent typing in Ruby, you Rarely If Ever need inheritance
anyway. Saves you the bother that is inheritance mapping at any rate.
Problem is, I'll end up with 100's of
tables, which also does not appeal to me.
=20

I can't understand this. It's rather common that an ORM database schema
has a number of tables in the same order of magnitude as there are model
classes. My biased opinion is that data should be modelled on the
database level where you have some theorethical foundations about what
are "good shapes" for the data. It also doesn't hurt to at least make
the DB schema somewhat document the data model - the data an application
gathers / stores is the most likely part of it to persist in the long
term. Of course, you're probably in a better position to judge this.
My third solution, the one I am asking advice on, is this.
=20
I create a single table with the 4 common columns and a dozen columns
with names such as float1, float2, float3, ... , string1, string2, ..
etc. Creating enough columns of each type to cover my largest subclass= =2E
Then I use STI and for each class I map the generic columns to the nice= r
names inside the subclasses. So in one class float1 may be miles/hour
while in another class it might be turkeys/hectare.
=20
Does this sound like a reasonable approach?
=20

Database schemas like this tend to make it into The Daily WTF with some
regularity.

(Unfortunately, they also make it into production systems.)

If there's a chance someday, someone, somewhy might want to access your
data at a DB level (using a reporting tool maybe), there'd be no end to
the grief.

Or, if a bug in your code gets the database into an inconsistent state,
and you have to drop down to SQL to patch things up.

Avoid.
And of course, does anyone see a much better way to do what I have
proposed?
=20

Use migrations to create the schema from scratch if you just want to
avoid writing a lot of SQL up front?

David Vallner


--------------enig769AF9E99DE70C00A818BF51
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFkxP8y6MhrS8astoRArEBAJ9HDubMhGUbZHdfCV+nscweMInkjACffcfQ
vBgA0l4e+GbXt3GUNlmTtBE=
=07NB
-----END PGP SIGNATURE-----

--------------enig769AF9E99DE70C00A818BF51--
 
R

Ryan Glover

Hello David,
Why are they deriving from a single class anyway?

All the objects are the same thing, so to speak. As a silly example of
what I am trying to do, let's say that each user an array named fruit.
There are many types of fruit and each have their own unique attributes
(# of grapes, fuzziness index of peach, banana radius) but they are all
fruit. My program is just like this, except I have hundreds of fruit.
Perhaps even a thousand. Ideally I would like to be able to add more
fruit as the program evolves just by adding a new fruit subclass and a
new fruit view.
I can't understand this. It's rather common that an ORM database schema
has a number of tables in the same order of magnitude as there are model
classes. My biased opinion is that data should be modelled on the
database level where you have some theorethical foundations about what
are "good shapes" for the data. It also doesn't hurt to at least make
the DB schema somewhat document the data model - the data an application
gathers / stores is the most likely part of it to persist in the long
term. Of course, you're probably in a better position to judge this.

I guess my problem with 100's perhaps a 1000 tables is db performance.
I don't have oracle, only mysql or postgres. Performance wise, wouldnt
it be better to have a few thousand rows in one table as opposed to a
few dozen spread across 100's of tables?

Database schemas like this tend to make it into The Daily WTF with some
regularity.

(Unfortunately, they also make it into production systems.)

If there's a chance someday, someone, somewhy might want to access your
data at a DB level (using a reporting tool maybe), there'd be no end to
the grief.

Or, if a bug in your code gets the database into an inconsistent state,
and you have to drop down to SQL to patch things up.

Avoid.

I appreciate all these warnings and I understand what you are saying.
In this case I am client for the project so I'll take my lumps as they
come.

Thanks for your response,
Ryan Glover
 
D

David Vallner

--------------enig2D215BDA840148A7A161469D
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Ryan said:
=20
I guess my problem with 100's perhaps a 1000 tables is db performance. =
I don't have oracle, only mysql or postgres. Performance wise, wouldnt= =20
it be better to have a few thousand rows in one table as opposed to a=20
few dozen spread across 100's of tables?
=20

The history of computing knows only one answer to that question:
benchmark. (And not even that's much of an answer.)

However, I have very, very few doubts that separate tables would be
faster in a read-mostly scenario. The records for that model are
filtered at insert time instead of query time, and deciding between
tables is probably way faster than on any ad hoc type index column.

Generally, a well-designed schema will probably buy you more performance
in the long run than any sort of peephole hand-optimizations. You want
to minimize the number of round-trip hits on the database and the volume
of the datasets retrieved for that (often conflicting goals), and a
clear schema helps do just that.

David Vallner


--------------enig2D215BDA840148A7A161469D
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFkyHHy6MhrS8astoRAuvFAJsHncs5yrEvFhi87+HgFGWuile6mgCeI/kw
PqBjbS9atsfF2C6nhUUTLsg=
=vwRq
-----END PGP SIGNATURE-----

--------------enig2D215BDA840148A7A161469D--
 
M

Mat Schaffer

All the objects are the same thing, so to speak. As a silly
example of
what I am trying to do, let's say that each user an array named fruit.
There are many types of fruit and each have their own unique
attributes
(# of grapes, fuzziness index of peach, banana radius) but they are
all
fruit. My program is just like this, except I have hundreds of fruit.
Perhaps even a thousand. Ideally I would like to be able to add more
fruit as the program evolves just by adding a new fruit subclass and a
new fruit view.

Well, I'm guessing fruit is just an example. But I think you could
still group the fruits into subcategories or something to that effect
and end up with reasonable inheritance. For example, berries
(strawberry, raspberry) , pitted fruits (peach, concord grape), not
sure what you'd call a banana but I'd bet an encyclopedia would have
some info. But I think you get the idea. To get the specific kind
of fruit, you could just use a type field and get single table
inheritance on the subcategory.

It'll depend on your data model. But I'm inclined to agree with the
rest of group that there are either relationships or duck-typing
advantages that you're not utilizing.

And David has a point of schemas that include things like float1,
etc... do often show up on the daily WTF. That's might be reason
enough to stay away.

Thinking another way, what about having a fruit table and a trait
table. The trait table could reference a large number of
trait_types. The trait table would reference a fruit, a trait_type
and a value. You could probably get away with just 3 or 4 value
columns in the trait table. Just a thought.
-Mat
 
S

Stephen Richards

Ryan said:
Hello David,


All the objects are the same thing, so to speak. As a silly example of
what I am trying to do, let's say that each user an array named fruit.
There are many types of fruit and each have their own unique attributes
(# of grapes, fuzziness index of peach, banana radius) but they are all
fruit. My program is just like this, except I have hundreds of fruit.
Perhaps even a thousand. Ideally I would like to be able to add more
fruit as the program evolves just by adding a new fruit subclass and a
new fruit view.

Deriving a subclass for each type of different fruit isn't a scalable or
viable solution.

I presume your app will use a database, so why not use it to do all the
work.

A table for fruit containing all the attributes that remain constant
accross all fruit, (name, origin perhaps).

A table of attributes: diameter, colour, fuzziness index, length,
quantity_in_bunch. There could also be a field on this table to
determine the type: numeric, string, boolean, etc.

A join table to determine which fruit get which attribute.

then when you get an order, the app generates an ordered fruit attribute
for each attribute for that particular fruit and prompts the user to
fill in the data.

Any new fruit that come along in the future just get added to the
database with the correct attributes, and there is no extra work to do
in the app - it's all there.


Does that sort of architecture meet the requirement?

sr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top