Q: what database would you suggest?

D

Diego Virasoro

Hi,
I am doing some scientific work in Ruby. So far the data is stored
into a massive file (just Marshalled or YAMLed) and then once the run
is over it is loaded back into memory for some data analysis.

This (not very elegant method) has worked so far but as I will need to
raise the scale to several GBs the time lost in loading and slicing
the data in memory is getting painful. Hence I thought I could take
advantage of the many DB that exist for Ruby. Alas... they are so
many, and I've got no database experience, so I am not sure what would
be best.

The data can be thought of as hierarchical tables: i.e. tables in the
SQL sense with *some* entries corresponding to entire new tables, and
the others either strings or numbers (integers and float). I have only
access to one computer, though with multiple processors, so I don't
have any need for distributing the data. Finally I'd like speed, but
given the database size I can't keep all the info in memory.

The kind of operations I need to do involve getting slices of tables
(all the rows but a subset of columns), often in all the "subtables"
I've got. I also need to add a few rows to each table when the
simulation is running (but not for analysis) however the rate is
rather low so I don't expect this to be much of a constraint.

Could people with more DB experience give a few suggestions? I've read
of MongoDB, CouchDB, MySQL and PostgreSQL... and to be honest I am
pretty lost, so I am sending this in the hope some fellow Rubyist can
help a poor DB newbie. ;)

Thank you to all in advance.
Diego Virasoro
 
P

Phillip Gawlowski

The data can be thought of as hierarchical tables: i.e. tables in the
SQL sense with *some* entries corresponding to entire new tables, and
the others either strings or numbers (integers and float). I have only
access to one computer, though with multiple processors, so I don't
have any need for distributing the data. Finally I'd like speed, but
given the database size I can't keep all the info in memory.

The kind of operations I need to do involve getting slices of tables
(all the rows but a subset of columns), often in all the "subtables"
I've got. I also need to add a few rows to each table when the
simulation is running (but not for analysis) however the rate is
rather low so I don't expect this to be much of a constraint.

Could people with more DB experience give a few suggestions? I've read
of MongoDB, CouchDB, MySQL and PostgreSQL... and to be honest I am
pretty lost, so I am sending this in the hope some fellow Rubyist can
help a poor DB newbie. ;)

SQLite[0] might be worth a look, especially if you can split your data
across several instances of SQLite.

The key difference of SQLite to things like MySQL, PostgreSQL, and
others is that SQLite is not a server, in that it doesn't offer remote
access, which you say you don't need, so that is not an issue.

Additionally, you have no overhead of administering (or at least
setting up) a database server and its users, and SQLite is lighter on
your resources. ;)

If you have data that doesn't lend itself well to storing them in the
rather rigid confines of SQL, take a look at the NoSQL DBs like
CouchDB or MongoDB. All of those (and the SQL engines) have good to
great Ruby bindings, so you can use them with relative ease.

[0] http://sqlite.org/

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
 
R

Robert Klemme

SQLite sounds like a good start as Phillip Gawlowski says. It's widely
used, simple and straightforward. I doubt you'll have a problem with
size - see this:
http://www.sqlite.org/limits.html

I don't have benchmarks handy but for really large volumes I'd turn to
PostgreSQL. Especially if the ability is needed to manually manipulate
the data and do backup, parallel accesses and the like.

Kind regards

robert
 
D

Diego Virasoro

SQLite[0] might be worth a look, especially if you can split your data
across several instances of SQLite.

Thanks, I'll start with this then.

Any suggestion on the ruby wrapper? It seems that DataMapper is pretty
good and "hides" some of the complexities of SQL. But I've heard of
Sequel as well, and some closer to the SQL code such as ruby-sqlite.

Which one would you suggest to start?

Thank you again
Diego
 
P

Phillip Gawlowski

SQLite[0] might be worth a look, especially if you can split your data
across several instances of SQLite.

Thanks, I'll start with this then.

Any suggestion on the ruby wrapper? It seems that DataMapper is pretty
good and "hides" some of the complexities of SQL. But I've heard of
Sequel as well, and some closer to the SQL code such as ruby-sqlite.

Which one would you suggest to start?

Without knowing your situation: I'd use sqlite3-ruby, or DBI (still
allows you to access different DB engines without performing too much
magic behind the scenes). From what you wrote, it doesn't look like an
ORM (like DataMapper, or ActiveRecord) would be a good fit. And an ORM
might make things you want to do (like analysis of your data) more
difficult that it is necessary.

If you are really unsure: Grab what you find interesting, and do a
little prototyping with a reduced, somewhat simplified dataset that
you'll be working with. Once you have something that you are happy
with, stick to that, instead of having to discard half your work
because you start hating the way your chosen tool behaves. :)

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
 
R

Rick DeNatale

SQLite[0] might be worth a look, especially if you can split your data
across several instances of SQLite.

Thanks, I'll start with this then.

Any suggestion on the ruby wrapper? It seems that DataMapper is pretty
good and "hides" some of the complexities of SQL. But I've heard of
Sequel as well, and some closer to the SQL code such as ruby-sqlite.

Which one would you suggest to start?

IMHO you should definitely have a look at the latest Rails 3 version
of active record which uses arel https://github.com/rails/arel

http://railscasts.com/episodes/202-active-record-queries-in-rails-3
http://railscasts.com/episodes/239-activerecord-relation-walkthrough


Active record can be used separately from Rails, it always could but
Rails 3 makes it easier to pick particular parts of Rails then ever.


--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Github: http://github.com/rubyredrick
Twitter: @RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
 
D

David Masover

SQLite[0] might be worth a look, especially if you can split your data
across several instances of SQLite.

Thanks, I'll start with this then.

Any suggestion on the ruby wrapper? It seems that DataMapper is pretty
good and "hides" some of the complexities of SQL. But I've heard of
Sequel as well, and some closer to the SQL code such as ruby-sqlite.

Which one would you suggest to start?

Whichever you like -- the important thing is to start.

Personally, I'd suggest DataMapper, mostly because it's likely to make your
application least bound to SQL at all (let alone SQLite) if you decide to
change in the future.

One thing to be aware of with SQLite is that it essentially locks the entire
database for any operations. This may affect you if your code is sufficiently
multithreaded (and you use a Ruby which supports this, like JRuby) or if you
intend to launch multiple worker processes. In either case, you may want to
swap SQLite out for something else eventually.

But especially if you're building on something like DM, you can easily swap
SQLite out for something else in the future.
 
R

Reid Thompson

Could people with more DB experience give a few suggestions? I've read
of MongoDB, CouchDB, MySQL and PostgreSQL... and to be honest I am
pretty lost, so I am sending this in the hope some fellow Rubyist can
help a poor DB newbie. ;)
my recommendation would be postgresql. If you don't want to deal with postgresql initially, then sqlite3.
Migrating from sqlite3 to postgresql should be pretty straightforward if you desire/need to do so later. If
you go with an ORM, make sure that it supports the ability for YOU to write sql queries if you want to --
ORM's may not build the most appropriate SQL query for YOUR database schema, this may be the difference
between an acceptable/unacceptable query response time.
 
J

James Edward Gray II

One thing to be aware of with SQLite is that it essentially locks the =
entire database for any operations.

That's not very accurate. SQLite has a very sophisticated system of =
staged locks and it's definitely possible to have multiple clients =
reading data at once.

James Edward Gray II
 
A

Ammar Ali

One thing to be aware of with SQLite is that it essentially locks the entire
database for any operations. This may affect you if your code is sufficiently
multithreaded (and you use a Ruby which supports this, like JRuby) or if you
intend to launch multiple worker processes. In either case, you may want to
swap SQLite out for something else eventually.

I learned that the hard way. I innocently tried to fork a process that
used sqlite and the whole thing fell apart in not so obvious ways.
Many hours were wasted recovering from that one. So yes, safe sqlite
equals single process.

Regards,
Ammar
 
P

Petite Abeille

D

David Masover

"Prepare Now For Possible Future Head Transplant"
http://database-programmer.blogspot.com/2010/11/prepare-now-for-possible-fu
ture-head.html

I don't have time to do a point-by-point rebuttal, so for now:

"So what about that weird title involving head transplants? Obviously a head
transplant is impossible, making it also very unlikely, besides being silly
and non-sensical. It came to mind as a kind of aggregrate of all of the
bizarre and unrealistic ideas about abstracting data designs that I have heard
over the years."

Huh?

Having something that's at least abstract enough to deal with multiple
databases is something I have actually used. That is, as a matter of
convenience, I've actually developed applications using SQLite and deployed
them to MySQL.

And it's not always just SQL.

I've also migrated models from one application to another -- that is, I took a
model which used to be internal to both applications A and B, along with code
that made assumptions about that model, and made it live in application A,
with application B accessing it via REST.

So, if you're keeping track, that's the same model, used for both REST and
SQL. Far from impossible, silly, or unlikely, the fact that we were using
abstractions which allowed for this possibility increased the likelihood that
we'd actually do it.

It's also possible the article misses one of the main points of an ORM. Even
if I were writing a SQL-only application, I'd have to be insane to use raw SQL
to deal with it when I can have 90% of the work done for me by the ORM -- that
is, 90% of what any application does is trivial CRUD, and there's no reason I
should have to rewrite that every time.

Yeah, reading the 3.0 summary confirms all writes are an exclusive lock over
the entire database. My bad, that is different than locking for _all_ writes,
but it's still something you want to be aware of during the design phase,
especially if someone follows your advice and doesn't abstract sufficiently
that they can easily migrate away later.
 
F

flebber

oh, i didn't knew that they implemented SQL API.

For my two cents I am fairly database illiterate the only thing I have
read a decent amount about is normalisation of data.

I am using sequel and Mysql, I chose Mysql because it was popular wide
range of documnetation and community support but mostly it has the
MySQL workbench which will allow me to see graphically what i am doing
which being a beginner and only having used Access 2000 & 2007 prior
was a plus.You can turn networking off in MySQL setup to setup a
single developer PC easily.

There is also a very helpful sequel group if your having any issues
http://groups.google.com/group/sequel-talk and again enough docs for a
beginner here http://sequel.rubyforge.org/

I am a database no nothing and sequel and mysql work for me. And
because its fairly easy I am learning more..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top