Which non SQL Database ?

Discussion in 'Python' started by Jorge Biquez, Dec 4, 2010.

  1. Jorge Biquez

    Jorge Biquez Guest

    Hello all.

    Newbie question. Sorry.

    As part of my process to learn python I am working on two personal
    applications. Both will do it fine with a simple structure of data
    stored in files. I now there are lot of databases around I can use
    but I would like to know yoor advice on what other options you would
    consider for the job (it is training so no pressure on performance).
    One application will run as a desktop one,under Windows, Linux,
    Macintosh, being able to update data, not much, not complex, not many
    records. The second application, running behind web pages, will do
    the same, I mean, process simple data, updating showing data. not
    much info, not complex. As an excersice it is more than enough I
    guess and will let me learn what I need for now.
    Talking with a friend about what he will do (he use C only) he
    suggest to take a look on dBase format file since it is a stable
    format, fast and the index structure will be fine or maybe go with BD
    (Berkley) database file format (I hope I understood this one
    correctly) . Plain files it is not an option since I would like to
    have option to do rapid searches.

    What would do you suggest to take a look? If possible available under
    the 3 plattforms.

    Thanks in advance for your comments.

    Jorge Biquez
    Jorge Biquez, Dec 4, 2010
    #1
    1. Advertising

  2. Jorge Biquez <> writes:

    > Hello all.
    >
    > Newbie question. Sorry.
    >
    > As part of my process to learn python I am working on two personal
    > applications. Both will do it fine with a simple structure of data
    > stored in files. I now there are lot of databases around I can use but
    > I would like to know yoor advice on what other options you would
    > consider for the job (it is training so no pressure on
    > performance). One application will run as a desktop one,under Windows,
    > Linux, Macintosh, being able to update data, not much, not complex,
    > not many records. The second application, running behind web pages,
    > will do the same, I mean, process simple data, updating showing
    > data. not much info, not complex. As an excersice it is more than
    > enough I guess and will let me learn what I need for now.
    > Talking with a friend about what he will do (he use C only) he suggest
    > to take a look on dBase format file since it is a stable format, fast
    > and the index structure will be fine or maybe go with BD (Berkley)
    > database file format (I hope I understood this one correctly) . Plain
    > files it is not an option since I would like to have option to do
    > rapid searches.
    >
    > What would do you suggest to take a look? If possible available under
    > the 3 plattforms.


    Have you considered sqlite3? It is part of the Python standard library.
    It'll work under GNU/Linux, Windows and Mac OSX. For more details see:

    http://docs.python.org/library/sqlite3.html

    --
    Arnaud
    Arnaud Delobelle, Dec 4, 2010
    #2
    1. Advertising

  3. Jorge Biquez

    Gnarlodious Guest

    I use sqlite3, it is fairly simple, fast and not too strict.

    -- Gnarlie
    Gnarlodious, Dec 4, 2010
    #3
  4. Jorge Biquez

    Jorge Biquez Guest

    At 05:02 p.m. 04/12/2010, you wrote:
    >Jorge Biquez <> writes:
    >
    > > Hello all.
    > >
    > > Newbie question. Sorry.
    > >
    > > As part of my process to learn python I am working on two personal
    > > applications. Both will do it fine with a simple structure of data
    > > stored in files. I now there are lot of databases around I can use but
    > > I would like to know yoor advice on what other options you would
    > > consider for the job (it is training so no pressure on
    > > performance). One application will run as a desktop one,under Windows,
    > > Linux, Macintosh, being able to update data, not much, not complex,
    > > not many records. The second application, running behind web pages,
    > > will do the same, I mean, process simple data, updating showing
    > > data. not much info, not complex. As an excersice it is more than
    > > enough I guess and will let me learn what I need for now.
    > > Talking with a friend about what he will do (he use C only) he suggest
    > > to take a look on dBase format file since it is a stable format, fast
    > > and the index structure will be fine or maybe go with BD (Berkley)
    > > database file format (I hope I understood this one correctly) . Plain
    > > files it is not an option since I would like to have option to do
    > > rapid searches.
    > >
    > > What would do you suggest to take a look? If possible available under
    > > the 3 plattforms.

    >
    >Have you considered sqlite3? It is part of the Python standard library.
    >It'll work under GNU/Linux, Windows and Mac OSX. For more details see:
    >
    > http://docs.python.org/library/sqlite3.html
    >
    >--
    >Arnaud
    >--


    Hello all.

    Ok. sqlite3 seems like it is the best option since it is part of
    python already.... you are right of course.

    I do not see a good reason for not using Sqlite3 BUT if for some
    reason would not be an option.... what plain schema of files would
    you use? I am sorry to insist. I do not know much about the size tha
    using Sqlite adds to the application but the idea is that the ,
    application, single user for desktop yes, will be the smallest it can
    be since the idea is to distribute the executable only. (all this is
    to have a prototype I have done in other language)

    For the web part, yes, of course would be multiple users.

    Thanks to all.

    Jorge Biquez
    Jorge Biquez, Dec 4, 2010
    #4
  5. Jorge Biquez

    CM Guest

    On Dec 4, 6:43 pm, Jorge Biquez <> wrote:
    > At 05:02 p.m. 04/12/2010, you wrote:
    >
    >
    >
    > >Jorge Biquez <> writes:

    >
    > > > Hello all.

    >
    > > > Newbie question. Sorry.

    >
    > > > As part of my process to learn python I am working on two personal
    > > > applications. Both will do it fine with a simple structure of data
    > > > stored in files. I now there are lot of databases around I can use but
    > > > I would like to know yoor advice on what other options you would
    > > > consider for the job (it is training so no pressure on
    > > > performance). One application will run as a desktop one,under Windows,
    > > > Linux, Macintosh, being able to update data, not much, not complex,
    > > > not many records. The second application, running behind  web pages,
    > > > will do the same, I mean, process simple data, updating showing
    > > > data. not much info, not complex. As an excersice it is more than
    > > > enough I guess and will let me learn what I need for now.
    > > > Talking with a friend about what he will do (he use C only) he suggest
    > > > to take a look on dBase format file since it is a stable format, fast
    > > > and the index structure will be fine or maybe go with BD (Berkley)
    > > > database file format (I hope I understood this one correctly) . Plain
    > > > files it is not an option since I would like to have option to do
    > > > rapid searches.

    >
    > > > What would do you suggest to take a look? If possible available under
    > > > the 3 plattforms.

    >
    > >Have you considered sqlite3? It is part of the Python standard library.
    > >It'll work under GNU/Linux, Windows and Mac OSX.  For more details see:

    >
    > >    http://docs.python.org/library/sqlite3.html

    >
    > >--
    > >Arnaud
    > >--

    >
    > Hello all.
    >
    > Ok. sqlite3 seems like it is the best option since it is part of
    > python already.... you are right of course.
    >
    > I do not see a good reason for not using Sqlite3 BUT if for some
    > reason would not be an option.... what plain schema of files would
    > you use? I am sorry to insist. I do not know much about the size tha
    > using Sqlite adds to the application but the idea is that the ,
    > application, single user for desktop yes, will be the smallest it can
    > be since the idea is to distribute the executable only. (all this is
    > to have a prototype I have done in other language)


    SQlite itself is around 300 kilobytes. That's negligible. It is also
    already in Python, so you'd have to purposefully exclude it in
    creating your executable to save those 300 kb and thus the 1/13th of a
    second additional time it would take average (3.9 MB/s) users to
    download your app if it were included.
    CM, Dec 4, 2010
    #5
  6. Jorge Biquez

    Jorge Biquez Guest

    Hello all.

    Understood perfectly.

    Will forget other alternatives. Sqlite3 is the
    best option. Thanks for the explanation and time.

    Sqlite for single user and Postgresql will be the choice.

    Thanks all.

    Take care

    Jorge Biquez

    At 06:01 p.m. 04/12/2010, you wrote:
    >Jorge Biquez <> writes:
    >
    > > I do not see a good reason for not using Sqlite3 BUT if for some
    > > reason would not be an option.... what plain schema of files would you
    > > use? I am sorry to insist.

    >
    >SQLite stores the entire database in a single file. Does that answer the
    >question? I'm not sure I understand.
    >
    >Preferably, check SQLite's own site <URL:http://www.sqlite.org/> for
    >answers, since it seems your concerns are not specific to Python. If you
    >have Python-specific concerns about SQLite you'll need to make them more
    >explicit for us to answer them.
    >
    > > I do not know much about the size tha using Sqlite adds to the
    > > application

    >
    >As you noted, SQLite is already in the Python standard library.
    >
    > > For the web part, yes, of course would be multiple users.

    >
    >Systems like Berkeley DB, SQLite, dBase, et cetera achieve their
    >simplicity at the expense of concurrent access to the database.
    >
    >If you want concurrent access to the database by many connections,
    >that's where you need to look at a more sophisticated solution. For
    >efficient concurrent access, a DBMS such as PostgreSQL is the best
    >choice.
    >
    >--
    > \ “When I was little, my grandfather used to make me stand in a |
    > `\ closet for five minutes without moving. He said it was elevator |
    >_o__) practice.†—Steven Wright |
    >Ben Finney
    >--
    >http://mail.python.org/mailman/listinfo/python-list
    Jorge Biquez, Dec 5, 2010
    #6
  7. On 12/4/10 3:43 PM, Jorge Biquez wrote:

    > I do not see a good reason for not using Sqlite3 BUT if for some reason
    > would not be an option.... what plain schema of files would you use?


    Would shelve work?
    Monte Milanuk, Dec 5, 2010
    #7
  8. On 12/4/10 3:43 PM, Jorge Biquez wrote:

    > I do not see a good reason for not using Sqlite3 BUT if for some reason
    > would not be an option.... what plain schema of files would you use?


    Would shelve work?
    Monte Milanuk, Dec 5, 2010
    #8
  9. Jorge Biquez

    John Nagle Guest

    On 12/4/2010 8:44 PM, Monte Milanuk wrote:
    > On 12/4/10 3:43 PM, Jorge Biquez wrote:
    >
    >> I do not see a good reason for not using Sqlite3 BUT if for some reason
    >> would not be an option.... what plain schema of files would you use?

    >
    > Would shelve work?


    There are some systems for storing key-value pairs in files.

    Underneath "shelve" is some primitive database, dbm, gdbm or bsddb.
    "bsddb" is deprecated and was removed from Python 3.x. "dbm" has
    some classic problems. "gdbm" is an improved version of "dbm".
    None of these handle access from multiple processes, or crash
    recovery. We're looking at 1979 technology here.

    SQLite works right when accessed from multiple processes. SQLite
    is the entry-level database technology for Python today. It handles
    the hard cases, like undoing transactions after a crash and
    locking against multiple accesses. Lookup performance is good;
    simultaneous update by multiple processes, though, is not so
    good. When you have a web site that has many processes hitting
    the same database, it's time to move up to MySQL or Postgres.

    There's a lot of interest in "non-SQL" databases for very
    large distributed systems. You worry about this if you're Facebook
    or Google, or are running a big game server farm.

    John Nagle
    John Nagle, Dec 5, 2010
    #9
  10. Jorge Biquez

    Lie Ryan Guest

    On 12/05/10 10:43, Jorge Biquez wrote:
    > I do not see a good reason for not using Sqlite3 BUT if for some reason
    > would not be an option.... what plain schema of files would you use?


    Assuming you don't want SQL, you can use filesystem-based database. Most
    people doesn't realize that a filesystem is essentially a database (of
    files) and a file explorer is a (sort of) DBMS. It is relatively easy to
    create a robust (as robust as the filesystem) and fast (as fast as the
    filesystem) database system by using folders and files (and optionally
    hard and symbolic links) to store data in hierarchical topology.
    Lie Ryan, Dec 5, 2010
    #10
  11. Jorge Biquez

    Roy Smith Guest

    In article <4cfb8021$>,
    Lie Ryan <> wrote:

    > On 12/05/10 10:43, Jorge Biquez wrote:
    > > I do not see a good reason for not using Sqlite3 BUT if for some reason
    > > would not be an option.... what plain schema of files would you use?

    >
    > Assuming you don't want SQL, you can use filesystem-based database. Most
    > people doesn't realize that a filesystem is essentially a database (of
    > files) and a file explorer is a (sort of) DBMS. It is relatively easy to
    > create a robust (as robust as the filesystem) and fast (as fast as the
    > filesystem) database system by using folders and files (and optionally
    > hard and symbolic links) to store data in hierarchical topology.


    Another possibility is one of the new breed of non-relational databases.
    We've been using MongoDB (http://www.mongodb.org/) and so far are happy
    with it. You can find a bunch of other alternatives in Wikipedia's
    NoSQL article.
    Roy Smith, Dec 5, 2010
    #11
  12. On Sun, Dec 5, 2010 at 12:01 AM, John Nagle <> wrote:
    > On 12/4/2010 8:44 PM, Monte Milanuk wrote:
    >>
    >> On 12/4/10 3:43 PM, Jorge Biquez wrote:
    >>
    >>> I do not see a good reason for not using Sqlite3 BUT if for some reason
    >>> would not be an option.... what plain schema of files would you use?

    >>
    >> Would shelve work?

    >
    >    There are some systems for storing key-value pairs in files.
    >
    >    Underneath "shelve" is some primitive database, dbm, gdbm or bsddb..
    > "bsddb" is deprecated and was removed from Python 3.x.  "dbm" has
    > some classic problems.  "gdbm" is an improved version of "dbm".
    > None of these handle access from multiple processes, or crash
    > recovery.  We're looking at 1979 technology here.
    >
    >   SQLite works right when accessed from multiple processes.  SQLite
    > is the entry-level database technology for Python today.  It handles
    > the hard cases, like undoing transactions after a crash and
    > locking against multiple accesses.  Lookup performance is good;
    > simultaneous update by multiple processes, though, is not so
    > good.  When you have a web site that has many processes hitting
    > the same database, it's time to move up to MySQL or Postgres.
    >
    >   There's a lot of interest in "non-SQL" databases for very
    > large distributed systems.  You worry about this if you're Facebook
    > or Google, or are running a big game server farm.


    SQLite isn't exactly no SQL.

    I've used the bsddb and gdbm modules quite a bit. I've found that
    bsddb tables tend to get corrupted (whether used from CPython or C),
    EG when a filesystem fills up. I quite like the gdbm module though,
    and have been using it in my current project.

    If you find that converting your database keys and values to/from
    strings is expensive, you could check out
    http://stromberg.dnsalias.org/~dstromberg/cachedb.html which is a
    caching wrapper around gdbm and other single-table database interfaces
    supporting the same API.

    As far as multiple processes, IINM, gdbm supports a single writer and
    multiple readers.
    Dan Stromberg, Dec 5, 2010
    #12
  13. Jorge Biquez

    Steve Holden Guest

    On 12/5/2010 12:59 AM, CM wrote:
    > SQlite itself is around 300 kilobytes. That's negligible. It is also
    > already in Python, so you'd have to purposefully exclude it in
    > creating your executable to save those 300 kb and thus the 1/13th of a
    > second additional time it would take average (3.9 MB/s) users to
    > download your app if it were included.


    Just as a matter of interest where do you get the information that the
    average user has a 3.9 MB/s path to the Internet?

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    PyCon 2011 Atlanta March 9-17 http://us.pycon.org/
    See Python Video! http://python.mirocommunity.org/
    Holden Web LLC http://www.holdenweb.com/
    Steve Holden, Dec 6, 2010
    #13
  14. Jorge Biquez

    drygal Guest

    On Dec 4, 10:42 pm, Jorge Biquez <> wrote:
    > Hello all.
    >
    > Newbie question. Sorry.
    >
    > As part of my process to learn python I am working on two personal
    > applications. Both will do it fine with a simple structure of data
    > stored in files. I now there are lot of databases around I can use
    > but I would like to know yoor advice on what other options you would
    > consider for the job (it is training so no pressure on performance).
    > One application will run as a desktop one,under Windows, Linux,
    > Macintosh, being able to update data, not much, not complex, not many
    > records. The second application, running behind  web pages, will do
    > the same, I mean, process simple data, updating showing data. not
    > much info, not complex. As an excersice it is more than enough I
    > guess and will let me learn what I need for now.
    > Talking with a friend about what he will do (he use C only) he
    > suggest to take a look on dBase format file since it is a stable
    > format, fast and the index structure will be fine or maybe go with BD
    > (Berkley) database file format (I hope I understood this one
    > correctly) . Plain files it is not an option since I would like to
    > have option to do rapid searches.
    >
    > What would do you suggest to take a look? If possible available under
    > the 3 plattforms.
    >
    > Thanks in advance for your comments.
    >
    > Jorge Biquez


    Pickle dictionary object perhaps?
    http://docs.python.org/library/pickle.html
    drygal, Dec 6, 2010
    #14
  15. Jorge Biquez

    CM Guest

    On Dec 6, 2:17 am, Steve Holden <> wrote:
    > On 12/5/2010 12:59 AM, CM wrote:
    >
    > > SQlite itself is around 300 kilobytes.  That's negligible.  It is also
    > > already in Python, so you'd have to purposefully exclude it in
    > > creating your executable to save those 300 kb and thus the 1/13th of a
    > > second additional time it would take average (3.9 MB/s) users to
    > > download your app if it were included.

    >
    > Just as a matter of interest where do you get the information that the
    > average user has a 3.9 MB/s path to the Internet?


    First, I should have specified the average *U.S.* user, and in so
    doing realized how dumbly "own-nation-centric" that was. My
    apologies. But I got it from this:
    http://arstechnica.com/telecom/news/2010/01/us-broadband-still-lagging-in-speed-and-penetration.ars

    No idea how accurate that is, and I didn't break that into download
    vs. upload speeds. It puts the U.S. 18th in the world rankings of
    average connection speed.

    Also, for what it's worth, I noticed in making an .exe yesterday that
    sqlite3.dll is listed as 843 kB. That may be because it includes
    pysqlite as well, I don't know. In any case, that is at least
    appreciably higher than the 300 kb I listed above, though not too
    bad. If someone is trying to squeak out every last bit of savings in
    making their .exe small, then sqlite.dll is something they could
    consider excluding, if they can. But for most cases, no user will
    care about the extra size.

    Che

    > regards
    >  Steve
    > --
    > Steve Holden           +1 571 484 6266   +1 800 494 3119
    > PyCon 2011 Atlanta March 9-17      http://us.pycon.org/
    > See Python Video!      http://python.mirocommunity.org/
    > Holden Web LLC                http://www.holdenweb.com/
    CM, Dec 6, 2010
    #15
  16. Jorge Biquez

    MRAB Guest

    On 06/12/2010 16:35, CM wrote:
    > On Dec 6, 2:17 am, Steve Holden<> wrote:
    >> On 12/5/2010 12:59 AM, CM wrote:
    >>
    >>> SQlite itself is around 300 kilobytes. That's negligible. It is also
    >>> already in Python, so you'd have to purposefully exclude it in
    >>> creating your executable to save those 300 kb and thus the 1/13th of a
    >>> second additional time it would take average (3.9 MB/s) users to
    >>> download your app if it were included.

    >>
    >> Just as a matter of interest where do you get the information that the
    >> average user has a 3.9 MB/s path to the Internet?

    >
    > First, I should have specified the average *U.S.* user, and in so
    > doing realized how dumbly "own-nation-centric" that was. My
    > apologies. But I got it from this:
    > http://arstechnica.com/telecom/news/2010/01/us-broadband-still-lagging-in-speed-and-penetration.ars
    >

    [snip]
    You're also confusing MB/s (bytes) with Mb/s (bits).
    MRAB, Dec 6, 2010
    #16
  17. Jorge Biquez

    Dave Angel Guest

    On 01/-10/-28163 02:59 PM, CM wrote:
    > On Dec 6, 2:17 am, Steve Holden<> wrote:

    <snip>
    >> Just as a matter of interest where do you get the information that the
    >> average user has a 3.9 MB/s path to the Internet?

    >
    > First, I should have specified the average *U.S.* user, and in so
    > doing realized how dumbly "own-nation-centric" that was. My
    > apologies. But I got it from this:
    > http://arstechnica.com/telecom/news/2010/01/us-broadband-still-lagging-in-speed-and-penetration.ars
    ><snip>


    You're confusing megabits with megabytes. The article says the average
    in the US is 3.9Mb/s, which is only 0.41MB/s. Still seems high to me,
    but I have no statistics.

    DaveA
    Dave Angel, Dec 6, 2010
    #17
  18. Jorge Biquez

    CM Guest


    > You're also confusing MB/s (bytes) with Mb/s (bits).


    Yes I am. I think I will now recuse myself from the proceedings...
    CM, Dec 6, 2010
    #18
  19. Jorge Biquez

    Deadly Dirk Guest

    On Sat, 04 Dec 2010 16:42:36 -0600, Jorge Biquez wrote:

    > Hello all.
    >
    > Newbie question. Sorry.
    >
    > As part of my process to learn python I am working on two personal
    > applications. Both will do it fine with a simple structure of data
    > stored in files. I now there are lot of databases around I can use but I
    > would like to know yoor advice on what other options you would consider
    > for the job (it is training so no pressure on performance). One
    > application will run as a desktop one,under Windows, Linux, Macintosh,
    > being able to update data, not much, not complex, not many records. The
    > second application, running behind web pages, will do the same, I mean,
    > process simple data, updating showing data. not much info, not complex.
    > As an excersice it is more than enough I guess and will let me learn
    > what I need for now. Talking with a friend about what he will do (he use
    > C only) he suggest to take a look on dBase format file since it is a
    > stable format, fast and the index structure will be fine or maybe go
    > with BD (Berkley) database file format (I hope I understood this one
    > correctly) . Plain files it is not an option since I would like to have
    > option to do rapid searches.
    >
    > What would do you suggest to take a look? If possible available under
    > the 3 plattforms.
    >
    > Thanks in advance for your comments.
    >
    > Jorge Biquez


    Well, two NoSQL databases that I have some experience with are MongoDB
    and CouchDB. The choice among them depends on your application. CouchDB
    is an extremely simple to set up, it is all about the web interface, as a
    matter of fact it communicates with the outside world using HTTP
    protocol, returning JSON objects. You can configure it using curl. It is
    also extremely fast but it doesn't allow you to run ad hoc queries. You
    have to create something called a "view". This is more akin to what
    people in the RDBMS world call a "materialized view". Views are created
    by running JavaScript function on every document in the database. Results
    are stored in B*Tree index and then modified as documents are being
    inserted, updated or deleted. It is completely schema free, there are no
    tables, collections or "shards". The primary language for programming
    Couch is JavaScript.
    The same thing applies to MongoDB which is equally fast but does allow ad
    hoc queries and has quite a few options how to do them. It allows you to
    do the same kind of querying as RDBMS software, with the exception of
    joins. No joins. It also allows map/reduce queries using JavaScript and
    is not completely schema free. Databases have sub-objects called
    "collections" which can be indexed or partitioned across several machines
    ("sharding"), which is an excellent thing for building shared-nothing
    clusters. Collections can be indexed and can be aggregated using
    JavaScript and Google's map/reduce. Scripting languages like Python are
    very well supported and linked against MongoDB, which tends to be faster
    then communicating using HTTP. I find MongoDB well suited for what is
    traditionally known as data warehousing.
    Of course, traditional RDBMS specimens like MySQL, PostgreSQL, Firebird,
    Oracle, MS SQL Server or DB2 still rule supreme and most of the MVC tools
    like Django or Turbo Gears are made for RDBMS schemas and can read things
    like the primary or foreign keys and include that into the application.
    In short, there is no universal answer to your question. If prices are a
    consideration, Couch, Mongo, MySQL, PostgreSQL, Firebird and SQL Lite 3
    all cost about the same: $0. You will have to learn significantly less
    for starting with a NoSQL database, but if you need to create a serious
    application fast, RDBMS is still the right answer. You may want to look
    at this Youtube clip entitled "MongoDB is web scale":

    http://www.youtube.com/watch?v=b2F-DItXtZs



    --
    I don't think, therefore I am not.
    Deadly Dirk, Jan 23, 2011
    #19
  20. Jorge Biquez

    Roy Smith Guest

    In article <>,
    Deadly Dirk <> wrote:

    > The same thing applies to MongoDB which is equally fast but does allow ad
    > hoc queries and has quite a few options how to do them. It allows you to
    > do the same kind of querying as RDBMS software, with the exception of
    > joins. No joins.


    Well, sort of. You can use forEach() to get some join-like
    functionality. You don't get the full join optimization that SQL gives
    you, but at least you get to do some processing on the server side so
    you don't have to ship 40 gazillion records over the network to pick the
    three you wanted.

    > It also allows map/reduce queries using JavaScript and
    > is not completely schema free.


    What do you mean by "not completely schema free"?

    > Databases have sub-objects called "collections" which can be indexed
    > or partitioned across several machines ("sharding"), which is an
    > excellent thing for building shared-nothing clusters.


    We've been running Mongo 1.6.x for a few months. Based on our
    experiences, I'd say sharding is definitely not ready for prime time.
    There's two issues; stability and architecture.

    First, stability. We see mongos (the sharding proxy) crash a couple of
    times a week. We finally got the site stabilized by rigging upstart to
    monitor and automatically restart mongos when it crashes. Fortunately,
    mongos crashing doesn't cause any data loss (at least not that we've
    noticed). Hopefully this is something the 10gen folks will sort out in
    the 1.8 release.

    The architectural issues are more complex. Mongo can enforce uniqueness
    on a field, but only on non-sharded collection. Security (i.e. password
    authentication) does not work in a sharded environment. If I understand
    the release notes correctly, that's something which may get fixed in
    some future release.

    > Scripting languages like Python are
    > very well supported and linked against MongoDB


    The Python interface is very nice. In some ways, the JS interface is
    nicer, only because you can get away with less quoting, i.e.

    JS: find({inquisition: {$ne: 'spanish'}}
    Py: find({'inquisition': {'$ne': 'spanish'}}

    The PHP interface is (like everything in PHP), sucky:

    PHP: find(array('inquisition' => array('$ne' => 'spanish'))

    The common thread here is that unlike SQL, you're not feeding the
    database a string which it parses, you're feeding it a data structure.
    You're stuck with whatever data structure syntax the host language
    supports. Well, actually, that's not true. If you wanted to, you could
    write a front end which lets you execute:

    "find where inquisition != spanish"

    and have code to parse that and turn it into the required data
    structure. The odds of anybody doing that are pretty low, however. It
    would just feel wrong. In much the same way that SQLAlchemy's
    functional approach to building a SQL query just feels wrong to somebody
    who knows SQL.

    > I find MongoDB well suited for what is
    > traditionally known as data warehousing.


    I'll go along with that. It's a way to build a fast (possibly
    distributed, if they get sharding to work right) network datastore with
    some basic query capability. Compared to SQL, you end up doing a lot
    more work on the application side, and take on a lot more of the
    responsibility to enforce data integrity yourself.

    > You may want to look
    > at this Youtube clip entitled "MongoDB is web scale":
    >
    > http://www.youtube.com/watch?v=b2F-DItXtZs


    That's the funniest thing I've seen in a long time. The only sad part
    is that it's all true.

    There are some nice things to NO-SQL databases (particularly the
    schema-free part). A while ago, we discovered that about 200 of the
    300,000 documents in one of our collections were effectively duplicates
    of other documents ("document" in mongo-speak means "record" or perhaps
    "row" in SQL-speak). It was trivial to add "is_dup_of" fields to just
    those 200 records, and a little bit of code in our application to check
    the retrieved documents for that field and retrieve the pointed-to
    document. In SQL, that would have meant adding another column, or
    perhaps another table. Either way would have been far more painful than
    the fix we were able to do in mongo.
    Roy Smith, Jan 23, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul M

    SQL database table ot SQL schema

    Paul M, Dec 8, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    407
    Mythran
    Dec 8, 2003
  2. Michael
    Replies:
    2
    Views:
    9,023
    magscy
    Jul 19, 2009
  3. Harry Zoroc
    Replies:
    1
    Views:
    915
    Gregory Vaughan
    Jul 12, 2004
  4. Keith
    Replies:
    1
    Views:
    2,307
    Keith
    Jul 3, 2006
  5. =?Utf-8?B?TmVpbCBQYWRkb2Nr?=

    Connecting to SQL 2000 database with SQL 2005 tools installed

    =?Utf-8?B?TmVpbCBQYWRkb2Nr?=, May 2, 2007, in forum: ASP .Net
    Replies:
    1
    Views:
    393
    =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=
    May 2, 2007
Loading...

Share This Page