[ANN] KirbyBase 2.2

Discussion in 'Ruby' started by Jamey Cribbs, May 3, 2005.

  1. Jamey Cribbs

    Jamey Cribbs Guest

    I would like to announce version 2.2 of KirbyBase, a simple, pure-Ruby
    database management system that stores it's data in plain-text files.

    You can download the new version here:

    Windows: http://www.netpromi.com/files/KirbyBase_Ruby_2.2.zip
    Linux/Unix: http://www.netpromi.com/files/KirbyBase_Ruby_2.2.tar.gz

    You can find out more about Kirbybase at:

    http://www.netpromi.com/kirbybase_ruby.html

    I would like to thank Hugh Sasse for his bug fixes and code enhancements
    and I would like to thank Emiel van de Larr for his bug fixes.


    List of changes:

    * By far the biggest change in this version is that I have completely
    redesigned the internal structure of the database code. Because the
    KirbyBase and KBTable classes were too tightly coupled, I have created
    a KBEngine class and moved all low-level I/O logic and locking logic
    to this class. This allowed me to restructure the KirbyBase class to
    remove all of the methods that should have been private, but couldn't be
    because of the coupling to KBTable. In addition, it has allowed me to
    take all of the low-level code that should not have been in the KBTable
    class and put it where it belongs, as part of the underlying engine. I
    feel that the design of KirbyBase is much cleaner now. No changes were
    made to the class interfaces, so you should not have to change any of
    your code.

    * Changed str_to_date and str_to_datetime to use Date#parse method.

    * Changed #pack method so that it no longer reads the whole file into
    memory while packing it.

    * Changed code so that special character sequences like &linefeed; can be
    part of input data and KirbyBase will not interpret it as special
    characters.

    Enjoy!

    Jamey Cribbs
     
    Jamey Cribbs, May 3, 2005
    #1
    1. Advertising

  2. Jamey Cribbs

    Oliver Cromm Guest

    * Jamey Cribbs wrote:

    > I would like to announce version 2.2 of KirbyBase, a simple, pure-Ruby
    > database management system that stores it's data in plain-text files.


    The idea of plain text files appealed to me a lot (I had been pondering
    something similar myself, but couldn't have implemented in such a
    general fashion), so I decided to try it in my Usenet news statistics
    script, on which I'm learning lots of Ruby techniques.

    So for a start, I plugged KirbyBase in just as a cache - where before, I
    was reading header data from a news server each time, in the new version
    I save the raw data to a KirbyBase, add only recent messages, then read
    the part of the data I want (by date) from the KirbyBase.

    Unfortunately, it turned out to be none faster. I wonder if I'm doing
    anything wrong. What I save in time waiting for the server, KirbyBase
    seems to eat away in processing time (disk access hardly mentionable
    with my 6000 rows, 10KB of data). Is it true that you need a lot of
    processing power to use it, and my PIII-500 (Win-2K/Cygwin) is just not
    up to the task?

    You said:
    | Right now, it performs pretty well on small databases

    and even

    | It is fairly fast, comparing favorably to SQLite

    Well, one reason to try it was that I had installation problems with
    SQLite, so I can't compare directly, but now I wonder how it could ever
    compete. One select for string equality on my 6000 rows takes half a
    second or so, so I gave up on that completely.
    --
    Oliver C.
    45n31, 73w34
    Temperatur: 6.9°C (13 May 2005 10:00 AM EDT)
     
    Oliver Cromm, May 17, 2005
    #2
    1. Advertising

  3. Jamey Cribbs

    Jamey Cribbs Guest

    Oliver Cromm wrote:

    >So for a start, I plugged KirbyBase in just as a cache - where before, I
    >was reading header data from a news server each time, in the new version
    >I save the raw data to a KirbyBase, add only recent messages, then read
    >the part of the data I want (by date) from the KirbyBase.
    >
    >
    >

    This might be the source of the slowness. Is this field that you are
    reading by date defined as a Date field in the KirbyBase table? If it
    is, this is probably the problem. As I note in the manual, Ruby's
    Date/DateTime librarys are S-L-O-W! They really need to be rewritten as
    C libraries. Every time KirbyBase does a select on a Date field, it has
    to read in each record from the table's physical file and do a Date.new
    on the data. Like I said, this is slow!

    Here is an alternative to try: define this field in the table as a
    String field instead of a Date field. Select's will still work pretty
    much the same way because, for example:

    2005-05-25 > 2005-05-24

    and

    Date.new(2005,05,25) > Date.new(2005,05,24)

    are both true. In other words, Strings formatted similarly to the way
    Date's look compare the same way.

    Give this a try and see if you see a speed improvement. I have tried it
    and have seen dramatic improvements.

    Let me know how it goes.

    Jamey
     
    Jamey Cribbs, May 18, 2005
    #3
  4. Jamey Cribbs ha scritto:

    > Here is an alternative to try: define this field in the table as a
    > String field instead of a Date field. Select's will still work pretty
    > much the same way because, for example:
    >
    > 2005-05-25 > 2005-05-24
    >
    > and
    >
    > Date.new(2005,05,25) > Date.new(2005,05,24)
    >
    > are both true. In other words, Strings formatted similarly to the way
    > Date's look compare the same way.
    >
    > Give this a try and see if you see a speed improvement. I have tried it
    > and have seen dramatic improvements.
    >
    > Let me know how it goes.


    why don't use a Time object?
     
    gabriele renzi, May 18, 2005
    #4
  5. Jamey Cribbs

    Oliver Cromm Guest

    Jamey Cribbs wrote:

    > Oliver Cromm wrote:
    >
    >>So for a start, I plugged KirbyBase in just as a cache - where before, I
    >>was reading header data from a news server each time, in the new version
    >>I save the raw data to a KirbyBase, add only recent messages, then read
    >>the part of the data I want (by date) from the KirbyBase.
    >>

    > This might be the source of the slowness. Is this field that you are
    > reading by date defined as a Date field in the KirbyBase table? [...]
    >
    > Here is an alternative to try: define this field in the table as a
    > String field instead of a Date field. Select's will still work pretty
    > much the same way because, for example:
    >
    > 2005-05-25 > 2005-05-24


    I left the Date field as a string in the format I originally receive
    them, e.g. "Wed, 18 May 2005 10:29:44 +0900". Then, for each message, I
    use ParseDate. This is overhead for sure, but the point is that it is
    the same thing I do for the non-caching version (receive a specified
    number of Dates and decide which are within my limits).

    But I'll go ahead and try a version where I parse at read-in time and
    store the result, which would be a number (or two numbers, as I'd want
    to keep the time zone separate).
    --
    WinErr 008: Erroneous error. Nothing is wrong.
     
    Oliver Cromm, May 18, 2005
    #5
  6. Jamey Cribbs

    Jamey Cribbs Guest

    gabriele renzi wrote:

    > why don't use a Time object?
    >

    I chose to have Date/DateTime be field types in KirbyBase, rather than
    Time, because Time can only store dates back to 1970.

    Jamey
     
    Jamey Cribbs, May 18, 2005
    #6
  7. Jamey Cribbs

    Jamey Cribbs Guest

    Christian Neukirchen wrote:

    >Jamey Cribbs <> writes:
    >
    >
    >
    >>gabriele renzi wrote:
    >>
    >>
    >>
    >>>why don't use a Time object?
    >>>
    >>>
    >>>

    >>I chose to have Date/DateTime be field types in KirbyBase, rather than
    >>Time, because Time can only store dates back to 1970.
    >>
    >>

    >
    >ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.0]
    >
    >irb(main):006:0> Time.at -1600000000
    >=> Sun Apr 20 12:33:20 CET 1919
    >
    >


    When I tried this on my WindowsXP machine I got the following error:

    irb(main):001:0> Time.at -1600000000
    ArgumentError: time must be positive
    from (irb):1:in `at'
    from (irb):1
    irb(main):002:0>


    So, it does not let you use negative Times on XP. That's why I had to
    use Date/DateTime.

    Jamey


    Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.
     
    Jamey Cribbs, May 24, 2005
    #7
  8. Jamey Cribbs

    Oliver Cromm Guest

    * Oliver Cromm wrote:

    > Jamey Cribbs wrote:
    >
    >> Oliver Cromm wrote:
    >>
    >>>So for a start, I plugged KirbyBase in just as a cache - where before, I
    >>>was reading header data from a news server each time, in the new version
    >>>I save the raw data to a KirbyBase, add only recent messages, then read
    >>>the part of the data I want (by date) from the KirbyBase.
    >>>

    >> This might be the source of the slowness. Is this field that you are
    >> reading by date defined as a Date field in the KirbyBase table? [...]
    >>
    >> Here is an alternative to try: define this field in the table as a
    >> String field instead of a Date field. Select's will still work pretty
    >> much the same way because, for example:
    >>
    >> 2005-05-25 > 2005-05-24

    >
    > I left the Date field as a string in the format I originally receive
    > them, e.g. "Wed, 18 May 2005 10:29:44 +0900". Then, for each message, I
    > use ParseDate. This is overhead for sure, but the point is that it is
    > the same thing I do for the non-caching version (receive a specified
    > number of Dates and decide which are within my limits).
    >
    > But I'll go ahead and try a version where I parse at read-in time and
    > store the result, which would be a number (or two numbers, as I'd want
    > to keep the time zone separate).


    I found some time now for further experiments, and stored time as an
    integer. And yes, it is significantly faster this way, even slightly
    faster than my first attempt to do the same with SQLite.

    Times from some test with similar, not exactly equal tasks, so read with
    spoons of salt:
    - reading data fresh from News server: 50s
    - reading from KirbyBase with original format (rfc2822) Date field: 45s
    - reading from KirbyBase with Date as Integer: 12s
    - reading from SQLite with Date as Integer: 16s

    I have to do quite a number of calculations on that field; for every
    record selected (and in my simple experiments, that is nearly all of
    them), I need to extract at least the day of the week and the day
    number. But apparently, that doesn't take nearly as much time as a
    KirbyBase "select" based on ParseDate(aField). I'm not quite clear about
    what is going on with the select, but I know how to circumvent the
    problem.
    --
    Oliver C.
    45n31, 73w34
    Temperatur: 14.9°C (25 May 2005 11:00 AM EDT)
     
    Oliver Cromm, May 25, 2005
    #8
  9. Jamey Cribbs

    Jamey Cribbs Guest

    Oliver Cromm wrote:

    >I found some time now for further experiments, and stored time as an
    >integer. And yes, it is significantly faster this way, even slightly
    >faster than my first attempt to do the same with SQLite.
    >
    >Times from some test with similar, not exactly equal tasks, so read with
    >spoons of salt:
    >- reading data fresh from News server: 50s
    >- reading from KirbyBase with original format (rfc2822) Date field: 45s
    >- reading from KirbyBase with Date as Integer: 12s
    >- reading from SQLite with Date as Integer: 16s
    >
    >I have to do quite a number of calculations on that field; for every
    >record selected (and in my simple experiments, that is nearly all of
    >them), I need to extract at least the day of the week and the day
    >number. But apparently, that doesn't take nearly as much time as a
    >KirbyBase "select" based on ParseDate(aField). I'm not quite clear about
    >what is going on with the select, but I know how to circumvent the
    >problem.
    >
    >

    If I remember my experiments correctly when I first ported KirbyBase
    from Python to Ruby and noticed the significant speed difference when
    using Date/Datetime, my guess was that there isn't anything going on in
    #select that is causing the slowness. It is just that, in Ruby,
    creating a new Date/DateTime object is relatively slow, compared to
    Python. My further guess as to why this was is that, in Python, the
    datetime library is written in C, while in Ruby, the Date/DateTime
    library is written in Ruby. How's that for exhaustive scientific
    analysis? :)

    I could be totally wrong about this, but I am guessing that if the
    Date/DateTime library was re-written in C, it would be significantly
    faster and you would likewise notice a marked speed improvement while
    using Date/DateTime fields in KirbyBase. Unfortunately, since I am not
    a C programmer, I can't actually do this to test my theory. Hence, my
    workaround is to usually define any date fields I need as String
    fields. It speeds things up and, for comparison purposes, things pretty
    much work the same way.

    Jamey

    Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.
     
    Jamey Cribbs, May 26, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jamey Cribbs

    ANN: KirbyBase 1.8.1

    Jamey Cribbs, Apr 27, 2005, in forum: Python
    Replies:
    0
    Views:
    348
    Jamey Cribbs
    Apr 27, 2005
  2. Jamey Cribbs

    ANN: KirbyBase 2.0 beta 1

    Jamey Cribbs, Jul 15, 2005, in forum: Python
    Replies:
    0
    Views:
    295
    Jamey Cribbs
    Jul 15, 2005
  3. Jamey Cribbs

    ANN: KirbyBase 1.6

    Jamey Cribbs, Jun 23, 2004, in forum: Ruby
    Replies:
    0
    Views:
    135
    Jamey Cribbs
    Jun 23, 2004
  4. Jamey Cribbs

    ANN: KirbyBase 2.0

    Jamey Cribbs, Mar 29, 2005, in forum: Ruby
    Replies:
    3
    Views:
    176
    Bill Guindon
    Mar 29, 2005
  5. Jamey Cribbs

    [ANN] KirbyBase 2.0

    Jamey Cribbs, Mar 29, 2005, in forum: Ruby
    Replies:
    0
    Views:
    110
    Jamey Cribbs
    Mar 29, 2005
Loading...

Share This Page