[ANN] KirbyBase 2.2

J

Jamey Cribbs

I would like to announce version 2.2 of KirbyBase, a simple, pure-Ruby
database management system that stores it's data in plain-text files.

You can download the new version here:

Windows: http://www.netpromi.com/files/KirbyBase_Ruby_2.2.zip
Linux/Unix: http://www.netpromi.com/files/KirbyBase_Ruby_2.2.tar.gz

You can find out more about Kirbybase at:

http://www.netpromi.com/kirbybase_ruby.html

I would like to thank Hugh Sasse for his bug fixes and code enhancements
and I would like to thank Emiel van de Larr for his bug fixes.


List of changes:

* By far the biggest change in this version is that I have completely
redesigned the internal structure of the database code. Because the
KirbyBase and KBTable classes were too tightly coupled, I have created
a KBEngine class and moved all low-level I/O logic and locking logic
to this class. This allowed me to restructure the KirbyBase class to
remove all of the methods that should have been private, but couldn't be
because of the coupling to KBTable. In addition, it has allowed me to
take all of the low-level code that should not have been in the KBTable
class and put it where it belongs, as part of the underlying engine. I
feel that the design of KirbyBase is much cleaner now. No changes were
made to the class interfaces, so you should not have to change any of
your code.

* Changed str_to_date and str_to_datetime to use Date#parse method.

* Changed #pack method so that it no longer reads the whole file into
memory while packing it.

* Changed code so that special character sequences like &linefeed; can be
part of input data and KirbyBase will not interpret it as special
characters.

Enjoy!

Jamey Cribbs
(e-mail address removed)
 
O

Oliver Cromm

* Jamey Cribbs said:
I would like to announce version 2.2 of KirbyBase, a simple, pure-Ruby
database management system that stores it's data in plain-text files.

The idea of plain text files appealed to me a lot (I had been pondering
something similar myself, but couldn't have implemented in such a
general fashion), so I decided to try it in my Usenet news statistics
script, on which I'm learning lots of Ruby techniques.

So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.

Unfortunately, it turned out to be none faster. I wonder if I'm doing
anything wrong. What I save in time waiting for the server, KirbyBase
seems to eat away in processing time (disk access hardly mentionable
with my 6000 rows, 10KB of data). Is it true that you need a lot of
processing power to use it, and my PIII-500 (Win-2K/Cygwin) is just not
up to the task?

You said:
| Right now, it performs pretty well on small databases

and even

| It is fairly fast, comparing favorably to SQLite

Well, one reason to try it was that I had installation problems with
SQLite, so I can't compare directly, but now I wonder how it could ever
compete. One select for string equality on my 6000 rows takes half a
second or so, so I gave up on that completely.
 
J

Jamey Cribbs

Oliver said:
So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.
This might be the source of the slowness. Is this field that you are
reading by date defined as a Date field in the KirbyBase table? If it
is, this is probably the problem. As I note in the manual, Ruby's
Date/DateTime librarys are S-L-O-W! They really need to be rewritten as
C libraries. Every time KirbyBase does a select on a Date field, it has
to read in each record from the table's physical file and do a Date.new
on the data. Like I said, this is slow!

Here is an alternative to try: define this field in the table as a
String field instead of a Date field. Select's will still work pretty
much the same way because, for example:

2005-05-25 > 2005-05-24

and

Date.new(2005,05,25) > Date.new(2005,05,24)

are both true. In other words, Strings formatted similarly to the way
Date's look compare the same way.

Give this a try and see if you see a speed improvement. I have tried it
and have seen dramatic improvements.

Let me know how it goes.

Jamey
 
G

gabriele renzi

Jamey Cribbs ha scritto:
Here is an alternative to try: define this field in the table as a
String field instead of a Date field. Select's will still work pretty
much the same way because, for example:

2005-05-25 > 2005-05-24

and

Date.new(2005,05,25) > Date.new(2005,05,24)

are both true. In other words, Strings formatted similarly to the way
Date's look compare the same way.

Give this a try and see if you see a speed improvement. I have tried it
and have seen dramatic improvements.

Let me know how it goes.

why don't use a Time object?
 
O

Oliver Cromm

Jamey said:
Oliver said:
So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.
This might be the source of the slowness. Is this field that you are
reading by date defined as a Date field in the KirbyBase table? [...]

Here is an alternative to try: define this field in the table as a
String field instead of a Date field. Select's will still work pretty
much the same way because, for example:

2005-05-25 > 2005-05-24

I left the Date field as a string in the format I originally receive
them, e.g. "Wed, 18 May 2005 10:29:44 +0900". Then, for each message, I
use ParseDate. This is overhead for sure, but the point is that it is
the same thing I do for the non-caching version (receive a specified
number of Dates and decide which are within my limits).

But I'll go ahead and try a version where I parse at read-in time and
store the result, which would be a number (or two numbers, as I'd want
to keep the time zone separate).
 
J

Jamey Cribbs

gabriele said:
why don't use a Time object?
I chose to have Date/DateTime be field types in KirbyBase, rather than
Time, because Time can only store dates back to 1970.

Jamey
 
J

Jamey Cribbs

Christian said:
gabriele renzi wrote:



I chose to have Date/DateTime be field types in KirbyBase, rather than
Time, because Time can only store dates back to 1970.

ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.0]

irb(main):006:0> Time.at -1600000000
=> Sun Apr 20 12:33:20 CET 1919

When I tried this on my WindowsXP machine I got the following error:

irb(main):001:0> Time.at -1600000000
ArgumentError: time must be positive
from (irb):1:in `at'
from (irb):1
irb(main):002:0>


So, it does not let you use negative Times on XP. That's why I had to
use Date/DateTime.

Jamey


Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.
 
O

Oliver Cromm

* Oliver Cromm said:
Jamey said:
Oliver said:
So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.
This might be the source of the slowness. Is this field that you are
reading by date defined as a Date field in the KirbyBase table? [...]

Here is an alternative to try: define this field in the table as a
String field instead of a Date field. Select's will still work pretty
much the same way because, for example:

2005-05-25 > 2005-05-24

I left the Date field as a string in the format I originally receive
them, e.g. "Wed, 18 May 2005 10:29:44 +0900". Then, for each message, I
use ParseDate. This is overhead for sure, but the point is that it is
the same thing I do for the non-caching version (receive a specified
number of Dates and decide which are within my limits).

But I'll go ahead and try a version where I parse at read-in time and
store the result, which would be a number (or two numbers, as I'd want
to keep the time zone separate).

I found some time now for further experiments, and stored time as an
integer. And yes, it is significantly faster this way, even slightly
faster than my first attempt to do the same with SQLite.

Times from some test with similar, not exactly equal tasks, so read with
spoons of salt:
- reading data fresh from News server: 50s
- reading from KirbyBase with original format (rfc2822) Date field: 45s
- reading from KirbyBase with Date as Integer: 12s
- reading from SQLite with Date as Integer: 16s

I have to do quite a number of calculations on that field; for every
record selected (and in my simple experiments, that is nearly all of
them), I need to extract at least the day of the week and the day
number. But apparently, that doesn't take nearly as much time as a
KirbyBase "select" based on ParseDate(aField). I'm not quite clear about
what is going on with the select, but I know how to circumvent the
problem.
 
J

Jamey Cribbs

Oliver said:
I found some time now for further experiments, and stored time as an
integer. And yes, it is significantly faster this way, even slightly
faster than my first attempt to do the same with SQLite.

Times from some test with similar, not exactly equal tasks, so read with
spoons of salt:
- reading data fresh from News server: 50s
- reading from KirbyBase with original format (rfc2822) Date field: 45s
- reading from KirbyBase with Date as Integer: 12s
- reading from SQLite with Date as Integer: 16s

I have to do quite a number of calculations on that field; for every
record selected (and in my simple experiments, that is nearly all of
them), I need to extract at least the day of the week and the day
number. But apparently, that doesn't take nearly as much time as a
KirbyBase "select" based on ParseDate(aField). I'm not quite clear about
what is going on with the select, but I know how to circumvent the
problem.
If I remember my experiments correctly when I first ported KirbyBase
from Python to Ruby and noticed the significant speed difference when
using Date/Datetime, my guess was that there isn't anything going on in
#select that is causing the slowness. It is just that, in Ruby,
creating a new Date/DateTime object is relatively slow, compared to
Python. My further guess as to why this was is that, in Python, the
datetime library is written in C, while in Ruby, the Date/DateTime
library is written in Ruby. How's that for exhaustive scientific
analysis? :)

I could be totally wrong about this, but I am guessing that if the
Date/DateTime library was re-written in C, it would be significantly
faster and you would likewise notice a marked speed improvement while
using Date/DateTime fields in KirbyBase. Unfortunately, since I am not
a C programmer, I can't actually do this to test my theory. Hence, my
workaround is to usually define any date fields I need as String
fields. It speeds things up and, for comparison purposes, things pretty
much work the same way.

Jamey

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

[ANN] KirbyBase 2.5.1 1
[ANN] KirbyBase 2.2.1 0
[ANN] KirbyBase 2.4 2
[ANN] KirbyBase 2.5 1
[ANN] KirbyBase 2.5.2 0
[ANN] KirbyBase 2.1 0
[ANN] KirbyBase 2.0 0
ANN: KirbyBase 2.0 3

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top