Finding out what row DBI chokes on?

F

Francis Hwang

Some times I deal with a legacy table with screwy datetime info, so you
get fields that in MySQL are '0000-00-00' or other kinds of invalid
values ... I pull this out with DBI I get an ArgumentError that simply
says "argument out of range", which I believe is being caused by DBI
calling Time.gm( 0, 0, 0, 0, 0, 0 ) or something like that.

Now, it makes sense to me that DBI doesn't want to deal with this, but
in some cases it might be convenient for me to get a message that tells
me specifically what row has the bad value. Does anybody know how I
might do this, short of hacking DBI directly?

Francis Hwang
http://fhwang.net/
 
S

Shashank Date

Hi Francis,

--- Francis Hwang said:
Some times I deal with a legacy table with screwy
datetime info, so you
get fields that in MySQL are '0000-00-00' or other
kinds of invalid
values ... I pull this out with DBI I get an
ArgumentError that simply
says "argument out of range", which I believe is
being caused by DBI
calling Time.gm( 0, 0, 0, 0, 0, 0 ) or something
like that.

Is it possible to do validation at the SQL level
itself? What does the SQL generates the result look
like?

-- shanko





__________________________________
Do you Yahoo!?
Dress up your holiday email, Hollywood style. Learn more.
http://celebrity.mail.yahoo.com
 
F

Francis Hwang

Hi Francis,



Is it possible to do validation at the SQL level
itself? What does the SQL generates the result look
like?

-- shanko

Perhaps I didn't explain my problem very clearly. None of my current
code is generating bad datetimes, but at work I deal with a database
with legacy data going back 7 years, and from time to time I come
across bad datetimes that were inserted years before. DBI can't handle
these at all. Here's test code to show it:

1) First I create a table and insert a bogus datetime:

mysql> create table test ( dt datetime );
Query OK, 0 rows affected (0.26 sec)

mysql> insert into test( dt ) values( '2005-01-01' );
Query OK, 1 row affected (0.13 sec)

mysql> insert into test( dt ) values( '2004-01-01' );
Query OK, 1 row affected (0.37 sec)

mysql> insert into test( dt ) values( '0000-00-00' );
Query OK, 1 row affected (0.21 sec)

mysql> select * from test;
+---------------------+
| dt |
+---------------------+
| 2005-01-01 00:00:00 |
| 2004-01-01 00:00:00 |
| 0000-00-00 00:00:00 |
+---------------------+
3 rows in set (0.24 sec)

2) Then I try to query this through DBI:

irb(main):001:0> require 'dbi'=> true
irb(main):002:0> dbh = DBI.connect( 'dbi:Mysql:test:localhost',
'francis', 'xxxxxxx' )
=> #<DBI::DatabaseHandle:0x5e76a4 @trace_mode=2,
@handle=#<DBI::DBD::Mysql::Database:0x5e6ec0 @handle=#<Mysql>,
@mutex=#<Mutex:0x5df4b8 @waiting=[], @locked=false>,
@have_transactions=true, @attr={"AutoCommit"=>true}>,
@trace_output=#<IO:0x1d3a5c>>
irb(main):003:0> dbh.select_all( 'select * from test' )
ArgumentError: argument out of range
from /usr/local/lib/ruby/site_ruby/1.8/dbi/sql.rb:59:in `gm'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/sql.rb:59:in
`as_timestamp'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/sql.rb:79:in `send'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/sql.rb:79:in `coerce'
from
/usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:418:in
`fill_array'
from
/usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:415:in
`each_with_index'
from
/usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:415:in `each'
from
/usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:415:in
`each_with_index'
from
/usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:415:in
`fill_array'
from
/usr/local/lib/ruby/site_ruby/1.8/DBD/Mysql/Mysql.rb:425:in `fetch'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:1145:in
`fetch_all'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:1144:in `loop'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:1144:in
`fetch_all'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:909:in
`fetch_all'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:666:in
`select_all'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:662:in
`execute'
from /usr/local/lib/ruby/site_ruby/1.8/dbi/dbi.rb:662:in
`select_all'
from (irb):3irb(main):004:0>
irb(main):005:0*

Now, it isn't surprising or necessarily a bug that DBI won't even
sensibly process this value, and catching the exception is easy enough.
But it slows me down a little that

1) I don't know exactly what the bad value is. You'd be surprised how
many ways MySQL lets you insert bad datetime strings.
2) I don't know which row the bad value is in if I've done a select
that should return more than 1 row.

So I'm wondering if, short of patching DBI, there is a way for a DBI
client (Lafcadio, in my case) to get more info to return a more
informative error message. I'm big into informative errors.

Francis Hwang
http://fhwang.net/
 
S

Shashank Date

Hi Francis,

Francis said:
Perhaps I didn't explain my problem very clearly. None of my current
code is generating bad datetimes, but at work I deal with a database
with legacy data going back 7 years, and from time to time I come across
bad datetimes that were inserted years before.

This much I had guessed from your earlier email, so I guess you had
explained the problem clearly ;-)

Apologies if my answers did not make sense to you.
DBI can't handle these at all. Here's test code to show it:

Now, it isn't surprising or necessarily a bug that DBI won't even
sensibly process this value, and catching the exception is easy enough.
But it slows me down a little that

1) I don't know exactly what the bad value is. You'd be surprised how
many ways MySQL lets you insert bad datetime strings.

I deal with such data almost on a routine basis (although not using
MySQL), and use standard "scrubbing techniques":

1. Try to limit the data in the SELECT, like borrowing from your example:

SELECT * FROM TEST WHERE DT > '1997-01-01';

or


SELECT DISTINCT DT FROM TEST;

to see what values show up

2. Use the datetime / string conversion functions to try to identify the
bad data.

3. If the data set is small (< 100,000 rows) use CURSOR logic (usually
frowned upon) to identify the bad rows

4. Export the table to a CSV and then use the text filtering tools and
then Import it back

Again, not knowing exactly what your situation is, I do not know if all
these are even relevant. So I guess, what I am trying to say is, as far
as possible try to handle it at the backend BEFORE it reaches DBI.

2) I don't know which row the bad value is in if I've done a select that
should return more than 1 row.
So I'm wondering if, short of patching DBI, there is a way for a DBI
client (Lafcadio, in my case) to get more info to return a more
informative error message. I'm big into informative errors.

I am afraid there may be no way to do that. But I am by no means a DBI
expert.

HTH,

-- shanko
 
F

Francis Hwang

I deal with such data almost on a routine basis (although not using
MySQL), and use standard "scrubbing techniques":

1. Try to limit the data in the SELECT, like borrowing from your
example:

SELECT * FROM TEST WHERE DT > '1997-01-01';

or


SELECT DISTINCT DT FROM TEST;

to see what values show up

2. Use the datetime / string conversion functions to try to identify
the bad data.

3. If the data set is small (< 100,000 rows) use CURSOR logic (usually
frowned upon) to identify the bad rows

4. Export the table to a CSV and then use the text filtering tools and
then Import it back

Again, not knowing exactly what your situation is, I do not know if
all these are even relevant. So I guess, what I am trying to say is,
as far as possible try to handle it at the backend BEFORE it reaches
DBI.

These scrubbing techniques do help, thanks.

Francis Hwang
http://fhwang.net/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top