DBD::Oracle, Unicode, non-UTF8-non-ASCII strings

Discussion in 'Perl Misc' started by Jochen Lehmeier, Jul 23, 2009.

  1. Hello,

    I am using DBD::Oracle with a client and database characterset of AL32UTF8.

    My problem is that DBD::Oracle assumes utf8-encoded strings in prepare().
    If the input strings are not utf8, and contain non-ASCII characters, then
    they are *not* automatically converted to utf8 by DBD::Oracle. They are
    offered to Oracle as if they were utf8, and Oracle processes them even if
    they are not valid, maybe even returning them to Perl as such; Perl is
    then left with invalid utf8-flagged strings simply actually latin1

    To be very clear: this is *not* an issue of wrong character encodings
    during any time before the calls to DBI/DBD::Oracle (though it is,

    I seem to remember, although I cannot find it now, that this behaviour
    also was well documented in DBD::Oracle or at some other place, so I am
    not so much complaining about DBD::Oracle (though I'm a bit miffed - this
    would seem to be trivially fixable in there...).

    Here is an example:

    my $sql="select 'a \xe4 a' from dual"; # \xe4 is the german "a
    diag(DBI::data_string_desc($sql)); # UTF8 off, non-ASCII, 24
    characters 24 bytes
    my $sth=$dbh->prepare($sql);
    my ($rc)=$sth->fetchrow();
    diag("is_utf8: ".utf8::is_utf8($rc)." valid:
    '".utf8::valid($rc)."'"); # is_utf8: 1 valid: ''
    diag(DBI::data_string_desc($rc)); # UTF8 on but INVALID encoding,
    ASCII, 3 characters 5 bytes

    The output:

    # UTF8 off, non-ASCII, 24 characters 24 bytes
    # is_utf8: 1 valid: ''
    Malformed UTF-8 character (unexpected non-continuation byte 0x20,
    immediately after start byte 0xe4) in pattern match (m//) at
    /usr/lib/perl5/site_perl/5.8.3/x86_64-linux-thread-multi/DBI.pm line 1117.
    # UTF8 on but INVALID encoding, ASCII, 3 characters 5 bytes

    This also highlights one nastiness of the situation: no error or warning
    whatsoever about the situation is displayed by the usual
    prepare/execute/fetch calls. The "Malformed ..."-line above is from inside
    DBI::data_string_desc, where the broken string is used in a regexp
    (regexp's seem to be one place where this is noticed). So it's purely by
    chance that there is a notice about this problem in this test script.

    What to do? I have dozens, maybe 100 scripts that would be perfectly
    utf8-capable except for this issue. They use proper :encoding() on their
    file handles etc. . Is there some flag I can activate in DBD::Oracle or
    DBI which automatically recodes "UTF8 off, non-ASCII" strings to utf8? A
    simple Encode::decode("utf8",$sql) fixes the issue of course, but I am not
    really looking forward to changing all those scripts, maybe missing lots
    of stuff along the way or introducing errors.

    We have a module that handles low level DB stuff like opening the DB
    handle. So if there was a fix which could take place *once* after opening
    the DB, that would be great.


    $DBD::Oracle::VERSION = '1.23';
    $DBI::VERSION = "1.52";
    Oracle 10

    Thanks in advance!
    Jochen Lehmeier, Jul 23, 2009
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. gry
    Alf P. Steinbach
    Mar 13, 2012
  2. Feyruz
    Sherm Pendley
    Oct 14, 2005
  3. Replies:
  4. Jim Cochrane
    Jim Cochrane
    Aug 25, 2007
  5. bruce
    Mark Lawrence
    Nov 1, 2013

Share This Page