moving binary data from one RDBMS to Other

Discussion in 'Perl Misc' started by Perl Lover, Feb 6, 2008.

  1. Perl Lover

    Perl Lover Guest

    Hi,

    I am currently writing a perl dbi script to copy data from one rdbms to another.
    One of the columns contain binary data. When I copy the data directly (reading
    it from source, feeding it to target), it works fine. However when I save the
    source data first in a file and then use that file to load it to the target it
    messes up the binary data.

    The source data is read via dbi cal fetchrow_array and the array is joined into
    a string which is then print FILE to the file.
    Am I doing something wrong here. Is there a better way to store the data without
    using print, which does not corrupt the data.

    thanks.
     
    Perl Lover, Feb 6, 2008
    #1
    1. Advertisements

  2. I bet you are using windows and are seeing LF -> CRLF conversion. Try
    opening your files as binary files. See perldoc -f open and perldoc -f
    binmode.

    HTH,
    M4
     
    Martijn Lievaart, Feb 6, 2008
    #2
    1. Advertisements

  3. Perl Lover

    Perl Lover Guest

    No. The files are created on Solaris and are used locally.
     
    Perl Lover, Feb 6, 2008
    #3
  4. In that case, try to post a small but complete example that exhibits the
    problem.

    M4
     
    Martijn Lievaart, Feb 6, 2008
    #4
  5. Perl Lover

    Perl Lover Guest

    First of all thanks for your effort to help.

    What exactly you want? Code or data.

    The binary data contains
    non ascii, non printable characters.
    The source database stores it in a char field of 512 length.
    The target database (oracle) stores it as a BLOB field because that is
    the only field type that can store binary data.
     
    Perl Lover, Feb 7, 2008
    #5
  6. A small example may be difficult, but you could fill a string with the
    data as a literal string, write it out, read it back in and compare if it
    is equal.

    But try this first.

    First try to od -x the file and see if it contains exactly what it
    should. If it doesn't there's a problem getting the data from the
    database.

    Now read the data back and dump that (Data::Dumper will probably do,
    otherwise convert to hex with a small loop). Still the same? Then your
    problem is in storing the data.

    A possible problem with reading from the database is character set. A
    VARCHAR is not ment to hold binary data and you may run into characterset
    conversions here, or something else.

    HTH,
    M4
     
    Martijn Lievaart, Feb 7, 2008
    #6
  7. Perl Lover

    Perl Lover Guest

    I am looking at the possibility of perl incorrectly reporting two
    identical binary strings as different.

    Can two variables storing binary data be comapred
    by standard eq operator as in
    if ( $a eq $b )

    Does it do the equivalent C function memcmp.
    Is there a fool proof way of comparing these two variables.

    Thanks.
     
    Perl Lover, Feb 8, 2008
    #7
  8.  
    Peter J. Holzer, Feb 8, 2008
    #8
  9. Perl Lover

    Perl Lover Guest

    Not quite. memcmp compares bytes irrespective of type. The eq operator
    Hmm, this might be a clue to the problem we are facing.

    I am reading data from a DB2 database where the field type is char(128)
    and it contains binary data (db2 allows it). The reason why binary
    data is used is because the data is copied via memcpy to an array
    of integers and then processed. Apparently this has given
    a huge performance boost. It works great. Oracle is the database
    where it is written to and it does not allow binary data in char field.
    So we are using RAW, LONG- anything that allows us to store binary data
    as it is.

    I have a script in perl to compare them after it is written to Oracle.
    So I read both DB2 and Oracle row and use eq operator
    and it says "not equal". When we use BLOB data type to store the
    value in Oracle, it shows them as equal. I know Oracle stores
    BLOB data as it is with no characterset conversion.
    So to me, it proves that the problem is
    in Oracle storing the data in raw/long mode
    but others are skeptical about perl. Hence my post.

    If you can recommend a better way of comparing this data, that would
    be great.

    thanks all.
     
    Perl Lover, Feb 8, 2008
    #9
  10. Perl Lover

    John Bokma Guest

    Have you already considered to actually print the data (in hex format for
    example) and visually compare them, or use a loop and compare byte by byte
    and warn when a difference occur.

    Did you check the length of each string?
     
    John Bokma, Feb 8, 2008
    #10
  11. LONG is a character type and doesn't allow binary data (or rather, it
    allows it in the same way as VARCHAR2 and CLOB: You may get away with it
    if you are lucky). RAW and LONG RAW are intended for storing binary
    data, as is BLOB.

    The LONG types are deprecated: Instead of LONG you should use CLOB, and
    instead of LONG RAW you should use BLOB.

    Since you only need to store 128 bytes, RAW is probably the best type.
    But I've never used RAW and it is possible that DBD::Oracle doesn't
    handle it correctly in some cases.

    You should also print the differences between the strings. Often the
    problem is apparent when you see what the difference is.

    The DBI package contains some useful utility functions.
    That's new. I understood your previous posting that it doesn't work with
    blobs if the data is written to a file and read back - which suggested
    some problem related to file handling.
    eq is the correct way to compare strings for equality in perl. There is
    no better way. However, you may also want to know whether the strings
    are character or byte strings: See the utf8::is_utf8 function. And you
    almost always want to know *how* the strings are different.

    hp
     
    Peter J. Holzer, Feb 8, 2008
    #11
  12. Perl Lover

    Perl Lover Guest

    This is true. It works only with direct feed and not via file.

    This is what I wrote originally:

    "I am currently writing a perl dbi script to copy data from one rdbms to
    another.
    One of the columns contain binary data. When I copy the data directly (reading
    it from source, feeding it to target), it works fine. However when I save the
    source data first in a file and then use that file to load it to the target it
    messes up the binary data."

    I use print FILEHANDLE to write it to the file.
     
    Perl Lover, Feb 8, 2008
    #12
  13. Perl Lover

    John Bokma Guest

    [ binary data ]
    perldoc -f binmode (note the 3rd paragraph)

    If that's not the issue, use od -x file (assuming *nix) to get a hex dump
    of the contents. Compare it with a hex dump of the string before you print
    it to file.
     
    John Bokma, Feb 8, 2008
    #13
  14. Perl Lover

    Dr.Ruud Guest

    Perl Lover schreef:
    Maybe you are looking for:

    { use bytes;

    if ($a eq $b) {
    ...
    }
    }

    See `perldoc bytes`.
     
    Dr.Ruud, Feb 9, 2008
    #14
  15. How do you "use that file to load it to the target"? Feed it to sqlldr?
    Read the data with another perl script and then insert it via DBI?
    In the first case you have to produce the format sqlldr expects.

    I don't think this tells us anything we didn't already know.

    hp
     
    Peter J. Holzer, Feb 9, 2008
    #15
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.