moving binary data from one RDBMS to Other

P

Perl Lover

Hi,

I am currently writing a perl dbi script to copy data from one rdbms to another.
One of the columns contain binary data. When I copy the data directly (reading
it from source, feeding it to target), it works fine. However when I save the
source data first in a file and then use that file to load it to the target it
messes up the binary data.

The source data is read via dbi cal fetchrow_array and the array is joined into
a string which is then print FILE to the file.
Am I doing something wrong here. Is there a better way to store the data without
using print, which does not corrupt the data.

thanks.
 
M

Martijn Lievaart

Hi,

I am currently writing a perl dbi script to copy data from one rdbms to
another. One of the columns contain binary data. When I copy the data
directly (reading it from source, feeding it to target), it works fine.
However when I save the source data first in a file and then use that
file to load it to the target it messes up the binary data.

The source data is read via dbi cal fetchrow_array and the array is
joined into a string which is then print FILE to the file. Am I doing
something wrong here. Is there a better way to store the data without
using print, which does not corrupt the data.

I bet you are using windows and are seeing LF -> CRLF conversion. Try
opening your files as binary files. See perldoc -f open and perldoc -f
binmode.

HTH,
M4
 
P

Perl Lover

I bet you are using windows and are seeing LF -> CRLF conversion. Try
opening your files as binary files. See perldoc -f open and perldoc -f
binmode.

No. The files are created on Solaris and are used locally.
 
M

Martijn Lievaart

No. The files are created on Solaris and are used locally.

In that case, try to post a small but complete example that exhibits the
problem.

M4
 
P

Perl Lover

In that case, try to post a small but complete example that exhibits the
problem.

First of all thanks for your effort to help.

What exactly you want? Code or data.

The binary data contains
non ascii, non printable characters.
The source database stores it in a char field of 512 length.
The target database (oracle) stores it as a BLOB field because that is
the only field type that can store binary data.
 
M

Martijn Lievaart

First of all thanks for your effort to help.

What exactly you want? Code or data.

The binary data contains
non ascii, non printable characters.
The source database stores it in a char field of 512 length. The target
database (oracle) stores it as a BLOB field because that is the only
field type that can store binary data.

A small example may be difficult, but you could fill a string with the
data as a literal string, write it out, read it back in and compare if it
is equal.

But try this first.

First try to od -x the file and see if it contains exactly what it
should. If it doesn't there's a problem getting the data from the
database.

Now read the data back and dump that (Data::Dumper will probably do,
otherwise convert to hex with a small loop). Still the same? Then your
problem is in storing the data.

A possible problem with reading from the database is character set. A
VARCHAR is not ment to hold binary data and you may run into characterset
conversions here, or something else.

HTH,
M4
 
P

Perl Lover

First try to od -x the file and see if it contains exactly what it
should. If it doesn't there's a problem getting the data from the
database.

Now read the data back and dump that (Data::Dumper will probably do,
otherwise convert to hex with a small loop). Still the same? Then your
problem is in storing the data.

A possible problem with reading from the database is character set. A
VARCHAR is not ment to hold binary data and you may run into characterset
conversions here, or something else.

I am looking at the possibility of perl incorrectly reporting two
identical binary strings as different.

Can two variables storing binary data be comapred
by standard eq operator as in
if ( $a eq $b )

Does it do the equivalent C function memcmp.
Is there a fool proof way of comparing these two variables.

Thanks.
 
P

Peter J. Holzer

I am looking at the possibility of perl incorrectly reporting two
identical binary strings as different.

Can two variables storing binary data be comapred
by standard eq operator as in
if ( $a eq $b )
Yes.

Does it do the equivalent C function memcmp.

Not quite. memcmp compares bytes irrespective of type. The eq operator
makes sure that both operands are of the same type before comparing
them. If one of the operands isn't a string it is converted to a
string. If one of the strings is a byte string, and the other is a
character string, the byte string is converted to a character string.

So "eq" tests if the operands have the same string value, not the same
representation.

hp
 
P

Perl Lover

Not quite. memcmp compares bytes irrespective of type. The eq operator
makes sure that both operands are of the same type before comparing
them. If one of the operands isn't a string it is converted to a
string. If one of the strings is a byte string, and the other is a
character string, the byte string is converted to a character string.

So "eq" tests if the operands have the same string value, not the same
representation.

Hmm, this might be a clue to the problem we are facing.

I am reading data from a DB2 database where the field type is char(128)
and it contains binary data (db2 allows it). The reason why binary
data is used is because the data is copied via memcpy to an array
of integers and then processed. Apparently this has given
a huge performance boost. It works great. Oracle is the database
where it is written to and it does not allow binary data in char field.
So we are using RAW, LONG- anything that allows us to store binary data
as it is.

I have a script in perl to compare them after it is written to Oracle.
So I read both DB2 and Oracle row and use eq operator
and it says "not equal". When we use BLOB data type to store the
value in Oracle, it shows them as equal. I know Oracle stores
BLOB data as it is with no characterset conversion.
So to me, it proves that the problem is
in Oracle storing the data in raw/long mode
but others are skeptical about perl. Hence my post.

If you can recommend a better way of comparing this data, that would
be great.

thanks all.
 
J

John Bokma

Perl Lover said:
Hmm, this might be a clue to the problem we are facing.

I am reading data from a DB2 database where the field type is char(128)
and it contains binary data (db2 allows it). The reason why binary
data is used is because the data is copied via memcpy to an array
of integers and then processed. Apparently this has given
a huge performance boost. It works great. Oracle is the database
where it is written to and it does not allow binary data in char field.
So we are using RAW, LONG- anything that allows us to store binary data
as it is.

I have a script in perl to compare them after it is written to Oracle.
So I read both DB2 and Oracle row and use eq operator
and it says "not equal". When we use BLOB data type to store the
value in Oracle, it shows them as equal. I know Oracle stores
BLOB data as it is with no characterset conversion.
So to me, it proves that the problem is
in Oracle storing the data in raw/long mode
but others are skeptical about perl. Hence my post.

If you can recommend a better way of comparing this data, that would
be great.

Have you already considered to actually print the data (in hex format for
example) and visually compare them, or use a loop and compare byte by byte
and warn when a difference occur.

Did you check the length of each string?
 
P

Peter J. Holzer

Hmm, this might be a clue to the problem we are facing.

I am reading data from a DB2 database where the field type is char(128)
and it contains binary data (db2 allows it). The reason why binary
data is used is because the data is copied via memcpy to an array
of integers and then processed. Apparently this has given
a huge performance boost. It works great. Oracle is the database
where it is written to and it does not allow binary data in char field.
So we are using RAW, LONG- anything that allows us to store binary data
as it is.

LONG is a character type and doesn't allow binary data (or rather, it
allows it in the same way as VARCHAR2 and CLOB: You may get away with it
if you are lucky). RAW and LONG RAW are intended for storing binary
data, as is BLOB.

The LONG types are deprecated: Instead of LONG you should use CLOB, and
instead of LONG RAW you should use BLOB.

Since you only need to store 128 bytes, RAW is probably the best type.
But I've never used RAW and it is possible that DBD::Oracle doesn't
handle it correctly in some cases.

I have a script in perl to compare them after it is written to Oracle.
So I read both DB2 and Oracle row and use eq operator
and it says "not equal".

You should also print the differences between the strings. Often the
problem is apparent when you see what the difference is.

The DBI package contains some useful utility functions.
When we use BLOB data type to store the
value in Oracle, it shows them as equal.

That's new. I understood your previous posting that it doesn't work with
blobs if the data is written to a file and read back - which suggested
some problem related to file handling.
If you can recommend a better way of comparing this data, that would
be great.

eq is the correct way to compare strings for equality in perl. There is
no better way. However, you may also want to know whether the strings
are character or byte strings: See the utf8::is_utf8 function. And you
almost always want to know *how* the strings are different.

hp
 
P

Perl Lover

That's new. I understood your previous posting that it doesn't work with
blobs if the data is written to a file and read back - which suggested
some problem related to file handling.

This is true. It works only with direct feed and not via file.

This is what I wrote originally:

"I am currently writing a perl dbi script to copy data from one rdbms to
another.
One of the columns contain binary data. When I copy the data directly (reading
it from source, feeding it to target), it works fine. However when I save the
source data first in a file and then use that file to load it to the target it
messes up the binary data."

I use print FILEHANDLE to write it to the file.
 
J

John Bokma

[ binary data ]
I use print FILEHANDLE to write it to the file.

perldoc -f binmode (note the 3rd paragraph)

If that's not the issue, use od -x file (assuming *nix) to get a hex dump
of the contents. Compare it with a hex dump of the string before you print
it to file.
 
D

Dr.Ruud

Perl Lover schreef:
I am looking at the possibility of perl incorrectly reporting
two identical binary strings as different.

Maybe you are looking for:

{ use bytes;

if ($a eq $b) {
...
}
}

See `perldoc bytes`.
 
P

Peter J. Holzer

This is true. It works only with direct feed and not via file.

This is what I wrote originally:

"I am currently writing a perl dbi script to copy data from one rdbms to
another.
One of the columns contain binary data. When I copy the data directly (reading
it from source, feeding it to target), it works fine. However when I save the
source data first in a file and then use that file to load it to the target it
messes up the binary data."

How do you "use that file to load it to the target"? Feed it to sqlldr?
Read the data with another perl script and then insert it via DBI?
In the first case you have to produce the format sqlldr expects.

I use print FILEHANDLE to write it to the file.

I don't think this tells us anything we didn't already know.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top