file reading

R

ramyakrishnakumar

Hi All,
I am facing some problem with basic file operation...

I have one xml file looks like
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<x:recording>

<udf3>Gélin</udf3>

</x:recording>

My code is like it will read this file and store it into one string
and will call one Database Stored procedure to parse the xml and store
it into some tables.

eg:
FILE * file = fopen("testFile.xml","r+b");

struct _stat buffer;

int result1 = _stat( "testFile.xml", &buffer );

int size = buffer.st_size;
char *temp = new char [(sizeof(char))*(size+1)];
fread(temp,sizeof(char),size,file);

pass this temp to Ado for SP execution.



Problem:

you can see the xml file has one higherorderASCII character' é '

this going to the SP as wrong character 'é '

While debugging the code as well I can see the temp is having this
wrong value.

I reading in the binary mode but still why this problem is happening.

Can you please help me to resolve that
 
I

Ian Collins

Hi All,
I am facing some problem with basic file operation...

I have one xml file looks like
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<x:recording>

<udf3>Gélin</udf3>

</x:recording>

My code is like it will read this file and store it into one string
and will call one Database Stored procedure to parse the xml and store
it into some tables.

eg:
FILE * file = fopen("testFile.xml","r+b");

struct _stat buffer;

int result1 = _stat( "testFile.xml", &buffer );

int size = buffer.st_size;
char *temp = new char [(sizeof(char))*(size+1)];

As we are in the C world, that should be malloc - and sizeof(char) is by
definition, 1.
fread(temp,sizeof(char),size,file);

pass this temp to Ado for SP execution.
What is Ado and SP? Without knowing what is called, it's difficult to
answer the question.
Problem:

you can see the xml file has one higherorderASCII character' é '

this going to the SP as wrong character 'é '
What happens if you use unsigned char? Does the function you are
calling expect ASCII or UTF8, char, unsigned char or something else?
 
R

Richard Bos

I have one xml file looks like
<?xml version=3D"1.0" encoding=3D"UTF-8" standalone=3D"no" ?>
^^^^^^^
There's your problem.
you can see the xml file has one higherorderASCII character' =E9 '

No, it doesn't.
this going to the SP as wrong character '=C3=A9 '

This is what is actually in the file.

Read up on UTF-8. It's a way of encoding Unicode, including characters
_above_ 0xFF (such as Devanagari and other Indian scripts, which may be
one reason why the person who supplied your file uses it), in sequences
of 8-bit bytes. This does mean that all over 0x7F must be encoded in two
or more bytes. Either just pass on the UTF-8, or decode it by hand; it's
not hard. The greatest problem is going to be deciding what to do when
(not if!) you do get a Unicode character that won't fit in your C char.

Richard
 
R

ramyakrishnakumar

^^^^^^^
There's your problem.


No, it doesn't.


This is what is actually in the file.

Read up on UTF-8. It's a way of encoding Unicode, including characters
_above_ 0xFF (such as Devanagari and other Indian scripts, which may be
one reason why the person who supplied your file uses it), in sequences
of 8-bit bytes. This does mean that all over 0x7F must be encoded in two
or more bytes. Either just pass on the UTF-8, or decode it by hand; it's
not hard. The greatest problem is going to be deciding what to do when
(not if!) you do get a Unicode character that won't fit in your C char.

Richard


File is getting written by another routine , where all the characters
are written using fwrite.
In that header is been hard coded as "<?xml version="1.0"
encoding="UTF-8" standalone="no" ?> "
I think this conversion of characters is happening after writteninto
the file right?

Can we change anything[changing any other format of xml] while writing
the xml file, which will store these without conversion?

In reading code, how will it come to know these wto characters are
belongs to one character. or is there any other decoding machanism.

I am not much familiar with the xml.

I tried reading with unicode wide char as well, but it was not reading
properly.
 
R

Richard Bos

File is getting written by another routine , where all the characters
are written using fwrite.
In that header is been hard coded as "<?xml version="1.0"
encoding="UTF-8" standalone="no" ?> "
I think this conversion of characters is happening after writteninto
the file right?

How the blazes should _I_ know? _You_ have access to (possibly even
written) this "routine", whether that mean function or whatever, I do
not.
In reading code, how will it come to know these wto characters are
belongs to one character. or is there any other decoding machanism.

My dear boy, if you won't do your own research, you'll never amount to a
programmer. Information on UTF-8 is extremely easy to come by.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,279
Latest member
LaRoseDermaBottle

Latest Threads

Top