file reading

ramyakrishnakumar · Apr 12, 2007

Hi All,
I am facing some problem with basic file operation...

I have one xml file looks like
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<x:recording>

<udf3>Gélin</udf3>

</x:recording>

My code is like it will read this file and store it into one string
and will call one Database Stored procedure to parse the xml and store
it into some tables.

eg:
FILE * file = fopen("testFile.xml","r+b");

struct _stat buffer;

int result1 = _stat( "testFile.xml", &buffer );

int size = buffer.st_size;
char *temp = new char [(sizeof(char))*(size+1)];
fread(temp,sizeof(char),size,file);

pass this temp to Ado for SP execution.

Problem:

you can see the xml file has one higherorderASCII character' é '

this going to the SP as wrong character 'Ã© '

While debugging the code as well I can see the temp is having this
wrong value.

I reading in the binary mode but still why this problem is happening.

Can you please help me to resolve that

Ian Collins · Apr 12, 2007

Hi All,
I am facing some problem with basic file operation...

I have one xml file looks like
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<x:recording>

<udf3>Gélin</udf3>

</x:recording>

My code is like it will read this file and store it into one string
and will call one Database Stored procedure to parse the xml and store
it into some tables.

eg:
FILE * file = fopen("testFile.xml","r+b");

struct _stat buffer;

int result1 = _stat( "testFile.xml", &buffer );

int size = buffer.st_size;
char *temp = new char [(sizeof(char))*(size+1)];

As we are in the C world, that should be malloc - and sizeof(char) is by
definition, 1.

fread(temp,sizeof(char),size,file);

pass this temp to Ado for SP execution.

What is Ado and SP? Without knowing what is called, it's difficult to
answer the question.

Problem:

you can see the xml file has one higherorderASCII character' é '

this going to the SP as wrong character 'Ã© '

What happens if you use unsigned char? Does the function you are
calling expect ASCII or UTF8, char, unsigned char or something else?

Richard Bos · Apr 12, 2007

I have one xml file looks like
<?xml version=3D"1.0" encoding=3D"UTF-8" standalone=3D"no" ?>

^^^^^^^
There's your problem.

you can see the xml file has one higherorderASCII character' =E9 '

No, it doesn't.

this going to the SP as wrong character '=C3=A9 '

This is what is actually in the file.

Read up on UTF-8. It's a way of encoding Unicode, including characters
_above_ 0xFF (such as Devanagari and other Indian scripts, which may be
one reason why the person who supplied your file uses it), in sequences
of 8-bit bytes. This does mean that all over 0x7F must be encoded in two
or more bytes. Either just pass on the UTF-8, or decode it by hand; it's
not hard. The greatest problem is going to be deciding what to do when
(not if!) you do get a Unicode character that won't fit in your C char.

Richard

ramyakrishnakumar · Apr 12, 2007

^^^^^^^
There's your problem.

No, it doesn't.

This is what is actually in the file.

Read up on UTF-8. It's a way of encoding Unicode, including characters
_above_ 0xFF (such as Devanagari and other Indian scripts, which may be
one reason why the person who supplied your file uses it), in sequences
of 8-bit bytes. This does mean that all over 0x7F must be encoded in two
or more bytes. Either just pass on the UTF-8, or decode it by hand; it's
not hard. The greatest problem is going to be deciding what to do when
(not if!) you do get a Unicode character that won't fit in your C char.

Richard

File is getting written by another routine , where all the characters
are written using fwrite.
In that header is been hard coded as "<?xml version="1.0"
encoding="UTF-8" standalone="no" ?> "
I think this conversion of characters is happening after writteninto
the file right?

Can we change anything[changing any other format of xml] while writing
the xml file, which will store these without conversion?

In reading code, how will it come to know these wto characters are
belongs to one character. or is there any other decoding machanism.

I am not much familiar with the xml.

I tried reading with unicode wide char as well, but it was not reading
properly.

Richard Bos · Apr 13, 2007

File is getting written by another routine , where all the characters
are written using fwrite.
In that header is been hard coded as "<?xml version="1.0"
encoding="UTF-8" standalone="no" ?> "
I think this conversion of characters is happening after writteninto
the file right?

How the blazes should _I_ know? _You_ have access to (possibly even
written) this "routine", whether that mean function or whatever, I do
not.

In reading code, how will it come to know these wto characters are
belongs to one character. or is there any other decoding machanism.

My dear boy, if you won't do your own research, you'll never amount to a
programmer. Information on UTF-8 is extremely easy to come by.

Richard

Reading text file contents to a character buffer	29	Aug 2, 2010
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023
Reading large files	2	Sep 19, 2003
Reading a Binary File....	14	Nov 19, 2007
Reading little-endian data from a file in a portable manner	46	Jul 16, 2010
A question about reading an UTF-8 text file	8	Mar 18, 2006
Reading a file...	21	Mar 22, 2006
Issue with memory allocation and file reading	4	May 6, 2008

file reading

ramyakrishnakumar

Ian Collins

Richard Bos

ramyakrishnakumar

Richard Bos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads