Reading UTF-8 string from file with read() function.

S

Sergei

Hi,
I need to read a string from UTF-8 encoded text file.
I know at which byte position the string starts and its length (also
in byte units).
The problem is that read( FILEHANDLE,SCALAR,LENGTH) function takes
LENGTH in character units, not in bytes.
I've tried to open the file in binary mode instead of UTF-8, so I can
read the correct length, but then I can't process the string with
regular expressions correctly as Perl thinks it's in binary encoding,
not UTF-8.
Also, I've tried to read the string using getc() function, but it is
unacceptably slow.
Is there any solution ?
Thanks a lot,
--Sergei
 
B

Brian McCauley

Sergei said:
I need to read a string from UTF-8 encoded text file.
I know at which byte position the string starts and its length (also
in byte units).
The problem is that read( FILEHANDLE,SCALAR,LENGTH) function takes
LENGTH in character units, not in bytes.
I've tried to open the file in binary mode instead of UTF-8, so I can
read the correct length, but then I can't process the string with
regular expressions correctly as Perl thinks it's in binary encoding,
not UTF-8.
Is there any solution ?

Read the string from file as binary and then utf8::decode() it.
 
S

Sergei

Brian McCauley said:
...
Read the string from file as binary and then utf8::decode() it.

Brian,
You are right. I did:
use Encode 'decode_utf8';
$Unicode = decode_utf8($bytes);
And it works !
Thanks a lot !
--Sergei
 
N

nobull

use Encode 'decode_utf8';
$Unicode = decode_utf8($bytes);
And it works !

Yes, you can use Encode::decode_utf8() instead of the builtin
utf8::decode() if you like. Note: when called with a single agument
Encode::decode_utf8() is simply a wrapper for utf8::decode().
 
S

Sergei

Yes, you can use Encode::decode_utf8() instead of the builtin
utf8::decode() if you like. Note: when called with a single agument
Encode::decode_utf8() is simply a wrapper for utf8::decode().

I didn't know I could use it like this.
This way it's even better.
Thanks!
(Other's messages were useful too. What an excellent thing this news
group! Thanks a lot everybody!)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top