Internalisation support and dictionaries

B

Broke

Hello,

I am a beginner so please be indulgent.
I wanted to make a sort of dictionary given a text french file.
So I wrote the following script.
Everything is OK but the ordered list comes in US ASCII encoding.
How to make it work for accented letters?
Any help will be appreciated.
Here is my humble script.
=======
#!/usr/bin/perl -w
use warnings;
local $/;
use locale;
use utf8;
$file = '/Users/Broke/Desktop/data.txt';
open (IN, $file) or die "$file not found\n : $!\n";
@data = ();
%seen = ();
while (<IN>) {
foreach $word (m/(\b.+?\b)/gi) {
unless ($seen{$word}) {
$seen{$word} = 1;
push(@data, $word);
}
}
}
close (IN) or die "Can't close $file : $!\n";
@data = sort(@data);
@data = map $_ . "\n", @data;
open (OUT, ">/Users/Broke/Desktop/out.txt") or die "Can't create\n :
$!\n";
select (OUT);
print @data;
close (OUT);
========
B.
 
A

anno4000

Broke said:
Hello,

I am a beginner so please be indulgent.
I wanted to make a sort of dictionary given a text french file.
So I wrote the following script.
Everything is OK but the ordered list comes in US ASCII encoding.
How to make it work for accented letters?

Use the locale pragma. See perldoc perllocale for a general description
and perldoc locale for specifics.

Anno
 
B

Broke

Many thanks to you Michele for your help.
Thank you also for pointing out that the
dot will capture also space. That's true.
it's better written with the \w+ or
the [[:alnum:]]+
instead of the dot.
Thank you also for the other hints.

Please don't forget that my problem is that
I want to extract french words with
diacritics and that I get only
words without diacritics amongs the other
possible words that would like to extract.

As Anno points out this is the problem
of the locale pragma.

I will reinstall the operating system.
It seems that I forgot that I choosed
US languge as my defaut language when
installing the operating system.
Very fortunately with "Apple" I am not
forced to reformat.

Many thanks again and have a nice day!
 
B

Broke

Many thanks to you Anno.
You said the truth.
I will investigate this problem.
Thanks again!
-
B.
 
M

Mumia W.

Hello,

I am a beginner so please be indulgent.
I wanted to make a sort of dictionary given a text french file.
So I wrote the following script.
Everything is OK but the ordered list comes in US ASCII encoding.
How to make it work for accented letters?
Any help will be appreciated.
Here is my humble script.
=======
#!/usr/bin/perl -w
use warnings;
local $/;
use locale;
use utf8;
$file = '/Users/Broke/Desktop/data.txt';
open (IN, $file) or die "$file not found\n : $!\n";
[...]

You can set an encoding for the 'open' command:

open (IN, '<:utf8', $file) or die (...

Read about the 'open' command and Perl:

Start->Run->"perldoc -f open"
Start->Run->"perldoc perl"
 
B

Broke

SUPER !!!!
It is exactly this that I needed !!
My friend I am extremely glad !
It works !!!
All my problems are solved thanks to YOU !!!

:^-)

Many Many Many Many Many Many thanks to you !!
--
B.
Hello,

I am a beginner so please be indulgent.
I wanted to make a sort of dictionary given a text french file.
So I wrote the following script.
Everything is OK but the ordered list comes in US ASCII encoding.
How to make it work for accented letters?
Any help will be appreciated.
Here is my humble script.
=======
#!/usr/bin/perl -w
use warnings;
local $/;
use locale;
use utf8;
$file = '/Users/Broke/Desktop/data.txt';
open (IN, $file) or die "$file not found\n : $!\n";
[...]

You can set an encoding for the 'open' command:

open (IN, '<:utf8', $file) or die (...

Read about the 'open' command and Perl:

Start->Run->"perldoc -f open"
Start->Run->"perldoc perl"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top