transforming german characters

S

steve_f

I want to transform special German characters to obtain the following
variations:

groß bräu
gross bräu
gross braeu

there are two sets -

set one:
ß = ss = \xDF

set two:
Ä = Ae = \xC4
Ö = Oe = \xD6
Ü = Ue = \xDC
ä = ae = \xE4
ö = oe = \xF6
ü = ue = \xFC

basically, the rules are transform ß independently
and with set two, they are either all on or off together.

I wrote the follow which works well, but looks
pretty bad I think. so again this is a style question...
can anyone suggest a cleaner approach? TIA

sub transform_characters {
my @input = @_;
my @output;
for my $string (@input) {
push @output, $string;
if ($string =~ /\xDF/) {
$string =~ s/\xDF/ss/g;
push @output, $string;
if (test_for_character($string)) {
$string = swap_all($string);
push @output, $string;
}
next;
}
if (test_for_character($string)) {
$string = swap_all($string);
push @output, $string;
}
}
return @output;
}

sub test_for_character {
my $string = shift;
if ($string =~ /\xC4/ ||
$string =~ /\xD6/ ||
$string =~ /\xDC/ ||
$string =~ /\xE4/ ||
$string =~ /\xF6/ ||
$string =~ /\xFC/) {
return 1
} else {
return 0
}
}

sub swap_all {
my $string = shift;
$string =~ s/\xC4/Ae/g;
$string =~ s/\xD6/Oe/g;
$string =~ s/\xDC/Ue/g;
$string =~ s/\xE4/ae/g;
$string =~ s/\xF6/oe/g;
$string =~ s/\xFC/ue/g;
return $string;
}
 
J

John W. Krahn

steve_f said:
I want to transform special German characters to obtain the following
variations:

groß bräu
gross bräu
gross braeu

there are two sets -

set one:
ß = ss = \xDF

set two:
Ä = Ae = \xC4
Ö = Oe = \xD6
Ü = Ue = \xDC
ä = ae = \xE4
ö = oe = \xF6
ü = ue = \xFC

basically, the rules are transform ß independently
and with set two, they are either all on or off together.

I wrote the follow which works well, but looks
pretty bad I think.

It doesn't look too bad, I've seen worse. :)

so again this is a style question...
can anyone suggest a cleaner approach? TIA

The usual idiom is to use a hash for the search and replace tables.

sub transform_characters {
my @input = @_;
my @output;
for my $string (@input) {
push @output, $string;
if ($string =~ /\xDF/) {
$string =~ s/\xDF/ss/g;

Using a match followed by a substitution is a usual beginner mistake.
You only need the substitution.

if ( $string =~ s/\xDF/ss/g ) {

push @output, $string;
if (test_for_character($string)) {
$string = swap_all($string);
push @output, $string;
}
next;
}
if (test_for_character($string)) {
$string = swap_all($string);
push @output, $string;
}
}
return @output;
}

[snip code]

Using a hash you could write that as:

my %set1 = (
"\xDF" => 'ss',
);
# Use a character class because all keys are single characters
# If keys are multiple characters use alternation instead
my $key1 = '[' . join( '', keys %set1 ) . ']';

my %set2 = (
"\xC4" => 'Ae',
"\xD6" => 'Oe',
"\xDC" => 'Ue',
"\xE4" => 'ae',
"\xF6" => 'oe',
"\xFC" => 'ue',
);
my $key2 = '[' . join( '', keys %set2 ) . ']';

sub transform_characters {
my @input = @_;
my @output;
for my $string ( @input ) {
push @output, $string;
if ( $string =~ s/($key1)/$set1{$1}/og ) {
push @output, $string;
if ( $string =~ s/($key2)/$set2{$1}/og ) {
push @output, $string;
}
next;
}
if ( $string =~ s/($key2)/$set2{$1}/og ) {
push @output, $string;
}
}
return @output;
}



John
 
G

Gunnar Hjalmarsson

steve_f said:
I want to transform special German characters to obtain the
following variations:

groß bräu
gross bräu
gross braeu

there are two sets -

set one:
ß = ss = \xDF

set two:
Ä = Ae = \xC4
Ö = Oe = \xD6
Ü = Ue = \xDC
ä = ae = \xE4
ö = oe = \xF6
ü = ue = \xFC

basically, the rules are transform ß independently
and with set two, they are either all on or off together.

As John said, there is no reason to look for the characters with
separate regexes, and accordingly there is no reason to distinguish
between two sets.
for my $string (@input) {
push @output, $string;

Here you copy the whole original text to @output ...
if ($string =~ /\xDF/) {
$string =~ s/\xDF/ss/g;
push @output, $string;

.... and here you *add* the converted string. In the suggestion below,
I'm assuming that was a mistake.

sub transform_characters {
my @text = @_;

my %replace = (
"\xDF" => 'ss',
"\xC4" => 'Ae',
"\xD6" => 'Oe',
"\xDC" => 'Ue',
"\xE4" => 'ae',
"\xF6" => 'oe',
"\xFC" => 'ue',
);

for (@text) {
s/(\xDF|\xC4|\xD6|\xDC|\xE4|\xF6|\xFC)/$replace{$1}/g;
}

@text
}

my @output = transform_characters(@input);
 
S

steve_f

Thanks Gunnar, some great stuff here....I can use simple
statements to just brute force things, but I know there is
a more elegent way.

As John said, there is no reason to look for the characters with
separate regexes, and accordingly there is no reason to distinguish
between two sets.

The ß can either be on or off independent of the others so
you can get:

groß bräu
gross bräu
gross braeu

I should of stated the problem more directly:

if set one - set one on & set two on
set one off & set two on
set one off & set two off

if only set two - set two all on
- set two all off
Here you copy the whole original text to @output ...


... and here you *add* the converted string. In the suggestion below,
I'm assuming that was a mistake.

sub transform_characters {
my @text = @_;

my %replace = (
"\xDF" => 'ss',
"\xC4" => 'Ae',
"\xD6" => 'Oe',
"\xDC" => 'Ue',
"\xE4" => 'ae',
"\xF6" => 'oe',
"\xFC" => 'ue',
);
I really like the idea of the hash. Yes, I have heard you are not
thinking in Perl if you are not using hashes.
 
S

steve_f

Thank you John, this is really useful. Just to start, I must always remind
myself if I am doing something too many times to generalize.
John W. Krahn wrote:

[ snip - my statement of problem ]
It doesn't look too bad, I've seen worse. :)
I was able to brute force my way through it ;-)
The usual idiom is to use a hash for the search and replace tables.

yes, I see and it is very good...changes the whole approach
Using a match followed by a substitution is a usual beginner mistake.
You only need the substitution.

if ( $string =~ s/\xDF/ss/g ) {

ahh...ok...that's good to learn

[ snip code ]
Using a hash you could write that as:

my %set1 = (
"\xDF" => 'ss',
);
# Use a character class because all keys are single characters
# If keys are multiple characters use alternation instead

can you explain this a bit further? I'm not quite sure what you mean
by alternation, but I really only looked up the escaped values for
this particular problem.
my $key1 = '[' . join( '', keys %set1 ) . ']';

also here I start to get really lost....ok, you are loading into a scalar
the keys as one long string...joining them with no space between...
with two brackets so

$key1 = [\xDF]
$key2 = [\xC4\xD6\xDC\xE4\xF6\xFC]
correct?

I see you use it down below in this substitution but it is a bit hard
for me to understand:

if ( $string =~ s/($key1)/$set1{$1}/og )

well, if you have the time please give me a bit more clarrification
on this because I haven't seen it before.
my %set2 = (
"\xC4" => 'Ae',
"\xD6" => 'Oe',
"\xDC" => 'Ue',
"\xE4" => 'ae',
"\xF6" => 'oe',
"\xFC" => 'ue',
);
my $key2 = '[' . join( '', keys %set2 ) . ']';

sub transform_characters {
my @input = @_;
my @output;
for my $string ( @input ) {
push @output, $string;
if ( $string =~ s/($key1)/$set1{$1}/og ) {
push @output, $string;
if ( $string =~ s/($key2)/$set2{$1}/og ) {
push @output, $string;
}
next;
}
if ( $string =~ s/($key2)/$set2{$1}/og ) {
push @output, $string;
}
}
return @output;
}



John

Thanks again John.

Steve
 
J

Joe Smith

Gunnar said:
Here you copy the whole original text to @output ...



... and here you *add* the converted string. In the suggestion below,
I'm assuming that was a mistake.

As I read it, steve_f wants to output three separate lines for each
line of input that has both sets of characters.
line 1 = original string.
line 2 = string after doing just the ss substitution
line 3 = string after doing ss and all the other substitutions.
If so, adding the converted string with a second and third push is correct.
-Joe
 
J

John W. Krahn

steve_f said:
can you explain this a bit further? I'm not quite sure what you mean
by alternation, but I really only looked up the escaped values for
this particular problem.

Gunnar's example uses alternation.

my $key1 = '[' . join( '', keys %set1 ) . ']';

Changing this to use alternation would look something like:

my $key1 = '(?:' . join( '|', keys %set1 ) . ')';

also here I start to get really lost....ok, you are loading into a scalar
the keys as one long string...joining them with no space between...
with two brackets so

$key1 = [\xDF]
$key2 = [\xC4\xD6\xDC\xE4\xF6\xFC]
correct?
Yes.


I see you use it down below in this substitution but it is a bit hard
for me to understand:

if ( $string =~ s/($key1)/$set1{$1}/og )

well, if you have the time please give me a bit more clarrification
on this because I haven't seen it before.

The substitution and match operators interpolate variables like double
quoted strings so after interpolation the substitution operator sees:

if ( $string =~ s/([\xDF])/ss/g )


John
 
S

steve_f

steve_f said:
can you explain this a bit further? I'm not quite sure what you mean
by alternation, but I really only looked up the escaped values for
this particular problem.

Gunnar's example uses alternation.

my $key1 = '[' . join( '', keys %set1 ) . ']';

Changing this to use alternation would look something like:

my $key1 = '(?:' . join( '|', keys %set1 ) . ')';

also here I start to get really lost....ok, you are loading into a scalar
the keys as one long string...joining them with no space between...
with two brackets so

$key1 = [\xDF]
$key2 = [\xC4\xD6\xDC\xE4\xF6\xFC]
correct?
Yes.


I see you use it down below in this substitution but it is a bit hard
for me to understand:

if ( $string =~ s/($key1)/$set1{$1}/og )

well, if you have the time please give me a bit more clarrification
on this because I haven't seen it before.

The substitution and match operators interpolate variables like double
quoted strings so after interpolation the substitution operator sees:

ahhhhhhhhhh...all very fancy stuff, but I got it! thanks for
showing me this ;-)
if ( $string =~ s/([\xDF])/ss/g )


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top