Question about language setting

Discussion in 'Perl Misc' started by Dave Saville, Dec 23, 2013.

  1. Dave Saville

    Dave Saville Guest

    Hi Ben
    Well it was actually setting LC_MONETARY due to the locale.h mistake.
    I am not that surprised. An app would need to be a) complied with the
    faulty libc, b) run in a locale where the separator was not a period
    and c) actually try and change LC_NUMERIC/LC_MONETARY - Very low
    probability I would say. With respect, I would say that treating a
    version number as anything other than a string was not a very good
    idea. A quick split on not a digit?
     
    Dave Saville, Dec 31, 2013
    #21
    1. Advertisements

  2. Neither had I[*]. But

    setlocale(LC_NUMERIC, "de_DE");

    is supposed to switch to 'Germanly formatted numerals' (with apologies
    to people who care about grammer :) and if it can't because the
    necessary information is not available, it didn't work as it was
    supposed to.

    [*] After a short and happy intermezzo in 1998/99, I've grudgingly come
    to accept that there are two kinds of people on this planet:

    - those who write using Letters which is exactly everything
    available on a US-QERTY keyboard

    - weird aborigines painting bizarre ideograms they attach some
    uninteresting meaning to we have to reproduce on computer
    displays to avoid alienating potential customers

    and have henceforth dutifully restricted myself to ASCII in writing.
     
    Rainer Weikusat, Jan 1, 2014
    #22
    1. Advertisements

  3. Neither had I[*]. But

    setlocale(LC_NUMERIC, "de_DE");

    is supposed to switch to 'Germanly formatted numerals' (with apologies
    to people who care about grammer :) and if it can't because the
    necessary information is not available, it didn't work as it was
    supposed to.

    [*] After a short and happy intermezzo in 1998/99, I've grudgingly come
    to accept that there are two kinds of people on this planet:

    - those who write using Letters which is exactly everything
    available on a US-QWERTY keyboard

    - weird aborigines painting bizarre ideograms they attach some
    uninteresting meaning to we have to reproduce on computer
    displays to avoid alienating potential customers

    and have henceforth dutifully restricted myself to ASCII in writing.
     
    Rainer Weikusat, Jan 1, 2014
    #23
  4. According to what standard? ISO C only defines the "C" locale; others
    are implementation-defined. POSIX adds "POSIX" as a synonym for "C".

    [...]
     
    Keith Thompson, Jan 5, 2014
    #24
  5. The Open Group Base Specification Issue 7; IEEE Std 1003.1, 2013 Edition:

    | [XSI] [Option Start]
    | If the locale value has the form:
    |
    | language[_territory][.codeset]
    |
    | it refers to an implementation-provided locale, where settings of
    | language, territory, and codeset are implementation-defined.
    |
    | LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME
    | are defined to accept an additional field @ modifier, which allows the
    | user to select a specific instance of localization data within a single
    | category (for example, for selecting the dictionary as opposed to the
    | character ordering of data). The syntax for these environment variables
    | is thus defined as:
    |
    | [language[_territory][.codeset][@modifier]]
    |
    | For example, if a user wanted to interact with the system in French, but
    | required to sort German text files, LANG and LC_COLLATE could be defined
    | as:
    |
    | LANG=Fr_FR
    | LC_COLLATE=De_DE
    |
    | This could be extended to select dictionary collation (say) by use of
    | the @ modifier field; for example:
    |
    | LC_COLLATE=De_DE@dict
    | [Option End]

    So it's an optional extension to POSIX.

    The format for the language, territory and codeset specifiers doesn't
    seem to be specified, but the examples suggest ISO 639-1 for languages
    and ISO-3166 for territories, and I think pretty much all current
    unix-like systems follow these examples.

    hp

    [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08
     
    Peter J. Holzer, Jan 5, 2014
    #25
  6. No, but it does say that if it exists, it must be a locale suitable for
    the language "de" and the territory "DE". It doesn't say that "de" means
    German and "DE" means Germany, but I already wrote that.
    Well, "de" could mean "Dublin English", and then it probably refer to a
    locale where the decimal separator is ".". But that is clearly being
    facetious: While some systems may have alternate names for the languages
    and territories (HP-UX 9.x used full names instead of abbreviations, and
    Debian still has "deutsch" and "german" als aliases for de_DE.iso88591),
    it is not reasonable to assume that the language code "de" refers to any
    other language than German and that the territory code "DE" refers to
    any other country than Germany. And it is not reasonable to assume that
    a German locale for Germany in 2014[1] could prescribe any other decimal
    separator than ",".

    So I agree with Rainer here. On a modern POSIX system,
    setlocale(LC_NUMERIC, "de_DE") must either set the decimal separator to
    "," or fail.

    hp

    [1] Historically the decimal separator hasn't been that uniform: The
    "WIFO-Monatsberichte" of the Austrian Institute of Economic Research
    have used at least 3 different separators between 1927 and now. My
    mother still uses a dot instead of a comma (I do, too, but for a
    different reason).
     
    Peter J. Holzer, Jan 5, 2014
    #26
  7. Actually, perl is not admitting the possibility of the 'or fail' part
    because the code in question doesn't check the return value of
    setlocale. But that's somewhat of a useless discussion because de_DE is
    the locale-code for German for all real systems which happened to figure in
    this thread (Debian, Illumos and OS/2) and hence, if
    'setlocale(LC_NUMERIC, "de_DE")' does not switch to a German locale with , instead
    of a decimal point, "it didn't do what it was supposed to do". There may
    be various reason for that, 'German locale information unavailable'
    being among them, but since debugging a problem which occurs on a system
    with this information when it is being used is not really possible
    without it, I took the liberty of assuming that it would be available.
     
    Rainer Weikusat, Jan 6, 2014
    #27
  8. Dave Saville

    Dave Saville Guest

    Well we have found the problem, conflicting .h file definitions, and
    rebuilt 5.163 which appears to be OK. But I would like to program
    round it if possible.

    The problem occurs when the decimal separator is a comma. So, going on
    your previous suggestion,

    use strict;
    use warnings;
    BEGIN:
    {
    if ( sprintf("%f", 2.5) =~ m{\,} )
    {
    print "oh dear\n";
    $ENV{LANG} = 'C';
    }
    }
    use Encode;
    print "Hello world\n";

    But that fails too :-(

    What is needed is not to process the use Encode - which triggers the
    error, before I have a chance to fix it by setting to C. Or is that
    not possible?
     
    Dave Saville, Jan 8, 2014
    #28
  9. Dave Saville

    Dave Saville Guest

    Thanks Ben, Not used BEGIN before and I guess I automaticllay typed a
    colon after a "label" :)

    I think it will now - a quick test works here. But I have a problem
    setting a test case environment to match that of the guy who first hit
    the problem so I am mailing him test scripts to try.
    It's not.
    Last resort :)
     
    Dave Saville, Jan 8, 2014
    #29
  10. [...]
    Considering

    It is exactly equivalent to

    BEGIN { require Module; Module->import( LIST ); }
    [perldoc -f use]

    making that

    BEGIN {
    if ( sprintf("%f", 2.5) =~ m{\,} )
    {
    print "It's an invasion!\n";
    $ENV{LANG} = 'C';
    }

    require Encode;
    Encode->import();
    }

    might make sense. Or possibly (untested)

    BEGIN {
    local $ENV{LANG} = 'C' if sprintf("%f", 2.5) =~ m{\,};

    require Encode;
    Encode->import();
    }

    as this would restrict the modified environment to this block.
    This could itself be put into a module, eg

    package SafeEncode;

    BEGIN {
     
    Rainer Weikusat, Jan 8, 2014
    #30
  11. Semantically, yes. But in this case, all the code which logically
    belongs together is contained in the begin block.
    It's the purpose of the locale setting to affect numerical
    formatting. Hence, if it has to be disabled/ overridden somewhere in
    order to avoid a bug, this override should affect the codepath
    triggering the bug, not any other, perfectly harmless one which happens
    to format a number (or do something else which is influenced by the
    locale).
    This is a perfectly normal and documented way to invoke a subroutine
    after some other processing has been performed without the subroutine
    being able to notice that an intermediate subroutine ran, cf

    The "goto-&NAME" form is quite different from the other forms of
    "goto". In fact, it isn't a goto in the normal sense at all,
    and doesn't have the stigma associated with other gotos.
    Instead, it exits the current subroutine (losing any changes set
    by local()) and immediately calls in its place the named
    subroutine using the current value of @_. This is used by
    "AUTOLOAD" subroutines that wish to load another subroutine and
    then pretend that the other subroutine had been called in the
    first place (except that any modifications to @_ in the current
    subroutine are propagated to the other subroutine.)

    But this will kill the local (I didn't think about that), hence, it
    won't work in this case. Apart from that, you're absolutely free to
    cultivate a philosophical dislike for any particular Perl feature (and
    to argument against it) and everyone else is as perfectly free to
    consider your opinion misguided and the arguments in favor of it
    unconvincing.
     
    Rainer Weikusat, Jan 8, 2014
    #31
  12. Dave Saville

    Tim McDaniel Guest

    I just ran

    $ perl -w -e 'use strict;BEGIN {print "hi\n";} print "real\n"; BEGIN();print "end\n"'
    hi
    real
    end

    "sub" before "BEGIN" does not change the behavior. The same happens
    for CHECK, INIT, and UNITCHECK. For END, the END() call similarly
    does nothing, so it's real, end, hi.

    So it appears to me that they're far from real subs:
    - they do allow the "sub" keyword
    - they have code blocks, but that's not unique to subs
    - they are invoked automatically
    - you can define them multiple times without "Subroutine ___
    redefined", but unlike subs, the code blocks are concatenated rather
    than replaced
    - calling them neither causes "Undefined subroutine" nor causes code
    to run
    - you can stringize \&BEGIN and get "CODE(0xbb9455c0)" or whatever,
    but if you try to call any of them, you get "Undefined subroutine
    &main::BEGIN called" vel sim.
     
    Tim McDaniel, Jan 8, 2014
    #32
  13. Dave Saville

    Tim McDaniel Guest

    To clarify,

    I meant "calling them directly like 'BEGIN();'".
    I meant "calling them via a reference like 'my $x = \&BEGIN; $x->();'".
     
    Tim McDaniel, Jan 8, 2014
    #33
  14. Dave Saville

    Tim McDaniel Guest

    #! /usr/bin/perl
    use warnings;
    use strict;

    BEGIN {
    print "in begin\n";
    *BEGIN = sub { print "in subbegin\n"; };
    }
    exit 0;

    $ perl local/test/108.pl
    in begin
    $

    Yeah, not a BEGIN block. As you stated, but still, harrumph.

    ("&BEGIN();" just before exit outputs "subbegin", as I expected.)
     
    Tim McDaniel, Jan 9, 2014
    #34
  15. Dave Saville

    Dave Saville Guest

    <snip>

    It would appear that you can't trap this. :-(

    I have tried with 5.8.2 and 5.16.0 and it would appear that in the
    former case perl sets up its locale stuff *before* it ever gets around
    to BEGIN and in either case setting any environmentals in BEGIN has no
    effect.

    use strict;
    use warnings;
    BEGIN
    {
    if ( 2.5 ne "2.5" )
    {
    printf STDERR "oh dear %f\n", 2.5;
    }
    printf STDERR "BEGIN\t%s\n", $ENV{LANG};
    $ENV{LANG} = 'en_GB';
    }
    use Encode;
    printf STDERR "MAIN\t%s \n", $ENV{LANG};


    5.8.2

    [T:\tmp]set lang=nl_NL

    [T:\tmp]try.pl
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
    LC_ALL = (unset),
    LANG = "nl_NL"
    are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    BEGIN nl_NL 2.500000
    MAIN en_GB 2.500000


    5.16.0

    oh dear 2,500000
    BEGIN nl_NL 2,500000
    Invalid version format (non-numeric data) at
    u:/perl5/lib/5.16.0/constant.pm line 2.
    BEGIN failed--compilation aborted at u:/perl5/lib/5.16.0/constant.pm
    line 2.
    Compilation failed in require at u:/perl5/lib/5.16.0/os2/Encode.pm
    line 8.
    BEGIN failed--compilation aborted at u:/perl5/lib/5.16.0/os2/Encode.pm
    line 8.
    Compilation failed in require at try.pl line 15.
    BEGIN failed--compilation aborted at try.pl line 15.

    So I think this cannot be fixed from *inside* a perl script. Although
    setlocale() would fix the problem use of it trips the problem in the
    first place. :-(

    Thanks for all the help and discussion.
     
    Dave Saville, Jan 10, 2014
    #35
  16. Not in this way, at least, as the locale-information from the
    environment is applied to an actual process via

    setlocale(LC_ALL, "");

    But have you tried to load the Encode module with LC_ALL temporarily
    reset to "C locale", ie somewhat like this[*]:

    ------------
    use POSIX qw(locale_h);
    use locale;

    BEGIN {
    setlocale(LC_ALL, 'C');
    require mod;
    setlocale(LC_ALL, '');
    }
    ------------

    [*] In case you're unconditionally overwriting the user's locale,
    anyway, documenting this "I18N is not supported and using something
    other than the "C" locale may or may not work" might be a more
    honest way of dealing with this.

    On UNIX(*) etc, you could also do something like this:

    -------------
    #!/usr/bin/perl
    use POSIX qw(locale_h);
    use locale;

    BEGIN {
    setlocale(LC_ALL, '');
    if (sprintf('%f', 2.5) =~ /,/) {
    print STDERR ("You won't spoil my precious bodily fluids!\n");

    $ENV{LANG} = 'C';
    exec($0, @ARGV);
    }
    }

    printf("%f\n", 2.5);
    -------------
     
    Rainer Weikusat, Jan 10, 2014
    #36
  17. The setlocale C library function could also be made available via XS[*] or
    Inline::C.

    [*] It is actually not really difficult to combine XS/C code and Perl
    code without jumping through the hoop of creating a full-fledged
    extension module, ie, I have a module here which can be used like this:

    use MAD::xso_loader '/path/to/shared_object.so';

    which creates an AUTOLOAD subroutine in the package using it which tries
    to locate an otherwise undefined function in the shared object or
    objects.
     
    Rainer Weikusat, Jan 10, 2014
    #37
  18. Dave Saville

    Dave Saville Guest

    Fails - use POSIX falls down the same bear trap :-(

    [T:\tmp]try.pl
    Invalid version format (non-numeric data) at
    u:/perl5/lib/5.16.0/Exporter.pm lin
    e 3.
    Compilation failed in require at u:/perl5/lib/5.16.0/os2/Fcntl.pm line
    61.
    Compilation failed in require at u:/perl5/lib/5.16.0/os2/POSIX.pm line
    17.
    BEGIN failed--compilation aborted at u:/perl5/lib/5.16.0/os2/POSIX.pm
    line 17.
    Compilation failed in require at try.pl line 3.
    BEGIN failed--compilation aborted at try.pl line 3.
     
    Dave Saville, Jan 10, 2014
    #38
  19. "Perfection is the enemy of the good": I'm fine with using XS and all I
    really want is 'implement a function (or some functions)' in XS/C and
    link that together with an existing Perl program without either going through
    'all of the h2xs stuff' or 'relying on transparent runtime
    compilation/ re-compilation' (and autogenerated XS code), eg (actual
    example), in some application, I need to decompose an IPv4 address range
    into proper networks. There's an efficient algorithm for that (I
    invented, although I likely wasn't the first one to do so) but it needs
    access to 'fast' bit scanning operations usually available as machine
    instruction and provided as 'gcc builtins' in a somewhat more portable
    way. Enter

    ----------
    /*
    provide access to __builtin_clz
    and ffs routines

    $Id: bit_scan.xs,v 1.3 2011-12-08 21:38:49 rw Exp $
    */

    #include "EXTERN.h"
    #include "perl.h"
    #include "XSUB.h"

    #include <strings.h>

    MODULE = lib

    int
    do_ffs(v)
    unsigned v
    CODE:
    RETVAL = ffs(v);
    OUTPUT:
    RETVAL

    int do_fls(v)
    unsigned v
    CODE:
    RETVAL = 32 - __builtin_clz(v);
    OUTPUT:
    RETVAL
    -----------
    (this code is owned by my employer and quoted for educational purposes)
    I've also turned the racoon parser into a shared library so that I could
    make an extension module out of that and I considered doing this in
    order to provide automatic access to the various racoon
    structures. After reading through the DWARF specification, however, I
    quickly abandoned this idea in favour of writing functions creating 'Perl
    data structures' from selected parts of the racoon ones by hand ...
     
    Rainer Weikusat, Jan 10, 2014
    #39
  20. Dave Saville

    Dave Saville Guest

    Hi Ben

    Argument "LC_ALL" isn't numeric in subroutine entry at
    d:/usr/lib/perl/lib/5.16.
    0/OS2/POSIX.pm line 2.

    I could of course hard code the value from the .h file :)

    Finding the correct path on OS/2 is almost certainly *not* going to be
    a problem. Because perl for OS/2 is a binary and because OS/2 uses
    drive letters the chances of anyone installing in the same location as
    the guy who built perl are very small. Therefoe we make use of
    PERLLIB_PREFIX to find everything. However, it occurs to me that there
    would be no way to build that # line on the fly. Not only to get the
    path but also the perl version correct.
     
    Dave Saville, Jan 11, 2014
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.