POSIX enhancements to printf

J

James Kuyper

On 02/27/2014 09:05 AM, BartC wrote:
....
I was writing software for use in Europe twenty years ago (predating a lot
of this regional/locale stuff too).

Twenty years ago is not long enough to predate locales; <locale.h> was
part of the very first C standard in 1989, with much of it's current
functionality. I'm sure it long pre-dates that standard, though I've not
been able to locate a precise starting date.
Perhaps it's the same now (does a printed floating point value use something
other than a period as a decimal point in certain locales?)

Yes, in many European countries it's conventional to use the comma as
the decimal point, and to use a period as the thousands separator. I
first learned about that alternate convention while reading an advanced
math book more than 40 years ago, long before I even dreamed of
programming computers.
 
K

Keith Thompson

BartC said:
However I'm asking whether C will directly print, for example, floating
point values with a comma and whatever other local conventions demand.

Yes. Here's a program I just ran on my system (Linux Mint 14):

#include <stdio.h>
#include <locale.h>
int main(void) {
const char *const locales[] = {
"C", "C.UTF-8", "POSIX", "en_AG", "en_AG.utf8", "en_AU.utf8",
"en_BW.utf8", "en_CA.utf8", "en_DK.utf8", "en_GB.utf8",
"en_HK.utf8", "en_IE.utf8", "en_IN", "en_IN.utf8", "en_NG",
"en_NG.utf8", "en_NZ.utf8", "en_PH.utf8", "en_SG.utf8",
"en_US.utf8", "en_ZA.utf8", "en_ZM", "en_ZM.utf8",
"en_ZW.utf8", "zh_CN.utf8", "zh_SG.utf8"
};

for (int i = 0; i < sizeof locales / sizeof *locales; i ++) {
setlocale(LC_NUMERIC, locales);
printf("%-10s %'.2f\n", locales, 123456.78);
}
}

The list of locales is from the output of `locale -a`. Here's the
output I get:

C 123456.78
C.UTF-8 123456.78
POSIX 123456.78
en_AG 123,456.78
en_AG.utf8 123,456.78
en_AU.utf8 123,456.78
en_BW.utf8 123 456,78
en_CA.utf8 123,456.78
en_DK.utf8 123.456,78
en_GB.utf8 123,456.78
en_HK.utf8 123,456.78
en_IE.utf8 123,456.78
en_IN 1,23,456.78
en_IN.utf8 1,23,456.78
en_NG 123,456.78
en_NG.utf8 123,456.78
en_NZ.utf8 123,456.78
en_PH.utf8 123,456.78
en_SG.utf8 123,456.78
en_US.utf8 123,456.78
en_ZA.utf8 123 456,78
en_ZM 123,456.78
en_ZM.utf8 123,456.78
en_ZW.utf8 123 456,78
zh_CN.utf8 123,456.78
zh_SG.utf8 123,456.78

None of the locales uses '.' as a separator, but several of them do use
"," as a decimal point.
 
B

BartC

Keith Thompson said:
Yes. Here's a program I just ran on my system (Linux Mint 14): .....
The list of locales is from the output of `locale -a`. Here's the
output I get:
en_BW.utf8 123 456,78
.....

OK, that's interesting. Although none of the 4 or 5 C compilers I tried
under Windows 7 generated anything like that. Most didn't like the %',
except lccwin32 which used it to add comma separators to all, but kept the a
normal decimal point too.

Under Ubuntu however, I got similar results to you.

Still, I don't think I would rely to much on the capabilities of C's printf
yet if the output has to be just right.
 
K

Kaz Kylheku

....

OK, that's interesting. Although none of the 4 or 5 C compilers I tried
under Windows 7 generated anything like that. Most didn't like the %',
except lccwin32 which used it to add comma separators to all, but kept the a
normal decimal point too.

Under Ubuntu however, I got similar results to you.

Still, I don't think I would rely to much on the capabilities of C's printf
yet if the output has to be just right.

If you want positional parameters, it's better to use C++.

In C++ you can overload the comma operator, making it possible to do
something like this:


format("format string %1 %2"), 42, "foo";

the overloads of the , operator handle the type of the argument properly,
so you don't need stupidities like %d versus %s: you just say where
you want the argument to go.

Detecting too many or too few format parameters is 100% reliable also;
no undefined behaviors.

I developed a sophisticated version of this once where the formatting options
were handled as inline funtions that combined attributes represented as
efficient bitmasks. It may have looked something like this:

format("format string %1 %2"), width(5) | precision(3) | 42.0, "foo";
 
M

Mark Storkamp

Keith Thompson said:
None of the locales uses '.' as a separator, but several of them do use
"," as a decimal point.

On my system quite a few do, such as
el_GR 123.456,78
fi_FI 123.456,78

Then there's
et_EE 123 456,78

The strangest has to be
eu_ES 123456'78
it's the only one I saw that uses something other than . or , for the
decimal point.

But none of them use a space for a separator with a . decimal point,
which I believe the OP may have been looking for.

Looking at the contents of the LC_NUMERIC files, it would be simple
enough to make any custom format that one wanted, then include it with
the distribution.
 
K

Kaz Kylheku

On my system quite a few do, such as
el_GR 123.456,78
fi_FI 123.456,78

Then there's
et_EE 123 456,78

The strangest has to be
eu_ES 123456'78
it's the only one I saw that uses something other than . or , for the
decimal point.

It's pretty silly for all these countries to be in the EU, but
not agree on how to write numbers.
 
K

Keith Thompson

BartC said:
....

OK, that's interesting. Although none of the 4 or 5 C compilers I tried
under Windows 7 generated anything like that. Most didn't like the %',
except lccwin32 which used it to add comma separators to all, but kept the a
normal decimal point too.

The ' (apostrophe) flag is a POSIX extension, not mentioned by ISO C, so
I'm not surprised a Windows implementation wouldn't support it.
 
K

Kaz Kylheku

What? Do you expect them all to speak your language too?

No, and, notably, I do not expect anyone to read 123.45 out loud in the same
language, either.

You do know that 0 to 9 are international symbols, right, used world over,
even in countries where they have their own "native" symbols for numbers?

You're not expressing your culture by using international symbols, while
disagreeing on details like dots and commas.

The dot should be used for separating the decimal fraction. Why? Because the
comma is already used in mathematics for separating function arguments,
or elements in a set and so on.

When these Euro-fucktards substitute concrete figures into a formula, it looks
like they are increasing the number of arguments.

z = f(x, y)

Substitute x = 2,3 and y = 4,9:

z = f(2,3, 4,9)

It all hinges on that little piece of whitespace now. Do we parenthesize
or what?

They still write like this { 1, 2, 3, 4 } which is just whitespace differences
away from ebing ambiguous {1,2,3,4}. Is that {1.2,3.4}?
 
B

Ben Bacarisse

Ian Collins said:
jacob navia wrote:

One of the reasons I try and avoid it :)

*Nothing* works and that's only *one* reason to avoid it! How thorough
of you to have found others! :)

Reminds me of a conversation supposedly overheard by Alan Bennett in a
restaurant:

"The food's awful."
"Yes, and such small portions."
 
S

Stephen Sprunk

The list of locales is from the output of `locale -a`. Here's the
output I get:

...
en_DK.utf8 123.456,78
...

None of the locales uses '.' as a separator,

Are you sure about that? Look closer.

S
 
I

Ian Collins

BartC said:
(BTW, does anyone in continental Europe actually use periods (".") to
separate thousands, and commas as decimal points?)

Out of the 132 locales installed on my Solaris 11.1 system, 48 do:

Locale name = ar_AE.UTF-8 12.345.678,900000
Locale name = ar_BH.UTF-8 12.345.678,900000
Locale name = ar_DZ.UTF-8 12.345.678,900000
Locale name = ar_EG.UTF-8 12.345.678,900000
Locale name = ar_IQ.UTF-8 12.345.678,900000
Locale name = ar_JO.UTF-8 12.345.678,900000
Locale name = ar_KW.UTF-8 12.345.678,900000
Locale name = ar_LY.UTF-8 12.345.678,900000
Locale name = ar_MA.UTF-8 12.345.678,900000
Locale name = ar_OM.UTF-8 12.345.678,900000
Locale name = az_AZ.UTF-8 12.345.678,900000
Locale name = bs_BA.UTF-8 12.345.678,900000
Locale name = ca_ES.UTF-8 12.345.678,900000
Locale name = da_DK.UTF-8 12.345.678,900000
Locale name = de_AT.UTF-8 12.345.678,900000
Locale name = de_BE.UTF-8 12.345.678,900000
Locale name = de_DE.UTF-8 12.345.678,900000
Locale name = de_LU.UTF-8 12.345.678,900000
Locale name = el_CY.UTF-8 12.345.678,900000
Locale name = el_GR.UTF-8 12.345.678,900000
Locale name = es_AR.UTF-8 12.345.678,900000
Locale name = es_BO.UTF-8 12.345.678,900000
Locale name = es_CL.UTF-8 12.345.678,900000
Locale name = es_CO.UTF-8 12.345.678,900000
Locale name = es_CR.UTF-8 12.345.678,900000
Locale name = es_EC.UTF-8 12.345.678,900000
Locale name = es_ES.UTF-8 12.345.678,900000
Locale name = es_PY.UTF-8 12.345.678,900000
Locale name = es_UY.UTF-8 12.345.678,900000
Locale name = es_VE.UTF-8 12.345.678,900000
Locale name = fr_BE.UTF-8 12.345.678,900000
Locale name = fr_LU.UTF-8 12.345.678,900000
Locale name = hr_HR.UTF-8 12.345.678,900000
Locale name = id_ID.UTF-8 12.345.678,900000
Locale name = is_IS.UTF-8 12.345.678,900000
Locale name = iso_8859_1 12.345.678,900000
Locale name = it_IT.UTF-8 12.345.678,900000
Locale name = ka_GE.UTF-8 12.345.678,900000
Locale name = lt_LT.UTF-8 12.345.678,900000
Locale name = mk_MK.UTF-8 12.345.678,900000
Locale name = nl_BE.UTF-8 12.345.678,900000
Locale name = nl_NL.UTF-8 12.345.678,900000
Locale name = pt_BR.UTF-8 12.345.678,900000
Locale name = ro_RO.UTF-8 12.345.678,900000
Locale name = sl_SI.UTF-8 12.345.678,900000
Locale name = sq_AL.UTF-8 12.345.678,900000
Locale name = tr_TR.UTF-8 12.345.678,900000
Locale name = vi_VN.UTF-8 12.345.678,900000
 
K

Kenny McCormack

Out of the 132 locales installed on my Solaris 11.1 system, 48 do:

Locale name = ar_AE.UTF-8 12.345.678,900000

BTW, how does this weird way of doing things affect function calling?
(Note: This was alluded to earlier by another poster).

I.e., if I live in one of those countries, am I not entitled to write:

float foo(float bar) { ... }

And then call it like this:

foo(12,34);

and expect foo to receive the value of twelve and 34 one hundreths?

And if not, how could I possibly know otherwise???
 
B

BartC

Kenny McCormack said:
(Why would one particular machine have so locales in it? Does the OS switch
between 132 different languages too? It seems remarkably wasteful if so.)
BTW, how does this weird way of doing things affect function calling?
(Note: This was alluded to earlier by another poster).

I.e., if I live in one of those countries, am I not entitled to write:

float foo(float bar) { ... }

And then call it like this:

foo(12,34);

and expect foo to receive the value of twelve and 34 one hundreths?

And if not, how could I possibly know otherwise???

Language source code seems to be immune from these local conventions.
Programmers are expected to program in English, with English keywords and
conventions regarding numeric constants.

If we allowed full Unicode character sets for source code (identifiers,
comments, string constants), the various ways of denoting numeric constants,
and several styles of punctuation, so that strings can be quoted as:

"abc"
«abc»
--abc

then reading source code is going to get very interesting (and confusing,
what with the glyph for 'A' for example being represented by a dozen code
points, all distinct to a compiler.)

I'm fairly certain that users in continental Europe at least can cope easily
with English conventions if they have to. (One Italian site I saw yesterday
used both "." and "," as a decimal point on the same page!) And what
happened in the past when they ran programs in a language or an OS or
runtime that didn't support commas as a decimal point? In fact, what happens
now when they run a C program under Windows?
 
I

Ian Collins

BartC said:
Kenny McCormack said:
(Why would one particular machine have so locales in it? Does the OS switch
between 132 different languages too? It seems remarkably wasteful if so.)

The machine's root pool has lived through many hardware and software
upgrades over the years, I guess I got carried away one time!
 
K

Kenny McCormack

Kenny McCormack said:
(Why would one particular machine have so locales in it? Does the OS switch
between 132 different languages too? It seems remarkably wasteful if so.)

On this Mac running OSX:

$ locale -a|wc
203 203 2442
$

Linux systems tend, OTOH, to be minimalistic. The ones I looked at only
had, at most, 5 or 6 locales installed.

--
"The God of the Old Testament is arguably the most unpleasant character
in all fiction: jealous and proud of it; a petty, unjust, unforgiving
control-freak; a vindictive, bloodthirsty ethnic cleanser; a misogynistic,
homophobic, racist, infanticidal, genocidal, filicidal, pestilential,
megalomaniacal, sadomasochistic, capriciously malevolent bully."

- Richard Dawkins, The God Delusion -
 
J

James Kuyper

Kenny McCormack said:
(Why would one particular machine have so locales in it? Does the OS switch
between 132 different languages too? It seems remarkably wasteful if so.)

The locale files that encode things like numeric formating information
are tiny - the cost of storing 132 locales is quite negligible. Storing
different message files for 132 different locales is more expensive, and
most software doesn't attempt that. If you install software with
multi-lingual support, you're often given the option of determining
which language support files. I never install more than the five
languages that I or my wife can personally read. Most of the Linux
software I have installed supports Chinese, so setting the LANG and
LANGUAGE environment variables to zh_TW.utf8 is sufficient to make my
wife's account on the machine very comfortable for her.

It's entirely unremarkable waste, by modern standards. Modern computers
usually come with huge amounts of stuff pre-installed that most users
will never use; just to make sure that those who will want to use it
will be able to find it.
 
M

Mark Storkamp

(Why would one particular machine have so locales in it? Does the OS switch
between 132 different languages too? It seems remarkably wasteful if so.)

Mine has 235 locales installed, and uses a whopping 2,260,145 bytes.
Seems easier to include them all than having an extra step in
installation to choose which ones you want. What percentage of people
installing an OS do you suspect even know which ones they should choose?
Can you imagine the call volume to the help lines? Here the KISS
principle applies.
 
K

Keith Thompson

BartC said:
(Why would one particular machine have so locales in it? Does the OS
switch between 132 different languages too? It seems remarkably
wasteful if so.)
[...]

Wasteful of what? I wouldn't expect each locale to take up much space.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top