Problem with gcc

cognacc · Nov 17, 2009

C portability is still hard due to environmental differences. I can
port dh from Linux to BSD without too much work, but beyond that, say to
Solaris, the going gets tough.

is that an environmental issue? (no pun intended

).
Unix should be "no problem" for the file lock problem.
Keep to POSIX.
Use fcntl f.example or lockf (man lockf - seen on linux, netbsd ,
solaris)
i also think you might get some thread safety for free(if you need
that?).

see this link forexample, maybe it fits your need.
http://dbaspot.com/forums/solaris/362139-file-locking-solaris.html
(talk about the fcntl and lockf solution - found on linux+bsd+
solaris).

Ups. i think im off topic

sorry

mic

Seebs · Nov 17, 2009

is that an environmental issue? (no pun intended ).
Unix should be "no problem" for the file lock problem.
Keep to POSIX.
Use fcntl f.example or lockf (man lockf - seen on linux, netbsd ,
solaris)
i also think you might get some thread safety for free(if you need
that?).

A serious discussion of the locking semantics, and their interaction with
the process model, is of course purely UNIX-specific and has nothing to do
with C. Suffice it to say that neither of those will work in this
case. (At least, not remotely portably.)

-s

Ben Bacarisse · Nov 17, 2009

Keith Thompson said:
Is plain char signed or unsigned in lcc-win?

signed.

Ben Bacarisse · Nov 17, 2009

jacob navia said:
Countless bugs that have no reason to exist. Characters should
be unsigned.

As an implementor you are in a unique position. You could make a
version with unsigned chars to see if there is any performance hit, at
least for the hardware your compiler supports. I would not have
thought so, but it is a long time since I knew the timings of any
machine ops.

Dik T. Winter · Nov 17, 2009

> char values containing character representations which are negative, get
> unexpectedly sign-extended when used in mixed arithmetic. Usually this is
> undesirable, and unexpected if you are unaware of the signedness of your
> char type.

I in general do not do arithmetic operations on character representations.
Otherwise it would lead to possible misunderstandings like
'I' + 1 == 'J'
which is not true on all systems. Moreover it leads to horrible code
like:
if(islower(c)) c -= 32;
which is *not* equivalent to
c = tolower(c);
and leads to horrible things like an advertisement I once saw here in
Amsterdam which was done in Turkish on purpose but equated the i with dot
and the i without dot, making it actually nonsense.

Dik T. Winter · Nov 17, 2009

> Treating '\0' as data in a NUL terminated string seems unnatural to me,
> despite what the standard says. I know it's data in the sense of taking
> up storage, but I think of it as metadata, a pseudo length specifier.

try the following:

#include <stdio.h>

int main(void) {
printf("%lu", (unsigned long)sizeof("123"));
return 0;
}

Dik T. Winter · Nov 17, 2009

> Ok, so how do I assign a character code to c that happens to be the code
> 130, and that happens to have a different encoding from the one C
> understands?

How do you know that 130 represents a character that makes sense? When you
want a specific character that is in the source character set represent it
with the actual character surrounded by single quotes. If it is not in the
source character set but in the execution character set, represent it with
an escape surrounded by single quotes, so either '\202' or '\x82'.

John Kelly · Nov 17, 2009

try the following:

#include <stdio.h>

int main(void) {
printf("%lu", (unsigned long)sizeof("123"));
return 0;
}

4, right.

Like I said, I know the NUL takes a byte of storage. But what's inside
the quotes? 123 is the data. Trying to say that 123\0 is your data may
be the standards definition, but it's not natural. I agree that it can
be a nice trick for performance minded programmers.

Dik T. Winter · Nov 17, 2009

> OK. But then you have this little anomaly:
>
> int C = '\x82';
> int D = 0x82;
>
> You might expect C==D, but that isn't the case. Just something else to
> explain that probably wouldn't need explaining if chars were not signed.

Why do you expect that? Why do you assign a 'character constant' to an
integer?

Dik T. Winter · Nov 17, 2009

> C99 6.4.4.4p6 specifies the meaning of a hexadecimal escape sequence:
>
> The hexadecimal digits that follow the backslash and the letter
> x in a hexadecimal escape sequence are taken to be part of the
> construction of a single character for an integer character
> constant or of a single wide character for a wide character
> constant. The numerical value of the hexadecimal integer so
> formed specifies the value of the desired character or wide
> character.
>
> Note that it says "character"; it doesn't refer to the type (plain)
> char.
>
> And, of course, the constraint I already quoted says that the value of
> the hexadecimal escape sequence must be in the range of type unsigned
> char. If '\x82' has the value -126, then it violates the constraint,
> which I don't think is the intent.

But consider the semantics under 10:
If an integer character constant contains a single character or
escape sequence, its value is the one that results when an object
with type char whose value is that of the single character or escape
sequence is converted to type int.
I think there is something contradictionary here.

Dik T. Winter · Nov 17, 2009

>
> Looking at the wrong way again.

When I was replying I was either sleeping or not alert enough. On the
systems I use e-acute is not 130. On my system at work it is 201, on
my system at home it is 142. And I have also worked on a system where it
is 208. More interesting, in the article as posted by Jacob it also is not
130, but 201 (it explicitly states "charset=ISO-8859-1").

>
> And what about 129? Is it e-grave or c-hacek? And why should it be one
> and not the other?

Make that 200...

Ben Bacarisse · Nov 17, 2009

Ben Bacarisse said:
As an implementor you are in a unique position. You could make a
version with unsigned chars to see if there is any performance hit, at
least for the hardware your compiler supports. I would not have
thought so, but it is a long time since I knew the timings of any
machine ops.

I just spotted that gcc can be told to use one or the other. A simple
test seemed to be to compile gawk with signed and the unsigned char.
I could not detect a difference (on a Core2 laptop) in the run time of
the resulting binaries.

Of course, I am sure there are machines on which it does matter, but
my laptop is not one of them.

Keith Thompson · Nov 17, 2009

Dik T. Winter said:
But consider the semantics under 10:
If an integer character constant contains a single character or
escape sequence, its value is the one that results when an object
with type char whose value is that of the single character or escape
sequence is converted to type int.
I think there is something contradictionary here.

I thought so too, but I've decided otherwise.

The "numerical value of the hexadecimal integer" is +130. The value
*of the character constant* is -126. The value +130 *specifies* the
value -126.

I think it could be worded more clearly, but I'm not entirely sure
how.

Nick · Nov 17, 2009

Dik T. Winter said:
When I was replying I was either sleeping or not alert enough. On the
systems I use e-acute is not 130. On my system at work it is 201, on
my system at home it is 142. And I have also worked on a system where it
is 208. More interesting, in the article as posted by Jacob it also is not
130, but 201 (it explicitly states "charset=ISO-8859-1").

As I said, it's two bytes on my system.

Flash Gordon · Nov 17, 2009

bartc said:
jacob navia said:

Eric Sosman a écrit :

bartc wrote:

The letter 'é' is 130. Why I should have it as -126 ???

Looking at the wrong way again.

Unless you can tell us the reason for widening e-grave, c-hacek or
e-acute
and so on this makes no sense.

int data[256]={0};

data['ú'] += 1;

int data[1+UCHAR_MAX] = { 0 };
data['ú' - CHAR_MIN] += 1;

Or you could use the `int *datap = data - CHAR_MIN;' trick
if desired.

Click to expand...

GREAT!!!

But why should I be forced to remember to subtract CHAR_MIN ???

Click to expand...

And whether it's UCHAR_MAX or CHAR_MIN (or is it CHAR_MAX or UCHAR_MIN),
and whether they are to be added or subtracted.

And I thought widening/arithmetic on characters were meaningless
(according to most of the posters on this thread), yet those lines seem
to do just that.

People (or at least I) have been saying they can't see a good reason to
care whether the value is positive or negative. So one instance has been
found where you need to do a little arithmetic, but having put that in
you *still* don't need to care whether a value is positive or negative,
it will work with signed or unsigned char, and it is still a rare case.

Oh, and on languages which handle this better you would have to use an
explicit conversion to convert from the character type to an integral
type, and that is all that is being done here. It's just you have to
remember to do it yourself.

bartc · Nov 17, 2009

Dik T. Winter said:
Why do you expect that? Why do you assign a 'character constant' to an
integer?

Why is everyone adopting such a bullyish attitude?

'\x82' is a convenient way of embedding a 0x82 code in the middle of a
string literal. Why would anyone expect it to have a value other then hex 82
when assigned to a single char or an int?

If a char glyph was inside the quotes, you don't know exactly what code you
will end up with. But this is an absolute value.

Hallvard B Furuseth · Nov 17, 2009

bartc said:
'\x82' is a convenient way of embedding a 0x82 code in the middle of a
string literal. Why would anyone expect it to have a value other then
hex 82 when assigned to a single char or an int?

Because they knew the C language, so they knew that the signedness
of char is one of its warts - and that '\x82' has the value of char
c='\x82'; (promoted to int).

I don't understand this discussion. C has its share of warts, as do
other languages. If we are going to use a language, we learn about
those and deal with them. Or we can go looking for a language which
suits us better. Or an implementation which suits us better, like only
supporting those where char is unsigned, if that's such a big deal.

Sure, also languages evolve, with lots of discussion and disagreement of
which changes are feasible and which are not, and whether a change would
be an improvement or not. But that's an entirely different issue than
_using_ the language. This discussion keep mixing these two together,
sometimes with people insisting on using C-as-it-should-have-been.

Dik T. Winter · Nov 18, 2009

> news:[email protected]... ....

> >
> > Looking at the wrong way again. >
> > Unless you can tell us the reason for widening e-grave, c-hacek or e-acute
> > and so on this makes no sense.

Click to expand...

>
> int data[256]={0};
>
> data['ú'] += 1;

I would not expect a line like that in code, more something like:
data[c] += 1;
where c is the return value of getchar().

Dik T. Winter · Nov 18, 2009

>
> Why is everyone adopting such a bullyish attitude?
>
> '\x82' is a convenient way of embedding a 0x82 code in the middle of a
> string literal. Why would anyone expect it to have a value other then hex 82
> when assigned to a single char or an int?

Putting '\x82' in a string literal is making assumptions about the character
set in use.

> If a char glyph was inside the quotes, you don't know exactly what code you
> will end up with. But this is an absolute value.

If there is a char glyph there I know the code I will end up with is the
code for that specific glyph. If I put there '\x82' I have no idea what
glyph that would be, if any.

bartc · Nov 18, 2009

Dik T. Winter said:
The letter 'é' is 130. Why I should have it as -126 ???

Looking at the wrong way again.

Click to expand...

Unless you can tell us the reason for widening e-grave, c-hacek or
e-acute
and so on this makes no sense.

Click to expand...

int data[256]={0};

data['ú'] += 1;

Click to expand...

I would not expect a line like that in code, more something like:
data[c] += 1;
where c is the return value of getchar().

Yet another source of confusion: getchar returns codes 128 to 255 as
positive values, but put that value into a char type, and it becomes
negative: char c; data[getchar()] works, but data[c=getchar()] doesn't.

And why can't someone write: char text[100]; data[text] ?

Making arbitrary rules about what can or can't be coded is not really
helpful; why not just admit that negative characters are a bad idea as Eric
Sosman did in this thread?

String operations with unsigned char arrays	2	Mar 27, 2009
Compiling fics-1.7.4	3	May 6, 2011
Warning when comparing char[] to a #define'd string	12	Nov 7, 2008
gcc 4 signed vs unsigned char	22	Jul 26, 2005
Differing signedness warnings when compiling ruby-odbc.	0	Jan 9, 2006
review of the "container library", part 1/?	18	Mar 1, 2011
M2Crypto-0.20.2, SWIG-2.0.0, and OpenSSL-1.0.0a build problem	5	Jul 13, 2010
Problem trying to install ReportLab with easy_install	0	Feb 22, 2009

Problem with gcc

cognacc

Seebs

Ben Bacarisse

Ben Bacarisse

Dik T. Winter

Dik T. Winter

Dik T. Winter

John Kelly

Dik T. Winter

Dik T. Winter

Dik T. Winter

Ben Bacarisse

Keith Thompson

Nick

Flash Gordon

bartc

Hallvard B Furuseth

Dik T. Winter

Dik T. Winter

bartc

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads