help a beginner with a basic function that should return a char

R

Ralf Damaschke

Ben said:
Did you notice that pete (who, quite reasonably, wants to write
portable C90 code) has had to chance the specification of the function
so that now it takes an unsigned int? Do you know why? I am pretty
sure it is simply so that indexing the array with u % 2 is always 0 or
1. If u were a signed int, the result on some systems can be -1. Of
course he could have written:

char *a[] = {"even", "odd"};
return a[abs(i % 2)]; /* i now signed int again */

or used a three-element array indexed with a % 2 + 1, but the original
is neater. This kind of subtlety -- the need to use unsigned types --
is the kind of detail that merits a comment.

Given a signed int i I would use "(unsigned)i % 2" or even
"(unsigned)i & 1" as array index to the above.

[...]
Finally, I'd prefer that the return type of the function be 'const
char *' and that it have a better name, although neither may be
permitted if the specification is fixed.

const char *parity(int i)
{
return i % 2 ? "odd" : "even";
}

While I agree that getvalue() is not a perfect name for that function
I'm pretty sure that parity() isn't either. I too cannot imagine a
good name for the function given its signature and semantics, than
say oddity().

-- Ralf
 
B

Beej Jorgensen

Ben Pfaff said:
For what it's worth, it's important to be familiar with the C FAQ
if you're posting here. This issue is discussed in the very
first section:

Sure--I was merely speculating why it was a tricky topic for people.
One of my favorite pastimes is trying to figure out why certain things
difficult to learn. :)

Even in this entry:

cfaq> Anywhere else, it turns into an unnamed, static array of
cfaq> characters

the word "static" is written in passing, and people might gloss over it.
And even if they did consciously recognize the word, they might mistake
it for the English word "static", and not the keyword "static". A
newbie might not even know of the keyword. [Of course, the FAQ is an
excellent resource, and this is not meant to disparage it. If it were
/my/ FAQ, however, I'd add an explicit note about exactly this
frequently discussed issue.]

Plus people tend to know that once a function returns, you can't use any
of the local stuff in the function any more. Maybe they even know about
using "static" in a function. But if they're not clear that the pointer
is pointing to something persistent outside the function (which isn't
immediately obvious since the string is written inside the function and
there's no explicit "static"), they might be reluctant to return the
pointer for use elsewhere.

Finally, if there's any confusion about how pointers work (in C? surely
you jest, Beej) all bets are off.

-Beej
 
B

Ben Bacarisse

Ralf Damaschke said:
Ben said:
Did you notice that pete (who, quite reasonably, wants to write
portable C90 code) has had to chance the specification of the function
so that now it takes an unsigned int? Do you know why? I am pretty
sure it is simply so that indexing the array with u % 2 is always 0 or
1. If u were a signed int, the result on some systems can be -1. Of
course he could have written:

char *a[] = {"even", "odd"};
return a[abs(i % 2)]; /* i now signed int again */

or used a three-element array indexed with a % 2 + 1, but the original
is neater. This kind of subtlety -- the need to use unsigned types --
is the kind of detail that merits a comment.

Given a signed int i I would use "(unsigned)i % 2" or even
"(unsigned)i & 1" as array index to the above.

I could live with that, but I think abs() is more obvious.

The conversion to unsigned raises, once again, the topic of whether
UINT_MAX must be odd. Since I was on the side of the UINT_MAX must be
2**N-1 for some integer N camp last time this related topic came up, I
obviously accept that is must be. There are, however, arguments to
the contrary that I felt could not be dismissed except by appeals to
"the intent of the standard" or by considering practical matters
(there has never been an exception and it is hard to see why there
might be one in the future).
[...]
Finally, I'd prefer that the return type of the function be 'const
char *' and that it have a better name, although neither may be
permitted if the specification is fixed.

const char *parity(int i)
{
return i % 2 ? "odd" : "even";
}

While I agree that getvalue() is not a perfect name for that function
I'm pretty sure that parity() isn't either. I too cannot imagine a
good name for the function given its signature and semantics, than
say oddity().

Agreed. The function is an oddity (pun intended) and therefore very
hard to name. As a general rule, if you find it hard to name a
function it often indicates that the function is not doing what it
should.
 
T

Tim Harig

Plus people tend to know that once a function returns, you can't use any
of the local stuff in the function any more. Maybe they even know about
using "static" in a function. But if they're not clear that the pointer
is pointing to something persistent outside the function (which isn't
immediately obvious since the string is written inside the function and
there's no explicit "static"), they might be reluctant to return the
pointer for use elsewhere.

1. Yes, I think that reaches the heart of the matter and it is quite
possible to write safe code if one always assumes that everything
created locally is only local. No I didn't think about it and yes
I would have if it had been explicitly declared static. My
mistake.

2. There is at least one scenerio where this fails. Somebody pointed out
that static variables can be destroyed when using dlopen() or the
Windows equivilant after the library is unloaded dlclose().

3. I still hold that this kind of scenerio is rare in practical code
anyway. It is in general bad form to have literals of any kind
in the code. For string literals it is much better to have them
stored in some kind of resource file or use gettext() to, at
least, allow localized strings.
 
N

Nobody

2. There is at least one scenerio where this fails. Somebody pointed out
that static variables can be destroyed when using dlopen() or the
Windows equivilant after the library is unloaded dlclose().

But that's true for anything which is defined in the library, whether it's
a "static" local variable, a global variable, or a string literal.
3. I still hold that this kind of scenerio is rare in practical code
anyway. It is in general bad form to have literals of any kind
in the code. For string literals it is much better to have them
stored in some kind of resource file or use gettext() to, at
least, allow localized strings.

That means you have a separate copy for each instance of the program,
which is a waste of resources (although it's not as if that ever stopped
anyone). Literal data belongs in the .rodata segment, or in a file which
is mmap()d read-only, not on the heap.
 
T

Tim Harig

But that's true for anything which is defined in the library, whether it's
a "static" local variable, a global variable, or a string literal.

Appsolutely, with the exception of static variables, it works remarkably
like what happens when local variables go out of scope; and, this is just
one example that I have come up with where depending on static behavior may
fail. For all I know, there may be others.

There are many things that are possible to do with C standards but which
are not necessarily great programming practice. I don't think depending
on the duality the behavior of static variables and those in local scope is
such a great idea. I never declare anything static and I don't depend on
the fact that literal strings will be. I always make the simplified
assumption that everything local will be gone when returning and my code
works. If that not what is happening behind the scenes then I have still
lost very little; but, I have potentially avoided problems like dlopen()
without explicity accounting for the possiblility.
That means you have a separate copy for each instance of the program,
which is a waste of resources (although it's not as if that ever stopped

I don't often have resource problems on modern machines. When in conflict,
I will choose better design and more flexible code over fast code. I can
always, easily and relatively cheaply, add more RAM, more storage, or more
processing power.
anyone). Literal data belongs in the .rodata segment, or in a file which
is mmap()d read-only, not on the heap.

I use gettext() as black box. How it implements itself is its own
business. I have never had resource problems because of it.
 
N

Nobody

Appsolutely, with the exception of static variables, it works remarkably
like what happens when local variables go out of scope; and, this is just
one example that I have come up with where depending on static behavior may
fail. For all I know, there may be others.

There are many things that are possible to do with C standards but which
are not necessarily great programming practice. I don't think depending
on the duality the behavior of static variables and those in local scope is
such a great idea. I never declare anything static and I don't depend on
the fact that literal strings will be. I always make the simplified
assumption that everything local will be gone when returning and my code
works. If that not what is happening behind the scenes then I have still
lost very little; but, I have potentially avoided problems like dlopen()
without explicity accounting for the possiblility.

This means that you have to duplicate everything. You can't return a
pointer to anything in your library, including global variables, or pass
such a pointer to any function outside of your library if there's a chance
that function might retain a copy.

The usual solution to dlclose() issues is simply not to call dlclose().
I don't often have resource problems on modern machines. When in conflict,
I will choose better design and more flexible code over fast code. I can
always, easily and relatively cheaply, add more RAM, more storage, or more
processing power.


I use gettext() as black box. How it implements itself is its own
business. I have never had resource problems because of it.

This approach is one of the biggest hurdles to the use of application
servers. Applications tend to be written as if each user has the resources
of a desktop PC to themselves. Much of the efficiency which could be
obtained with a shared server is wasted because applications allocate
large amounts of unshared memory in spite of the fact that much of the
data will be common to all instances of a program.
 
T

Tim Harig

The usual solution to dlclose() issues is simply not to call dlclose().

Which rather defeats the part of the idea of runtime loadable modules. Do
you think it is less a waste of resources to have blocks of code segments
that are not being used in memory when you could instead offload what you
are not using?
This approach is one of the biggest hurdles to the use of application
servers. Applications tend to be written as if each user has the resources
of a desktop PC to themselves. Much of the efficiency which could be
obtained with a shared server is wasted because applications allocate
large amounts of unshared memory in spite of the fact that much of the
data will be common to all instances of a program.

Funny, I have written servers before which had no performance problems and
the world hasn't endeded since I didn't pass any literals from functions.
In fact, my programs are known internally for being extremely small,
simple, and stable.

There are always a few who get off by shaving that extra millisecond.
They spend more time writing their code and the result is usually more
complex leading to worse bugs. Their performance gains are insignificant
compared to the extra salary and benefits costs in the time that they
waste.

I suppose that it is macho to squeeze out those extra resources; but, I
have never been given to attempting to be macho. I prefer simple, solid
code that is good enough performancewise to get the job done in spec.
 
S

Stephen Sprunk

Tim said:
Which rather defeats the part of the idea of runtime loadable modules. Do
you think it is less a waste of resources to have blocks of code segments
that are not being used in memory when you could instead offload what you
are not using?

Any modern OS which is likely to have dlopen() and dlclose() will page
the code out if it's not being used, so that is a moot point.

S
 
T

Tim Harig

Any modern OS which is likely to have dlopen() and dlclose() will page
the code out if it's not being used, so that is a moot point.

1. The same thing can also happen for those duplicate strings that Nobody
is so afraid of.

2. The program has little control over what the OS swaps out; however, by
using dlclose(), the program has the explicit ability to unload a
module that it knows it will not be needing. The operating system
does not of course have this foreknowledge of how the program will
be working in the future.

3. Using a resource file or gettext(), strings may only be loaded when
they are needed which may mean less memory usage overall rather
then having them all loaded into memory when the program begins or
dynamically loads a library. With a little optimization, there
may be no duplication because once a string is loaded into the
heap, it can be referenced multiple times.

4. Using operationally defined (non-hard coded) strings is more flexible
as it makes it possible for multiple languages etc. It further
adds flexibility for developement as it allows those focusing on
interfaces to change wording in the po files rather then the
source. That means much of the interface may be ignored by
programmers and written by interface design specialists who may not
be programmers.

The bottom line is that it is not inherently terrible to avoid using
embedded litterals of any type; and, it may go a long way towards improving
the overall codebase which I still contend is a much better goal then pure
performance.
 
W

Willem

Tim Harig wrote:
)> The usual solution to dlclose() issues is simply not to call dlclose().
)
) Which rather defeats the part of the idea of runtime loadable modules. Do
) you think it is less a waste of resources to have blocks of code segments
) that are not being used in memory when you could instead offload what you
) are not using?

The block of code segments will only be in memory once for all of the
processes that are using it. That's the big difference.

If I have 20 server apps dlopen()ing the same lib, then if you copy all
the strings before you return them you will have 20 copies of each string
in memory. If you return string literals, you have only one.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
T

Tim Harig

Tim Harig wrote:
)> The usual solution to dlclose() issues is simply not to call dlclose().
) Which rather defeats the part of the idea of runtime loadable modules. Do
) you think it is less a waste of resources to have blocks of code segments
) that are not being used in memory when you could instead offload what you
) are not using?
The block of code segments will only be in memory once for all of the
processes that are using it. That's the big difference.

Appsolutely. Although, in the case of a forking daemon, it would be
possible to make use of shared memory.
If I have 20 server apps dlopen()ing the same lib, then if you copy all
^^^
It any given situation, it might not be necessary to copy *all* of the
strings. It may be possible to only copy those strings which are used.
the strings before you return them you will have 20 copies of each string
in memory. If you return string literals, you have only one.

In so doing, you leave the possiblity that you have a dangling pointer that
wasn't re-initialized and which if referenced will likely cause a segfault.

I don't really doubt that passing string literals *might* save some
resources. I do question whether those lost resources are significant
enough to justify any errors that may arise or the inflexibility of hard
coded values.

Back on the original topic of the OP's function, I don't think returning a
string is as generally useful, or as stylistically sound as returning
an integer; or, at my recommedation an enum (bool or parity).
 
N

Nobody

1. The same thing can also happen for those duplicate strings that Nobody
is so afraid of.

But each duplicate has to be swapped out individually; and if they're
swapped in, each one gets a different page of memory even though they're
all the same.
2. The program has little control over what the OS swaps out;

And doesn't generally need that control.
however, by
using dlclose(), the program has the explicit ability to unload a
module that it knows it will not be needing.

Provided that the module was written in such a way that no references to
it linger. This isn't entirely within the author's control; the module
may need to use libraries which retain references.
3. Using a resource file or gettext(), strings may only be loaded when
they are needed which may mean less memory usage overall rather
then having them all loaded into memory when the program begins or
dynamically loads a library.

Executables and libraries are paged in as needed. If the data is never
accessed, it won't get paged in. If it's read-only data, even if it
does get paged in, there will never be more than one copy in RAM.
With a little optimization, there
may be no duplication because once a string is loaded into the
heap, it can be referenced multiple times.

Only by the same process. If you have multiple instances of the program,
each one will have their own copy the data. If the program uses that data
regularly, each instance will have its own copy in RAM, because the kernel
can't tell that they're all the same (and in practice, they may all be
slightly different, i.e. memory locations will differ).
4. Using operationally defined (non-hard coded) strings is more flexible
as it makes it possible for multiple languages etc. It further adds
flexibility for developement as it allows those focusing on interfaces
to change wording in the po files rather then the source. That means
much of the interface may be ignored by programmers and written by
interface design specialists who may not be programmers.

You can achieve the same results using files which are mmap()d read-only.
These can either be shared libraries, or a custom format.
 
T

Tim Harig

But each duplicate has to be swapped out individually; and if they're
swapped in, each one gets a different page of memory even though they're
all the same.

Yep. Watch out the sky is falling. Its amazing that anybody can get
anything to run without string literals unless they have a supercomputer at
their disposal.

Note that you ignored the main question as to whether all of this memory
usage is really significant enough to justify the costs.
And doesn't generally need that control.

You are one who is so concerned about efficiency and resource usage. Given
that, you should see the advantage that the application has in optimizing
its own memory usage based on the knowledge of its own usage. Personally,
I have not found a situation where it matters enough to be significant.
Provided that the module was written in such a way that no references to
it linger. This isn't entirely within the author's control; the module
may need to use libraries which retain references.

Presumably, any module that you would load dynamically would need to be
self contained enough to be unloaded; otherwise, this discussion is rather
moot. The problem would not deal with the subject at hand.
Executables and libraries are paged in as needed. If the data is never
accessed, it won't get paged in. If it's read-only data, even if it
does get paged in, there will never be more than one copy in RAM.

Memory is paged in blocks (usually 4k for i386). If I only need 255
characters in a given block, the memory paged in will still be 4k
irregardless of the fact that I am only using 255 bytes.
Only by the same process. If you have multiple instances of the program,
each one will have their own copy the data. If the program uses that data
regularly, each instance will have its own copy in RAM, because the kernel
can't tell that they're all the same (and in practice, they may all be
slightly different, i.e. memory locations will differ).

If you are so worried about it, for the case of forking daemons, you can
use shared memory. I have not come across a situation where the
duplification was significant enough to cause problems.
You can achieve the same results using files which are mmap()d read-only.
These can either be shared libraries, or a custom format.

For all I know, that may be how gettext works under the hood; and, I
wouldn't treat mmaped segments as any more stable then localized variables
or dynamically loaded libraries -- they too can by munmap[ed]().

So yes, I concede that Nobody is the god of resource usage; and, its
amazing that I have able to get along all these years without passing
string litterals or a cluster of supercomputers for my programs to run on.
He is duly entitled to wank off for every spare byte that he saves.

Meanwhile, I will continue to strive to create solid, safer, code even
if it means that I use a few more resources (which can be mitigated
if necessary). I sure that I am going down a bad road where the only
possible conclusion is insufficient memory. I guess you should pity me
and my demise ... or something.
 
L

luserXtrog

Back on the original topic of the OP's function, I don't think returning a
string is as generally useful, or as stylistically sound as returning
an integer; or, at my recommedation an enum (bool or parity).

Totally.
Any way you slice it, the function is returning a single integer.
A char * is a memory address, a number. Returning a small integer
like 0 or 1 is less sneaky than returning 0xABCDEF0 or 0xABCDEF5.

I'd probably do something equally and oppositely misguided, such as:

int oddp (int x) { return x & 1; }
....
printf("x is %s\n", (char *[]){"even","odd"}[oddp(x)]);

;%

Would this function benefit from the "pure" attribute in gcc?
 
N

Nobody

In so doing, you leave the possiblity that you have a dangling pointer that
wasn't re-initialized and which if referenced will likely cause a segfault.

I don't really doubt that passing string literals *might* save some
resources. I do question whether those lost resources are significant
enough to justify any errors that may arise or the inflexibility of hard
coded values.

It isn't just string literals. Some programs can consist almost entirely
of data which is read from a (fixed) file, stored on the heap, and never
modified.

Most interpreted languages behave like this. A rare exception is that some
Lisp interpreters can use a read-only frozen state file, using
copy-on-write for modifications. Needless to say, this doesn't work with
destructive updates, but it does mean that you (mostly) avoid duplicating
the code for every instance.
 
K

Keith Thompson

luserXtrog said:
Totally.
Any way you slice it, the function is returning a single integer.
A char * is a memory address, a number. Returning a small integer
like 0 or 1 is less sneaky than returning 0xABCDEF0 or 0xABCDEF5.

Addresses are not numbers. They may be represented that way, as any C
object's representation is a sequence of bits, but conceptually
they're very different things.
I'd probably do something equally and oppositely misguided, such as:

int oddp (int x) { return x & 1; }

Is that guaranteed to work for negative values?
...
printf("x is %s\n", (char *[]){"even","odd"}[oddp(x)]);

;%

I'd just write:

printf("x is %s\n", oddp(x) "odd" : "even");

In any case, the fact that this particular function would more
usefully return an integer (or perhaps an enum) than a string doesn't
imply that functions returning strings in general are not useful.
(Of course a function can't actually return a string, but it can
return a result that gives the caller access to a specified string.)

Returning (a pointer to) a string literal can be problematic if
you're concerned about localization, but not all programs need to
deal with that.
Would this function benefit from the "pure" attribute in gcc?

I don't know.
 
L

luserXtrog

Addresses are not numbers.  They may be represented that way, as any C
object's representation is a sequence of bits, but conceptually
they're very different things.

Thanks, Korzybski.
But there is also a basic similarity between addresses and integers,
particularly where the integer is employed to index an array.
A pointer is an index to any location in the process memory space.
Using an index to a smaller portion of that memory can improve
locality.
Is that guaranteed to work for negative values?

Oh. I hadn't thought of that. [poof]
...
printf("x is %s\n", (char *[]){"even","odd"}[oddp(x)]);

I'd just write:

    printf("x is %s\n", oddp(x) ? "odd" : "even");

In any case, the fact that this particular function would more
usefully return an integer (or perhaps an enum) than a string doesn't
imply that functions returning strings in general are not useful.
(Of course a function can't actually return a string, but it can
return a result that gives the caller access to a specified string.)

Returning (a pointer to) a string literal can be problematic if
you're concerned about localization, but not all programs need to
deal with that.

Or if you're trying to get lint's blessing.

I don't know.

Oh, well. :)
 
K

Keith Thompson

luserXtrog said:
Thanks, Korzybski.
But there is also a basic similarity between addresses and integers,
particularly where the integer is employed to index an array.
A pointer is an index to any location in the process memory space.
Using an index to a smaller portion of that memory can improve
locality.

You can probably find a basic similarity between just about any two
concepts.

Your model of a pointer as "an index to any location in the process
memory space" assumes a monolithic linear addressing space, where all
of memory can be treated as a single array of bytes. C does not
require such a model, and there are machines that don't use it.

You can add or subtract an integer to a pointer and get another
pointer, you can subtract one pointer from another to get an integer,
and you can compare two pointers using <, <=, >, or >=. But all
these operations are defined only within a single object.

[...]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,266
Messages
2,571,087
Members
48,773
Latest member
Kaybee

Latest Threads

Top