User implementable standard library functions

K

Keith Thompson

Kenneth Brody said:
cr88192 said:
Keith Thompson said:
[...]
How do you portably guarantee proper alignment?

if grabbed from malloc, it is aligned to whatever malloc's alignment [...]
ifdef powers, ifdef fixes everything...

Hardly. If mmap isn't supported, you still have to find some
alternative.

on windows, I use malloc.
[...]

But wasn't the original question along the lines of "how can one
write a replacement for (not a wrapper around) malloc()"?

(Okay, it was actaully more along the lines of "which standard
library functions could be implemented by an end user, rather
than the implementor".)

Yes. Here's the original post, by (e-mail address removed):
| One interesting thing to come out of the recent "Alignment" thread is
| that it is impossible to write a portable replacement for malloc in
| "user space" (not sure what the right term is - I mean an ordinary
| programmer, not an implementor) - even a naive method using a large
| array isn't guaranteed to work if there's no way of having a variable
| of strictest alignment. Oh, for the sake of the pedants, let's
| discount
| void *my_malloc(size_t size) { return 0; }
| :)
|
| Of course, for most standard library functions, say something like
| strlen, it's perfectly possible to provide a completely equivalent
| implementation yourself.
|
| So as an academic exercise, which other standard library functions
| share the same property as malloc, that the ordinary programmer is
| powerless to write an equivalent function without dipping into non-
| Standard implementation details?

Obviously using mmap and using malloc (!) are not responsive to the
original question.

Thread topics do drift, and that's ok, but we shouldn't let them drift
away from *both* the topic of the newsgroup and relevance to the
original question.
 
H

Harald van =?UTF-8?B?RMSzaw==?=

pete said:
fgetc and fputc are much easier to implement than that:

Fair point, fgetc/fputc can also be implemented portably by calling
getc/putc. And getc/putc can be implemented portably by calling fgetc/fputc
or fread/fwrite or fscanf/fprintf or vfscanf/vfprintf, but whichever way
you choose, you have one of fread/fgetc/getc/fscanf/vfscanf and
fwrite/fputc/putc/fprintf/vfprintf that cannot be implemented portably. I
hope I didn't accidentally overlook any other input or output functions
that would form yet another alternative base I/O function. :)
 
P

pete

Harald said:
pete wrote:

Fair point, fgetc/fputc can also be implemented portably by calling
getc/putc.

But the definitions that I showed,
would actually be efficient in cases where getc and putc
were also implemented as macros, which is not uncommon.
 
C

cr88192

Kenneth Brody said:
cr88192 said:
Keith Thompson said:
[...]
How do you portably guarantee proper alignment?

if grabbed from malloc, it is aligned to whatever malloc's alignment [...]
ifdef powers, ifdef fixes everything...

Hardly. If mmap isn't supported, you still have to find some
alternative.

on windows, I use malloc.
[...]

But wasn't the original question along the lines of "how can one
write a replacement for (not a wrapper around) malloc()"?

well, often they can be interrelated, in particular, in that malloc (by
itself) typically has a few major limitations:
it is inefficient for a large number of small objects (say, doing several
million 8 byte allocations);
it does not do garbage collection;
it is not possible to store or retrieve type information from objects;
....

doing a wrapper, oner can address these goals, and others...


how does it classify?:
one uses malloc simply to gain large chunks of memory (in much the same way
malloc itself gains memory through means such as mmap or sbrk).


and, if we detect a favorable situation (running on a certain known OS), we
can bypass this and gain the same memory in much the same way as the
original implementation (via mmap, or maybe sbrk). in all practical sense it
is a replacement.

if one didn't even want this much library dependency, they could bypass
functions like mmap, and generate the interrupts themselves, thus directly
accessing the OS kernel.

the exact means employed may not matter too much.


one does their own custom allocator over this new memory (with whatever
algos they feel appropriate).

it works...

(Okay, it was actaully more along the lines of "which standard
library functions could be implemented by an end user, rather
than the implementor".)

ok.

in a strict sense, none can, since they are, after all, already implemented
by the standard library.
and in a non-strict sense, all can...
 
F

Flash Gordon

cr88192 wrote, On 24/08/07 07:02:
ok.

in a strict sense, none can, since they are, after all, already implemented
by the standard library.
and in a non-strict sense, all can...

In a strict sense a lot can. memset, memcpy...
 
R

Richard Heathfield

Flash Gordon said:
cr88192 wrote, On 24/08/07 07:02:

In a strict sense a lot can. memset, memcpy...

<pedant>
Nope, none can - because all the standard library functions have names
that are reserved for use by the implementation.
</pedant>
 
R

Richard Bos

cr88192 said:
Keith Thompson said:
cr88192 said:
[...]
How do you portably guarantee proper alignment?

if grabbed from malloc, it is aligned to whatever malloc's alignment
is (on x86 and friends, usually at least 8 or 16 I think...).

and, if from mmap, it is aligned to a page boundary.

mmap is not portable.

it exists on linux...
good enough for if running on linux, just have to detect and use.

That's irrelevant here in comp.lang.c.

but is relevant for actual coding, where most non-trivial code ends up
having to deal with the underlying implementation in one non-standard way or
another...

unless you mean to say: c.l.c is only for theory, not for actual practice...

No, ISO C (and therefore, c.l.c) _is_ for practice. That's _why_ we
don't assume that All The World Is A Bitty-Box. In particular in the
present thread, which was about _portably_ implementing standard library
functions. Of course you can implement every single library function on
any system if you're allowed to use extensions. Since most systems have
at least one implementation which allows you to use some (but wildly
varying) kind of assembler from within C, C-plus-any-extension-you-
desire is trivially able to do anything which assembler can, including
implementing the Standard Library. But that wasn't the point of this
thread.
on windows, I use malloc.

To implement malloc() itself? Nice. For the next trick, please lift
yourself clear off the ground.

Richard
 
C

CBFalconer

Richard said:
.... snip ...


To implement malloc() itself? Nice. For the next trick, please
lift yourself clear off the ground.

That is definitely a bootstrap :) (or hydrogen)
 
K

Kenneth Brody

Richard said:
Flash Gordon said:


<pedant>
Nope, none can - because all the standard library functions have names
that are reserved for use by the implementation.
</pedant>

Well, you could call them my_memset() and my_memcpy(). But, could
one write a portable my_malloc() which properly handles alignment?

Actually, I suppose the answer is "yes, as long as you edit a header
file to properly #define my_ALIGNMENT as needed". But, can you do
it in such a way that the system self-determines the value? (Even
if that "self determination" involves running another program which
generates the #define for you.)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
P

Peter J. Holzer

pete said:
santosh said:
So as an academic exercise, which other standard library functions
share the same property as malloc, that the ordinary programmer is
powerless to write an equivalent function without dipping into non-
Standard implementation details?
[...]
isdigit can be implemented.
It is not locale dependent.

The same applies to isxdigit, and (without looking it up) probably
isspace - and perhaps one or two others in the same vein.

Only if you stick to the basic execution character set. There may be
other digits, xdigits, whitespace characters, etc.

OTOH, all the isx... and other locale-dependent function can be
implemented in portable C by moving the knowledge about the locales to
files which are read at runtime. The filenames could be passed
via the environment to avoid needing knowledge about the filesystem
layout.

hp
 
R

Richard Heathfield

Peter J. Holzer said:
Only if you stick to the basic execution character set. There may be
other digits, xdigits, whitespace characters, etc.

I just looked up isspace, and it is indeed locale-dependent. But isdigit
and isxdigit are not. The isdigit function is required to return true
(non-zero) for *only* the ten decimal digit characters '0' through '9',
regardless of which character set is in use. For any other character,
it must return 0, regardless of the locale or the character set in use.
The isxdigit function is required to return true (non-zero) for the ten
decimal digit characters, and for 'a', 'b', 'c', 'd', 'e', 'f', 'A',
'B', 'C', 'D', 'E', and 'F'. For any other character, it must return 0,
regardless of the locale or the character set in use.
<snip>
 
P

Peter J. Holzer

I just looked up isspace, and it is indeed locale-dependent. But isdigit
and isxdigit are not. The isdigit function is required to return true
(non-zero) for *only* the ten decimal digit characters '0' through '9',
regardless of which character set is in use. For any other character,
it must return 0, regardless of the locale or the character set in use.
The isxdigit function is required to return true (non-zero) for the ten
decimal digit characters, and for 'a', 'b', 'c', 'd', 'e', 'f', 'A',
'B', 'C', 'D', 'E', and 'F'. For any other character, it must return 0,
regardless of the locale or the character set in use.

You are right. That means that a hypothetical C implementation with a 16
bit char and the Unicode BMP as the execution character set would have
the problem that isdigit doesn't map to the unicode Nd (Numeric,
decimal) property.

I'm also not sure if the wording for iswdigit "The iswdigit function
tests for any wide character that corresponds to a decimal-digit
character (as defined in 5.2.1)." allows more than 10 digits (i.e.,
several wide characters could correspond to each decimal digit). glibc
doesn't seem to think so.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top