Increasing efficiency in C

A

Arthur J. O'Dwyer

Mike Wahler said:
jacob navia said:
"Mike Wahler" <[email protected]> a écrit...

I think you're proposing C++. Rather than try to 'reinvent' it,
I just use it.

Well I can't use it Mike. [...] Just too complex.
I use the parts I find useful, discard the rest.
The crux of the matter is knowing when to stop. When a feature
becomes a nuisance, and doesn't simplify the task it is better
to drop it.

Or ignore it. Simple, huh?

The problem is you just can't ignore them, because on a project of
any size others will start to use them.

Solution 1: Don't care what the other guy is doing; that's *his*
part of the project, and you only need to know the interfaces. Make
sure the interfaces are all written using standard C types and
passing mechanisms, so that modules can talk to each other with
some sort of reliability.
Solution 2: Tell the other guy up front that he shouldn't use
the complicated parts of C++. Or better, get your boss to tell
him.
Solution 3: Tell the other guy that he can't use C++, period.
Then *you* use C++, and compile it so that it links with C code.
Link it with the other guy's module (written in nice readable C).
Do you think it's practical to use a subset of C++ for anything
outside of your own personal code, or maybe code developed by a couple
of people. Maybe in an organisation with very strict procedures.

Eh, probably not. ;-) But I think it's useless to say that
C++ is less practical than C; it's going to get used anyway,
because it really does make some things easier. std::string and
std::map are my friends, and I *do* use them when it's appropriate.
The main reason I don't use C++ for everything is that the STL
methods have such weird names; I have to keep a reference open
on my desktop whenever I'm doing anything with std::map!

-Arthur,
heretic
 
J

jacob navia

----- Original Message -----
From: "Arthur J. O'Dwyer" <[email protected]>
Newsgroups: comp.lang.c
Sent: Thursday, March 04, 2004 9:31 PM
Subject: Re: Increasing efficiency in C

The main reason I don't use C++ for everything is that the STL
methods have such weird names; I have to keep a reference open
on my desktop whenever I'm doing anything with std::map!

-Arthur,
heretic

I am writing map_string. Will take a function and return a string built with
the results of applying the function to each character.

I would like to see your specs. It *can* be ritten in C!

jacob
 
R

Roc

Dan said:
As I recall, Turbo Pascal stored the size of the string in the first
character slot, thus limiting the max string length to 255 (on the
DOS-based platform I used).

Not very handy if you needed longer strings than that.


Brian Rodenborn

As I recall, that was more convention for arrays than it was stipulated by
the language, wasn't it?
 
A

Alan Balmer

----- Original Message -----
From: "Arthur J. O'Dwyer" <[email protected]>
Newsgroups: comp.lang.c
Sent: Thursday, March 04, 2004 9:31 PM
Subject: Re: Increasing efficiency in C



I am writing map_string. Will take a function and return a string built with
the results of applying the function to each character.

I would like to see your specs. It *can* be ritten in C!
IIRC, that's an example in Koenig's "pitfalls" book.
 
M

Mike Wahler

Peter Ammon said:
jacob said:
As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
...
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?

jacob

I don't understand why everyone is comparing this to C++. The obvious
parallel is to Pascal, which used exactly this sort of string
representation.

I believe I was the first in this thread to mention C++.
I did so 1) because I'm familiar with it, 2) because
Jacob seems to be clamoring for the 'safety' and
'intelligence' which is built into C++'s 'std::string'
type.

As for his remarks about C, it seems he wants to put
training wheels on a Harley-Davidson racing motorcycle.
No thanks. I'd certainly wear a helmet (take precautions),
but *I* will decide how far I should lean into the turns.

I could have easily said BASIC instead. I used C++ as
an example, not necessarily as a 'cure-all' (I actually
use C far most often than other languages in my production
work, for a variety of reasons).


-Mike
 
J

jacob navia

Mike Wahler said:
As for his remarks about C, it seems he wants to put
training wheels on a Harley-Davidson racing motorcycle.

Yes. You got that 100%.

As you may know, even Harley-Davidson drivers weren't born
knowing how to drive those beasts.

They have to learn as anybody else.

At one time we were all beginners isn't it?

Training wheels are very useful since they allow to train
yourself using the machine without doing any
harm.

Undefined behavior, passing red lights without stopping
and all kinds of bad driving are to be actively eliminted.

This requires more training.
No thanks. I'd certainly wear a helmet (take precautions),
but *I* will decide how far I should lean into the turns.

Yes. But when you lean too far the machine should have
a safety net isn't it?

I know a computer crash is harmless compared to
a Harley Davidson crash, at least, nothing serious
happens to you even if you crash at full C speed.
:)
I could have easily said BASIC instead. I used C++ as
an example, not necessarily as a 'cure-all' (I actually
use C far most often than other languages in my production
work, for a variety of reasons).

Well, I think that when driving a computer the
machine should have a safe environment. You can
drive even a Harley Davidson safely.

Specially with a fast machine is easy to lean too far,
as you know.

I prefer safer environments. Risk taking is boring at the
end. Why keep bugs around for years?

Above all:

Why can't be C conceived as an evolving language like
any other?

Are we stuck with those strings forever or what?

jacob
 
J

jacob navia

Alan Balmer said:
IIRC, that's an example in Koenig's "pitfalls" book.

C is a pitfall then.

I know this opinion is widespread among some people.
Specially in C++ circles :)

Why can't be map be written in C?

I wrote my first map for a lisp interpreter, I wrote in
C around 1990.

The idea that map can't be written in C is absurd. You pass
to a function each element in a sequence. You can obtain (in
one of the possible versions) a similar list or vector, that is
a map of the function applied to each char.

Of course THAT map has not all the bells and whistles
of C++ and that is precisely the point. It can be written
in C, not in C++.

Of course C can't write C++. That is precisely what
makes C interesting.

A mapping function is no longer that complicated.
Apply a function to a container in sequence.

Very simple.

jacob
 
M

Martin Ambuhl

Default said:
Dan Pop wrote:




As I recall, Turbo Pascal stored the size of the string in the first
character slot, thus limiting the max string length to 255 (on the
DOS-based platform I used).

Not very handy if you needed longer strings than that.

This was the scheme implemented earlier in a number of languages. For
example, the MU-BASIC provided with RT-11 (for working scientists to
write real-time programs on the PDP-11 with a minimum of learning costs)
had string variables with element 0 containing the size and the indices
of the characters being 1-255.
MU-BASIC also provided virtual arrays, wherein an array's elements might
be stored on the disk rather than in memory and arrays that _had_ to be
in memory could be kept there.
 
A

Arthur J. O'Dwyer

C is a pitfall then.

I know this opinion is widespread among some people.
Specially in C++ circles :)

Why can't be map be written in C?

:) You missed the point. Your pitfall was not in your assumption
that "X can always be written in C," but rather in your assumption
that "because A does not use C for X, he must think that X *cannot*
be done in C."
Here's an example of a situation in which I have used C++, even
though it *could* be done in C. Note that this is a throw-away
program, not a big application:

Given a phone number composed of decimal digits, and a dictionary
of English words such as /usr/dict, produce a list of plausible
mnemonics for the number according to a standard telephone keypad.
E.g., given the input number "278487," the program would produce a
list including "Arthur," "2-rug-up," and "2-suits."

This IMHO is much easier to hack together when given the building
blocks of std::string and std::map, than it would be in C.

The idea that map can't be written in C is absurd. You pass
to a function each element in a sequence. You can obtain (in
one of the possible versions) a similar list or vector, that is
a map of the function applied to each char.

That's not std::map. std::map is a container that MAPS things
onto other things. The functional-programming function 'map'
is something else entirely, and is probably duplicated somewhere
in C++'s <algorithm> or <functional> headers, I dunno and I duncare.

-Arthur
 
A

Arthur J. O'Dwyer

I am writing map_string. Will take a function and return a string
built with the results of applying the function to each character.

You *do* realize this is a one-liner, right?

void mapstr(char *d, char *s, int(*f)(int))
{
while (*s)
*d++ = f(*s++);
*d = '\0';
return;
}

Possible enhancements: Allow 'mapstr(s,t,0)' as a synonym
for 'strcpy(s,t)'. Control for the possibility that f(k)
equals zero for some k!=0. If d is NULL, allocate and return
space for the resulting string using 'malloc' or a static buffer.
<OT>
Implement 'foldl' and/or 'foldr' over strings (although 'foldr'
would probably be silly, and both really require template
programming to be useful, which C doesn't have).
</OT>

-Arthur
 
J

jacob navia

Dan Pop said:
In <[email protected]> "jacob navia"

You don't know where the pointer will end pointing to. Your wording
simply didn't make any sense to anyone but you.

The representation of a string in C is the sequence of characters, up to
and including the null terminator. No kind of pointer is involved in the
representation of a C string.

Wow Dan, this is news for me. No kind of pointer?

Not even a char * as it seems?

Strange. Are all those prototypes in string.h wrong?

I would fill a defect report.

Just do not exaggerate Dan. Let's keep cool ok?

I am speaking about a naked char * that points to the
start of a sequence o bytes that should end with a
terminating zero.

By definition of the data structure, its length is
not known, and the same scan must be repeated
each time we access the length.

More serious, the failure modes are quite horrible.

In writing mode a wild pointer is like a loaded
machine gun, ready to start shooting around
at random. Pieces of the program, essential data
like the return address are wiped by the gun,
without any way for the system to stop it.

The program is in an undeterminable state,
depending on the direction the machine gun was
shooting.

Ahh. How nice. We are fearful. We risk that but
it works you see?

*I* do not do any mistake, you say.

Well Dan, just keep cool.

I have no fear to recognize that I do make mistakes.
I am not a star programmer. I am a run of
the mill brain, that gets bored taking always this
new dangerous turn. Damm it. Can't the machine
do it for me?

You say:
Are you reading impaired or what? Which of them qualifies as popular?

Well, Microsoft proposed one recently. And there are several.
I can't tell you which are "popular" since I am not doing
that kind of research. But they are surely used.
No other scheme proved to by better in a GENERAL PURPOSE context.
As you admit yourself, the alternate libraries are designed for well
defined goals, rather than as universal replacements for the C strings.

Safety was one of the more widespread goals. I am trying to
build checked strings into lcc-win32. I think that a more
debuggable environment is easier to work with.
And the very existence of these libraries proves that the C language DOES
support alternate schemes. So, your point is moot.

The language doesn't support it.

I repeat that length prefixed strings should be easy to
use: name[2] should do what is supposed to.

My whole point is that data structure development should
be opened up to the C user that should be able to
specify data types that follow special rules he/she defines.

For instance you could add a "flags" field to the standard
length prefixed string, and implement read only
strings, or time stamp based data, or whatever.

The language should allow people defining programs
that handle the data structures in a way it suits them the
best.

C is not object oriented but we all use lists, stacks, hash tables
in our everyday programming.
1. This is not a performance issue.

No, this is a human performance issue. People get bored of
details. Computers do not.
People use computers to make repetitive work. Why can't
we use the computer to check for mistakes?

Your answer is:
2. This is a *general* problem of C: most C features are error prone in
the hands of the incompetent.

Your are competent Dan. Surely more than me.

I belong to the other ones.
The ones that make mistakes. I am not afraid of saying this,
maybe because I think knowing this is the start of
knowledge.

When you realize your mistakes you can start learning.
Only then.

:)

Of course not Dan. Sure. I believe you that 100%
Dynamic memory allocation has exactly the same problems: write beyond
a dynamically allocated memory bolck (in either direction) and memory
corruption will (most likely) bite you, sooner or later. What is your
better replacement for malloc and friends?

The garbage collector. I wrote one for my Lisp interpreter
in the 90ties, and I have adapted Mt Boehm's work to lcc-win32.

The GC is much better than malloc/free. But I know, that's
another discussion ...
C is a sharp tool *by design*. People who can't use sharp tools or are
afraid of them, should not use C. There are plenty of other programming
languages designed for them so there is no need to turn C into a less
sharp tool (and, therefore less effective in the hands of the competent
programmers) and annoy C's *intended* user base.

I want it to be sharper Dan. C is not sharp enough
with all those bugs that creep the programs.
You can't be sure of a tool if it is not designed to
be sharp and safe.

You take the knife not at the edge?

A knife is a sharp tool by its very nature.

But it can only be used because
you do not touch at the edge isn't it?

That blunt side, that provides safety for your hand
makes for a usable knife. Without it, using a knife
is cutting yourself in the fingers :)
There are many ways in which C needs to be extended, but adding more
string formats is not one of them. You're wasting your time trying to
fix something that isn't broken.

That is the start. A better string library would be an achievement.

Nothing spectacular, and very simple.
 
A

Alan Balmer

C is a pitfall then.

I know this opinion is widespread among some people.
Specially in C++ circles :)

Why can't be map be written in C?

You misunderstand. Of course it can be written in C, as illustrated in
Koenig's book. Who said otherwise?

As for the book title, "Traps and Pitfalls" books and articles have
been written for a number of languages and end-user programs. It does
not mean that C itself is a pitfall, just that there are pitfalls that
programmers can fall into, particularly novice programmers. This is
true of any language.
 
M

Malcolm

Dan Pop said:
The representation of a string in C is the sequence of characters, up > to
and including the null terminator. No kind of pointer is involved
in the representation of a C string.
This leads to the array pointer equivalence issue. Enough to say that
strings are passed around internally in C as naked char *s.
[ better libraries ]
Are you reading impaired or what? Which of them qualifies as
popular?
This is the crux of the issue. Every man and his dog writes a new C string
library. Part of the reason is that it's something a newbie programmer can
do, part is that it is so obvious that efficiency can be improved by passing
a length along with a pointer. However you need a standards body to make the
library stick, plus an end to the C convention that
foo("bar");
passes a char * to function foo.
There are many ways in which C needs to be extended, but adding > more
string formats is not one of them. You're wasting your time
trying to fix something that isn't broken.
I dunno. For my purposes C strings are perfectly adequate because all I
usually need to do is a trivial bit of text programming, maybe to get a
filename from a user or to print out a high score table. However if I was
writing a performance-critical webpage server that spits out html pages at a
great rate, then I might well want something better. A standard library of
high-perfomance string functions would then be nice.
 
J

jacob navia

Arthur J. O'Dwyer said:
You *do* realize this is a one-liner, right?

void mapstr(char *d, char *s, int(*f)(int))
{
while (*s)
*d++ = f(*s++);
*d = '\0';
return;
}

Possible enhancements: Allow 'mapstr(s,t,0)' as a synonym
for 'strcpy(s,t)'.

Interesting idea. The function is the identity function
by default. In the case of a copy, the function is

int fn(int c) { return c; } // identity function.

The call is optimized away of course, but it is still consistent.
Control for the possibility that f(k)
equals zero for some k!=0.

Using length prefixed strings this is not a problem.
Actually embedded zeroes can appear in a length
delimited string anywhere. They do not have any
meaning any more. Zeros are that: zeroes. There is
no "special" character.
If d is NULL, allocate and return
space for the resulting string using 'malloc' or a static buffer.

With length prefixed strings we know that the
length of the result is fixed, and allocation doesn't
imply a memory scan.
 
J

jacob navia

Arthur J. O'Dwyer said:
Given a phone number composed of decimal digits, and a dictionary
of English words such as /usr/dict, produce a list of plausible
mnemonics for the number according to a standard telephone keypad.
E.g., given the input number "278487," the program would produce a
list including "Arthur," "2-rug-up," and "2-suits."

Sorry but i do not get it.

What algorithm you use for mapping 278487 to "Arthur" ??

It is not that evident. I mean not the code in C++ but
an english description of a way of getting "Arthur"
from 278487 and not "Dan" or "whatever" :)

P.S. Please note that the programmer doesn't know your
phone number of course :)
 
N

Nick Landsberg

jacob navia wrote:

[ much snippage ]
Safety was one of the more widespread goals. I am trying to
build checked strings into lcc-win32. I think that a more
debuggable environment is easier to work with.




The language doesn't support it.

At the risk of being in over my depth, the language
supports any string representation you may dream up.
The "standard libraries" support one particular
reprsentation. This implementation was, I guess,
chosen by the standards committee in order to
be backwards compatible with the large number
of C-programs which were in existence prior to 89.

One is free to re-create all of the standard routines using
a different convention for representing strings
(e.g. char-count in s[0], some "magic cookie" means
to flash the character on an IE browser, etc., etc.)
and put this into an implementation-specific library
using different names than the standard specifies for
the "standard" library functions. That is currently
supported by the language, is it not?

[ more snipped ]
My whole point is that data structure development should
be opened up to the C user that should be able to
specify data types that follow special rules he/she defines.

It's already there. However, by convention, almost all people
use the routines in the standard. As Dan has said, there
have been attempts to do this in the past, but not
accepted by the vast majority of C programmers,
nor ported from one implementation to the next.

[ much snipped ]
That is the start. A better string library would be an achievement.

Nothing spectacular, and very simple.

But would it be backward compatible with tons of
code already out there? This may be an
intractible problem.
 
D

Dik T. Winter

>
> Meaning original Pascal programs had to be written for fixed-length
> strings, and each different fixed length would need a different set of
> functions. Ditto for arrays of other types. And verily, it suckéd
> mightily, and there was much wailing and gnashing of teeth.

The much wailing and gnashing of teeth was largely at the fixed array
length of other types than characters (the length was part of the type).
I have seen the implementing done of a numerical library in Pascal.
(Luckily I was only partly involved.)
> Forwhich
> conformant array schemas were invented. And verily, it did still suck
> mightily, but not nearly as mightily as before.

The version of Pascal from Minnesota University was much better than the
original, and also much better than the successor... But of course,
Pascal was written at a time when string handling was still minimal.

I have used only one language where an array was really a first-class
citizen: Algol 68, with all kinds of nifty possibilities. Like with:
'mode' 'complex' = 'struct'('real' re, im);
'complex' a[1:n, 1:m];
you could select:
a[,2].re
which would be an array reference to the real parts of the second
column of a, and you could use it anyplace where an array of reals
was needed. But I digress.
 
A

Arthur J. O'Dwyer

Sorry but i do not get it.

What algorithm you use for mapping 278487 to "Arthur" ??

Hmm. I would have expected my explanation to be a little opaque
to a Russian native, perhaps, but I honestly expected the French
telephone to follow the American mold [see diagram]. Do your telephones
have any letters on the keys? Which ones?


The standard American telephone keypad:

ABC DEF
1 2 3

GHI JKL MNO
4 5 6

PRS TUV WXY
7 8 9

OPER
* 0 #

-Arthur
 
R

Richard Bos

Ben Pfaff said:
sizeof(size_t) == sizeof('\0') is very common. For example, it
is true on most "32-bit" systems.

Maybe you meant sizeof(size_t) == sizeof(char).

Er, yes, of course I did <smacks self>.

Richard
 
R

Richard Bos

jacob navia said:
Yes. You got that 100%.

As you may know, even Harley-Davidson drivers weren't born
knowing how to drive those beasts.

They have to learn as anybody else.

Yeah. But they don't do so on a full-powered Harley, they do so on a
250cc Toyota. If you really tried to put training wheels on a Harley,
you'd be found dead the day after. In four different dumpsters. You just
be grateful you're dealing with programmers, not with Hell's Angels -
the worst you can expect here is ridicule, and the realisation that you
don't know what the blazes you're on about.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top