return the start of a substring in a string in c

Eric Sosman · Jul 17, 2007

Malcolm McLean wrote On 07/17/07 17:06,:

[...] However it
obvious that C has too many integer types; short, int, long, long long, in
signed and unsigned, size_t and ptrdiff_t.

Drat! My obviousness detector must be on the fritz
again. Probably forgot to pay my platitude purveyor.

That's ten standards for
representing an integer.

... to accompany three looping constructs, two ways
to exit, and a partridge in a pear tree. What's wrong
with an expressive vocabulary? Are you an aficionado of
Newspeak, perchance?

We also know that standardisation tends to work.

Hence the QWERTY keyboard and the NTSC signal. Not
to mention VHS, the way English is spel{led,t}, Daylight
Saving Time, and foot binding. By definition, "standard"
is never "superior."

You can't just argue by analogy, of course, but don't ignore the lessons of
history.

The lessons of history *are* analogy.

Malcolm McLean · Jul 17, 2007

Eric Sosman said:
Malcolm McLean wrote On 07/17/07 17:06,:

[...] However it
obvious that C has too many integer types; short, int, long, long long,
in
signed and unsigned, size_t and ptrdiff_t.

Click to expand...

Drat! My obviousness detector must be on the fritz
again. Probably forgot to pay my platitude purveyor.

That's ten standards for
representing an integer.

Click to expand...

... to accompany three looping constructs, two ways
to exit, and a partridge in a pear tree. What's wrong
with an expressive vocabulary? Are you an aficionado of
Newspeak, perchance?

The difference is that the integer representation is the way that functions
talk to each other.
No one minds ten different kinds of kettles in the shop. However when you've
got ten different standards for plugs it becomes a real problem.

Ian Collins · Jul 18, 2007

Malcolm said:
Eric Sosman said:

Malcolm McLean wrote On 07/17/07 17:06,:

[...] However it
obvious that C has too many integer types; short, int, long, long
long, in
signed and unsigned, size_t and ptrdiff_t.

Click to expand...

Drat! My obviousness detector must be on the fritz
again. Probably forgot to pay my platitude purveyor.

That's ten standards for
representing an integer.

Click to expand...

... to accompany three looping constructs, two ways
to exit, and a partridge in a pear tree. What's wrong
with an expressive vocabulary? Are you an aficionado of
Newspeak, perchance?

Click to expand...

The difference is that the integer representation is the way that
functions talk to each other.
No one minds ten different kinds of kettles in the shop. However when
you've got ten different standards for plugs it becomes a real problem.

Unless you have 10 standards for sockets.

Try controlling real hardware with only one integer type.

Malcolm McLean · Jul 18, 2007

Ian Collins said:
Malcolm said:

Eric Sosman said:

Malcolm McLean wrote On 07/17/07 17:06,:
[...] However it
obvious that C has too many integer types; short, int, long, long
long, in
signed and unsigned, size_t and ptrdiff_t.

Drat! My obviousness detector must be on the fritz
again. Probably forgot to pay my platitude purveyor.

That's ten standards for
representing an integer.

... to accompany three looping constructs, two ways
to exit, and a partridge in a pear tree. What's wrong
with an expressive vocabulary? Are you an aficionado of
Newspeak, perchance?

Click to expand...

The difference is that the integer representation is the way that
functions talk to each other.
No one minds ten different kinds of kettles in the shop. However when
you've got ten different standards for plugs it becomes a real problem.

Click to expand...

Unless you have 10 standards for sockets.

So we've got ten different plugs, and ten different matching sockets. If
your kitchen socket doesn't match the plug of the kettle you want, you can
buy an adapter, but there are ninety of them.

Try controlling real hardware with only one integer type.

That's partly why I insist on separating IO from logic. It is a lot easier
said than done.
But code that interacts with hardware devices using bits is inherently
non-portable anyway. It won't become any less portable if you use a
platform-specific extension.

Flash Gordon · Jul 18, 2007

Malcolm McLean wrote, On 18/07/07 07:40:

Ian Collins said:
Ian Collins said:

Malcolm said:

Malcolm McLean wrote On 07/17/07 17:06,:
[...] However it
obvious that C has too many integer types; short, int, long, long
long, in
signed and unsigned, size_t and ptrdiff_t.

Drat! My obviousness detector must be on the fritz
again. Probably forgot to pay my platitude purveyor.

That's ten standards for
representing an integer.

... to accompany three looping constructs, two ways
to exit, and a partridge in a pear tree. What's wrong
with an expressive vocabulary? Are you an aficionado of
Newspeak, perchance?

The difference is that the integer representation is the way that
functions talk to each other.
No one minds ten different kinds of kettles in the shop. However when
you've got ten different standards for plugs it becomes a real problem.

Click to expand...

Unless you have 10 standards for sockets.

Click to expand...

So we've got ten different plugs, and ten different matching sockets. If
your kitchen socket doesn't match the plug of the kettle you want, you
can buy an adapter, but there are ninety of them.

Actually, different types of plugs and sockets for different purposes is
very useful. It prevents you from plugging the wrong plug in to the
wrong socket. This is also why in some countries they use different
plugs for different power ratings to ensure you do not overload the
current (we even do it in the UK still for certain specific purposes,
mainly industrial, which is appropriate since so much C is written
commercially).

That's partly why I insist on separating IO from logic. It is a lot
easier said than done.
But code that interacts with hardware devices using bits is inherently
non-portable anyway.

Where did Ian mention bits? A vast amount of I/O is integer information
of an appropriate width.

> It won't become any less portable if you use a
platform-specific extension.

Actually, it does. A number of problems have a finite limit on the width
of data it will ever be worth using, so you use the same unsigned
integer type throughout the calculation which gives you more range that
can be useful because of physics, and suddenly all of your code is
non-portable instead of just the I/O code. Or you simply have a
processor that cannot efficiently process a larger integer type, and you
cannot change to a bigger processor because of power and/or thermal
restrictions (or you care about global warming and so simply do not want
to use something that uses more power than required and takes more
energy to manufacture than required).

Also there is the problem of I/O bandwidth on, for example, some
database applications which are I/O bound, so you want your data type in
the database to be the smallest type large enough to avoid slowing
things down. Or you are dealing with a SQL database which provides a 32
bit integer type (i.e. MS SQL, Oracle, MySQL, PostgresQL...). Or you are
dealing with interfacing to another language that has a 32 bit integer type.

I've only worked in a few industries in over 20 years, but in most of
those for most of the time using an integer type larger than 32 bits (or
often 16 bits, and sometimes 8 bits) would be a waste of resource
(time/money/power/make-the-project-fail) and only occasionally needed a
64 bit integer type.

Maybe having multiple types is something some programmers cannot cope
with, but for those programmers there are typeless languages.

Ben Bacarisse · Jul 18, 2007

Malcolm McLean said:
Engineering is psychological as well as physical. When you are dealing
with human social beahviour, you don't get exactly the same situation
twice.
For instance the Dutch tulip mania and the dot com bubble had some
similarities,

More analogies. I'd rather see a C program that is hard to write
unless "int is 64 bits on 64 bit processors" (if that remains your now
rather narrow clam). A study showing what effect your proposed
changes might have on existing code bases (e.g. Linux, FreeBSD,
openSSL...) on 64 bit machines would also do more good than an appeal
to Whitworth thread sizing.

However it obvious that C has too many integer types;

Ah. If that means it is not debatable, then I have nothing more to
add.

Eric Sosman · Jul 18, 2007

Malcolm McLean wrote On 07/17/07 18:19,:

Malcolm McLean wrote On 07/17/07 17:06,:

[...] However it
obvious that C has too many integer types; short, int, long, long long,
in
signed and unsigned, size_t and ptrdiff_t.

Click to expand...

Drat! My obviousness detector must be on the fritz
again. Probably forgot to pay my platitude purveyor.

That's ten standards for
representing an integer.

Click to expand...

... to accompany three looping constructs, two ways
to exit, and a partridge in a pear tree. What's wrong
with an expressive vocabulary? Are you an aficionado of
Newspeak, perchance?

Click to expand...

The difference is that the integer representation is the way that functions
talk to each other.

First: Functions do not "talk to each other" through
integers alone. (If you doubt this, please explain how
strstr() communicates with its caller.) Second: Even if
integers were the official Esperanto of functiondom, why
should the language have only one word?

No one minds ten different kinds of kettles in the shop. However when you've
got ten different standards for plugs it becomes a real problem.

Most people think it a Good Thing, not "a real problem,"
that Ethernet cables won't plug into the power mains.

Malcolm McLean · Jul 18, 2007

Eric Sosman said:
First: Functions do not "talk to each other" through
integers alone. (If you doubt this, please explain how
strstr() communicates with its caller.) Second: Even if
integers were the official Esperanto of functiondom, why
should the language have only one word?

Take this:
void getcursor(unsigned char *x, unsigned short *y)

Here we are specifying that caller and callee shall communicate with each
other using a certain standard for specifying integer, namely the C short
representation. If caller passes the address of an int the compiler ought to
complain. More subtly, if he passes the address of a sint16 then he is also
committing an error.

There is no reason for using different representations of integers other
than machine efficiency. They are all just whole numbers. However on a 64
bit machine, 64 bit integers are efficient.

Eric Sosman · Jul 18, 2007

Malcolm McLean wrote On 07/18/07 17:02,:

Take this:

Why not the explanation of how strstr() and its caller
"talk to" each other via integers? Do you find the answer
an Inconvenient Truth?

void getcursor(unsigned char *x, unsigned short *y)

Here we are specifying that caller and callee shall communicate with each
other using a certain standard for specifying integer, namely the C short
representation. If caller passes the address of an int the compiler ought to
complain. More subtly, if he passes the address of a sint16 then he is also
committing an error.

The compiler *will* complain, in both cases, assuming
a prototype is in scope at the point of the call.

There is no reason for using different representations of integers other
than machine efficiency. They are all just whole numbers. However on a 64
bit machine, 64 bit integers are efficient.

One of the biggest problems in modern machine design
is dealing with the fact that the CPU reads and writes data
much faster than the memory can deliver and absorb it. Even
with multiple levels of very expensive cache to buffer the
speed mismatch, a CPU running at "100% utilization" is often
spending more than half its cycles in wait state, just idling
while the memory system trudges slowly along.

Using 16-bit shorts instead of your 64-bit whatevers, I
can get four times as many numbers to and from memory per
cache transaction. Between memory stalls, my CPU executes
four times as many instructions as yours -- or to turn it
around, my CPU can get through a bazillion computations with
one-quarter the number of memory stalls yours will incur.

... and you claim your CPU is running "efficiently?"

Real-world example: I'm transcribing some of my ancient
LP vinyl records onto CD via my home computer's sound card.
The CD recording format specifies 16-bit samples, and that's
what the sound card delivers, at 44100 samples per second
for each of two stereo channels. One side of an LP runs
about forty minutes, so my sound card produces about 400MB
per side. I've got 1.5GB of RAM, so even with Microbloat
and lots of other goo in the system there's still plenty of
room to soak up the data.

But with your "efficiency" ideas, I'd have 1600MB to
deal with instead. Not too terrible, because the data rate
of 40MB/s is probably something my disk can handle -- it'd
be nicer if I only had to worry about 10MB/s, but Malcolm
says it's better to spend the extra 30MB/s to make sure I've
got plenty of copies of the sign bits.

... and you claim this is "efficient?"

Then comes the editing, cleanup, track separation, and
so on, where the recorded data gets read in and massaged by
software. Instead of a sound file that fits entirely in
memory, I've got a file that cannot be processed without
going back and forth to the page device all the time. Now
instead of waiting (say) 200ns to get data from RAM, my
CPU is waiting (say) 10ms to get it from disk -- a small
matter of fifty thousand times longer.

... and you claim this is "efficient?"

Dogma has overtaken reason.

Flash Gordon · Jul 18, 2007

Eric Sosman wrote, On 18/07/07 22:50:

Malcolm McLean wrote On 07/18/07 17:02,:

Why not the explanation of how strstr() and its caller
"talk to" each other via integers? Do you find the answer
an Inconvenient Truth?

I thought it was a good point as well. Shame we've not got an answer.

The compiler *will* complain, in both cases, assuming
a prototype is in scope at the point of the call.

You forgot that a lot of types are to indicate different usage. This is
more useful in a more strongly typed language, but it still documents
usage and allows for type change if appropriate. Although in this case I
would expect to see either
void getcursor(struct coord *cursor)
or
struct coord getcursor(void)

No integer in sight.

One of the biggest problems in modern machine design
is dealing with the fact that the CPU reads and writes data
much faster than the memory can deliver and absorb it. Even

... and you claim this is "efficient?"

Dogma has overtaken reason.

Indeed it has with Malcolm. It's not even a very good dogma in the first
place.

Richard · Jul 18, 2007

Malcolm McLean said:
Take this:
void getcursor(unsigned char *x, unsigned short *y)

Here we are specifying that caller and callee shall communicate with
each other using a certain standard for specifying integer, namely the
C short representation. If caller passes the address of an int the
compiler ought to complain. More subtly, if he passes the address of a
sint16 then he is also committing an error.

There is no reason for using different representations of integers
other than machine efficiency. They are all just whole
numbers. However on a 64 bit machine, 64 bit integers are efficient.

Different representations are used, if for nothing else, to aid
readability and mimic stronger typing.

In addition, when you have a certain "type" represented by "int" then to
give it its own "define" or typedef greatly simplifies things later if
and when you wish to change that particular type (e.g move to a long or
even a struct).

There is even the issue of context help. What IS a variable? One hot key
later the "type" (e.g GTK_SCREENCOORD) takes you to a detailed
explanation of the variable usage you are likely to encounter.

Richard Bos · Jul 19, 2007

Malcolm McLean said:
Take this:
void getcursor(unsigned char *x, unsigned short *y)

I'd rather not. That is a wrongly designed interface. No wonder you're
scared of types, if this is how you habitually abuse them.

Here we are specifying that caller and callee shall communicate with each
other using a certain standard for specifying integer, namely the C short
representation. If caller passes the address of an int the compiler ought to
complain. More subtly, if he passes the address of a sint16 then he is also
committing an error.

There is no such thing as a sint16, so yes, passing that would be wrong.

There is no reason for using different representations of integers other
than machine efficiency.

Wrong _again_.

They are all just whole numbers. However on a 64
bit machine, 64 bit integers are efficient.

And we all have 64 bit machines, right? Guess what: wrong.

As long as you keep approaching C as if it were badly spelled BASIC, you
will never cure yourself of your polyintegrophobia.

Richard

Malcolm McLean · Jul 19, 2007

Richard Bos said:
And we all have 64 bit machines, right? Guess what: wrong.

The campaign for 64 bit ints campaigns for 64 bit ints on 64 bit machines,
maintaining the convention that int is the natural integer type for the
platform. On 32 bit machines it is accepted that int should be 32 bits, just
as it should be 16 bits on small machines with 16 bit registers.

The idea is that an integer can be used to index any array. (It is not quite
true because an array of chars that fills 2^63 + 1 bytes cannot be so
indexed, but that exception we will tolerate).

Malcolm McLean · Jul 19, 2007

Eric Sosman said:
I've got 1.5GB of RAM, so even with Microbloat
and lots of other goo in the system there's still plenty of
room to soak up the data.

Because you've got a 32 bit machine, or one that is just transitioning. 64
bit machines can have more than 2GB of memory, which only costs about $100 a
gigabyte and has been falling for the past thirty years.

But with your "efficiency" ideas, I'd have 1600MB to
deal with instead. Not too terrible, because the data rate
of 40MB/s is probably something my disk can handle -- it'd
be nicer if I only had to worry about 10MB/s, but Malcolm
says it's better to spend the extra 30MB/s to make sure I've
got plenty of copies of the sign bits.

... and you claim this is "efficient?"

There is always a marginal case where getting an order of two improvement
actually makes the program twice as good, or allows it to work where
otherwise it would run out of resources. However it is not a good
programming strategy to stress a general purpose computer like that, if you
can possibly avoid it. If the user wants to run two copies of your program
at once, he' stuck. Ditto if he want to run a video telling him how to use
the system. Then the program won't remain marginal for long, soon he'll
upgrade and all you micro-optimisation will then be so much wasted effort.
Sometimes of course it really can't be avoided, as with a games console
where you must fill memory and run the polygon engine to within a few
percentage points of its capacity. If we did ban all other integer types
you'd have to fake up 16 bit integers with logical ops, but that isn't the
proposal for now. They will remain for exceptional use, such as storing raw
audio samples.

santosh · Jul 19, 2007

The campaign for 64 bit ints campaigns for 64 bit ints on 64 bit
machines, maintaining the convention that int is the natural
integer type for the platform. On 32 bit machines it is accepted
that int should be 32 bits, just as it should be 16 bits on small
machines with 16 bit registers.

If you don't like abstract types, why not program in assembler? C is
what it is, and it's not going to change anytime soon.

I don't understand what exactly you are "campaigning" for: int to be
the only integer type in C, or for 64 bit platforms to become
mainstream?

Both are quite narrow-minded wishes. Different types exist because
they serve a felt need, as do different processors.

Flash Gordon · Jul 19, 2007

Malcolm McLean wrote, On 19/07/07 21:27:

The campaign for 64 bit ints campaigns for 64 bit ints on 64 bit
machines, maintaining the convention that int is the natural integer
type for the platform. On 32 bit machines it is accepted that int should
be 32 bits, just as it should be 16 bits on small machines with 16 bit
registers.

Well, a small chip manufacturer called Intel seem to have different
opinion. http://download.intel.com/design/Itanium/Downloads/24535803.pdf
I'm a bit more inclined to trust their opinion than yours, since I
suspect they know a bit more about how to get the best out of processors
than you do.

The idea is that an integer can be used to index any array. (It is not
quite true because an array of chars that fills 2^63 + 1 bytes cannot be
so indexed, but that exception we will tolerate).

You've got an integer type for that already in C, it is called size_t.
Personally, I don't find 6 character names to be too long.

Eric Sosman · Jul 19, 2007

Malcolm McLean wrote On 07/19/07 16:38,:

(... still ducking the question about how strstr()
and its caller "talk to each other" with integers ...)

Because you've got a 32 bit machine, or one that is just transitioning. 64
bit machines can have more than 2GB of memory, which only costs about $100 a
gigabyte and has been falling for the past thirty years.

Not an issue, because even within the 32-bit scheme
there's still room to add 167% more memory to the machine
I've already got. Of course, it would cost me: the DIMM
slots are fully populated (2x512 and 2x128), so to get to
4GB I'd actually need to buy 4GB, not just 2.5GB. That'll
cost me -- checks an on-line store -- less than you thought,
only about $240!

Except that there are things I'd rather do with that
$240 than sacrifice it to your notions of purity. It takes
me -- well, "more than ten minutes" to earn $240, and I'd
prefer to spend it on something more worthwhile.

There is always a marginal case where getting an order of two improvement
actually makes the program twice as good, or allows it to work where
otherwise it would run out of resources.

Since you're saying factors of two can be ignored, I
guess you're justified (in your own mind) in saying "factor
of two" for a factor of four. Applied recursively, this
strategy allows us to ignore *all* inefficiencies:

"It's a thousand times slower!"

"Well, since factors of two are small enough to ignore,
we may as well call it five hundred times slower. And we
may as well call *that* two hundred fifty times, and that's
really indistinguishable from one twenty-five, which in turn
is the same as sixty (rounding down a little), and sixty is
really equivalent to thirty, which isn't noticeably different
from twelve, which is virtually the same as six, which might
as well be three, which is pretty close to two, which we've
already agreed is negligible. So there's really no speed
difference at all!"

However it is not a good
programming strategy to stress a general purpose computer like that, if you
can possibly avoid it.

My computer is *not* stressed as things stand now, without
your beloved fourfold bloat. I can browse the Net while I'm
recording from an LP, I can read and write E-mail, I can even
search for nonsense on Usenet. (The search is a short one.)

If the user wants to run two copies of your program
at once, he' stuck. Ditto if he want to run a video telling him how to use
the system. Then the program won't remain marginal for long, soon he'll
upgrade and all you micro-optimisation will then be so much wasted effort.

A fourfold speedup is a "micro"-optimization? I guess it
follows that your fourfold slowdown is a micro-pessimization.
If I'd known that factors of four don't bother you, I'd have
sent my old 300MHz machine to you instead of to the recyclers.

Sometimes of course it really can't be avoided, as with a games console
where you must fill memory and run the polygon engine to within a few
percentage points of its capacity. If we did ban all other integer types
you'd have to fake up 16 bit integers with logical ops, but that isn't the
proposal for now. They will remain for exceptional use, such as storing raw
audio samples.

And what language do you propose to use to manipulate
those samples? Not 64-bit-only C, that's for sure. Here's
my way of reducing the volume of one sample:

uint16_t *sample = ...;
*sample -= *sample / 10;

.... and here's what you want me to do instead:

/* 8-bit (?) */ unsigned char *samplebytes = ...;
/* 64-bit */ unsigned int sample;
sample = samplebytes[0] + (samplebytes[1] << 8);
sample -= sample / 10;
samplebytes[0] = sample;
samplebytes[1] = sample >> 8;

It won't wash, Malcolm. It wouldn't wash even if Herakles
ran a couple rivers over it.

santosh · Jul 19, 2007

It takes me -- well, "more than ten minutes" to earn $240, [ ... ]

You're working too hard then!

Malcolm McLean · Jul 19, 2007

santosh said:
I don't understand what exactly you are "campaigning" for: int to be
the only integer type in C, or for 64 bit platforms to become
mainstream?

The campaign is for int to be 64 bits on 64 bit machines.
There are reasons for this, one of which is that, at some future date, it
will enable us to simplify the language, particularly by removing size_t.

The campaign for 64 bits is important because 64 bit architectures are going
to come into common use in the near future. Whilst I welcome the end of the
2GB limit on memory, I am not trying to advocate use of 64 bit processors
where 32 bit processors would be adequate.

Ben Pfaff · Jul 19, 2007

Malcolm McLean said:
The campaign for 64 bits is important because 64 bit architectures are
going to come into common use in the near future. Whilst I welcome the
end of the 2GB limit on memory, I am not trying to advocate use of 64
bit processors where 32 bit processors would be adequate.

Sounds like it should be named the "Campaign for 64 bits where 64
bits is due".

Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
What are the distinctions between StringBuilder in Java and StringBuilder in C#?	0	Jul 12, 2022
Measuring a string of text	1	Sep 15, 2022
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Converting an Array to a String in JavaScript	7	Sep 22, 2023
How to extract all values except the last value in a string separated by comma in sql	2	Jun 15, 2023
substring assignment in fortran, C, etc.	46	May 19, 2009
Trouble accessing a value within a JSON string.	1	Jun 16, 2023

return the start of a substring in a string in c

Eric Sosman

Malcolm McLean

Ian Collins

Malcolm McLean

Flash Gordon

Ben Bacarisse

Eric Sosman

Malcolm McLean

Eric Sosman

Flash Gordon

Richard

Richard Bos

Malcolm McLean

Malcolm McLean

santosh

Flash Gordon

Eric Sosman

santosh

Malcolm McLean

Ben Pfaff

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads