substring finding problem!

spinoza1111 · Feb 17, 2010

spinoza1111wrote:

As usual, my point has gone way over your head.

You cite me as someone who says you are not a troll, in support of an
argument that you are not a troll. In so doing, you are appealing to
people's trust in my good judgement, whether or not you realise it.

For those people who do trust my good judgement, the argument is a
powerful one, but the argument that you are an idiot is equally
powerful. And for those people who do not trust my good judgement, the
argument carries no conviction.

As for your opinion of me, I ascribe it no value whatsoever, so it has
no bearing on this discussion.

ObTopic: finding substrings is easy with strstr(). If you need to find a
substring, use strstr() until and unless profiling demonstrates that
it's a significant bottleneck.

THOU SHALT NOT says Richard NOT USE STRING.H, even for shits and
giggles.

Because thou are virtuous, shall there be no more cakes and ale?

Ike Naar · Feb 17, 2010

I also added one if statement to check if adding step to str would result in
it going beyond str+lstr. that is :-

if (((orig_str + orig_lstr) - str) <= step) break;

Not sure if this is your problem, but you should be aware that
you're doing a comparison between a signed number (the difference
between the pointers (orig_str+orig_lstr) and str), and an unsigned
number (step).
This will give unexpected results if the lefthandside of the comparison
is a small negative value; the value will be converted to unsigned
before it is compared to the righthandside, and becomes a huge positive
value. As a result, the comparision yields false.

You can work around the problem by rewriting the comparison as

(orig_str + orig_lstr <= str + step)

where both sides of the comparision are pointers.

fedora · Feb 17, 2010

Ike said:
Not sure if this is your problem, but you should be aware that
you're doing a comparison between a signed number (the difference
between the pointers (orig_str+orig_lstr) and str), and an unsigned
number (step).
This will give unexpected results if the lefthandside of the comparison
is a small negative value; the value will be converted to unsigned
before it is compared to the righthandside, and becomes a huge positive
value. As a result, the comparision yields false.

You can work around the problem by rewriting the comparison as

(orig_str + orig_lstr <= str + step)

where both sides of the comparision are pointers.

Hi Ike!

Gcc warned me of this too, but i ignored it since i couldn't think of a
better way.

Initially i thought of writing it exactly like yours but then when str is <
step bytes before it's end, adding step to it would be undefined behaviour
so i coded like i did. Maybe i can cast step to signed long and then compare
with (orig_str+orig_lstr) - str? is this ok?

thanks

fedora · Feb 17, 2010

Ben said:
Why do you want to do what spinoza1111 says? I am curious about how
you decided it was something you wanted to do.

I wanted to write my own functions partly for learning as i'm just beginning
and also because if i used string.h, i couldn't have submitted to spinoza's
challenge...

but motly it was just for learning. as we see, i was correct since i'm
having so much difficulty.

His initial programs all used string.h and some versions erroneously
called strlen without including string.h. I don't know why he decided
to change his mind, but a skilled C programmer would use the standard
library and only do something more complex if there was a compelling
reason.

i'm not a skilled programmer

You do it like this:

#define STRLEN my_strelen

and you write STRLEN, STRSTR etc in your code.

Okay i see. Thanks.

Your problems below come from this + 1, I think. It looks wrong.
Sorry I did not spot this first time round.

I added the one because when both string and sub string are equal length the
diff will give zero, so to make the while loop below work properly, i add
one.

i cant compare remaining_len >= 0 since that'll always be true for unsigned.

If you run off an array, all kinds of strange things can happen. It
is often not worthwhile trying to work out exactly why (at least I've
stopped trying -- I just fix the problem).

i hate it when i cant get a mental image of how a function will behave for
all combinations of legal ainput values. boundary values complicate
everything and unsigned seems to be more trouble than i thought.

compiled with gcc -Wall -Wextra -std=c99 -pedantic -o replace replace.c -
ggdb3

# gdb ./replace
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

(gdb) run "sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda"
"+" Starting program: /home/fedora/c/replace
"sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400648 in strFirstCh (str=0x7fff1da36feb "rc/c/replace",
ch=115 's', lstr=18446744073709551546) at replace.c:17
17 if (str[current] == ch) {

how can str be "rc/c/replace"?? It should be some part of the "sdak..."
string as i see...
ch is with right value. but lstr is wrong. how did it get that value? i
cant see any thing in my code that is wrong but i'm not intel.

big thanks to any one who can say why str is "rc/c/replace" and lstr is
wrong...

Click to expand...

See above.

am thinking if string programing in C is naturally so difficult or i'm
just stupid

Click to expand...

No, it is hard to get the details right but you are not helping
yourself by not breaking your program up into helpful simple
functions. You have some nice functions to help find the strings, but
you stopped there. I'd have some functions to help build the copy
with the replacements.

Yes i'm not happy with the replace routine. It's too big. i'll code versions
of strcpy and make it shorter and try again. maybe the bug is in replace...

*Actually*, I'd use (and did use) memcpy and strcpy, but if I had
decided to drink the "no string.h" cool aid, I'd write these myself.
Neither is more than a line or two and they simplify the replace
function a lot.

For my own amusement, I've studied what slows up this replace function
and I have written reasonably fast version that avoids strstr because,
as I've descried elsewhere, it forces the program to re-scan strings
unnecessarily.

by this replace function do you mean mine above?

Ike Naar · Feb 17, 2010

Ike said:
Ike said:

if (((orig_str + orig_lstr) - str) <= step) break;

Click to expand...

[snip]

(orig_str + orig_lstr <= str + step)

Click to expand...

[snip]

Initially i thought of writing it exactly like yours but then when str is <
step bytes before it's end, adding step to it would be undefined behaviour
so i coded like i did.

Whoops you're right I did not think about that.

Maybe i can cast step to signed long and then compare
with (orig_str+orig_lstr) - str? is this ok?

That is a possibility. If your compiler supports prtdiff_t then perhaps
it's better to cast step to that type, so that the types on both sides
of the comparison operator are the same.

What about (orig_str + orig_lstr - step <= str) ?

Kenny McCormack · Feb 17, 2010

spinoza1111 said:
But if nonconforming to normalized deviance and Eunuch programming is
to be a troll, a crank and an idiot in your book, hey, so I am.

Now you've gone and done it! From here on in, they will gleefully refer
to you as a "self-confessed troll".

It happened to me about 5 years ago - and they still use it in their
diatribes. The fact, of course, being that they define things so that
anyone with even a glimmer of intelligence becomes defined as "troll".

fedora · Feb 17, 2010

Ike said:
Ike said:

if (((orig_str + orig_lstr) - str) <= step) break;

[snip]

(orig_str + orig_lstr <= str + step)

Click to expand...

[snip]

Initially i thought of writing it exactly like yours but then when str is
< step bytes before it's end, adding step to it would be undefined
behaviour so i coded like i did.

Click to expand...

Whoops you're right I did not think about that.

Maybe i can cast step to signed long and then compare
with (orig_str+orig_lstr) - str? is this ok?

Click to expand...

That is a possibility. If your compiler supports prtdiff_t then perhaps
it's better to cast step to that type, so that the types on both sides
of the comparison operator are the same.

What about (orig_str + orig_lstr - step <= str) ?

This is perfect thanks!! After changing that line to the comparison above,
i've stopped getting the strange seg faults as far as i can see.

so it seems this failed test allowed lstr to wrap around to very big values
and access foreign memory locations. now it works!

thanks again Ike

spinoza1111 · Feb 17, 2010

...

Now you've gone and done it! From here on in, they will gleefully refer
to you as a "self-confessed troll".

It happened to me about 5 years ago - and they still use it in their
diatribes. The fact, of course, being that they define things so that
anyone with even a glimmer of intelligence becomes defined as "troll".

There's no point, Kenny, in evading their stupid, childish labels. But
I predict that if I continue posting great code and incisive writing,
things will change here.

Hasta la victoria siempre!

spinoza1111 · Feb 17, 2010

Clearly then, you are not a troll.

They think they can define the world,
Their poison they have hurled,
Because they can't stand freedom:
They recreate their corporate kingdom.
They actually do this for fun
Thinking by lies they have won,
But when you look at their code,
You see quite a load,
Of the bugs they condemn in others.
Seebach can't get string length right
He's off by one in plain sight,
And they dare to call you a "troll"
When you can write above the level of O,
When you code above the level of Joe.

They've taken structured programming
And made it a dog's dinner:
They think it is a ban on thinking.
Dijsktra died at seventy-two
In part from life long depression
At the nonsense and voodoo
That passed for good computing.
Now they invoke his name
To speak it they should feel shame.

Seebs · Feb 17, 2010

Thanks Ben! I didn't use stdlib functions because i wanted to how
easy/difficult it would be to write my own. also spinoza didn't accept
program that called function in string.h.

Unless you're expecting to spend most of your programming career working
for people who have major obsessive problems with using technology in ways
that generally work, because of personal grudges never adequately explained,
I would suggest that perhaps what Nilges accepts or doesn't accept should
not be a component of any technical decision, ever.

About plugging in my own versions, do you mean writing routines with the
same name so that at linking the linker finds my lib first and plug it in?
but i read somewhere that using ansi c's namespace and relying on linker is
all undefined and gcc can replace calls to stdlib functions with inline code
so bypassing my code too... but i'll try it.

The obvious way to do it would be:

size_t
callstrlen(const char *s) {
#ifdef ME
/* your code here */
#else
return strlen(s);
#endif
}

-s

Seebs · Feb 17, 2010

am thinking if string programing in C is naturally so difficult or i'm just
stupid

It takes quite a while to get used to it, at the very least. When you're
doing multiple string operations, you have a lot of things to keep track
of. Consider a simple strcpy() replacement:
char *cpystr(char *dest, char *src) {
char *start = dest;
while ((*(dest++) = *(src++)) != '\0')
;
return start;
}

This isn't actually all that simple; I write a lot of code which is
much harder for me to figure out, but which would be easier to figure
out without years of experience with strings. It can be a bit easier
to read done the other way:

char *cpystr(char *dest, char *src) {
int i;
for (i = 0; src; ++i) {
dest = src;
}
dest = '\0';
return dest;
}

This one is often easier for people to read because the pointers stay
pointing to the same things. So for some cases, you may find array
notation simpler. What I have found is usually that array notation is
simpler as long as I only have one index to use. If I have multiple indexes,
it can be easier for me to think about the problem if I write it in
terms of pointers, but not necessarily tersely:

while (*src) {
*dest = *src;
++dest;
++src;
}
*dest = '\0';

That's certainly simpler to understand than the original version. What this
does is trade the simplicity of having the pointers stay fixed -- they always
point to the same things -- for the simplicity of having the semantic content
of the pointers stay fixed -- they always point to the next character you
need to copy.

Which of those is better is a bit arbitrary, and may vary from one person to
another, or one algorithm to another. Once you start getting into stuff like
strstr() implementations, it's often important to pick descriptive names.
I don't recommend names like thePointerToTheThingIWasGoingToCopyInto. I tend
to prefer short nickname length things; something that's easy to recognize,
visually distinct, and tells me what I'm doing.

So...

char *findstr(char *needle, char *haystack) {
while (*haystack) {
if (*haystack == *needle) {
char *h = haystack, *n = needle, *first = NULL;
while (*n && *n == *h) {
if (!first && *h == *needle) {
first = h;
}
++n;
++h;
}
if (!*n) {
return haystack;
} else {
haystack = first - 1;
}
}
++haystack;
}
return NULL;
}

I don't know whether this will work, but it's approximately a standard cheap
strstr(), with the one optimization being an attempt to not rescan a chunk
of the haystack looking for the first character of the needle when it's not
needed. The names aren't very long, and the inner loop uses short names
which clearly refer back to their sources.

You're invited to look for errors in this, as there may well be some, not
the least of which is that I have no idea whether or not it will compile.

-s

Kenny McCormack · Feb 17, 2010

It happened to me about 5 years ago - and they still use it in their
diatribes. The fact, of course, being that they define things so that
anyone with even a glimmer of intelligence becomes defined as "troll".

Clearly then, you are not a troll.[/QUOTE]

Don't quit your day job.

Malcolm McLean · Feb 17, 2010

Unless you're expecting to spend most of your programming career working
for people who have major obsessive problems with using technology in ways
that generally work, because of personal grudges never adequately explained,
I would suggest that perhaps what Nilges accepts or doesn't accept should
not be a component of any technical decision, ever.

That's the ad hominem fallacy. It's not a pretentious term for
"insult" but a common falacy, which is to suppose that an argument is
wrong because of the person who is making it.

In fact there are good reasons for deprecating string.h. chars
effectively have to be octets, whilst often programs need to accept
non-Latin strings. Then the functions are all very old, with certain
weaknesses (no protection from buffer overun in strcpy, an O(N)
performance for strcat and strlen, an inconvenient interface for
strcat, const inconsistencies with strchr, very poor functionality
with strfind and const inconsiencies here too, very serious buffer
problems with sprintf, an overly difficult interface and buffer
problems with sscanf, thread problems with strtok and a non-intuitive
interface.

Seebs · Feb 17, 2010

That's the ad hominem fallacy. It's not a pretentious term for
"insult" but a common falacy, which is to suppose that an argument is
wrong because of the person who is making it.

No, it's not an ad hominem fallacy. It's the very well supported view
that what Nilges accepts or doesn't accept should not be a component of any
technical decision. I'm not saying that an argument is wrong because of
the person who is making it. I'm saying that a conclusion should be ignored
(neither accepted nor rejected) based on the person who has offered it.

Which is to say, if you know someone is a clown, knowing his position on an
issue tells you nothing for or against the issue. Now, if he had advanced
an argument, it could be worth discussing that argument, but as long as
we're just talking about his conclusion, it's not an ad hominem fallacy to
suggest disregarding the conclusions reached by someone who is demonstrably
very bad at the topic in question.

In fact there are good reasons for deprecating string.h.

For some purposes, yes.

For manipulation of sequences of non-NUL chars, terminated by a char, not so
much.

chars
effectively have to be octets, whilst often programs need to accept
non-Latin strings.

True. This is addressed in no small part by the multibyte stuff, which you
would presumably use for multibyte strings.

Then the functions are all very old, with certain
weaknesses (no protection from buffer overun in strcpy, an O(N)
performance for strcat and strlen, an inconvenient interface for
strcat, const inconsistencies with strchr, very poor functionality
with strfind and const inconsiencies here too, very serious buffer
problems with sprintf, an overly difficult interface and buffer
problems with sscanf, thread problems with strtok and a non-intuitive
interface.

I'm not aware of "strfind".

While the various interfaces are certainly flawed, consider that the Nilges
alternative is to duplicate the flaws of the interface without even the
benefit of already having been debugged. Or, worse, to just not even come
close.

There are certainly cases where the <string.h> functions are not the right
tool. However, that Nilges argues against it is not an argument either way
in that. His arguments might be an argument either way; his conclusion is
not.

-s

Ben Bacarisse · Feb 17, 2010

Malcolm McLean said:
In fact there are good reasons for deprecating string.h. chars
effectively have to be octets, whilst often programs need to accept
non-Latin strings.

It's easy to switch to wide character versions if you used the
equivalent str* versions. A few macros and you can build versions for
either character type very simply.

Then the functions are all very old, with certain
weaknesses (no protection from buffer overun in strcpy, an O(N)
performance for strcat and strlen, an inconvenient interface for
strcat, const inconsistencies with strchr, very poor functionality
with strfind and const inconsiencies here too, very serious buffer
problems with sprintf, an overly difficult interface and buffer
problems with sscanf, thread problems with strtok and a non-intuitive
interface.

Those are arguments for using something better, not arguments for not
using C's string functions. If the "challenge" had been: "use this
improved string library to write replace" or "design a string library
so that replace is easy to write" I for one would have no objection.

The problem is that rejecting what is already there (rather than using
something better) leads to a /more/ complex and buggy solution.

Kaz Kylheku · Feb 17, 2010

It's easy to switch to wide character versions if you used the
equivalent str* versions.

Just watch out for C99 braindamage!

swprintf has an argument interface similar to the wide character
equivalent of snprintf, but you may be bitten by the gratuitously
different return value convention.

Walter Banks · Feb 17, 2010

Ben said:
Those are arguments for using something better, not arguments for not
using C's string functions. If the "challenge" had been: "use this
improved string library to write replace" or "design a string library
so that replace is easy to write" I for one would have no objection.

The problem is that rejecting what is already there (rather than using
something better) leads to a /more/ complex and buggy solution.

This whole project thread has been filled how not to engineer software.
Application code specifically avoiding libraries, Not Invented Here,
random design with moving target specifications or no specifications,
ad hoc testing with a dose of 20+ Year old unresolved office battles,
interpersonal rivalry and off topic rants.

As several have stated not the environment that we are used to.

Regards

w..

Seebs · Feb 17, 2010

This whole project thread has been filled how not to engineer software.
Application code specifically avoiding libraries, Not Invented Here,
random design with moving target specifications or no specifications,
ad hoc testing with a dose of 20+ Year old unresolved office battles,
interpersonal rivalry and off topic rants.

As several have stated not the environment that we are used to.

Yes, but it's important to be prepared to program in some of the many
environments which real programmers often end up having to work in.

To be fair, I've never had a coworker in the same class as Nilges. Not
even particularly close. But I have had to work with arbitrary or
bad specifications, specifications which change repeatedly during
implementation, old office battles, and vehement opposition to things which
were Not Invented Here.

At one point, I was asked to develop a linked list implementation. The
proposed design looked like this:

struct list_node {
struct list_node *next;
void *data;
};

struct list {
struct list_node *head;
struct list_node *tail;
};

The specification was much as you'd expect. Except for one TINY detail.
Which was that the formal specification was that
(struct list *) (x->tail->next) == x
whenever tail was not null.

That is to say, if the list contained any members, the "next" pointer for
the last member of the list was a pointer (suitably converted) to the list
object.

So iteration would look roughly like:
for (l = x->head; l->next != x; l = l->next) {
/* ... */
}

It took a day or so of effort for me to round up enough senior developers
to all sit on the guy and tell him that:

1. He was wrong.
2. He was micro-managing, which is presumptively wrong.

before we were allowed to use a more conventional design.

Having had to deal with things like a database in which the formal schema
description begins with "all fields are VARCHAR for simplicity", I found the
Nilges String Replace Challenge to be a surprisingly good approximation of
what programming work is often like in the real world.

(Disclaimer: All the above memories are faded with age. My current
environment is pretty good about this kind of stuff. I have no clue about
the office politics, as our management put a great deal of time and effort
into ensuring that they are Not Our Problem.)

-s

Chris M. Thomasson · Feb 17, 2010

[...]

Yes, strcat and strlen are O(N) - so, where it matters, you remember the
string length, having found it out the first time.

Bingo! :^)

Walter Banks · Feb 17, 2010

Seebs said:
Yes, but it's important to be prepared to program in some of the many
environments which real programmers often end up having to work in.

To be fair, I've never had a coworker in the same class as Nilges. Not
even particularly close. But I have had to work with arbitrary or
bad specifications, specifications which change repeatedly during
implementation, old office battles, and vehement opposition to things which
were Not Invented Here.

I saw this 20 years ago and it became nothing but a memory as better
more effective approaches prevailed.

(Disclaimer: All the above memories are faded with age. My current
environment is pretty good about this kind of stuff. I have no clue about
the office politics, as our management put a great deal of time and effort
into ensuring that they are Not Our Problem.)

Software development practices have improved a lot as applications
have become more complex and requirements better defined.

w..

Fibonacci	0	May 13, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
Need help finding Segmentation fault C++	0	Apr 16, 2022
Python code problem	2	Apr 23, 2023
Scanf is being prioritized over printf ?	1	Nov 5, 2023
My Status, Ciphertext	2	Nov 28, 2023
Finding a string inside string	3	Nov 23, 2010

substring finding problem!

spinoza1111

Ike Naar

fedora

fedora

Ike Naar

Kenny McCormack

fedora

spinoza1111

spinoza1111

Seebs

Seebs

Kenny McCormack

Malcolm McLean

Seebs

Ben Bacarisse

Kaz Kylheku

Walter Banks

Seebs

Chris M. Thomasson

Walter Banks

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads