Tryiing just to read/understand this code

S

smnoff

Below is a section from string.c at this
linkhttp://cvs.opensolaris.org/source/xref/on/usr/src/common/util/string.cthat
I am trying to fully understand.I don't fully understand LINE 514; not to
mention that entire inner while loop and what it'strying to accomplish. I
figured that if I can at least understand each line of this strstr methodand
why it's written the ways it written, as well as in regards to perfomance or
code simplicity, it will help me learn c.Thanks. 497 char *
498 strstr(const char *as1, const char *as2)
499 {
500 const char *s1, *s2;
501 const char *tptr;
502 char c;
503
504 s1 = as1;
505 s2 = as2;
506
507 if (s2 == NULL || *s2 == '\0')
508 return ((char *)s1);
509 c = *s2;
510
511 while (*s1)
512 if (*s1++ == c) {
513 tptr = s1;
514 while ((c = *++s2) == *s1++ && c)
515 ;
516 if (c == 0)
517 return ((char *)tptr - 1);
518 s1 = tptr;
519 s2 = as2;
520 c = *s2;
521 }
522
523 return (NULL);
524 }
 
B

bwaichu

smnoff said:
Below is a section from string.c at this
linkhttp://cvs.opensolaris.org/source/xref/on/usr/src/common/util/string.cthat
I am trying to fully understand.I don't fully understand LINE 514; not to
mention that entire inner while loop and what it'strying to accomplish. I
figured that if I can at least understand each line of this strstr methodand
why it's written the ways it written, as well as in regards to perfomance or
code simplicity, it will help me learn c.Thanks. 497 char *
498 strstr(const char *as1, const char *as2)
499 {
500 const char *s1, *s2;
501 const char *tptr;
502 char c;
503
504 s1 = as1;
505 s2 = as2;
506
507 if (s2 == NULL || *s2 == '\0')
508 return ((char *)s1);
509 c = *s2;
510
511 while (*s1)
512 if (*s1++ == c) {
513 tptr = s1;
514 while ((c = *++s2) == *s1++ && c)
515 ;
516 if (c == 0)
517 return ((char *)tptr - 1);
518 s1 = tptr;
519 s2 = as2;
520 c = *s2;
521 }
522
523 return (NULL);
524 }

Wrap the function above in a main then just view the locals in a
debugger. That's the best way to learn what is happening. Also, it's
just basic pointer math, so you might to write some small programs to
practice your pointer math.

Brian
 
B

Ben C

Below is a section from string.c at this
linkhttp://cvs.opensolaris.org/source/xref/on/usr/src/common/util/string.cthat
I am trying to fully understand.I don't fully understand LINE 514; not to
mention that entire inner while loop and what it'strying to accomplish. I
figured that if I can at least understand each line of this strstr methodand
why it's written the ways it written, as well as in regards to perfomance or
code simplicity, it will help me learn c.Thanks. 497 char *
498 strstr(const char *as1, const char *as2)

We're looking for a sequence of characters in the string as1 that match
as2.
499 {
500 const char *s1, *s2;
501 const char *tptr;
502 char c;
503
504 s1 = as1;
505 s2 = as2;
506
507 if (s2 == NULL || *s2 == '\0')
508 return ((char *)s1);
509 c = *s2;
510
511 while (*s1)
512 if (*s1++ == c) {
513 tptr = s1;

At this point *(s1 - 1) and *s2 both contain copies of the same
character. So this is a candidate for a match, but we need to check if
the two pointers carry on pointing to the same characters if we
increment them each one byte at a time until the last
non-null-terminating character in s2.

strstr("catenation", "nat") should match, but strstr("catentation",
"natj") should return NULL. At this point we've got s1-1 pointing to the
n of "catenation" and s2 pointing to the n of "nat". We need to walk
through both strings, to match up the 'a' and the 't' and find our way
to the 'j'.

We could write the next bit perhaps more readably like this:

const char *p = s1 - 1;
const char *q = s2;
size_t i;

for (i = 0; q != '\0'; i++)
if (p != q)
return NULL; /* partial match, but not good enough */

/* If we get here then as2 is a substring of as1 ... */

514 while ((c = *++s2) == *s1++ && c)
515 ;

So, here we're doing the same thing: we increment s2 and s1 for as long
as they continue to point to equal bytes, and s2 hasn't got to a null
terminator. Lines 514 and 515 mean basically this in pseudocode:

forever
increment s2
let c = *s2

if c != s1
increment s1
break
endif

increment s1
if c == '\0' break
endfor
 
S

smnoff

In your pseudocode, why is the

if c != s1

use the != as opposed to the == like iin the actual code?
 
B

Ben C

In your pseudocode, why is the

if c != s1

use the != as opposed to the == like iin the actual code?

It should be c != *s1. My mistake.

But != is right, because in the pseudocode this is the condition for
breaking _out_ of the while loop, but in the actual code, ==, because
it's a condition for staying _in_ the loop.

 
S

smnoff

Any reason why the original programmer of this code in line 514 and 515 put
all the
increments all in the same line?

And for that matter "jammed" everything in a single line? and especially the
assignment of

c = *++s2

in the while loop statement?

In my opinion, it makes it extremely hard to read what's going on (or even
debug for that matter.)

Was this all written in one line of code for performance reasons?
 
R

Richard Heathfield

smnoff said:
Any reason why the original programmer of this code in line 514 and 515
put all the
increments all in the same line?

And for that matter "jammed" everything in a single line? and especially
the assignment of

c = *++s2

in the while loop statement?

It's relatively idiomatic. I wouldn't feel any particular guilt in writing
code like that. It just means "bump the pointer to grab the value of the
next object along".
In my opinion, it makes it extremely hard to read what's going on (or even
debug for that matter.)

In my opinion, it's not a huge deal.
Was this all written in one line of code for performance reasons?

No. It could be split without any particular performance hit. It would mean
re-coding the loop, of course.
 
G

gbostock

smnoff said:
Do you think it's written like that for "job security"?

No, it's written like that because any decent C programmer shouldn't
have a problem with it and because it follows the philosophy of tight
code. Tight code has less wriggle room for introducing errors by
maintenance coders. Creating more lines of code and more variables
creates more opportunity for screwing it up. If you are going to modify
that code, you'd better know what you're doing and not just start
hacking at it as so many do. Since this is a low-level function, the
chances of it needing maintenance are slim.

There's a difference between tight code and obfuscated code and this
code is tight, not obfuscated.
 
F

Flash Gordon

smnoff wrote:

Please don't top post. You reply belongs under the portion of the post
you are replying to, not above. See 90% of the posts in this group for
how to do it properly.
> Do you think it's written like that for "job security"?

I don't and I doubt Richard does. c = *++s2 is such a common idiom in C
that most people who have significant experience in C will understand it
without thinking about it. I would do something like that simply because
it is the "normal" way to write it.

Please don't quote peoples sigs, i.e. the bit above you quoted from
Richard's post, unless you are commenting on them.
 
S

smnoff

Wow, for a C language newgroup that is supposedly idiomatic and can handle
concise syntax (i.e. a top post), they can't seem to figure out where the
replies to postings are and where they belong!!! Maybe the posting was TOO
concise...then again can't newsgroup posting be idiomatic just like the C
language?
 
D

Default User

smnoff said:
Wow, for a C language newgroup that is supposedly idiomatic and can
handle concise syntax (i.e. a top post), they can't seem to figure
out where the replies to postings are and where they belong!!! Maybe
the posting was TOO concise...then again can't newsgroup posting be
idiomatic just like the C language?

*plonk*




Brian
 
R

Richard Heathfield

smnoff said:
Wow, for a C language newgroup that is supposedly idiomatic

No, the /expression under discussion/ was idiomatic. The newsgroup itself is
not idiomatic. It's a newsgroup.
and can handle concise syntax (i.e. a top post),

There's nothing concise about top-posting.
they can't seem to figure out where the
replies to postings are and where they belong

Sounds like you're spoiling for a fight. Sorry to disappoint you. If others
want to play that game, that's up to them. But if you can't be bothered to
work out how to post to comp.lang.c in a way that minimises wasted time for
your readers, then I can't be bothered to read your stuff.
 
K

Keith Thompson

smnoff said:
Wow, for a C language newgroup that is supposedly idiomatic and can handle
concise syntax (i.e. a top post), they can't seem to figure out where the
replies to postings are and where they belong!!! Maybe the posting was TOO
concise...then again can't newsgroup posting be idiomatic just like the C
language?

You misunderstand.

There is nothing "concise" about top-posting, and it's inconsistent
with the conventions that have evolved in this newsgroup over the last
few decades. By ignoring those conventions, you make your postings
more difficult to read, and many of us won't bother.

Please read <http://www.caliburn.nl/topposting.html> for more
information.
 
B

Ben C

[snip]

Any reason why the original programmer of this code in line 514 and
515 put all the increments all in the same line?
And for that matter "jammed" everything in a single line? and
especially the assignment of
c = *++s2
in the while loop statement?
In my opinion, it makes it extremely hard to read what's going on (or
even debug for that matter.)

You get used to it, I don't think it's hard to read. I think people are
inspired to write C like this when they read the demonstration of
writing strcpy in K&R, which starts with a fairly verbose version, and
ends up saying well you might as well just write:

while (*s2++ = *s1++);

This is easy to read surely? You often get a compiler warning like
"suggest parentheses around assignment used as truth value" for this
kind of thing, however.

The opensolaris strchr was a bit hard to read, IMHO, because of the way
s1 and s2 were out-of-step so you got a pre-increment on one of them and
a post-increment on the other. Then again, strchr's always going to be a
bit fiddly.
Was this all written in one line of code for performance reasons?

Unlikely, although it's possible, because performance is a high priority
for library functions like strchr. It may be that this particular way of
writing it produced better code on the compiler/machine this started
life on.
 
R

Richard Bos

smnoff said:
Wow, for a C language newgroup that is supposedly idiomatic and can handle
concise syntax (i.e. a top post),

If you luserish, top-posting Outhouse monkeys were at all concerned with
conciseness, you'd learn to bleeding well snip.

Have fun in my bozo bin, Bubba.

Richard
 
R

Richard

smnoff said:
Any reason why the original programmer of this code in line 514 and 515 put
all the
increments all in the same line?

And for that matter "jammed" everything in a single line? and especially the
assignment of

c = *++s2

in the while loop statement?

This would be perfectly readable, bog standard C. Nothing special or
gimmicky. I would be surprised to see someone break it into two
statements such as

s2++;
c=*s2;

to be honest. Use the features of the language. Dont try and program a
language like another language.

e.g do use

x=(f?a:b);

rather than

if(f)
x=a;
else
x=b;

Such things can and do seem a little strange at first, but like

while(*d++=*s++);

it is perfectly clear and recognisable to any C programmer worth his
salt.
 
S

smnoff

I am trying to understand why line 516:

516 if (c == 0)

is using 0 as the indicator to know when the match is successfull as opposed
to something like '\0'

Can anyone please explain?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top