Subtracting strings

K

Krumble Bunk

Hi all,

I have an elementary question, of which I never grasped, and i'd like
to finally put it to rest.

I have the following code, which although makes use of the string
library, and could therefore be labelled as OT, the actual question I
have relates to the code around it. This is the highly abridged
version:

const char *fname;
const char *execname="/usr/bin/foo";
char *pathname;
char *str;

....
strcpy(pathname,execname); <- on my platform
I plan to use strlcpy() :)
if((str=strrchr(pathname,'/')) != NULL)
{
*++str='\0';
fname=execname+(str - pathname); <- somehow, fname
ends up being "foo" as intended
} else {
fname=execname;
*pathname='\0';
}

I have checked the ascii values, for the above, and turning the
resulting ascii codes (which seem to be concat'd together), I do not
produce foo.

Would appreciate any help on this, as it's frying my peonic brain!

thanks

KB.
 
C

Chris Dollin

Krumble said:
I have an elementary question, of which I never grasped, and i'd like
to finally put it to rest.

(You never actually pose a question, but I assume it to be
"why does this work?")
I have the following code, which although makes use of the string
library, and could therefore be labelled as OT, the actual question I
have relates to the code around it. This is the highly abridged
version:

const char *fname;
const char *execname="/usr/bin/foo";
char *pathname;
char *str;

...
strcpy(pathname,execname); <- on my platform
I plan to use strlcpy() :)
if((str=strrchr(pathname,'/')) != NULL)
{
*++str='\0';
fname=execname+(str - pathname); <- somehow, fname
ends up being "foo" as intended

`str` and `pathname` are pointers. Subtraction of pointers
(that point into the same object) produces the difference
between them -- how much to add to the second to get the first.
It isn't "string subtraction".

So after the test, if it succeeds, `str` points to the last `/`
in `pathname`. Then it gets incremented (and a 0 written over
the `/`, so that `pathname` points to the string up-to-but-not-
including the `/`, ie, "/usr/bin"). Now the difference between
`str` and `pathname` -- notice that `str` points into the same
object as `pathname` does -- is the number of characters before
the "foo" in the string. Adding that to `execname` gets a pointer
to the "foo" in execname's "/usr/bin/foo".
} else {
fname=execname;
*pathname='\0';
}

I have checked the ascii values, for the above,

Why? The code works whatever the character set is. (It might not
be /useful/ on a machine with a different character set, but that's
another matter.)
and turning the
resulting ascii codes (which seem to be concat'd together), I do not
produce foo.

Would appreciate any help on this, as it's frying my peonic brain!

Ah, well, that's the problem -- you should have asked for a
/positronic/ brain, not a /peonic/ one.
 
K

Krumble Bunk

(You never actually pose a question, but I assume it to be
"why does this work?")






`str` and `pathname` are pointers. Subtraction of pointers
(that point into the same object) produces the difference
between them -- how much to add to the second to get the first.
It isn't "string subtraction".

So after the test, if it succeeds, `str` points to the last `/`
in `pathname`. Then it gets incremented (and a 0 written over
the `/`, so that `pathname` points to the string up-to-but-not-
including the `/`, ie, "/usr/bin"). Now the difference between
`str` and `pathname` -- notice that `str` points into the same
object as `pathname` does -- is the number of characters before
the "foo" in the string. Adding that to `execname` gets a pointer
to the "foo" in execname's "/usr/bin/foo".



Why? The code works whatever the character set is. (It might not
be /useful/ on a machine with a different character set, but that's
another matter.)



Ah, well, that's the problem -- you should have asked for a
/positronic/ brain, not a /peonic/ one.

--
"Thereafter, events may roll unheeded." /Foundation/

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Chris - many thanks for your help! Answers my question nicely.

cheers

KB.
 
J

Jens Thoms Toerring

Krumble Bunk said:
I have the following code, which although makes use of the string
library, and could therefore be labelled as OT, the actual question I
have relates to the code around it.

I have checked the ascii values, for the above, and turning the
resulting ascii codes (which seem to be concat'd together), I do not
produce foo.

Lookig at ASCII values won't explain the mystery;-) You're not
"subtracing strings" (whatever this might mean), you're subtracting
pointers. This is allowed as long as the pointers involved point to
the same object (as they do here since they both point somewhere
into the string pointed to by 'pathname') and then return the
number of elements between the two pointers.
This is the highly abridged version:
const char *fname;
const char *execname="/usr/bin/foo";
char *pathname;
char *str;
...
strcpy(pathname,execname); <- on my platform
I plan to use strlcpy() :)

Whatever you use (I don't know what strlcpy() is supposed to do),
I hope in the left-out code you allocate enough memory for 'pathname'
to point to.
if((str=strrchr(pathname,'/')) != NULL)
{
*++str='\0';

I guess you meant here to have

*str++ = '\0';

since I guess you intended to overwrite the '/' and not the 'f'
following it.

Given that what 'execname' points to has successfully been copied to
memory pointed to what 'pathname' points to, 'str' now points to
the "foo" part in the memory assigned to 'pathname', which contains

|------ pathname points here
v
'/', 'u', 's', 'r', '/', 'b', 'i', 'n', '\0', 'f', 'o', 'o', '\n'
^
str points here --------------|

'str - pathname' is then the number of characters in between (9).
fname=execname+(str - pathname); <- somehow, fname

'fname' points now to 'execname + 9', i.e. the 10th character in the
memory pointed to by 'execname'. And, probably not surprising anymore,
at this position starts the "foo" part of that string.
ends up being "foo" as intended
} else {
fname=execname;
*pathname='\0';
}
Regards, Jensa
 
K

Krumble Bunk

Lookig at ASCII values won't explain the mystery;-) You're not
"subtracing strings" (whatever this might mean), you're subtracting
pointers. This is allowed as long as the pointers involved point to
the same object (as they do here since they both point somewhere
into the string pointed to by 'pathname') and then return the
number of elements between the two pointers.


Whatever you use (I don't know what strlcpy() is supposed to do),
I hope in the left-out code you allocate enough memory for 'pathname'
to point to.


I guess you meant here to have

*str++ = '\0';

since I guess you intended to overwrite the '/' and not the 'f'
following it.

Given that what 'execname' points to has successfully been copied to
memory pointed to what 'pathname' points to, 'str' now points to
the "foo" part in the memory assigned to 'pathname', which contains

|------ pathname points here
v
'/', 'u', 's', 'r', '/', 'b', 'i', 'n', '\0', 'f', 'o', 'o', '\n'
^
str points here --------------|

'str - pathname' is then the number of characters in between (9).


'fname' points now to 'execname + 9', i.e. the 10th character in the
memory pointed to by 'execname'. And, probably not surprising anymore,
at this position starts the "foo" part of that string.


Regards, Jensa

Excellent!! Thanks a lot Jens, loved the ascii art - that helped
hammer it in for me. Thanks to both of you!

KayBee.
 
F

fred.l.kleinschmidt

Hi all,

I have an elementary question, of which I never grasped, and i'd like
to finally put it to rest.

I have the following code, which although makes use of the string
library, and could therefore be labelled as OT, the actual question I
have relates to the code around it.  This is the highly abridged
version:

const char *fname;
const char *execname="/usr/bin/foo";
char *pathname;
char *str;

...
strcpy(pathname,execname);

One hopes that pathname has been set to point to some
legitimate storage long enough to hold the number of
characters in execname.
if((str=strrchr(pathname,'/')) != NULL)
    {
        *++str='\0';

This places a NUL character in the position *AFTER the
last slash (Chris Dollin's comment notwithstanding)
        fname=execname+(str - pathname);     } else {
        fname=execname;
        *pathname='\0';
    }

So the final result is that fname points to the first
character after the last slash in execname
(i.e., the file name stripped of the directory component),
and pathname points to the directory portion,
INCLUDING the last slash, if there is one.

All that stuff with subtracting pointers is unnecessary.
All you need is:

strcpy(pathname,execname);
str = strrchr( pathname, '/');
if ( str ) {
*str = '\0';
fname = str++;
}
else {
fname = execname;
*pathname = '\0';
}

Note that here pathname ends up pointing to a string
that is the directory portion WITHOUT the trailing slash.

Under posix, one would use basename() and dirname().
 
K

Keith Thompson

Krumble Bunk said:
I have an elementary question, of which I never grasped, and i'd like
to finally put it to rest.

I have the following code, which although makes use of the string
library, and could therefore be labelled as OT, the actual question I
have relates to the code around it. This is the highly abridged
version:
[snip]

Others have answered your (implicit) question; I'll just mention a
couple of points about posting here.

Code that uses the C standard library is absolutely topical here.

The FAQ for this newsgroup is an excellent resource; you can find it
at <http://www.c-faq.com/>. (I'm not suggesting that the answer to
your question could have been found there, just that it's worth
knowing about).

When you post a followup, please trim the quoted text down to what's
necessary for your response to make sense. It's rarely necessary to
quote the entire article. Don't quote signatures unless you're
actually commenting on them.
 
K

Kenneth Brody

Krumble Bunk wrote:
[...]
const char *fname;
const char *execname="/usr/bin/foo";
char *pathname;
char *str;

...
strcpy(pathname,execname); <- on my platform
I plan to use strlcpy() :)

I assume you have allocated space for pathname somewhere?
if((str=strrchr(pathname,'/')) != NULL)
{
*++str='\0';
fname=execname+(str - pathname); <- somehow, fname
ends up being "foo" as intended
} else {
fname=execname;
*pathname='\0';
}

I have checked the ascii values, for the above, and turning the
resulting ascii codes (which seem to be concat'd together), I do not
produce foo.

Would appreciate any help on this, as it's frying my peonic brain!

Because str points somewhere within pathname, "str-pathname" gives
you the offset within pathname. In this case, it points to the
character after the last '/', which is the 'f' in "foo".

When added to execname, it points to the character in execname at
the same offset. Because pathname was a copy of execname, it, too,
points to the 'f' in "foo", but within execname.

Perhaps it would make more sense to you with something like this:

int OffsetToFname;
...
*++str = '\0';
OffsetToFname = str - pathname;
fname = execname + OffsetToFname;

Of course, unless you are going to free(pathname), I'm not sure why
you don't simply point fname to the return from strrchr().

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
L

lawrence.jones

Krumble Bunk said:
const char *fname;
const char *execname="/usr/bin/foo";
char *pathname;
char *str;

...
strcpy(pathname,execname); <- on my platform I plan to use strlcpy() :)

Hopefully, some of the omitted code allocates space and sets pathname to
point to it. Otherwise, you're using an ininitialized variable.
if((str=strrchr(pathname,'/')) != NULL)
{
*++str='\0';
fname=execname+(str - pathname); <- somehow, fname ends up being "foo" as intended

At this point, str is pointing to the character just past the last "/"
in pathname, which is the beginning of "foo". (str - pathname) is the
number of characters between pathname and str (9 in this case), so
adding that to execname causes fname to point 9 characters past
execname, which is the beginning of "foo" in the original string.

For pointer subtraction to be valid, both pointers must point to
elements of the same array and the result of the subtraction is the
number of elements between them (e.g., for pointers to int, you get the
number of ints).

-Larry Jones

Something COULD happen today. And if anything DOES,
by golly, I'm going to be ready for it! -- Calvin
 
C

Chris Dollin

CBFalconer said:
^^^^^^^^
This is a pointer to memory. It doesn't point to any memory yet.

You're making unwarrented assumptions about the contents of `...`.

--
"It was starting to end, after what seemed most of /Nine Princes in Amber/
eternity to me."

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN
 
C

Chris Dollin

This places a NUL character in the position *AFTER the
last slash (Chris Dollin's comment notwithstanding)

Oops. Good catch, thanks.

--
"There's something about this place," said Peter presently, /Gaudy Night/
"that alters one's values."

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England
 
T

Topi Linkala

Kenneth said:
Krumble Bunk wrote:
[...]
const char *fname;
const char *execname="/usr/bin/foo";
char *pathname;
char *str;

...
strcpy(pathname,execname); <- on my platform
I plan to use strlcpy() :)


I assume you have allocated space for pathname somewhere?

if((str=strrchr(pathname,'/')) != NULL)
{
*++str='\0';
fname=execname+(str - pathname); <- somehow, fname
ends up being "foo" as intended
} else {
fname=execname;
*pathname='\0';
}

I have checked the ascii values, for the above, and turning the
resulting ascii codes (which seem to be concat'd together), I do not
produce foo.

Would appreciate any help on this, as it's frying my peonic brain!


Because str points somewhere within pathname, "str-pathname" gives
you the offset within pathname. In this case, it points to the
character after the last '/', which is the 'f' in "foo".

When added to execname, it points to the character in execname at
the same offset. Because pathname was a copy of execname, it, too,
points to the 'f' in "foo", but within execname.

Perhaps it would make more sense to you with something like this:

int OffsetToFname;
...
*++str = '\0';
OffsetToFname = str - pathname;
fname = execname + OffsetToFname;

Of course, unless you are going to free(pathname), I'm not sure why
you don't simply point fname to the return from strrchr().

See his code. After the code has completed pathname has the following
characters: '/', 'u', 's', 'r', '/', 'b', 'i', 'n', '/', '\0', 'o', 'o',
'\0'

So if he would just set fname to str then fname would point to an empty
string.

Why he wants to retain the last slash is a question. Normally one would
have pathname as "/usr/bin" and fname as "foo" but maybe he wants to
have them so that he can later concatenate them without adding the '/'.

Topi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top