repeated calls to strrchr... to find second to last occurence

S

Sean Berry

I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

I thought I would be able to do it like this:

----------------------------------------------------
char *cd = (char *)NULL;

if (strstr(short_database, "http") != (char *)NULL) {
cd = strrchr(short_database, '.');
cd = strchr(cd, '/');
strcpy(short_database, cd);
}
---------------------------------------------------

But, since there is a "." in ".txt", this will not work.
So I need to repeat the call to find the second to
last instance of ".".

Can anyone help. Thanks in advance, and sorry about
the seemingly easy questing... I am not a good C programmer,
yet!
 
J

Joona I Palaste

Sean Berry said:
I need to find the second to last occurence of a "." in a string.
and want to extract /path/to/file.txt
I thought I would be able to do it like this:
if (strstr(short_database, "http") != (char *)NULL) {
cd = strrchr(short_database, '.');
cd = strchr(cd, '/');
strcpy(short_database, cd);
}
---------------------------------------------------
But, since there is a "." in ".txt", this will not work.
So I need to repeat the call to find the second to
last instance of ".".
Can anyone help. Thanks in advance, and sorry about
the seemingly easy questing... I am not a good C programmer,
yet!

strrchr() returns a pointer to the last match, or NULL if there was
no match. So, if it found a match, you need to investigate the part
of the string that comes before the last match.
First check if the pointer is the same as your original string
pointer. If it is, strrchr() found a match at the exact start of your
string, and there can't possibly be anything before it. So in that
case, exit: there isn't a second-to-last match.
Otherwise, change the character in the position strrchr() reported to
'\0', chopping off the string from the match onwards. Then call
sttrchr() again. If it found a match, that's your second-to-last
match. Otherwise exit: there isn't a second-to-last match.
If your string isn't modifiable, copy it into a modifiable string.
 
D

Darrell Grainger

I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

It seems like you are asking the wrong question. If you want to extract
everything after the domain name then looking for the second last '.'
character will not always work. What if I had the URL:

http://some.domain.com/path.with/a.period/in/it/file.txt

Don't you want everything after, and including, the third '/'?
I thought I would be able to do it like this:

----------------------------------------------------
char *cd = (char *)NULL;

if (strstr(short_database, "http") != (char *)NULL) {
cd = strrchr(short_database, '.');
cd = strchr(cd, '/');
strcpy(short_database, cd);
}
---------------------------------------------------

But, since there is a "." in ".txt", this will not work.
So I need to repeat the call to find the second to
last instance of ".".

Can anyone help. Thanks in advance, and sorry about
the seemingly easy questing... I am not a good C programmer,
yet!

Your C code doesn't seem to be the problem. You might want to pop over to
comp.programming and validate your algorithm before you attempt to
implement it.
 
A

Arthur J. O'Dwyer

I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

Don't use 'strrchr'. As you have discovered, that won't work.
The solution is not to hack around the problem, but rather to solve
it a different way. What part of the string do you want to extract?
Answer: the part following "mydomin.com". In general, the part which
comes immediately after the domain name, which is a sequence of
alphanumerics and dots, which itself follows the string "http://".
So look for "http://", followed by a sequence of alphanumerics and
dots, followed by a slash; and then extract everything from the slash
onwards.

BTW, unnecessary casts are evil.

char *cd;
if (strncmp(short_database, "http://", (sizeof "http://" - 1)) != 0)
do_error("URL does not begin with 'http://'!");
cd = short_database + (sizeof "http://" - 1);
while (!strchr("/", *cd))
++cd;
strcpy(short_database, cd);

(Note the use of 'strchr("/",...)' instead of '(... == '/')'. This
is an idiom that I've found very useful; it catches the end-of-string
null character as well as the slash for which we're really looking.
Be sure to preserve this behavior in your code; you don't want to
segfault if the user enters "http://www.google.com"!)

HTH,
-Arthur
 
J

Jens.Toerring

Don't use 'strrchr'. As you have discovered, that won't work.
The solution is not to hack around the problem, but rather to solve
it a different way. What part of the string do you want to extract?
Answer: the part following "mydomin.com". In general, the part which
comes immediately after the domain name, which is a sequence of
alphanumerics and dots, which itself follows the string "http://".
So look for "http://", followed by a sequence of alphanumerics and
dots, followed by a slash; and then extract everything from the slash
onwards.
BTW, unnecessary casts are evil.
char *cd;
if (strncmp(short_database, "http://", (sizeof "http://" - 1)) != 0)

Why ( sizeof "http://" - 1 )? Shouldn't that be sizeof "http://"?
do_error("URL does not begin with 'http://'!");
cd = short_database + (sizeof "http://" - 1);

I would think that the "-1" part does not look right here.
while (!strchr("/", *cd))

Using strchr() seems to be a bit of overkill when just comparing
characters. And it might be useful to guard against URLs like
"http://xx.yy.zz" by checking for '\0' while iterating over the
string:

while ( *cd && *cd != '/' )

if ( ! *cd )
do_error("URL without a path!");
strcpy(short_database, cd);

Don't use strcpy() when the strings may overlap, use memmove() instead:

memmove( short_database, cd, strlen( cd ) );

Regards, Jens
 
I

Irrwahn Grausewitz

Why ( sizeof "http://" - 1 )? Shouldn't that be sizeof "http://"?

Because otherwise strncmp would return non-zero for each and every
valid URL; hint: sizeof "http://" evaluates to 8.
I would think that the "-1" part does not look right here.

But it is correct. See above.
Using strchr() seems to be a bit of overkill when just comparing
characters. And it might be useful to guard against URLs like
"http://xx.yy.zz" by checking for '\0' while iterating over the
string:

And that's exactly what the strchr does; as Arthur already pointed
out, strchr considers the terminating null character to be part of
the string.
while ( *cd && *cd != '/' )

That's an equivalent solution.

FWIW, I'd let my code additionally check if the sequence between
"http://" and the next '/' or '\0' only consists of alphanumeric
characters plus dash plus dot.

Don't use strcpy() when the strings may overlap, use memmove() instead:

memmove( short_database, cd, strlen( cd ) );

Now, that's good advice, provided the obvious error is fixed:
terminating the string looks like a Good Idea to me.

Regards
 
J

Jens.Toerring

Because otherwise strncmp would return non-zero for each and every
valid URL; hint: sizeof "http://" evaluates to 8.
But it is correct. See above.

Grrr.... Perhaps hitting myself on my head will help me to remember

sizeof != strlen
And that's exactly what the strchr does; as Arthur already pointed
out, strchr considers the terminating null character to be part of
the string.

Ah, that was too clever a solution for me;-(
That's an equivalent solution.
FWIW, I'd let my code additionally check if the sequence between
"http://" and the next '/' or '\0' only consists of alphanumeric
characters plus dash plus dot.
Now, that's good advice, provided the obvious error is fixed:
terminating the string looks like a Good Idea to me.

Yes, another off by 1 error. Mustn't have been a good day...

Regards, Jens
 
D

Dan Pop

In said:
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

This is an ideal job for sscanf: its pattern matching capabilities are
just enough for your needs. Assuming that the string starts with
"http://" (otherwise use strstr to locate it first):

char str[] = "http://this.is.mydomain.com/path/to/file.txt";
char path[sizeof str];
int rc = sscanf(str, "http://%*[^/]%[^\n]", path);
if (rc == 1) puts(path);

If there are some other characters that you don't want to allow into the
path, include them in the last conversion specification.

Dan
 
P

Peter Ammon

Why ( sizeof "http://" - 1 )? Shouldn't that be sizeof "http://"?

I don't see why. sizeof "http://" is 8 (7 real characters + the null
character) but we only want to use seven characters.

I normally prefer to use strlen() in this circumstance. It makes the
code more obvious, since (as we just demonstrated) not everyone is
familar with the semantics of sizeof with string literals, and gcc will
optimize calls to strlen() with string literals so there is no
performance win using sizeof. YMMV.
I would think that the "-1" part does not look right here.

See above.
Using strchr() seems to be a bit of overkill when just comparing
characters. And it might be useful to guard against URLs like
"http://xx.yy.zz" by checking for '\0' while iterating over the
string:

while ( *cd && *cd != '/' )



if ( ! *cd )
do_error("URL without a path!");




Don't use strcpy() when the strings may overlap, use memmove() instead:

memmove( short_database, cd, strlen( cd ) );

Regards, Jens

Both good points, but your memmove() fails to copy the null character.
Add one to the strlen().

-Peter
 
P

Peter Ammon

Sean said:
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

I thought I would be able to do it like this:

----------------------------------------------------
char *cd = (char *)NULL;

if (strstr(short_database, "http") != (char *)NULL) {
cd = strrchr(short_database, '.');
cd = strchr(cd, '/');
strcpy(short_database, cd);
}
---------------------------------------------------

But, since there is a "." in ".txt", this will not work.
So I need to repeat the call to find the second to
last instance of ".".

Can anyone help. Thanks in advance, and sorry about
the seemingly easy questing... I am not a good C programmer,
yet!

Here's one way to do it.

char short_database[] = "http://this.is.mydomin.com/path/to/file.txt";
char* reader = short_database, * writer = short_database;
int slash_count = 3;
do {
if (slash_count <= 0) *writer++ = *reader++;
else if (*reader++ == '/') slash_count--;
} while (*reader);

will put "path/to/file.txt" into short_database.
 
K

kal

Sean Berry said:
I need to find the second to last occurence of a "." in a string.

Basically I am taking a URL like
http://this.is.mydomin.com/path/to/file.txt

and want to extract /path/to/file.txt

As has been pointed out by others, this approach has some
drawbacks.

If faced with a similar problem, I would first look around
for availability of functions that manipulate URL like
strings. The more general a function the better.

i.e. a function that extracts the "path" part even if one
or more of the other parts are missing. And may be another
function to check if a URL string is well formed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top