Parsing a path name

D

drhowarddrfine

I'm working with a server that will provide me the pathname to a file,
among many paths. So from getenv I may get /home/myweb/page1 but, of
course, there will be many variations of that. I'm unsure of the best
way to go about following the path. Should I read one char at a time
or use scanf? The problem could occur with something like
/home/mypage/page1/page1/page2/page2, for example.

I have not been programming in a few years so I could use a few hints,
tips, links and C functions. tia
 
R

Richard Heathfield

drhowarddrfine said:
I'm working with a server that will provide me the pathname to a file,
among many paths. So from getenv I may get /home/myweb/page1 but, of
course, there will be many variations of that. I'm unsure of the best
way to go about following the path. Should I read one char at a time
or use scanf? The problem could occur with something like
/home/mypage/page1/page1/page2/page2, for example.

Presumably you define "path" as "series of short steps, each of which begins
with a / character, and is followed either by another such step, or by the
end of the string".

So parse it in steps:

#include <stdio.h>

#define MAX_STEP_LEN 256 /* in your example, 8 would have done! */

int get_step(char *out, char **cur, char sep, size_t max)
{
int rc = 0;
if(**cur != '\0')
{
*out++ = **cur;
++*cur;
--max;

while(max > 0 && **cur != '\0' && **cur != sep)
{
*out++ = **cur;
++*cur;
--max;
}
*out = '\0';
rc = 1;
}
return rc;
}

int main(void)
{
char path[] = "/home/mypage/page1/page1/page2/page2";
char step[MAX_STEP_LEN] = {0};

char *curstep = path;
while(get_step(step, &curstep, '/', MAX_STEP_LEN))
{
printf("[%s]\n", step);
}

return 0;
}

Output:

[/home]
[/mypage]
[/page1]
[/page1]
[/page2]
[/page2]

If that does what you want, great. If not, it should at least get you
started.
 
F

Fred Kleinschmidt

drhowarddrfine said:
I'm working with a server that will provide me the pathname to a file,
among many paths. So from getenv I may get /home/myweb/page1 but, of
course, there will be many variations of that. I'm unsure of the best
way to go about following the path. Should I read one char at a time
or use scanf? The problem could occur with something like
/home/mypage/page1/page1/page2/page2, for example.

What *problem* are you talking about?
What is it that you are actually trying to do?
 
P

Paul Connolly

it is difficult to understand what you want to do - you might want to use
strtok to split up your string, delimited by slashes
 
D

drhowarddrfine

Richard may have provided the answer. As the title said, I'm trying to
parse the pathname. The web pages on this server are all dynamically
created using C but I need to find the path to the final web page so I
know what page to create. I thought of reading each char until I hit a
slash then doing a strcmp or some such to find where to go next to
continue down the path.
 
C

Christopher Layne

Paul said:
it is difficult to understand what you want to do - you might want to use
strtok to split up your string, delimited by slashes

strtok() is a POS, do not use this function. Roll your own or use an external
pre-exsting. But don't use strtok().
 
R

Richard Heathfield

Christopher Layne said:
strtok() is a POS, do not use this function. Roll your own or use an
external pre-exsting. But don't use strtok().

I disagree. Whilst it is true that the circumstances in which one might find
strtok() appropriate for use are somewhat limited, they do nevertheless
exist; it is no gets() clone, I assure you. I certainly have no qualms
about using strtok() - except when it is not appropriate.
 
R

Richard Bos

kondal said:
What is POS?

Point Of Sale?

But no, strtok() is rarely the function to use, but not never. In this
case, it _might_ work, but it probably won't be the best choice.

Richard
 
S

Simon Biber

Richard said:
Christopher Layne said:


I disagree. Whilst it is true that the circumstances in which one might find
strtok() appropriate for use are somewhat limited, they do nevertheless
exist; it is no gets() clone, I assure you. I certainly have no qualms
about using strtok() - except when it is not appropriate.

I wrote some code that parses commands (received over a network socket,
but that's irrelevant here).

The commands' names and functions that implement them are stored in an
array of structs:

struct Command
{
const char *name;
void (*function)(void);
} commands[] = {
{"help", cmd_help},
{"quit", cmd_quit},
{"foo", cmd_foo},
{"bar", cmd_bar}
};
size_t num_commands = sizeof commands / sizeof *commands;

One function starts off the strtok process, extracting the first token
to determine the command name and pass off the rest of the processing to
a specific function.

The command choice code looks like:

/* Start parsing it. Pointer q now contains the first token,
the name of the command */
char *q = strtok(p, " ");

/* Iterate through the commands, comparing names, until
we find one that matches */
size_t i;
for(i = 0; i < num_commands; i++)
{
if(!strcmp(q, commands->name))
{
/* call the function that implements this command */
commands->function();
break;
}
}

if(i == num_commands)
{
send_message("error no_such_command\n");
}

Whereupon the individual command's implementation function continues the
parsing process:

static void cmd_foo()
{
char *arg1 = strtok(NULL, " ");
if(arg1)
{
send_message("success foo\n");
}
else
{
send_message("error foo no_argument\n");
}
}

Different commands can take a different number of arguments.
 
D

drhowarddrfine

Richard said:
Christopher Layne said:


I disagree. Whilst it is true that the circumstances in which one might find
strtok() appropriate for use are somewhat limited, they do nevertheless
exist; it is no gets() clone, I assure you. I certainly have no qualms
about using strtok() - except when it is not appropriate.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

So what's wrong with strtok?
 
R

Robert Latest

On 4 Oct 2006 07:59:05 -0700,
in Msg. said:
So what's wrong with strtok?

Nothing at all as long as you know what it does and doesn't do and use
it appropriately.

Which can be said about damn near anything, even gets().

robert
 
D

Default User

drhowarddrfine wrote:

So what's wrong with strtok?


The main problem is that it modifies the string it acts on. In the
particular case, you're getting it from getenv(). The standard says
about that function:

Returns

[#4] The getenv function returns a pointer to a string
associated with the matched list member. The string pointed
to shall not be modified by the program, but may be
overwritten by a subsequent call to the getenv function. If
the specified name cannot be found, a null pointer is
returned.



So to use strtok() in a defined manner, you would have to make a copy
of the string first, which might be an unnecessary step. Without more
details on what you mean by "parse" and what you will do with the
resultant parsed information, it's hard to say what will be most useful.




Brian
 
K

Keith Thompson

Default User said:
The main problem is that it modifies the string it acts on.

The other problem is that it's not reentrant; it depends on static
internal state, so you can't process a string with strtok() in the
middle of processing another string with strtok().

Some implementations provide a non-standard strtok_r() function
(defined by POSIX) that avoids the reentrancy problem, but it still
modifies the string.
 
D

Default User

Keith said:
The other problem is that it's not reentrant; it depends on static
internal state, so you can't process a string with strtok() in the
middle of processing another string with strtok().

Some implementations provide a non-standard strtok_r() function
(defined by POSIX) that avoids the reentrancy problem, but it still
modifies the string.


Yes. It's not all that difficult to write a safer tokenizer, provided
you don't want to use of the fine examples already out there. I believe
some have been posted here in the past, otherwise they can found on the
web.




Brian
 
A

Andrew Poelstra

kondal said:


Point Of Sale. :)

You can use Google and the Urban Dictionary to get the right definition
in this context.

I had to ask my dad... it's really been that long since I was on
MSN. :-}
 
C

CBFalconer

Keith said:
The other problem is that it's not reentrant; it depends on static
internal state, so you can't process a string with strtok() in the
middle of processing another string with strtok().

Some implementations provide a non-standard strtok_r() function
(defined by POSIX) that avoids the reentrancy problem, but it still
modifies the string.

A third problem is that it doesn't detect null tokens, as signified
by two continuous occurences of the token delimiting char. If
these things matter, you can use the following routine:

/* ------- file toksplit.c ----------*/
#include "toksplit.h"

/* copy over the next token from an input string, after
skipping leading blanks (or other whitespace?). The
token is terminated by the first appearance of tokchar,
or by the end of the source string.

The caller must supply sufficient space in token to
receive any token, Otherwise tokens will be truncated.

Returns: a pointer past the terminating tokchar.

This will happily return an infinity of empty tokens if
called with src pointing to the end of a string. Tokens
will never include a copy of tokchar.

A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
Revised 2006-06-13
*/

const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) src++;

while (*src && (tokchar != *src)) {
if (lgh) {
*token++ = *src;
--lgh;
}
src++;
}
if (*src && (tokchar == *src)) src++;
}
*token = '\0';
return src;
} /* toksplit */

#ifdef TESTING
#include <stdio.h>

#define ABRsize 6 /* length of acceptable token abbreviations */

/* ---------------- */

static void showtoken(int i, char *tok)
{
putchar(i + '1'); putchar(':');
puts(tok);
} /* showtoken */

/* ---------------- */

int main(void)
{
char teststring[] = "This is a test, ,, abbrev, more";

const char *t, *s = teststring;
int i;
char token[ABRsize + 1];

puts(teststring);
t = s;
for (i = 0; i < 4; i++) {
t = toksplit(t, ',', token, ABRsize);
showtoken(i, token);
}

puts("\nHow to detect 'no more tokens' while truncating");
t = s; i = 0;
while (*t) {
t = toksplit(t, ',', token, 3);
showtoken(i, token);
i++;
}

puts("\nUsing blanks as token delimiters");
t = s; i = 0;
while (*t) {
t = toksplit(t, ' ', token, ABRsize);
showtoken(i, token);
i++;
}
return 0;
} /* main */

#endif
/* ------- end file toksplit.c ----------*/

--
Some informative links:
< <http://www.geocities.com/nnqweb/>
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/>
 
P

Paul Connolly

Default User said:
drhowarddrfine wrote:

So what's wrong with strtok?


The main problem is that it modifies the string it acts on. In the
particular case, you're getting it from getenv(). The standard says
about that function:

Returns

[#4] The getenv function returns a pointer to a string
associated with the matched list member. The string pointed
to shall not be modified by the program, but may be
overwritten by a subsequent call to the getenv function. If
the specified name cannot be found, a null pointer is
returned.



So to use strtok() in a defined manner, you would have to make a copy
of the string first, which might be an unnecessary step. Without more
details on what you mean by "parse" and what you will do with the
resultant parsed information, it's hard to say what will be most useful.




Brian

why doesn't getenv return a const char *?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top