One of my first C functions - Please comment

M

Mark Hobley

I am writing a C function that will be utilized by a pathwalker. I am fairly
new to C programming. The purpose of the function is to look at a path variable
and determine the longest element within it. This is for a Linux based system
and the function supports paths with a colon prefixed with an backslash escape
character. (Presumably I need to support this on Linux, because directory
names can contain a colon).

The code currently contains a rather ugly "goto treatasdefault" within the
switch constuct, and I am wondering if this can be rearranged to be more
presentable.

If I am in one branch of a switch, is there any mechanism that will allow me to
conditionally jump to the default branch, or is goto the only way? I know
that I can fallthrough, by eliminating break, but this will just fallthrough
to the next switch branch. I really want to jump to default.

The code is not yet complete and has not yet been tested.

sconst.h:

#ifndef SCONST_H
#define SCONST_H

#define DIRSEP '/'
#define PATHSEP ':'
#define ESCSEQ '\'

#endif

lonp.c

/* ***************************************************************************

Filename: lonp.c Version: 0.0.1 Date: 96/03/2010

Details: This function will return the length of the longest value within
a path variable. This is typically used within path walking applications.

Written by Mark Hobley.

(c) Copyright 2010 Mark Hobley.

This file can be redistributed under the terms of version 2 of the
GNU General Public Licence as published by the Free Software Foundation.

It is prohibited to include any part of this software into any other
software product or other work, not covered by version 2 of the
GNU General Public Licence.

*************************************************************************** */

/* ************************** Standard Includes *************************** */

#include <stddef.h> /* Required for NULL */

/* **************************** Other Includes **************************** */

#include "lonp.h"

/* ******************************* Functions ****************************** */

int longestpath(char *pathdata) {
char *element = pathdata;
char c;
int eoe = 0; /* False */
int lastescseq = 0; /* False */
int len = 0, longest = 0;

while (!eoe) {
c = element
switch (c) {
case NULL:
eoe = 1; /* True */
if (len > longest) {
longest = len;
}
break;
case ESCSEQ:
if (lastescseq != 0) {
lastescseq = 0
goto treatasdetault;
}
case PATHSEP:
if (lastescseq != 0) {
goto treatasdefault;
}
if (len > longest) {
longest = len;
}
len = 0; /* Reset Length */
break;
default:
treatasdefault:
++len;
break;
}
++element;
}
return(longest);
}
 
K

Keith Thompson

The code is not yet complete and has not yet been tested.

Apparently it hasn't even been compiled.
sconst.h:

#ifndef SCONST_H
#define SCONST_H

#define DIRSEP '/'
#define PATHSEP ':'
#define ESCSEQ '\'

This is a syntax error (or rather, any attempt to use this macro is a
syntax error). The character constant for a backslash is '\\'.
#endif

lonp.c
[snip]
(c) Copyright 2010 Mark Hobley.

This file can be redistributed under the terms of version 2 of the
GNU General Public Licence as published by the Free Software Foundation.

It is prohibited to include any part of this software into any other
software product or other work, not covered by version 2 of the
GNU General Public Licence.

<OT>
I'm not sure this prohibition is consistent with the GPL. This isn't
the place to go into details, but a good start would be to carefully
read the GPL itself.
</OT>

[...]
#include "lonp.h"

Is it lonp.h or sconst.h?
/* ******************************* Functions ****************************** */

int longestpath(char *pathdata) {
char *element = pathdata;
char c;
int eoe = 0; /* False */
int lastescseq = 0; /* False */
int len = 0, longest = 0;

while (!eoe) {
c = element

c is of type char; element is of type char*. Your compiler should
have complained about this -- and about the missing semicolon.
switch (c) {
case NULL:

NULL is a null pointer constant, not a character constant. The null
character is '\0'. (Your compiler might not have complained about
this if your said:
eoe = 1; /* True */
if (len > longest) {
longest = len;
}
break;
case ESCSEQ:
if (lastescseq != 0) {
lastescseq = 0

Another missing semicolon.
goto treatasdetault;
}
case PATHSEP:
if (lastescseq != 0) {
goto treatasdefault;
}
if (len > longest) {
longest = len;
}
len = 0; /* Reset Length */
break;
default:
treatasdefault:
++len;
break;
}
++element;
}
return(longest);
}

If you want help with your program logic, please correct any
compilation errors before posting your code.
 
M

Mark Hobley

Keith Thompson said:
Apparently it hasn't even been compiled.

Yeah. Sorry Keith. I hadn't actually finished coding. I never thought about
compiling as is. It was a good idea though.
This is a syntax error (or rather, any attempt to use this macro is a
syntax error). The character constant for a backslash is '\\'.

Ok. I fixed this.

#define ESCSEQ '\\'
It is prohibited to include any part of this software into any other
software product or other work, not covered by version 2 of the
GNU General Public Licence.

Ok. I have read GPL. I am not sure what you think is wrong here. I will ask
about that in a software licencing related group.

#include "sconst.h"
Is it lonp.h or sconst.h?

Oops! It's both
c is of type char; element is of type char*. Your compiler should
have complained about this -- and about the missing semicolon.

Missing semicolon fixed. I was expecting a derefence here. I expected c to
take the value pointed to by *element. Right, I will do some more reading
to find out how to get the derefence to work.
NULL is a null pointer constant, not a character constant.

Ok. That wasn't clear at all (from documentation). I have now added a line to
lonp.h:

#define NULLCHR '\0'

and changed NULL to NULLCHR
Another missing semicolon.

Ok. Fixed. Cheers.
If you want help with your program logic, please correct any
compilation errors before posting your code.

Right. I have still got to look at that derefence and I will do a preliminary
compile.

Thanks for your help so far.

Mark.
 
M

Mark Hobley

Keith Thompson said:
I'm not sure this prohibition is consistent with the GPL. This isn't
the place to go into details, but a good start would be to carefully
read the GPL itself.

Ok. I have now asked about this in comp.software.licensing

Regards,

Mark.
 
M

Malcolm McLean

I am writing a C function that will be utilized by a pathwalker. I am fairly
new to C programming. The purpose of the function is to look at a path variable
and determine the longest element within it. This is for a Linux based system
and the function supports paths with a colon prefixed with an backslash escape
character. (Presumably I need to support this on Linux, because directory
names can contain a colon).

The code currently contains a rather ugly "goto treatasdefault" within the
switch constuct, and I am wondering if this can be rearranged to be more
presentable.

If I am in one branch of a switch, is there any mechanism that will allow me to
conditionally jump to the default branch, or is goto the only way? I know
that I can fallthrough, by eliminating break, but this will just fallthrough
to the next switch branch. I really want to jump to default.

The code is not yet complete and has not yet been tested.

sconst.h:

#ifndef SCONST_H
#define SCONST_H

#define DIRSEP '/'
#define PATHSEP ':'
#define ESCSEQ '\'

#endif

lonp.c

/* ***************************************************************************

  Filename: lonp.c          Version: 0.0.1      Date: 96/03/2010

  Details: This function will return the length of the longest value within
  a path variable. This is typically used within path walking applications.

  Written by Mark Hobley.

  (c) Copyright 2010 Mark Hobley.

  This file can be redistributed under the terms of version 2 of the
  GNU General Public Licence as published by the Free Software Foundation.

  It is prohibited to include any part of this software into any other
  software product or other work, not covered by version 2 of the
  GNU General Public Licence.

*************************************************************************** */

/* ************************** Standard Includes *************************** */

#include <stddef.h>  /* Required for NULL */

/* **************************** Other Includes **************************** */

#include "lonp.h"

/* ******************************* Functions ****************************** */

int longestpath(char *pathdata) {
  char *element = pathdata;
  char c;
  int eoe = 0;                /* False */
  int lastescseq = 0;         /* False */
  int len = 0, longest = 0;

  while (!eoe) {
    c = element
    switch (c) {
      case NULL:
        eoe = 1;    /* True */
        if (len > longest) {
          longest = len;
        }
        break;
      case ESCSEQ:
        if (lastescseq != 0) {
          lastescseq = 0
          goto treatasdetault;
        }
      case PATHSEP:
        if (lastescseq != 0) {
          goto treatasdefault;
        }
        if (len > longest) {
          longest = len;
        }
        len = 0;    /* Reset Length */
        break;
      default:
treatasdefault:
        ++len;
        break;
    }
    ++element;
  }
  return(longest);

}


It's knock-up code. The trick in making in neat is to use the standard
library functions to do some of the messiness for you. Here's an
untested version.

int longestpath(char *pathname, char sep)
{
char *ptr = pathname;
int answer = 0;
int dist;
char *sepptr;

while(ptr)
{
sepptr = strchr(ptr, sep);
/* this is a nasty bit, exclude escapes */
while(sepptr > ptr && *(sepptr-1) == ESCSEQ))
septpr = strrchr(sepptr+1, sep);

if(sepptr)
dist = sepptr - ptr -1;
else
dist = strlen(ptr);
if(dist > answer)
answer = dist;
ptr = sepptr;
if(ptr)
ptr = ptr + 1;
}
}

answer will give length, including anyescape characters.

A question for mediumbies: why do we say ptr = ptr +1 rather than ptr+
+?
 
N

Nick

Malcolm McLean said:
It's knock-up code. The trick in making in neat is to use the standard
library functions to do some of the messiness for you. Here's an
untested version.

int longestpath(char *pathname, char sep)
{
char *ptr = pathname;
int answer = 0;
int dist;
char *sepptr;

while(ptr)
{
sepptr = strchr(ptr, sep);
/* this is a nasty bit, exclude escapes */
while(sepptr > ptr && *(sepptr-1) == ESCSEQ))
septpr = strrchr(sepptr+1, sep);

if(sepptr)
dist = sepptr - ptr -1;
else
dist = strlen(ptr);
if(dist > answer)
answer = dist;
ptr = sepptr;
if(ptr)
ptr = ptr + 1;
}
}

answer will give length, including anyescape characters.

A question for mediumbies: why do we say ptr = ptr +1 rather than ptr+
+?

I haven't a clue. Any why not ptr += 1?

Of course, you could do:
[snip]
if(sepptr) {
dist = sepptr - ptr -1;
sepptr += 1;
}
else
dist = strlen(ptr);
if(dist > answer)
answer = dist;
ptr = sepptr;
}

Yes, you could avoid adding a line by
dist = sepptr++ - ptr -1;
but that seems too confusing.
 
E

Eric Sosman

Is it "Spot the UB Week" again already? Relational
operators are not defined for null pointer operands (6.5.8p5).
 
B

Ben Bacarisse

Eric Sosman said:
typo: var should be 'sepptr'

Is it "Spot the UB Week" again already? Relational
operators are not defined for null pointer operands (6.5.8p5).

So is falling off the end of the end of non-void function. There's
also an off-by-one error in the logic and escapes can't be escaped.
(I don't know if that last part is in the design but I think it should
be.) I know this was just Malcolm showing the overall idea but I
think that that very idea deserves some discussion.

There seems to be some interaction between C programmers and character
twiddling code that promotes monolithic functions with tricky logic
and, as a result, buggy code. I don't draw this conclusion from the
postings in this thread but from my own sorry experience. I have,
rather late in the day, started to apply normal programming design
criteria to these sorts of string fiddling problems. With the right
helper functions the code is often cleaner.

Here is how I'd go about it these days. It is another example where
strchr is less than ideal in that it would help if strchr returned a
pointer to the terminating null when no matching character is found so
I'll will write a function to do that.

Checking for escapes can be done inefficiently -- I only look when I
find a separator and multiple consecutive \s will be rare so I won't
bother to keep track of the "escape state" the way I often used to in
code like this.

I think this is one of those relatively rare cases where there is a
natural fit to do ... while loop rather than a while, so that's the
way I've gone:

const char *find_chr_or_eos(const char *s, char chr)
{
while (*s && *s != chr) ++s;
return s;
}

int is_escaped(const char *cp, const char *start, char esc)
{
int escaped = 0;
if (cp > start)
while (--cp >= start && *cp == esc)
escaped = !escaped;
return escaped;
}

const char *find_unescaped_chr_or_eos(const char *s, char chr, char esc)
{
const char *cp = s;
while (*(cp = find_chr_or_eos(cp, chr)) && is_escaped(cp, s, esc))
++cp;
return cp;
}

size_t longestpath(const char *path, char sep)
{
size_t max_len = 0;
const char *end;
do {
end = find_unescaped_chr_or_eos(path, sep, '\\');
if (end - path > max_len)
max_len = end - path;
path = end + 1;
} while (*end);
return max_len;
}

I am sure this is old hat to most people here, but as a convert away
from the sort of loop the OP presented I though it worth posting. It
will server me right if there are half a dozen bug in this but I
claim that, written this way, they will at least be easier to find!
 
P

Peter Nilsson

Ben Bacarisse said:
So is falling off the end of the end of non-void function.

That in itself is not UB, unless the calling function attempts
to use the return value. [6.9.1p12]
 
B

Ben Bacarisse

Peter Nilsson said:
Ben Bacarisse said:
So is falling off the end of the end of non-void function.

That in itself is not UB, unless the calling function attempts
to use the return value. [6.9.1p12]

Touche[1]! Not much chance of that for this sort of function though,
but a good catch none the less.

[1] I'll resist using UTF-8.
 
M

Mark Hobley

Mark Hobley said:
while (!eoe) {
c = *element;
switch (c) {
case NULLCHR:
eoe = 1; /* True */
if (len > longest) {
longest = len;
}
break;
}
++element; <----
} |

I have another query. If *element at the NUL character at the end of the
string, the final increment will take this pointer beyond the end of the
string. This does not matter to me, because the loop is ending, and
*element no longer has any relevance. However, is it actually legal to
increment beyond the end of the string, or do I need a trap here?
eg:

if (eoe == 0) {
++element;
}

The program runs fine without the trap AFAIK, but am I "breaking the rules"
here?

(In other words is the act of incrementing the pointer beyond the string
boundary against the rules, or is it only breaking the rules to attempt to
subsequently dereference the now invalid pointer?)

Mark.
 
B

Ben Bacarisse

I have another query. If *element at the NUL character at the end of the
string, the final increment will take this pointer beyond the end of the
string. This does not matter to me, because the loop is ending, and
*element no longer has any relevance. However, is it actually legal to
increment beyond the end of the string, or do I need a trap here?

No, it's safe.

(In other words is the act of incrementing the pointer beyond the string
boundary against the rules, or is it only breaking the rules to attempt to
subsequently dereference the now invalid pointer?)

It might be safe to do even that, since the null character may not be
at the end of the object, but that is a special case. You are right
that it is only de-referencing one of these one-past-the-end pointers
that is prohibited. This applies to none array objects as well:

int i, *ip = &i;
if (++ip > &i) puts("safe and always true");

<snip>
 
P

Peter Nilsson

[email protected] (Mark Hobley) said:
(... is the act of incrementing the pointer beyond the string
boundary against the rules, or is it only breaking the rules
to attempt to subsequently dereference the now invalid pointer?)

You can reference one byte beyond an array, but not dereference
it.
 
N

Nick Keighley

([...] is the act of incrementing the pointer beyond the string
boundary against the rules, or is it only breaking the rules to attempt to
subsequently dereference the now invalid pointer?)

its a special case allowed by the standard

int main (void)
{
int a [10];
int *pi;
int i;

pi = &a[9]; /* last entry ok */
i = *pi; /* ok */
pi = &a[10]; /* one past- ok */
i = *pi; /* BAD */

pi = &a[-1]; /* BAD */

return 0;
}

its invalid to even form the address of something outside the range
0..10
 
B

BruceS

([...] is the act of incrementing the pointer beyond the string
boundary against the rules, or is it only breaking the rules to attempt to
subsequently dereference the now invalid pointer?)

its a special case allowed by the standard

int main (void)
{
    int a [10];
    int *pi;
    int i;

    pi = &a[9];   /* last entry ok */
    i = *pi;      /* ok */
    pi = &a[10];  /* one past- ok */
    i = *pi;      /* BAD */

    pi = &a[-1];  /* BAD */

    return 0;

}

its invalid to even form the address of something outside the range
0..10

C&V please? I didn't know this one, and I can't seem to find it, at
least in C99. My search skills are sometimes suboptimal :<
 
E

Eric Sosman

[...]
int a [10];
[...]
its invalid to even form the address of something outside the range
0..10

C&V please? I didn't know this one, and I can't seem to find it, at
least in C99. My search skills are sometimes suboptimal :<

6.5.6p8 describes the situations where adding pointers and
integers (including negative integers) are valid. Situations
which would lead to a subscript less than [0] or greater than [10]
are not in the list, hence undefined by omission. (4.1p2 says
that this is just as undefined as if there were a specific ban.)
 
B

BruceS

[...]
     int a [10];
[...]
its invalid to even form the address of something outside the range
0..10
C&V please?  I didn't know this one, and I can't seem to find it, at
least in C99.  My search skills are sometimes suboptimal :<

     6.5.6p8 describes the situations where adding pointers and
integers (including negative integers) are valid.  Situations
which would lead to a subscript less than [0] or greater than [10]
are not in the list, hence undefined by omission.  (4.1p2 says
that this is just as undefined as if there were a specific ban.)

Thanks, Eric. I read that part of the Standard (after you gave the
ref), and thought I *still* wasn't getting it (had started more
questions on it) when I noticed that the stated range was 0..10. I
blame the time change.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,691
Members
48,796
Latest member
Greg L.

Latest Threads

Top