susbtring function

H

HSeganfredo

Hello again,

I´ve made a small substring() function that shoudl give back the a
susbtring from a string, given its start and end positions.

Is there any chance of the code below give an output containing data
from another memory area that is completely unrelated to the input
string? This is happening and the start/end parameters are acceptable
(within bounds) for the input string...any idea?


char *substring(char *string, int start, int end){
printf("Input %s:", string);

char *result = (char *)malloc((end - start + 1)*sizeof(char));
if(result == NULL){
return(NULL);
}

string = string + start;
while(start < end){
*result = *string;
result++;
string++;
start++;
}
printf("Output %s:", result);
return(result);
}
 
I

Ian Collins

Hello again,

I´ve made a small substring() function that shoudl give back the a
susbtring from a string, given its start and end positions.

Is there any chance of the code below give an output containing data
from another memory area that is completely unrelated to the input
string? This is happening and the start/end parameters are acceptable
(within bounds) for the input string...any idea?
A few issues:

char *substring(char *string, int start, int end){

You don't change the input string, so make this

char *substring(const char *string, int start, int end);
printf("Input %s:", string);

You forgot the '\n':

printf("Input %s:", string);
char *result = (char *)malloc((end - start + 1)*sizeof(char));

Don't cast the return of malloc and sizeof(char) == 1 by definition.

char *result = malloc(end - start + 1);
if(result == NULL){
return(NULL);

return isn't a function

return NULL;
You must preserve the start of the result in order to return it:

char* temp = result;
string = string + start;
while(start < end){
*result = *string;

More idiomatic:

*result++ = *string++;
start++;
}

You should add the closing '\0' to the result.
printf("Output %s:", result);
return(result);

printf("Output %s:\n", temp);
return temp;
 
C

CBFalconer

I´ve made a small substring() function that shoudl give back the a
susbtring from a string, given its start and end positions.

Is there any chance of the code below give an output containing data
from another memory area that is completely unrelated to the input
string? This is happening and the start/end parameters are acceptable
(within bounds) for the input string...any idea?

char *substring(char *string, int start, int end){

start and end are indices into *string, so should be type size_t
printf("Input %s:", string);

char *result = (char *)malloc((end - start + 1)*sizeof(char));

You can't put this after code. Declare result before the printf
statement. Delete the cast. Eliminate "*sizeof(char)", because
that is 1 by definition.
if(result == NULL){
return(NULL);
}

string = string + start;

Where have you checked that start is a valid index for this
string? Maybe "if (start < strlen(string)) ... " would do.
while(start < end){

Where have you checked that end is a valid index for this string?
Where have you checked that end is larger than start? See above.
*result = *string;

Maybe this expression should include start?
result++;

Maybe this action should disappear?
string++;
start++;

Indenting code controlled by a conditional is considered cool.
}
printf("Output %s:", result);
return(result);

This isn't the value malloc returned. You have a problem when you
free, or you have a memory leak.

Enjoy.
 
K

Keith Thompson

Ian Collins said:
(e-mail address removed) wrote: [...]
printf("Input %s:", string);

You forgot the '\n':

printf("Input %s:", string);

So did you. Just as a matter of esthetics, putting the ':' at the end
seems odd. I'd write this as:

printf("Input: \"%s\"\n", string);

[...]
return isn't a function

return NULL;

To expand on that a bit: the syntax of a return statement is
return ;
or
return <expression> ;

The parentheses are not necessary, but they are permitted. If you use

return(NULL);

then the parentheses are part of the expression, not part of the
syntax of the return statement itself.

I prefer *not* to use extraneous parentheses; a return statement isn't
a function call, so it shouldn't look like one.

But using parentheses isn't wrong. Even the examples in K&R1 use
parentheses on return statements (though K&R2 changed this).

[...]
 
C

Charlie Gordon

Ian Collins said:
You don't change the input string, so make this

char *substring(const char *string, int start, int end);


You forgot the '\n':

printf("Input %s:", string);


Don't cast the return of malloc and sizeof(char) == 1 by definition.

char *result = malloc(end - start + 1);

return isn't a function

return NULL;
You must preserve the start of the result in order to return it:

char* temp = result;


More idiomatic:

*result++ = *string++;


You should add the closing '\0' to the result.


printf("Output %s:\n", temp);
return temp;

All these corrections are fine and dandy... to complete the answer for the
OP, there are 2 reasons you can get data in the output unrelated to in
input:
- you forgot the final '\0' after the copy loop. Insert *result = '\0';
after the }.
- you don't check that start and end fall within the boundaries of the
string. If not, you are attempting to read from beyond the end of the
string or before its beginning or indeed from no particular location... This
invokes undefined behaviour and does not necessarily cause your program to
crash.
 
I

Ian Collins

Charlie said:
All these corrections are fine and dandy...

I hope the were more than that.
to complete the answer for the
OP, there are 2 reasons you can get data in the output unrelated to in
input:
- you forgot the final '\0' after the copy loop. Insert *result = '\0';
after the }.
- you don't check that start and end fall within the boundaries of the
string. If not, you are attempting to read from beyond the end of the
string or before its beginning or indeed from no particular location... This
invokes undefined behaviour and does not necessarily cause your program to
crash.
There were at least three, as I pointed out, he was returning one past
the end of the result.
 
K

Keith Thompson

CBFalconer said:
(e-mail address removed) wrote: [...]
printf("Input %s:", string);

char *result = (char *)malloc((end - start + 1)*sizeof(char));

You can't put this after code. Declare result before the printf
statement.
[...]

To be precise, C90 does not allow declarations to follow statements
within a block, but C99 does. Most compilers do not (yet?) fully
implement C99, so if you want maximal portability for your code, all
declarations within a block should precede all statements within that
same block. If you don't mind limiting the portability of your code
to compilers than support C99, or at least that support that
particular feature, then you're ok.

[...]
that is 1 by definition.


Where have you checked that start is a valid index for this
string? Maybe "if (start < strlen(string)) ... " would do.


Where have you checked that end is a valid index for this string?
Where have you checked that end is larger than start? See above.

Possibly that check needs to be done by the caller. It's ok for a
function's behavior to be undefined if you give it invalid arguments;
many functions in the standard library work this way.

If you do want to do the check in the function, you need to decide
what to do if it fails. For a substring function, if the 'start' and
'end' functions are outside the bounds of the string, it's probably
reasonable to return an empty string; you might not even consider this
to be an error.

But the most important thing is to *document the requirements*. If I
want to use your function, I should be able to look at the function's
declaration (not body) and any commentary, an for any possible
arguments either determine exactly what the function will do, or be
warned that the behavior is undefined and I shouldn't call it with
those arguments.

[...]
 
M

Michal Nazarewicz

I´ve made a small substring() function that shoudl give back the a
susbtring from a string, given its start and end positions.

Is there any chance of the code below give an output containing data
from another memory area that is completely unrelated to the input
string?

As others described it is. :)
This is happening and the start/end parameters are acceptable
(within bounds) for the input string...any idea?

char *substring(char *string, int start, int end){
printf("Input %s:", string);

char *result = (char *)malloc((end - start + 1)*sizeof(char));
if(result == NULL){
return(NULL);
}

string = string + start;
while(start < end){
*result = *string;
result++;
string++;
start++;
}
printf("Output %s:", result);
return(result);
}

You may use strncpy() but remember to put 0 byte at the end of the
string:

#v+
char *substring(const char *string, size_t start, size_t end) {
return strcnpy(calloc(end-start+1), string+start, end-start);
}
#v-

Exercise: make the code readable, check if memory allocation succeed,
check if start is valid string's index and [start, end] is
non-empty set.
 
C

Charlie Gordon

Michal Nazarewicz said:
As others described it is. :)


You may use strncpy() but remember to put 0 byte at the end of the
string:

Please refrein from giving the OP ill advice.
strncpy is not appropriate as an alternative for the above code, memcpy is.
The semantics of strncpy are cumbersome, inefficient and error prone. It
should be deprecated and no longer used. Viable alternatives exist such as
BSD's strlcpy.
#v+
char *substring(const char *string, size_t start, size_t end) {
return strcnpy(calloc(end-start+1), string+start, end-start);
}
#v-

Assuming strcncpy was a typo for the infamous strncpy, this solution is
inefficient: the destination array is written twice, and each byte copied
from the source string must be checked for '\0'. bytes beyond an eventual
'\0' are not copied, unlike the previous code, but instead are set to '\0'
in the destination buffer, unlike popular belief.

A much simpler and more efficient solution:

char *substring(const char *string, size_t start, size_t end) {
char *dest = malloc(end - start + 1);
if (dest) {
memcpy(dest, string + start, end - start);
dest[end - start] = '\0';
}
return dest;
}
Exercise: make the code readable, check if memory allocation succeed,
check if start is valid string's index and [start, end] is
non-empty set.

Of course we agree on these important considerations.
 
M

Michal Nazarewicz

Charlie Gordon said:
Please refrein from giving the OP ill advice.
strncpy is not appropriate as an alternative for the above code, memcpy is.
The semantics of strncpy are cumbersome, inefficient and error prone. It
should be deprecated and no longer used. Viable alternatives exist such as
BSD's strlcpy.

It seems OP is learning C and therefore it's IMO important to try out
different solutions. True however, that I should have said that the
code is inefficient.
Assuming strcncpy was a typo for the infamous strncpy,

Ah yes, sorry...
this solution is inefficient: the destination array is written twice,
and each byte copied from the source string must be checked for '\0'.

No matter if one uses strncpy or something else source string should be
checked for terminating NUL character.
A much simpler and more efficient solution:

char *substring(const char *string, size_t start, size_t end) {
char *dest = malloc(end - start + 1);
if (dest) {
memcpy(dest, string + start, end - start);
dest[end - start] = '\0';
}
return dest;
}

This adds another point to exercise list below: check if end is valid
string's index.
Exercise: make the code readable, check if memory allocation succeed,
check if start is valid string's index and [start, end] is
non-empty set.

Of course we agree on these important considerations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top