Finding a word inside a string

A

arnuld

WANTED: want to find whether a words exists inside a string or not and
convert it into int (its a number).

GOT: I got what I wanted

As usual, any suggestion to improve the program is welcomed :)



#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>


enum HELPER_VALUES {
VAL_SUCC = 0,
VAL_ERR = -1
};

enum BOOLEAN_VALUES {
VAL_TRUE = 1,
VAL_FALSE = 0
};

enum SIZES {
SIZE_VALUE = 10
};


enum {
RECV_TEXT_NO_MATCH = 10,
ERR_CONVERSION_CHAR_TO_INT = -10
};


int get_int_value_using_char_key(const char* t, const char* k);
void get_value_using_key(const char* t, const char* k, char* f);


int main(void)
{
int v_int = VAL_ERR;
char v_str[SIZE_VALUE+1] = {0};

const char* desc = "This is the value of Content: 123\n and this is
Size: 98";
char* k1 = "Content";
char* k2 = "Size";

get_value_using_key(desc, k1, v_str);
v_int = get_int_value_using_char_key(desc, k2);
printf("v_str = \"%s\"\n", v_str);
printf("v_int = %d\n", v_int);

return 0;
}



/* returns the integer value corresponding to the key mentioned as const
char* k */
int get_int_value_using_char_key(const char* t, const char* k)
{
int v = 0;
char value_str[SIZE_VALUE+1] = {0};

get_value_using_key(t, k, value_str);

if(value_str[0])
{
v = strtol(value_str, (char**) NULL, 10);
if(0 == v && ERANGE == errno) return ERR_CONVERSION_CHAR_TO_INT;
}

return v;
}



void get_value_using_key(const char* t, const char* k, char* fill_value)
{
int f_idx;
const int fill_size = SIZE_VALUE; /* can not use strlen(f) here as
fill_value is full of null chaacters */
char* p = NULL;
size_t len_of_key = strlen(k);

while(VAL_TRUE)
{
p = strstr(t, k);

if(NULL == p) return;

p += len_of_key;

if(*p == ':' && *(p+1) == ' ' )
{
p += 2; /* Removing the colon and next space after colon */
for(f_idx = 0; (f_idx != fill_size) && (*p != '\n'); ++p, +
+f_idx)
{
fill_value[f_idx] = *p;
}

break;
}
else
{
t = p;
}
}
}
======================= OUTPUT =====================
[arnuld@dune programs]$ gcc -ansi -pedantic -Wall -Wextra get-value-from-
key.c
[arnuld@dune programs]$ ./a.out
v_str = "123"
v_int = 98
[arnuld@dune programs]$
 
A

arnuld

WANTED: want to find whether a words exists inside a string or not and
convert it into int (its a number).

GOT: I got what I wanted

Eh... it should read:

WANTED: want to find whether a key (word) exists inside a string or not
and and taking its value (a number) convert it into int

:-/
 
I

Ike Naar

WANTED: want to find whether a words exists inside a string or not and
convert it into int (its a number).

GOT: I got what I wanted

Do you get what you want if you search for "ant" in
"Size of elephant: huge\nSize of ant: tiny" ?
 
N

Nick Keighley

Eh... it should read:

WANTED: want to find whether a key (word) exists inside a string or not
and and taking its value (a number) convert it into int

is strstr() any use to you?
 
A

arnuld

Do you get what you want if you search for "ant" in "Size of elephant:
huge\nSize of ant: tiny" ?


With this I get:

[arnuld@dune programs]$ ./a.out
v_str = ""
v_int = 0
[arnuld@dune programs]$

There are no integer values here in this input. Hence any returned values
are error conditions.

But this program returns 0 (zero) in error conditions, value 0 is
acceptable for v_int. These v_int values are connected with some
variable. It means this program will always overwrite my variables in
error conditions.

How about using isdigit() on input before deciding to extract the values ?
(NOTE: Input in my real program is very large, almost 1500 characters
every 2 seconds)
 
I

Ike Naar

Do you get what you want if you search for "ant" in "Size of elephant:
huge\nSize of ant: tiny" ?

With this I get:

[arnuld@dune programs]$ ./a.out
v_str = ""
v_int = 0

Okay, let's be more specific.

When using ``get_value_using_key'', do you get what you want if you
search for "ant" in "Size of elephant: huge\nSize of ant: tiny" ?

Alternatively, when using ``get_int_value_using_char_key'',
do you get what you want if you search for "ant" in
"Size of elephant: 9999999\nSize of ant: 1" ?

The point is that the method of using ``strstr'' to find the key in
the string table is rather crude, and you may get unexpected matches
if the key appears as part of another word.
 
B

Barry Schwarz

WANTED: want to find whether a words exists inside a string or not and
convert it into int (its a number).

GOT: I got what I wanted

As usual, any suggestion to improve the program is welcomed :)

-Eliminate the undefined behavior.
-Delete the useless cast.
-Eliminate the wasted vertical space.
-Don't use tabs for indenting.
-Indent consistently.
-Align your braces. Some of your } are under the corresponding {;
others are not.
-Delete unused enum values.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>


enum HELPER_VALUES {
VAL_SUCC = 0,
VAL_ERR = -1
};

enum BOOLEAN_VALUES {
VAL_TRUE = 1,
VAL_FALSE = 0
};

enum SIZES {
SIZE_VALUE = 10
};

In addition to defining some constants, you have created three new
object types. And without even looking I am sure there will never be
an object of type enum SIZES. Using enum is fine but what is the
point of creating multiple types? One tag-less type would do quite
well.
enum {
RECV_TEXT_NO_MATCH = 10,
ERR_CONVERSION_CHAR_TO_INT = -10
};


int get_int_value_using_char_key(const char* t, const char* k);
void get_value_using_key(const char* t, const char* k, char* f);


int main(void)
{
int v_int = VAL_ERR;
char v_str[SIZE_VALUE+1] = {0};

const char* desc = "This is the value of Content: 123\n and this is
Size: 98";
char* k1 = "Content";
char* k2 = "Size";

get_value_using_key(desc, k1, v_str);
v_int = get_int_value_using_char_key(desc, k2);
printf("v_str = \"%s\"\n", v_str);
printf("v_int = %d\n", v_int);

return 0;
}



/* returns the integer value corresponding to the key mentioned as const
char* k */
int get_int_value_using_char_key(const char* t, const char* k)
{
int v = 0;
char value_str[SIZE_VALUE+1] = {0};

get_value_using_key(t, k, value_str);

t contains the address of the string literal "This is ...".
k contains the address of the string literal "Size".
value_str evaluates to the address of the first byte in your local
array.
if(value_str[0])
{
v = strtol(value_str, (char**) NULL, 10);

What purpose do you think the cast serves? What would be different if
you deleted it?
if(0 == v && ERANGE == errno) return ERR_CONVERSION_CHAR_TO_INT;
}

So -10 will never be valid input to your code?

If strtol sets errno to ERANGE, you have already invoked undefined
behavior in the previous assignment to v (unless INT_MAX happens to
equal LONG_MAX). Regardless, when errno is set to ERANGE, v will
never be zero so the if can never evaluate to true.

If v happens to be zero, you do not know whether that is because your
text evaluated to 0 or because strtol could not perform any
conversion. Using NULL as the second argument to strtol eliminated
your best hope of doing error analysis.
return v;
}



void get_value_using_key(const char* t, const char* k, char* fill_value)
{

t contains the address of the string literal "This is ...".
k contains the address of the string literal "Size".
fill value contains the address of the array in the caller.
int f_idx;
const int fill_size = SIZE_VALUE; /* can not use strlen(f) here as
fill_value is full of null chaacters */

(You could pass the size of the array fill_value points to so that
this function could have general applicability.)
char* p = NULL;
size_t len_of_key = strlen(k);

len_of_key contains 4

(It is ok if this value is zero?)
while(VAL_TRUE)
{
p = strstr(t, k);

p contains the address of the 'S' in the long string literal.
if(NULL == p) return;

p += len_of_key;

p contains the address of the ':'.
if(*p == ':' && *(p+1) == ' ' )
True

{
p += 2; /* Removing the colon and next space after colon */

p contains the address of the '9'.
for(f_idx = 0; (f_idx != fill_size) && (*p != '\n'); ++p, +
+f_idx)

(Since ++ is a single token, how could your system break the line
here? I have never seen a newsreader break anywhere but a blank.)
{
fill_value[f_idx] = *p;
}

Iteration 1:
f_idx is 0.
p points to '9'.
fill_value[0] is '9'
Iteration 2:
f_idx is 1.
p points to '8'.
fill_value is '8'.
Iteration 3:
f_idx is 2.
p points to '\0'.
fill_value[2] is '\0' (unchanged).
Iteration 4:
f_idx is 3.
p points to ?????????
UNDEFINED BEHAVIOR starts here.
....
Iteration 10:
f_idx is 9.
p points to (a different) ?????????
Iteration 11:
f_idx is 10; loop terminates.

It is your misfortune that fill_value probably now contains
'9', '8', '\0', ?, ?, ?, ?, ?, ?, ?
and that strtol stopped when it tried to process the '\0'. Doing what
is expected is the most pernicious form of undefined behavior.
break;
}
else
{
t = p;
}
}
}
======================= OUTPUT =====================
[arnuld@dune programs]$ gcc -ansi -pedantic -Wall -Wextra get-value-from-
key.c
[arnuld@dune programs]$ ./a.out
v_str = "123"
v_int = 98
[arnuld@dune programs]$
 
S

Seebs

As usual, any suggestion to improve the program is welcomed :)

I suspect it is beyond my capacity to help much, but:
enum HELPER_VALUES {
VAL_SUCC = 0,
VAL_ERR = -1
};

These have no reason to exist.
enum BOOLEAN_VALUES {
VAL_TRUE = 1,
VAL_FALSE = 0
};

These have even more no reason to exist.

Completely eradicate both of these, they're useless to you.
enum SIZES {
SIZE_VALUE = 10
};

This makes no sense at all.
enum {
RECV_TEXT_NO_MATCH = 10,
ERR_CONVERSION_CHAR_TO_INT = -10
};

This, too, makes no sense at all.
int main(void)
{
int v_int = VAL_ERR;
char v_str[SIZE_VALUE+1] = {0};

This doesn't help any, because you have never given any reason for
SIZE_VALUE to hold the value it does.
const char* desc = "This is the value of Content: 123\n and this is
Size: 98";

This is a great example of a case in which it would make a lot more
sense for you to use the argc/argv parameters instead of hardcoding
a string.
/* returns the integer value corresponding to the key mentioned as const
char* k */

This is a good start on a comment, but "mentioned" is the wrong
word. I'd just say "passed as k". Don't duplicate the type information,
just use the parameter name.
v = strtol(value_str, (char**) NULL, 10);

This cast is unneeded.
if(0 == v && ERANGE == errno) return ERR_CONVERSION_CHAR_TO_INT;

This is a really awkward way to write this expression.
return v;

And what happens if the number someone uses has the same value as
your magic value here? Then you lose.

Here's the thing: If you want to return something with a reasonable
range, you MUST come up with separate channels for the data and the
return code. Any other "solution" is doomed to failure.
const int fill_size = SIZE_VALUE; /* can not use strlen(f) here as
fill_value is full of null chaacters */

Here you create a problem: You should be passing the size explicitly,
rather than relying on every part of your program remembering to us ethe
same enum.
for(f_idx = 0; (f_idx != fill_size) && (*p != '\n'); ++p, +
+f_idx)

Use <, not !=, for loop indexes -- some day you'll screw up and manage
to skip over the loop index or something, and you'll be screwed if you
used != instead of < or >.

I think you should probably rethink this, because as is, you're stuck
with exactly searching until '\n' no matter what. What about spaces
or other delimiters? I'd probably take a pointer to a function with
rules similar to isspace().

I guess... It's nice that you're trying to generalize, but you're
generalizing very poorly at this point. If I were going to try to do
this, it would have none of those enums.

My strategy would be:

* The routine to get a value introduced by a keyword would return a
pointer to the identified value, or a null pointer if nothing could
be found. This eliminates completely "SIZE_VALUE" and everything like
it. (It does add an allocation, but allocation is comparatively cheap
these days.)
* The routine to extract a number given a keyword would handle the
free internally, and would probably either take a pointer in which to
store the obtained value, or a pointer in which to store an error code.
* If I were feeling fancy, I might do something like
struct { int success, int value; } get_value...();
but I think on the whole that's more error-prone and confusing.

-s
 
A

August Karlstrom

-Don't use tabs for indenting.

You probably mean that we shouldn't mix tabs and blanks in indentation.
Unfortunately this is the default setting in GNU Emacs. I prefer to use
tabs only.

/August
 
I

Ike Naar

Use <, not !=, for loop indexes -- some day you'll screw up and manage
to skip over the loop index or something, and you'll be screwed if you
used != instead of < or >.

Using ``!='' gives you a stronger postcondition:

for (i = 0; i != N; ++i) { /* ... */ }
/* Postcondition: i == N */

vs.

for (i = 0; i < N; ++i) { /* ... */ }
/* Postcondition: i >= N */

And using ``!='' is probably safer:
If you use ``!='' the effect of screwing up the loop control might be
that the loop runs forever; at least, you notice that the code is now broken.
If you use ``<'' or ``>'' instead, the broken loop still terminates,
perhaps after performing the wrong number of iterations, giving slightly wrong
results; the brokenness of the code may go unnoticed for a long time.
 
S

Seebs

Using ``!='' gives you a stronger postcondition:
True!

for (i = 0; i != N; ++i) { /* ... */ }
/* Postcondition: i == N */

for (i = 0; i < N; ++i) { /* ... */ }
/* Postcondition: i >= N */
And using ``!='' is probably safer:
If you use ``!='' the effect of screwing up the loop control might be
that the loop runs forever; at least, you notice that the code is now broken.
If you use ``<'' or ``>'' instead, the broken loop still terminates,
perhaps after performing the wrong number of iterations, giving slightly wrong
results; the brokenness of the code may go unnoticed for a long time.

You have a point here. I guess the question is which failure mode is
more likely. This runs into the same issues as the neverending Assert
Debate; what kind of way of failing is best?

It is quite possible for a != N loop which goes wrong to corrupt a little
data before something else catches it -- especially if it, say, uses break;
statements or anything like that.

In short, It Depends. But you make some good points here, which I will think
about more.

-s
 
K

Keith Thompson

August Karlstrom said:
You probably mean that we shouldn't mix tabs and blanks in indentation.
Unfortunately this is the default setting in GNU Emacs. I prefer to use
tabs only.

I can't speak for Barry, but I'm sure he meant exactly what he said.

My own preference is to use only spaces for indenting (except in
contexts, such as Makefiles, that actually require tabs).

I understand the arguments on both sides, and I suggest that this
is not the place to re-hash them. My only point was that, though
you might disagree with Barry (and with me), it's probably not a
good idea to assume that he actually agrees with you but didn't
state it correctly.
 
A

August Karlstrom

I can't speak for Barry, but I'm sure he meant exactly what he said.

My own preference is to use only spaces for indenting (except in
contexts, such as Makefiles, that actually require tabs).

I understand the arguments on both sides, and I suggest that this
is not the place to re-hash them. My only point was that, though
you might disagree with Barry (and with me), it's probably not a
good idea to assume that he actually agrees with you but didn't
state it correctly.

I was only trying to provide some insight into what the real problem is
with the indentation of the OP's code; it's not that it's using tabs but
a combination of tabs and blank space.

/August
 
A

August Karlstrom

Using ``!='' gives you a stronger postcondition:

for (i = 0; i != N; ++i) { /* ... */ }
/* Postcondition: i == N */

vs.

for (i = 0; i< N; ++i) { /* ... */ }
/* Postcondition: i>= N */

Assuming N is non-negative we have the same postcondition in the latter
for loop as the loop invariant is i <= N which together with the while
condition i < N gives the postcondition (i >= N) && (i <= N) which is
equivalent to i == N.

/August
 
B

Barry Schwarz

I was only trying to provide some insight into what the real problem is
with the indentation of the OP's code; it's not that it's using tabs but
a combination of tabs and blank space.

The problem with tabs is that when the code is posted you have no idea
how far the recipient's system will indent. A reasonable 4 or 5
spaces will probably render the code readable. An excessive 8 or 10
will probably not.
 
C

Chad

 if(value_str[0])
What purpose do you think the cast serves?  What would be different if
you deleted it?


So -10 will never be valid input to your code?

If strtol sets errno to ERANGE, you have already invoked undefined
behavior in the previous assignment to v (unless INT_MAX happens to
equal LONG_MAX).  Regardless, when errno is set to ERANGE, v will
never be zero so the if can never evaluate to true.

If v happens to be zero, you do not know whether that is because your
text evaluated to 0 or because strtol could not perform any
conversion.  Using NULL as the second argument to strtol eliminated
your best hope of doing error analysis.

Would a better method be something like the following

errno = 0;

if (v == 0) {
if (errno == ERANGE) {
return errno;
}
return ERR_CONVERSION_CHAR_TO_INT;
}


Chad
 
B

Barry Schwarz

 if(value_str[0])
   {
     v = strtol(value_str, (char**) NULL, 10);  

What purpose do you think the cast serves?  What would be different if
you deleted it?
     if(0 == v && ERANGE == errno)  return ERR_CONVERSION_CHAR_TO_INT;
   }

So -10 will never be valid input to your code?

If strtol sets errno to ERANGE, you have already invoked undefined
behavior in the previous assignment to v (unless INT_MAX happens to
equal LONG_MAX).  Regardless, when errno is set to ERANGE, v will
never be zero so the if can never evaluate to true.

If v happens to be zero, you do not know whether that is because your
text evaluated to 0 or because strtol could not perform any
conversion.  Using NULL as the second argument to strtol eliminated
your best hope of doing error analysis.

Would a better method be something like the following

errno = 0;

if (v == 0) {
if (errno == ERANGE) {
return errno;
}
return ERR_CONVERSION_CHAR_TO_INT;
}

How could it be? If v is zero, the errno cannot be ERANGE. It makes
no sense to test a condition when the answer is already known.

On the other hand, if errno is ERANGE then the code invokes undefined
behavior.
 
S

Seebs

Would a better method be something like the following

errno = 0;

if (v == 0) {
if (errno == ERANGE) {
return errno;
}
return ERR_CONVERSION_CHAR_TO_INT;
}

On one of my systems, ERANGE is 34.

Wouldn't it suck to have a program blow up claiming I hadn't
provided a valid number every time I tried to enter "34"?

-s
 
A

August Karlstrom

The problem with tabs is that when the code is posted you have no idea
how far the recipient's system will indent. A reasonable 4 or 5
spaces will probably render the code readable. An excessive 8 or 10
will probably not.

What you see as a problem I see as an advantage: Everyone can choose
their preferred amount of indentation by setting the tab with in their
favourite editor. And at least Linus Torvalds thinks that indentation
steps corresponding to eight blanks is readable. I prefer three myself.

/August
 
M

Marcin Grzegorczyk

Ike said:
Using ``!='' gives you a stronger postcondition:

for (i = 0; i != N; ++i) { /* ... */ }
/* Postcondition: i == N */

vs.

for (i = 0; i < N; ++i) { /* ... */ }
/* Postcondition: i >= N */

And using ``!='' is probably safer:

IMHO it's safest to use `<` and follow the loop body with

assert(i == N);

if the code actually depends on that. That might not be the case if `N`
is allowed to be negative (in that case, `i != N` as the loop condition
will obviously screw up).
If you use ``!='' the effect of screwing up the loop control might be
that the loop runs forever; at least, you notice that the code is now broken.

Or it might not; `i` may eventually reach `N`, possibly after a
wraparound. If the width of `i` is small (say, 16 bits), you may not
even notice the delay.

IOW, no silver bullet. As Seebs wrote, It Depends.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top