I need help

J

janus

Hello All,

From language like Python I could do the following;

foo = ["New York", "New Jersey","New London"]
foo.join(" ")
That would join then.

In C, how am I going to reproduce the same?
Let me try first.
I feeling is to go this way


char newyork[] = "New York";

char newyork[] = "New London";

char newyork[] = "New Jersey";

int accnum = strlen(newyork) + strlen(newlondon) + strlen(newjersey);
char mystring[accnum];
strncat(mystring, newyork , sizeof(mystring)-strlen(newyork)-1);
strcat(mystring, " ");
strncat(mystring, newlondon , sizeof(mystring)-strlen(newlondon)-1);
strcat(mystring, " ");
strncat(mystring, newjersey , sizeof(mystring)-strlen(newjersey)-1)

Janus
 
S

Seebs

In C, how am I going to reproduce the same?

There are many ways to do this.

My usual idiom is roughly something to the effect of:
size_t len = enough_space;
char *buf = malloc(len);
char *s = buf;

for (i = 0; i < items; ++i)
s += snprintf(buf, len - (s - buf), "%s ", array);
if (s > buf)
s[-1] = '\0';

This is probably buggy. Note that figuring out "enough_space" is still
tricky; I'm just showing how I tend to handle stuff like this. You don't
want to do a lot of stuff like strncat(), that's expensive.

-s
 
H

Heinrich Wolf

janus said:
Hello All,

From language like Python I could do the following;

foo = ["New York", "New Jersey","New London"]
foo.join(" ")
That would join then.

In C, how am I going to reproduce the same?
Let me try first.
I feeling is to go this way


char newyork[] = "New York";

char newyork[] = "New London";
char newlondon[] ...
char newyork[] = "New Jersey";
char newjersey[] ...
int accnum = strlen(newyork) + strlen(newlondon) + strlen(newjersey);
char mystring[accnum];
That might not compile on every machine, since accnum is not necessary
constant. And you forgot 2 spaces and the terminating '\0' character. You
also forgot to initialize mystring with an empty string. That might lead to
memory corruption. Use this instead.
char mystring[sizeof(newyork) - 1 + sizeof(newlondon) - 1 +
sizeof(newjersey) - 1 + 2 + 1] = "";
strncat(mystring, newyork , sizeof(mystring)-strlen(newyork)-1);
strcat(mystring, " ");
strncat(mystring, newlondon , sizeof(mystring)-strlen(newlondon)-1);
strcat(mystring, " ");
strncat(mystring, newjersey , sizeof(mystring)-strlen(newjersey)-1)
if you declare mystring automatically with sufficient size as above with
sizeof(), then you can reduce the overhead of strlen() with strcat in place
of strncat.

But even strcat needs to loop through mystring in order to find it's current
end. You can reduce this overhead with:
strcpy(mystring, newyork);
mystring[sizeof(newyork) - 1] = ' ';
strcpy(mystring, sizeof(newyork) - 1 + 1, newlondon);
mystring[sizeof(newyork) - 1 + 1 + sizeof(newlondon) - 1] = ' ';
strcpy(mystring, sizeof(newyork) - 1 + 1 + sizeof(newlondon) - 1 + 1,
newjersey);
 
H

Heinrich Wolf

correcting typo's

....
But even strcat needs to loop through mystring in order to find it's
current end. You can reduce this overhead with:
strcpy(mystring, newyork);
mystring[sizeof(newyork) - 1] = ' ';
strcpy(mystring, sizeof(newyork) - 1 + 1, newlondon);
strcpy(mystring + sizeof(newyork) - 1 + 1, newlondon);
mystring[sizeof(newyork) - 1 + 1 + sizeof(newlondon) - 1] = ' ';
strcpy(mystring, sizeof(newyork) - 1 + 1 + sizeof(newlondon) - 1 + 1,
newjersey);
strcpy(mystring + sizeof(newyork) - 1 + 1 + sizeof(newlondon) - 1 + 1,
newjersey);
 
K

Keith Thompson

janus said:
Hello All,

From language like Python I could do the following;

foo = ["New York", "New Jersey","New London"]
foo.join(" ")
That would join then.

In C, how am I going to reproduce the same?
[snip]

Please choose a meaningful subject line when you post a question.
Something like "Joining strings" or "String concatenation" would
have been good.

And when you post a followup, please quote enough of the parent
article to provide some context; readers can't always easily see
the article to which you're replying. Google Groups can make this
difficult; last time I checked, you can correct it by going back
to the older version of the interface (or, of course, by using
something other than Google Groups).
 
S

Shao Miller

Hello All,

From language like Python I could do the following;

foo = ["New York", "New Jersey","New London"]
foo.join(" ")
That would join then.

In C, how am I going to reproduce the same?

Sometimes it can be nice to track the length of a string, or a buffer.
I was a bit bored, so here's something you might or might not enjoy.
With any luck, it doesn't have any off-by-one errors. :)

Also available at the following link, which has some nice colours:

http://codepad.org/CKfqOSG2

/* For printf() */
#include <stdio.h>
/* For malloc(), free(), EXIT_SUCCESS, EXIT_FAILURE */
#include <stdlib.h>
/* For memcpy() */
#include <string.h>

struct string {
/*
* Count of characters in the string,
* excluding any null terminator, in bytes
*/
size_t len;
/* Size of the buffer itself, in bytes */
size_t max_len;
/* Points to the buffer */
char * buf;
/* For allocation convenience, the buffer can co-incide with this */
char appended_buf[1];
};
/* You can use this macro for [wordy] convenience, as an initializer */
#define STRING_INIT_FROM_STRING_LITERAL(string) \
{sizeof string - 1, sizeof string, string, 0}

/* Join non-empty strings with a specified, non-empty delimiter string */
int join_strings_with_delim(
struct string * output_str,
const struct string * const * input_strings,
const unsigned int input_string_count,
const struct string * delim,
int allocation
) {
int i;
size_t output_size;
char * buf;

/* Check output string */
if (!output_str)
return EXIT_FAILURE;
/* Check delimiter string */
if (!delim || !delim->buf || !delim->len)
return EXIT_FAILURE;
/* Check the strings to be joined */
if (!input_string_count || input_string_count < 2 || !input_strings)
return EXIT_FAILURE;
for (i = 0, output_size = 0; i < input_string_count; ++i) {
/* Check each input string */
if (!input_strings || !input_strings->buf ||
!input_strings->len)
return EXIT_FAILURE;
/* Compute the output size */
output_size += input_strings->len;
}
/* Include the delimiter string size in the computation */
output_size += delim->len * (input_string_count - 1);
/* Include the null terminator */
++output_size;
/* Should we allocate? */
if (allocation) {
buf = malloc(output_size);
if (!buf)
return EXIT_FAILURE;
output_str->buf = buf;
output_str->max_len = output_size;
/* Exclude the null terminator from the string length */
output_str->len = output_size - 1;
} else {
/* We don't allocate, so the caller ought to have */
if (!output_str->buf || output_size > output_str->max_len)
return EXIT_FAILURE;
buf = output_str->buf;
}
/* Perform the join */
i = 0;
/* Skip the delimiter the first time around */
goto skip_delim;
do {
size_t slen;
slen = delim->len;
memcpy(buf, delim->buf, slen);
buf += slen;
skip_delim:
slen = input_strings->len;
memcpy(buf, input_strings->buf, slen);
buf += slen;
++i;
} while (i < input_string_count);
/* Terminate the string */
*buf = 0;
return EXIT_SUCCESS;
}

void free_string_buf(struct string * str) {
if (!str) {
/* Programmer error! */
return;
}
free(str->buf);
str->buf = NULL;
str->len = str->max_len = 0;
return;
}

int main(void) {
/* These strings don't change, so we use 'static const' */
static const struct string foo[] = {
STRING_INIT_FROM_STRING_LITERAL("New York"),
STRING_INIT_FROM_STRING_LITERAL("New Jersey"),
STRING_INIT_FROM_STRING_LITERAL("New London"),
};
/* An array of pointers to these strings. It oughtn't to change */
static const struct string * const foo_ptrs[] =
{foo, foo + 1, foo + 2};
/* An output string for the joined strings */
struct string foo_output;

/* Or without the macro, we could have used the more redundant: */
static const struct string bar[] = {
{sizeof "New York" - 1, sizeof "New York", "New York", 0},
{sizeof "New Jersey" - 1, sizeof "New Jersey", "New Jersey", 0},
{sizeof "New London" - 1, sizeof "New London", "New London", 0},
};
/* An array of pointers to these strings. It oughtn't to change */
static const struct string * const bar_ptrs[] =
{bar, bar + 1, bar + 2};
/* An output string for the joined strings */
struct string bar_output;

/* The delimiter string. It does not change */
static const struct string delim_str =
STRING_INIT_FROM_STRING_LITERAL(" ");

/* Track any errors */
int status;

/*
* Join the strings, with the specified delimiter,
* into the output buffer. The number of strings to process
* can be calculated with 'sizeof array / sizeof *array'.
* We happen to know that it's 3, but code can change over time
*/
status = join_strings_with_delim(
&foo_output,
foo_ptrs,
sizeof foo_ptrs / sizeof *foo_ptrs,
&delim_str,
1 /* Allocate the output buffer */
);
if (status == EXIT_FAILURE) {
printf("Allocating foo_output failed!\n");
return EXIT_FAILURE;
}
printf("foo_output: \"%s\"\n", foo_output.buf);
/* Free the output buffer */
free_string_buf(&foo_output);

/* Same for 'bar' */
status = join_strings_with_delim(
&bar_output,
bar_ptrs,
sizeof bar_ptrs / sizeof *bar_ptrs,
&delim_str,
1 /* Allocate the output buffer */
);
if (status == EXIT_FAILURE) {
printf("Allocating bar_output failed!\n");
return EXIT_FAILURE;
}
printf("bar_output: \"%s\"\n", bar_output.buf);
/* Free the output buffer */
free_string_buf(&bar_output);

return EXIT_SUCCESS;
}
 
S

Shao Miller

/* We don't allocate, so the caller ought to have */
if (!output_str->buf || output_size > output_str->max_len)
return EXIT_FAILURE;
buf = output_str->buf;
}

Oops. I forgot to do:

output_str->len = output_size - 1;

before that closing brace.

"How embarrassing." - Yoda, to the Younglings
 
S

Seebs

I am confuse... Could explain the following;
const struct string * const * input_strings,
# Having two consts and pointer symbol staggered

cdecl> explain const int * const * foo;
declare foo as pointer to const pointer to const int

What this is doing is saying that the struct strings pointed to
can't be modified, nor can the pointers to them, and input_strings
is an array (in effect) of these unmodifiable pointers.

FWIW, I did length-tracking strings ages ago.

http://www.seebs.net/c/sz.html

I've used them moderately heavily, and not had significant trouble. There's
an obviously unsafe assumption made in them, but In Practice It Works.
It would be pretty trivial to remove it, though it'd impose more bookkeeping.

(Basically, the magic that lets you guess whether something is really one
of these, or just a normal C string, is unreliable.)

-s
 
S

Shao Miller

Hello All,

From language like Python I could do the following;

foo = ["New York", "New Jersey","New London"]
foo.join(" ")
That would join then.

In C, how am I going to reproduce the same?

Sometimes it can be nice to track the length of a string, or a buffer. I
was a bit bored, so here's something you might or might not enjoy. With
any luck, it doesn't have any off-by-one errors. :)

Also available at the following link, which has some nice colours:

http://codepad.org/CKfqOSG2

/* For printf() */
#include <stdio.h>
/* For malloc(), free(), EXIT_SUCCESS, EXIT_FAILURE */
#include <stdlib.h>
/* For memcpy() */
#include <string.h>

struct string {
/*
* Count of characters in the string,
* excluding any null terminator, in bytes
*/
size_t len;
/* Size of the buffer itself, in bytes */
size_t max_len;
/* Points to the buffer */
char * buf;
/* For allocation convenience, the buffer can co-incide with this */
char appended_buf[1];
};
/* You can use this macro for [wordy] convenience, as an initializer */
#define STRING_INIT_FROM_STRING_LITERAL(string) \
{sizeof string - 1, sizeof string, string, 0}

/* Join non-empty strings with a specified, non-empty delimiter string */
int join_strings_with_delim(
struct string * output_str,
const struct string * const * input_strings,
const unsigned int input_string_count,
const struct string * delim,
int allocation
) {
int i;
size_t output_size;
char * buf;

/* Check output string */
if (!output_str)
return EXIT_FAILURE;
/* Check delimiter string */
if (!delim || !delim->buf || !delim->len)
return EXIT_FAILURE;
/* Check the strings to be joined */
if (!input_string_count || input_string_count < 2 || !input_strings)
return EXIT_FAILURE;
for (i = 0, output_size = 0; i < input_string_count; ++i) {
/* Check each input string */
if (!input_strings || !input_strings->buf ||
!input_strings->len)
return EXIT_FAILURE;
/* Compute the output size */
output_size += input_strings->len;
}
/* Include the delimiter string size in the computation */
output_size += delim->len * (input_string_count - 1);
/* Include the null terminator */
++output_size;
/* Should we allocate? */
if (allocation) {
buf = malloc(output_size);
if (!buf)
return EXIT_FAILURE;
output_str->buf = buf;
output_str->max_len = output_size;
/* Exclude the null terminator from the string length */
output_str->len = output_size - 1;
} else {
/* We don't allocate, so the caller ought to have */
if (!output_str->buf || output_size > output_str->max_len)
return EXIT_FAILURE;
buf = output_str->buf;
}
/* Perform the join */
i = 0;
/* Skip the delimiter the first time around */
goto skip_delim;
do {
size_t slen;
slen = delim->len;
memcpy(buf, delim->buf, slen);
buf += slen;
skip_delim:
slen = input_strings->len;
memcpy(buf, input_strings->buf, slen);
buf += slen;
++i;
} while (i < input_string_count);
/* Terminate the string */
*buf = 0;
return EXIT_SUCCESS;
}

void free_string_buf(struct string * str) {
if (!str) {
/* Programmer error! */
return;
}
free(str->buf);
str->buf = NULL;
str->len = str->max_len = 0;
return;
}

int main(void) {
/* These strings don't change, so we use 'static const' */
static const struct string foo[] = {
STRING_INIT_FROM_STRING_LITERAL("New York"),
STRING_INIT_FROM_STRING_LITERAL("New Jersey"),
STRING_INIT_FROM_STRING_LITERAL("New London"),
};
/* An array of pointers to these strings. It oughtn't to change */
static const struct string * const foo_ptrs[] =
{foo, foo + 1, foo + 2};
/* An output string for the joined strings */
struct string foo_output;

/* Or without the macro, we could have used the more redundant: */
static const struct string bar[] = {
{sizeof "New York" - 1, sizeof "New York", "New York", 0},
{sizeof "New Jersey" - 1, sizeof "New Jersey", "New Jersey", 0},
{sizeof "New London" - 1, sizeof "New London", "New London", 0},
};
/* An array of pointers to these strings. It oughtn't to change */
static const struct string * const bar_ptrs[] =
{bar, bar + 1, bar + 2};
/* An output string for the joined strings */
struct string bar_output;

/* The delimiter string. It does not change */
static const struct string delim_str =
STRING_INIT_FROM_STRING_LITERAL(" ");

/* Track any errors */
int status;

/*
* Join the strings, with the specified delimiter,
* into the output buffer. The number of strings to process
* can be calculated with 'sizeof array / sizeof *array'.
* We happen to know that it's 3, but code can change over time
*/
status = join_strings_with_delim(
&foo_output,
foo_ptrs,
sizeof foo_ptrs / sizeof *foo_ptrs,
&delim_str,
1 /* Allocate the output buffer */
);
if (status == EXIT_FAILURE) {
printf("Allocating foo_output failed!\n");
return EXIT_FAILURE;
}
printf("foo_output: \"%s\"\n", foo_output.buf);
/* Free the output buffer */
free_string_buf(&foo_output);

/* Same for 'bar' */
status = join_strings_with_delim(
&bar_output,
bar_ptrs,
sizeof bar_ptrs / sizeof *bar_ptrs,
&delim_str,
1 /* Allocate the output buffer */
);
if (status == EXIT_FAILURE) {
printf("Allocating bar_output failed!\n");
return EXIT_FAILURE;
}
printf("bar_output: \"%s\"\n", bar_output.buf);
/* Free the output buffer */
free_string_buf(&bar_output);

return EXIT_SUCCESS;
}

Pretty sure it'll be sizeof(char *), not sizeof(char[2]). :)

Woah.

That I could be wrong about this for twenty years without noticing sort
of suggests that it's esoteric, but it's still pretty basic stuff. *sigh*


You might've seen it, had your news-reader filtering been different.

On 6/24/2011 5:07 AM, in thread "Re: Web programming in C lang with
... I've never seen C code which
used a new string type. I know how it could be done; a struct containing
a size member and a flexible array member is one obvious approach.
However, it hasn't been done in any code I've ever had to deal with.

You might've seen it, had your news-reader filtering been different.

In thread "Re: pointer = &membuff[-2];":
Here's an example program:

...

Here's some feed-back, for whatever it's worth:

#define _POSIX_SOURCE 1
#include <stdlib.h>
#include <stdio.h>

signed int main(signed int argc, char ** argv) {
/* It can be nice to put constant values here */
enum cv {
buff_elem_count = 10,
test_num = 54,
zero = 0
};
unsigned int n1 = 0;
unsigned int n2 = 2;
unsigned int n3;
unsigned int * membuff;
/*
* 'struct dummy' can have an unpredictable
* alignment requirement which might
* not be the same as 'unsigned int',
* even though it must be divisible by
* the alignment requirement of
* 'unsigned int' due to the first member
*/
struct dummy {
unsigned int d1;
unsigned int d2;
unsigned int d3;
} * sp;

/* Modified size computation */
membuff = malloc(buff_elem_count * sizeof *membuff);
/* Check for null pointer value */
if (!membuff) {
puts("Out of memory.");
return EXIT_FAILURE;
}

/* Modified to use testing value */
membuff[0] = test_num;

/*
* If the pointer arithmetic result below does
* not point to an 'unsigned int', the behaviour
* is undefined.
* If the alignment requirement of
* 'struct dummy' is not satisfied by the pointer
* arithmetic result below, the behaviour is
* undefined.
*/
sp = (struct dummy *) (membuff + (signed int)n1 - n2);
/*
* If 'sp' does not point to a contiguous range of
* accessible memory with size 'sizeof (struct dummy)'
* (if there's a hole), then the behaviour below is
* undefined.
* If there is padding between members of
* 'struct dummy', '&sp->d3' is not necessarily the
* same as '(unsigned int (*)[3])sp + 2'.
*/
n3 = sp->d3;

/* Newline added */
printf("Result = %u\n", n3);

return EXIT_SUCCESS;
}
Here's an example program:

...

membuff = malloc(10*sizeof(unsigned int));

Better:
membuff = malloc(10 * sizeof *membuff);
membuff[0] = 54;

sp = (struct dummy *)&membuff[n1 - n2];

Yes, n1-n2 is UINT_MAX-1, not -2, as you suggest above. This gives this
code undefined behavior. Note, however, that the behavior would be
undefined, for essentially the same reason, if n1 and n2 were signed, so
that n1-n2 is -2. The only values you can safely put in that subscript
are 0 through 10. 10 is safe only because of the '&' - any other use of
memcpy[10] would have undefined behavior.

You don't have to dereference the resulting pointer to have problems,
and dereferencing it only to access sp->d3 doesn't avoid the problems.
The expression&membuf[-2] is equivalent to membuf-2, and it has
undefined behavior all by itself, before it's value is even converted to
(struct dummy *). On many real-world implementations, membuf-2 could be
an invalid pointer value, which means that even storing it in an address
register (as might be done somewhere in that expression) could cause
your program to abort or a signal to be raised (among many other
possibilities).

As a further, minor quibble, it's possible (though rather unlikely) that
the alignment requirements of struct dummy are stricter than those of
unsigned int, so it's not guaranteed that you could, for instance, even
safely cast (struct dummy*)&membuf[2].

...

In thread "Re: Reference counting":
Is one goal of news-reader filtering to save time?

But neither of you will read this, of course. Oh well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top