parsing commands and parameters

  • Thread starter meATprivacyDOTnet
  • Start date
M

meATprivacyDOTnet

Hi,

To learn some C, I am writing a sample POP server in standard C to
handle connections from a standard telnet client.

The client will send to the server one of the following commands in a
single line over the network:

- SEND recipient "body" "subject"
e.g. SEND "joe" "this is a test message" "test message"

- VIEW x
e.g. VIEW 1 (view the 1st message)

- QUIT
(no params, just close the connection)

Using the recv() function, I stored the whole line (main command and
arguments) from the client into a string in the server.

What is the best/easiest way to parse the string to get out of it the
command and the arguments in the server?

Thanks in advance.
 
C

Christopher Benson-Manica

meATprivacyDOTnet said:
To learn some C, I am writing a sample POP server in standard C to
handle connections from a standard telnet client.

Just FYI, a lot of the details of your project won't be topical here:

http://www.ungerhu.com/jxh/clc.welcome.txt
http://www.eskimo.com/~scs/C-faq/top.html
http://benpfaff.org/writings/clc/off-topic.html
What is the best/easiest way to parse the string to get out of it the
command and the arguments in the server?

I am not a guru; add grains of salt as needed and watch to see if the
gurus correct me :)

strtok() is effective, but it's not particularly easy to use
correctly. Assuming POP commands are delimited by spaces (I don't
know), you could start with something simple:

#define MAX_ARGS 10
char cmd[1<<10]; /* the size is up to you */
char *args[MAX_ARGS];
unsigned int arg_count=0;
char *cp=args;

/* Use system-specific means to fill cmd with the response from a POP
* command, taking care not to overflow the cmd array */

while( cp && arg_count < MAX_ARGS ) {
args[arg_count++]=cp;
cp=strchr( cp, ' ' );
if( cp ) {
*cp++=0;
}
}

/* args now contains arg_count arguments, with which you can do
* whatever you want */
 
M

meATprivacyDOTnet

strtok() is effective, but it's not particularly easy to use
correctly. Assuming POP commands are delimited by spaces (I don't
know), you could start with something simple:

#define MAX_ARGS 10
char cmd[1<<10]; /* the size is up to you */
char *args[MAX_ARGS];
unsigned int arg_count=0;
char *cp=args;

/* Use system-specific means to fill cmd with the response from a POP
* command, taking care not to overflow the cmd array */

while( cp && arg_count < MAX_ARGS ) {
args[arg_count++]=cp;
cp=strchr( cp, ' ' );
if( cp ) {
*cp++=0;
}
}

/* args now contains arg_count arguments, with which you can do
* whatever you want */

I think this will work if commands and arguments are simply delimited by
spaces, but it won't work for the following line, for which some of the
arguments are delimited by double quotes and contain spaces:

SEND recipient "body" "subject"
e.g.: SEND joe "this is a test message" "test massage"

Any idea on how to treat that?

Thanks a lot.
 
B

Ben Pfaff

Nitin said:
Use strtok .. that's the best thing available in C for such things

strtok() has at least these problems:

* It merges adjacent delimiters. If you use a comma as
your delimiter, then "a,,b,c" is three tokens, not
four. This is often the wrong thing to do. In fact,
it is only the right thing to do, in my experience,
when the delimiter set is limited to white space.

* The identity of the delimiter is lost, because it is
changed to a null terminator.

* It modifies the string that it tokenizes. This is bad
because it forces you to make a copy of the string if
you want to use it later. It also means that you can't
tokenize a string literal with it; this is not
necessarily something you'd want to do all the time but
it is surprising.

* It can only be used once at a time. If a sequence of
strtok() calls is ongoing and another one is started,
the state of the first one is lost. This isn't a
problem for small programs but it is easy to lose track
of such things in hierarchies of nested functions in
large programs. In other words, strtok() breaks
encapsulation.
 
M

meATprivacyDOTnet

Use strtok .. that's the best thing available in C for such things

How do I handle the following command with the strtok() function?

- SEND recipient "body" "subject"
e.g. SEND "joe" "this is a test message" "test message"

Thanks.
 
C

Christopher Benson-Manica

meATprivacyDOTnet said:
I think this will work if commands and arguments are simply delimited by
spaces, but it won't work for the following line, for which some of the
arguments are delimited by double quotes and contain spaces:
SEND recipient "body" "subject"
e.g.: SEND joe "this is a test message" "test massage"
Any idea on how to treat that?

You could step through the string one character at a time, changing
spaces to NUL characters and adding to the arguments array, and keep
track of whether you've found a double quote and ignore all spaces
until you find another one.

There's probably an elegant solution using sscanf, but I for one am
not qualified to provide it.
 
D

Daniel Haude

On Mon, 04 Apr 2005 08:54:43 +0200,
in Msg. said:
I think this will work if commands and arguments are simply delimited by
spaces, but it won't work for the following line, for which some of the
arguments are delimited by double quotes and contain spaces:

SEND recipient "body" "subject"
e.g.: SEND joe "this is a test message" "test massage"

Any idea on how to treat that?

Sure. Just "walk" through the string with a couple of pointers, keeping in
mind of what you're looking for, and provide tratment of errors (like
missing arguments or quotes). It's fun and educating exercise writing
little parsers like that -- and it's amazing how much can go wrong.

For more complex tasks you'd use state machines or a dedicated regex
library, but this problem is still well-suited to "walking pointers".

--Daniel
 
T

Thomas Cameron

Christopher said:
meATprivacyDOTnet <[email protected]> spoke thus:

You could step through the string one character at a time, changing
spaces to NUL characters and adding to the arguments array, and keep
track of whether you've found a double quote and ignore all spaces
until you find another one.

There's probably an elegant solution using sscanf, but I for one am
not qualified to provide it.

The issue with iterating through the string using the double quote is that
you can not have one in your message...now we get into having to escape
characters...and this is going to get fun real soon. :)
 
C

Christopher Benson-Manica

Thomas Cameron said:
The issue with iterating through the string using the double quote is that
you can not have one in your message...now we get into having to escape
characters...and this is going to get fun real soon. :)

Handling all that is left as an exercise for the reader :)
 
D

Daniel Haude

On Mon, 04 Apr 2005 11:29:57 -0400,
in Msg. said:
The issue with iterating through the string using the double quote is that
you can not have one in your message...now we get into having to escape
characters...and this is going to get fun real soon. :)

Things like this quickly evolve into full-blown state machines. A fun
exercise which, if getting marginally more complex, is better left to a
regex library or lex.

--Daniel
 
M

Michael Wojcik

How do I handle the following command with the strtok() function?

- SEND recipient "body" "subject"
e.g. SEND "joe" "this is a test message" "test message"

You don't. strtok is the wrong tool for this job. (It's the wrong
tool for most jobs, for the reasons Ben listed.)

The best way to do this - particularly since you should be handling
things like embedded quotation marks - is to use a proper lexical
analyzer (aka "lexer") and parser. That gives you a single
consistent, reliable mechanism for handling all well-formed input,
detecting malformed input, and so on.

The lexer divides the input stream into tokens, where a token
might be "send-command" (for "SEND") or "string" (for any of its
three arguments) or special values like "EOL" for end-of-line.
The parser recognizes a sequence of tokens and acts on it, eg by
calling the "send message" function with parameters for recipient,
body, and subject.

You could use a toolset like lex and yacc (or their descendants,
such as Flex and Bison) to generate a lexer and parser, or write
them yourself. For something this simple I'd probably just write
them myself, and I'd probably use a single-level, non-reducing
parser (that is, all the productions would be from the starting
state to a list of terminal symbols, and the parser would just
iterate through them until it found a match).

Lexers are typically written in C as state machines. One popular
approach is to create an enumeration that's a list of the states
you could be in while you're tokenizing, create a switch statement
for each of those states, and wrap that in a while loop that
iterates over all the input. (This assumes that you can tokenize
all of your input first, and then parse it. In some cases you
need to be able to alternate between tokenizing and parsing, in
which case the lexer will generally run until it has identified
the next token, which it then returns to the caller.)

The cases in the switch statement look at the next character and
decide whether it changes the state. For example, if it's a quote
character, and you're not currently in a quoted string, then it means
you'll be transitioning to the quoted-string state. If it's an
ordinary character with no special meaning, it just gets appended to
the current token. And so on.

In many cases the lexer also recognizes keywords (like "SEND" when
it's the first word on a line, in your example) and indicates them
to the parser. The lexer builds a list of tokens (or just a single
token, if it's a token-at-a-time lexer), which are probably
structures with a "type" field and a "content" field. Depending on
how your parser works and how your language is specified, you might
use an array (and limit the number of tokens to its size); or you
might put the tokens in a linked list, or you might dynamically
allocate an array of tokens. But let's say it might look something
like this:

enum token_type {
END, /* indicates end of production for parser */
SEND_CMD, /* the SEND command keyword */
STRING, /* string data, not a keyword */
EOL, /* end of line */
EOF /* end of input */
};

struct token {
enum token_type type; /* type of this token */
char *content; /* optional token content */
};

struct token_node {
struct token; /* this token */
struct token_node *next; /* next token */
};

Then the lexer might return a linked list of token structures (which
it would allocate as it tokenized the stream) with:

{ SEND_CMD, NULL } ->
{ STRING, "joe" } ->
{ STRING, "this is a test message" } ->
{ STRING, "test message" } ->
{ EOL, NULL } -> NULL

And the parser might have an array of the token sequences
("productions") it recognizes:

#define MAX_TOKENS_IN_PRODUCTION 6
enum token_type cmds[MAX_TOKENS_IN_PRODUCTION][] = {
{SEND_CMD, STRING, STRING, STRING, EOL, END}
};

(Note that this definition limits the number of tokens in a
production to six; you'd have to set that to the longest sequence of
tokens in any phrase you want the parser to recognize.)

The parser would iterate through the items in the cmd array, looking
to see if the sequence of tokens returned by the lexer matched one.
If so, then it hands that sequence of tokens off to the function that
processes it, and moves on to the next sequence.

There are lots of other approaches, but this is a simple one that
should be capable of handling SMTP.

I'm sure there are parsing tutorials on the web, and there are
numerous texts that discuss the subject in detail. One classic is
the "Dragon Book", Aho, Sethi, and Ulman's _Compilers: Principles,
Techniques, and Tools_.

--
Michael Wojcik (e-mail address removed)

Art is our chief means of breaking bread with the dead ... but the social
and political history of Europe would be exactly the same if Dante and
Shakespeare and Mozart had never lived. -- W. H. Auden
 
J

Joe Estock

Michael said:
How do I handle the following command with the strtok() function?

- SEND recipient "body" "subject"
e.g. SEND "joe" "this is a test message" "test message"


You don't. strtok is the wrong tool for this job. (It's the wrong
tool for most jobs, for the reasons Ben listed.)

The best way to do this - particularly since you should be handling
things like embedded quotation marks - is to use a proper lexical
analyzer (aka "lexer") and parser. That gives you a single
consistent, reliable mechanism for handling all well-formed input,
detecting malformed input, and so on.

The lexer divides the input stream into tokens, where a token
might be "send-command" (for "SEND") or "string" (for any of its
three arguments) or special values like "EOL" for end-of-line.
The parser recognizes a sequence of tokens and acts on it, eg by
calling the "send message" function with parameters for recipient,
body, and subject.

You could use a toolset like lex and yacc (or their descendants,
such as Flex and Bison) to generate a lexer and parser, or write
them yourself. For something this simple I'd probably just write
them myself, and I'd probably use a single-level, non-reducing
parser (that is, all the productions would be from the starting
state to a list of terminal symbols, and the parser would just
iterate through them until it found a match).

Lexers are typically written in C as state machines. One popular
approach is to create an enumeration that's a list of the states
you could be in while you're tokenizing, create a switch statement
for each of those states, and wrap that in a while loop that
iterates over all the input. (This assumes that you can tokenize
all of your input first, and then parse it. In some cases you
need to be able to alternate between tokenizing and parsing, in
which case the lexer will generally run until it has identified
the next token, which it then returns to the caller.)

The cases in the switch statement look at the next character and
decide whether it changes the state. For example, if it's a quote
character, and you're not currently in a quoted string, then it means
you'll be transitioning to the quoted-string state. If it's an
ordinary character with no special meaning, it just gets appended to
the current token. And so on.

In many cases the lexer also recognizes keywords (like "SEND" when
it's the first word on a line, in your example) and indicates them
to the parser. The lexer builds a list of tokens (or just a single
token, if it's a token-at-a-time lexer), which are probably
structures with a "type" field and a "content" field. Depending on
how your parser works and how your language is specified, you might
use an array (and limit the number of tokens to its size); or you
might put the tokens in a linked list, or you might dynamically
allocate an array of tokens. But let's say it might look something
like this:

enum token_type {
END, /* indicates end of production for parser */
SEND_CMD, /* the SEND command keyword */
STRING, /* string data, not a keyword */
EOL, /* end of line */
EOF /* end of input */
};

struct token {
enum token_type type; /* type of this token */
char *content; /* optional token content */
};

struct token_node {
struct token; /* this token */
struct token_node *next; /* next token */
};

Then the lexer might return a linked list of token structures (which
it would allocate as it tokenized the stream) with:

{ SEND_CMD, NULL } ->
{ STRING, "joe" } ->
{ STRING, "this is a test message" } ->
{ STRING, "test message" } ->
{ EOL, NULL } -> NULL

And the parser might have an array of the token sequences
("productions") it recognizes:

#define MAX_TOKENS_IN_PRODUCTION 6
enum token_type cmds[MAX_TOKENS_IN_PRODUCTION][] = {
{SEND_CMD, STRING, STRING, STRING, EOL, END}
};

(Note that this definition limits the number of tokens in a
production to six; you'd have to set that to the longest sequence of
tokens in any phrase you want the parser to recognize.)

The parser would iterate through the items in the cmd array, looking
to see if the sequence of tokens returned by the lexer matched one.
If so, then it hands that sequence of tokens off to the function that
processes it, and moves on to the next sequence.

There are lots of other approaches, but this is a simple one that
should be capable of handling SMTP.

I'm sure there are parsing tutorials on the web, and there are
numerous texts that discuss the subject in detail. One classic is
the "Dragon Book", Aho, Sethi, and Ulman's _Compilers: Principles,
Techniques, and Tools_.
Not necessarilly. lexers are sometimes quite complex and for something
as simplistic as this a simplistic solution would work just the same.
Concider the following:

SEND recipient "body" "subject"

Obviously with the above, SEND is the command so we don't need that. We
know we are sendign a message so let's discard that for the moment. Now
we are left with the following:

recipient "body" "subject"

This too is easy. Our subject would be the easiest to parse, however at
the same time it will be somewhat difficult due to reasons mentioned by
others in this thread. This is where we will manipulate strtok() to our
needs. We have three "parameters" in the above line and we need to
separate them. To keep things simple (we all like KISS) we will now
remove yet another parameter to make our task a little less cumbersome.
Depending on the implementation, recipient will either be in the form
joeuser or (e-mail address removed). Since there will never be any spaces in a
recipient, we can remove that part as well. Now we are left with the
following:

"body" "subject"

Much easier to handle now. The subject will be shorter than the body in
most cases, so let's take it out of the equation as well. Presuming that
all double quotes are escaped, this will be somewhat easy. Once we are
done with that we simply take of the first " and the last " and we have
our body. Below is some sample code to help illustrate this.

I know that using fixed sizes on the strings is ugly - I would never do
this in production code, but for illustrative purposes:

void send_message(char *data)
{
char *tokenptr; /* strtok() */
unsigned int loc; /* placeholder/counter */
char recipient[512];
char subject[512];

/* trim off the command "SEND" */
for(loc = 0; data[loc] != ' '; loc++) /* get to our destination */;
loc++; /* move to the end of the space */
memmove(data, &data[loc], strlen(&data[loc]) + 1);

/* now we have: recipient "body" "subject" */
memset(recipient, '\0', 512);
for(loc = 0; data[loc] != ' '; loc++) /* get to our destination */;
loc++;
strncpy(recipient, data, loc);
memmove(data, &data[loc], strlen(&data[loc]) + 1);

/* now we have: "body" "subject" */
for(loc = strlen(data) - 1; data[loc] != '"' && data[loc - 1] != '\\';
loc--) /* get to the first unescaped quote in the subject */ ;

memset(subject, '\0', 512);
strncpy(subject, &data[loc], strlen(&data[loc]) - 1);
data[loc] = '\0';

/* all we have left now is the body. remove the first and last " */
data[strlen(data) - 1] = '\0';
memmove(data, &data[1], strlen(&data[1]) + 1);

/* subject now contains the subject, recipient now contains the
recipient, and data now contains the body. pretty simple. */
}

The code above is certainly not perfect, but it certainly works
(untested, but should work - in theory). It eliminates the need to use
strtok() and it should be just as fast (maybe a little slower, but not
enough to matter for something of this size) as strtok(). The only
problem is that it modifies the original buffer, which in most cases
shouldn't be a problem.

Joe Estock
 
M

Michael Wojcik

Michael said:
[snip 115 lines quoted without comment]

Learn to trim your posts, please.
Not necessarilly.

*What* isn't necessarily *what*? Part of providing context is making
sure it's specific.
lexers are sometimes quite complex

And sometimes quite simple.
and for something
as simplistic as this a simplistic solution would work just the same.

Obviously it wouldn't "work just the same" unless it was the same. A
simplistic solution might well work, though since we don't have a good
description of the extent of the OP's actual needs we can't determine
how simple a correct solution might be.

Using an existing implementation would also work. So would writing the
program using a language with high-level constructs for parsing. The
fact that a solution would work is no recommendation.
[snip example of brute-force parsing]

This too is easy. Our subject would be the easiest to parse, however at
the same time it will be somewhat difficult due to reasons mentioned by
others in this thread. This is where we will manipulate strtok() to our
needs. We have three "parameters" in the above line and we need to
separate them. To keep things simple (we all like KISS) we will now

I prefer correct, robust, and maintainable over simple, thank you anyway.
That's why I recommended a proper solution rather than an ad hoc brute
force one.
remove yet another parameter to make our task a little less cumbersome.
Depending on the implementation, recipient will either be in the form
joeuser or (e-mail address removed).

That was not specified in the OP's request. It might be implied by,
say, the SMTP RFC, but we have no way of knowing whether that's
applicable.
Since there will never be any spaces in a recipient,

This is precisely the sort of assumption that makes for a fragile
implementation.
we can remove that part as well. Now we are left with the
following:

"body" "subject"

Much easier to handle now. The subject will be shorter than the body in
most cases, so let's take it out of the equation as well. Presuming that
all double quotes are escaped, this will be somewhat easy.

Not with strtok. You claim you're going to "manipulate strtok() to
[your] needs", but in your sample code you don't use it - because, as
I noted, it's the wrong approach.
Once we are
done with that we simply take of the first " and the last " and we have
our body. Below is some sample code to help illustrate this.

Congratulations. You've just created a awkward, domain-specific,
inflexible, fragile parser.
I know that using fixed sizes on the strings is ugly - I would never do
this in production code, but for illustrative purposes:

void send_message(char *data)

If you've already determined that the command is SEND, why don't you
know where its arguments start? Why are you passing in a pointer to
the command at all?
{
char *tokenptr; /* strtok() */

Which you never call...
unsigned int loc; /* placeholder/counter */
char recipient[512];
char subject[512];

/* trim off the command "SEND" */
for(loc = 0; data[loc] != ' '; loc++) /* get to our destination */;

If there is no space, you've just invoked UB. Sometimes input isn't
well-formed, and one job of a parser is to handle that.
loc++; /* move to the end of the space */
memmove(data, &data[loc], strlen(&data[loc]) + 1);

Why bother shifting the data? Keeping a pointer or offset to the
arguments suffices.
/* now we have: recipient "body" "subject" */
memset(recipient, '\0', 512);

Whatever for? You're going to overwrite those bytes in a moment
anyway. And if you must clear it, what's wrong with just assigning
{0} to it when you define it? And if you must use memset, why are
you passing in the length as a magic number?

Yes, yes, this is "example" code. Examples have a way of migrating
into production; those who take the time to write them well have
fewer problems to solve down the road.

(I also don't see much point in using '\0' rather than 0 for the
second argument, but that's a matter of style.)
for(loc = 0; data[loc] != ' '; loc++) /* get to our destination */;

As above, plus you've now reified your assumption about no spaces in
the recipient argument. If that were part of the grammar, you could
either support a quoted spacy string for recipient, or reject it as
malformed data, for free - it'd be handled automatically by the lexer
and parser.

That's the difference between production-quality code and ad hoc
messes.
loc++;
strncpy(recipient, data, loc);

Ugh. strncpy is rarely the right solution, either; it's nearly as
broken as strtok.

First problem: if loc > sizeof recipient, you've just invoked UB.

Second problem: if loc == sizeof recipient, you now have an
unterminated string in recipient. (You only have a terminated one
in there if loc < sizeof recipient because of the wasteful earlier
memset; better to just do "recipient[loc] = 0;" after copying the
data in.)

Third problem: if your input is well-formed, then there is no null
character in the first loc bytes of data, so strncpy buys you
nothing. It's just memcpy with a pointless extra test.

In general, strncpy is the Wrong Thing anyway. Heathfield's
Observation applies: most programs need to know if they're truncating
the source, so they need to know whether it was larger than the
destination anyway, so they might as well call strlen and then memcpy
the appropriate length. And strncpy's behavior when source is smaller
than destination is infelicitous in most circumstances, as it wastes
cycles and potentially spoils locality of reference and otherwise
messes with good memory management.
memmove(data, &data[loc], strlen(&data[loc]) + 1);

Now we've shifted the entire body twice. No doubt we could find a
less efficient approach, but it'd take some thought.
/* now we have: "body" "subject" */
for(loc = strlen(data) - 1; data[loc] != '"' && data[loc - 1] != '\\';
loc--) /* get to the first unescaped quote in the subject */ ;

Now your assumption about how quotation marks are escaped is reified
in an obscure line of code in the middle of a function. Changing or
extending it is a maintenance nightmare. And if you're supporting
multiple commands that have to handle this syntax, you have to make
those changes in multiple functions, because you've pushed your parsing
down into the handling of each specific function - duplicating effort
and errors.
memset(subject, '\0', 512);
strncpy(subject, &data[loc], strlen(&data[loc]) - 1);
data[loc] = '\0';

Here the memset is completely superfluous - it's not even serving as
an inefficient way of terminating the string.
/* all we have left now is the body. remove the first and last " */
data[strlen(data) - 1] = '\0';
memmove(data, &data[1], strlen(&data[1]) + 1);

Ouch! The body is moved for a third time. For this one simple task
you've managed to do three potentially very large buffer copies. (I
would hate to see this try to process a bunch of the huge MIME
multipart email messages people like to throw around these days, with
whopping binary attachments in Base64 or uuencode.)

And a couple of unnecessary calls to strlen, while we're at it. The
function should already know how long the body is - it should have
known it as soon as it parsed it out. There's never any reason to
determine it again except sheer laziness.
/* subject now contains the subject, recipient now contains the
recipient, and data now contains the body. pretty simple. */

Pretty ghastly is what it is.
The code above is certainly not perfect, but it certainly works
(untested, but should work - in theory).
Motto!

It eliminates the need to use
strtok() and it should be just as fast (maybe a little slower, but not
enough to matter for something of this size) as strtok().

Since strtok() isn't applicable to this problem, that's a meaningless
comparison. And considering the number of buffer copies and scans
this code does, I wouldn't be entering it in any speed contests if I
were you.

Ad hoc brute-force parsing has its place, I suppose, in trivial q&d
one-offs and the like. I took this to be a serious question, deserving
a serious answer, and the serious answer is to write a real parser.
Anything else is just a disaster waiting to happen - and by the time it
satisfies all the requirements, very likely more work than doing it
correctly in the first place.
 
J

Joe Estock

Michael said:
Michael said:
[snip 115 lines quoted without comment]


Learn to trim your posts, please.
Some news clients do not deliver the full conversation, hence why I did
not trim anything.
Obviously it wouldn't "work just the same" unless it was the same. A
simplistic solution might well work, though since we don't have a good
description of the extent of the OP's actual needs we can't determine
how simple a correct solution might be.
This I completely agree with.

[ trim other needless comments, code, and remarks ]
Again, it was an example of ONE way to accomplish the problem, not THE way.

memset() as used in my example is always a good idea. Yes you can
accomplish the same during initilization, however you do not know what
might be in the buffer after the first location. malloc() returns you a
pointer to memory containing the space you requested. In many cases
(especially on a busy system) this memory is dirty.

I agree about the example not being the clear cut end all solution,
however it is one way of doing things. The intent was to show a general
method of solving a simple problem, not an attempt at starting a flame war.
 
M

Mark McIntyre

Michael said:
Michael Wojcik wrote:

[snip 115 lines quoted without comment]


Learn to trim your posts, please.
Some news clients do not deliver the full conversation, hence why I did
not trim anything.

I guess the point is, you need to trim enough to keep the post
succint, but not too much to remove all context. Leaving 115 lines of
virgin text is too much. If you find you need to leave that much, you
should be posting intra-text, not all at the end.
 
J

Joe Estock

Mark said:
Michael said:
Michael Wojcik wrote:


[snip 115 lines quoted without comment]


Learn to trim your posts, please.

Some news clients do not deliver the full conversation, hence why I did
not trim anything.


I guess the point is, you need to trim enough to keep the post
succint, but not too much to remove all context. Leaving 115 lines of
virgin text is too much. If you find you need to leave that much, you
should be posting intra-text, not all at the end.
Noted and thank you for the information.
 
C

Chris Croughton

Noted and thank you for the information.

I usually trim to around two levels of quoting (the person I'm quoting
and the one they quote) as a maximum, unless the context is really
necessary. One level (just the person to whom I'm replying) if that
makes sense (in the current post, for instance, just quoting what you
said wouldn't make sense at all).

Another way of doing it, and the way recommended by 'netquette'
documents, is to summarise the points to which you are replying, but
that's more risky ("I didn't say that!").

Chris C
 
M

Michael Wojcik

Some news clients do not deliver the full conversation, hence why I did
not trim anything.

Well, I'm glad to see you had a reason for not trimming (it's far more
often the product of laziness on the poster's part). The convention
observed by most of the regulars here, though, is to quote only enough
to provide context, and to intersperse new text with quoted material,
as you did in this post. While it's true that some people may not see
every message in a thread, or may see them out of order,[1] that's
usually the best compromise between concision and intelligibility.
memset() as used in my example is always a good idea. Yes you can
accomplish the same during initilization, however you do not know what
might be in the buffer after the first location.

Actually you do, in any conforming C implementation. Initializing an
object of complete aggregate type intializes all member objects in
that object. If any member objects have no explicit value provided
in the initializer, they're intialized as if 0 had been specified:
integral types to 0, floating types to 0.0, pointer types to null.

Any object of a complete object type in C can be initialized with the
initializer "{0}", which will set all of its member objects to the
appropriate value from the list above.
malloc() returns you a
pointer to memory containing the space you requested. In many cases
(especially on a busy system) this memory is dirty.

True. However, there's no guarantee that memset will do the right
thing for non-integral object types; the implementation need not use
all-bits-zero for null pointers or for floating-point zero. My
preference with dynamically allocated memory is either to have the
program logic explicitly track which elements of it have valid values
(this often falls out trivially from a requirement to track some
other piece of information, such as "buffered data length" or "active
objects in array"), or if I'm allocating a structure, to use a
predefined initializer object and structure copy:

static const struct foo foo0 = {0}; /* initializer */
struct foo *newfoo;

newfoo = malloc(sizeof *newfoo);
if (! newfoo) ...;
*newfoo = foo0;
I agree about the example not being the clear cut end all solution,
however it is one way of doing things. The intent was to show a general
method of solving a simple problem, not an attempt at starting a flame war.

Fair enough. In my experience, however, ad hoc approaches are almost
never worth the (generally small) savings in up-front effort. I just
finished a TN3270E implementation, for example, which replaced one
with a lot of ad hoc parsing. I wrote a new one from the ground up
because the old one was nearly unmaintainable; it was full of
cascading conditions and flag variables for holding arbitrary pieces
of state information.[2] Took about a week (what with the usual
distractions), and well worth it; the new implementation is a lot
more robust, has significant additional features, and is easy to
understand and extend.

I think one of the drawbacks of C's flexibility is that it makes it
far too easy to slap together ad hoc implementations (particularly
for things like string parsing) that work now, for well-formed input,
but are terribly fragile and difficult to maintain. The urge to get
something working leads to code that attacks the problem with
whatever the programmer thought of first, often in a brute-force
stepwise manner with no evidence of overarching design.

It's the programming equivalent of using a chisel as a screwdriver.
That's not the fault of the toolbox (C) for including the chisel;
it's the fault of the craftsman for using it instead of looking for
the right tool, and for digging into the workpiece without
considering how best to solve the whole problem.


1. This is generally an effect of the news server, by the way, rather
than the user agent ("client") - the user agent can only retrieve
those messages that have arrived at, and not been discarded by, the
server. The distributed, best-effort nature of Usenet makes article
propagatation a chancy business.

2. I'd like to defend the author of the original implementation by
noting that it was produced under terrible conditions - a much-too-
tight deadline, just before he was leaving for an extended vacation,
with a paucity of specifications and constantly-changing require-
ments. I had the luxury of seeing what had gone wrong the first
time and the leisure to consider how best to approach the problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top