Request for comment on my tiny learning project: dfighterdb (mytree)

D

dfighter

Flash said:
dfighter wrote:

Thank you Richard for you observations. Since you told me it's ok to
post larger codebases too, I'm posting it now.

In addition to Keith's comments (which I agree with)...

#define RESET 1
#define SEARCH 0

If, as I suspect, these are two possible values of a single item an enum
would make sense.

static char *ui_menu_main_szitem[] = {
"Table",
"Quit",
NULL
};

Adding a const qualifier on this and the other similar arrays I've
snipped might help trap errors at compile time.

static char *header = "dfighterdb UI";
static char *separator = "-------------";

Again, a const might be useful.
static void ui_trim(char *s){
int i = 0;
for(i = 0; s != '\0'; i++)


You are setting i to 0 on two consecutive lines. Whilst legal, it does
look wrong to me and is likely (I think) to make people stop and wonder.

static int ui_getmenuchoice(void){
char line[MAXINPUTLINELENGTH];
fgets(line,MAXINPUTLINELENGTH,stdin);

return line[0];

This is rather odd. If you want to throw away all characters after the
first on the line it won't work if the line is too long (someone
accidentally put something down on the keyboard). It would be better to
make sure you have read up to EOF or a newline in my opinion.
}

static int ui_getINT(int *error){
char line[MAXINPUTLINELENGTH];
int value = 0;
fgets(line,MAXINPUTLINELENGTH,stdin);
if(!isdigit(line[0]))
*error = -1;
else value = (int)atof(line);

atof is not a good function to use for several reasons. How will you do
error checking with it? Why use a function to decode a floating point
number when what you want is an integer value? strtol and checking the
result is in range of an int (make value a long) and checking for errors
would be better. Even without checking for errors (just range) it would
still be better! Also the cast is not needed and does not help.
return value;

}

static double ui_getDOUBLE(int *error){
char line[MAXINPUTLINELENGTH];
double value = 0.0;
fgets(line,MAXINPUTLINELENGTH,stdin);

if(!isdigit(line[0]) && line[0] != '+' && line[0] != '-')
*error = -1;
else value = atof(line);

strtod would be better. You could then check for errors.

strncpy(fieldnames,name,MAXFIELDNAMELENGTH);


strncpy is rarely the right function for the job. It can leave you with
an array of chars which is NOT terminated by a '\0', so if you then
treat it as a string, which you do below, it will then go reading off in
to the wild blue yonder!
}

for(i = 0; i < numfields; i++){
error = 0;
do{
printf("type of \"%s\":",fieldnames);


void ui_tablemenu_query_new(void){
unsigned char numfields = table_numfields();
int i = 0;
void **v = NULL;
unsigned char type = 0;
char *fieldname = NULL;
int error = 0;
char cval = 0;
int ival = 0;
double dval = 0;

v = malloc(numfields*sizeof(void*));

You could use sizeof *v instead of sizeof(void*). It saves having to get
the type right or check further up that it is right.
if(v == NULL){
printf("ERROR: not enough memory\n");
return;
}

for(i = 0; i < numfields; i++){
type = table_fieldtype(i);
v = NULL;

switch(type){
case CHAR: v = malloc(sizeof(char)); break;
case INT: v = malloc(sizeof(int)); break;
case DOUBLE: v = malloc(sizeof(double)); break;
}


This is going to be very memory (and probably time) inefficient on most
systems. You might want to investigate using a union of char, int &
double to avoid this extra block of mallocs. If you also need pointers
to strings (which I think you do) then add a char* to the union as well.

<snip>


Well, that's as far as I've read. However, it does give you a few more
things to look.

Thank you Flash Gordon, you did raise some good points too. I'm going to
consider them and make changes accordingly.
Setting a variable to 0 on to consecutive lines was due to my habit of
declaring and initializing the variables at the same time, it may be a
bit confusing, but I think that I will not change that part :)

dfighter
 
D

dfighter

Barry said:
Dear everyone!
dfighterdb (I started it as mytree), is a tiny and simple console based
"database" application I'm writing as a learning project.
It's purpose is to teach me some generic dynamic data structures, and
general design of "larger" projects (Basically a binary tree and a
linear list with 256 generic fields).
It's not complete yet, however it is complete enough to present an
insight into my "coding style", to show how well (or badly) I can use
the language, how portable the code is, general design of the project, etc.
So I'd like to ask you all if you have the time to review it and comment
on it.

zipped source code + win32 binary:
http://www.freeweb.hu/dfighter/dfighterdb.zip

In tree.c, you have a instances of code of the form
ptr = NULL;
ptr = malloc(...);
-There is no point in assigning a variable one value and in the very
next statement assigning it a new value.

You have many identifiers that start with a double underscore. These
are reserved for the implementation. You shouldn't use them.

In common.c you have code of the form
case CHAR:{
if(*(char*)val1 > *(char*)val2)
retval = 1;
else
if(*(char*)val1 < *(char*)val2)
retval = -1;
}break;
-In most of your code I looked at, your closing braces line up with
the keyword to which the opening brace is attached. This makes it
easy to find the end of a block. Here the closing brace is well
indented beyond the case keyword and much too easy to miss, partly
because of the attached break.
-Furthermore, the break is not part of the if or the else or even the
initial if. It should not be on the same line as the brace and
aligned with the initial if since it is at the same "top" level within
the case.
-Your code uses tabs for indentation. If you ever have to post a
single function to Usenet asking for help, the tabs will play havoc
with your message. It is better(tm) to get used to using consistent
spacing for indenting your code.

Member field of treenode_t uses a dynamically allocated array of
void*. Every time you need to reference the object pointed to by one
of the void*, you cast the pointer to the appropriate type. In
addition to the intended effects, casts also tell the compiler it
should ignore certain problems which is almost always undesirable. I
wonder if a union of three pointer types would be easier on the typing
and reading. Something along the lines of
typedef union mypointers{
char* mychar;
int* myint;
double* mydouble;
}mypointer_t;
and in treenode_t
mypointer_t *field
Then in tree_dump the body of your switch would look like
case CHAR: printf("%c ",*root->field.mychar); break;
case INT: printf("%d ",*root->field.myint); break;
case DOUBLE: printf("%lf ",*root->field.mydouble); break;

In ui_mainmenu, you print out a line that does not terminate with a
'\n'. On some systems, this line may not appear in a timely fashion.
You should use
fflush(stdout);
to insure that it does.

If you are going to include several functions in a single translation
unit (.c file), consider putting them in alphabetical order to make it
easier to find. I recommend that functions larger than one screen
display should be in their own files.

In ui_mainmenu, you call ui_getmenuchoice to decide what to do. You
then test the returned value against EOF. ui_getmenuchoice cannot
possibly return the value EOF. Furthermore, ui_getmenuchoice does not
test the return from fgets for errors.

Why did you decide to use atof in ui_getINT when atoi seems more
appropriate? The cast is unnecessary. I recommend using the strto_
functions in place of atof simply for the better error detection.

Why does ui_getDOUBLE tolerate negative values while ui_getINT does
not?

I understand why ui_trim strips off trailing blanks but why does it
strip off trailing punctuation?

Why does ui_getSTRING refuse to recognize a string that starts with
something other than a letter?

In ui_tablemenu_create, once ui_getINT sets error to 1, it never gets
reset back to 0. You have the same problem when calling ui_getSTRING.

In ui_tablemenu_query_new, shy do you cast the return from
ui_getmenuchoice to unsigned char only to assign the result to cval
which is of type char? And the same problem with error.

Hi Barry!
First of all thanks for your review.
I'm aware of the double underscore problem now from earlier posts to
this thread.
I don't do the indentation myself. I am currently using M$ Visual Studio
2008's editor to edit source code, so it is done automatically, while it
is very convenient as you mentioned the tabs might cause problems. I
will see if I can change the editor's behavior, or in worst case
scenario I will write a small program to change the tabs to spaces.
The rest of your thoughts are valid and good points as well I will
consider them and make changes accordingly.

dfighter
 
D

dfighter

static void ui_trim(char *s){
int i = 0;
for(i = 0; s != '\0'; i++)
;


could this be replaced with
i = strlen(s);
while(i > 0 && !isalnum(s))
i--;

s[i+1] = '\0';
}


if you trim a string with no alphanums in it yeilds a string of length
1.
ui_trim("***") => "*"
is that ok?
[the string literal is notational convenience]

It will step off the end of the string if you pass it the empty string

Hi Nick!
Yes it's ok, that was the plan. I should probably rename the
ui_getSTRING function to something like ui_getNAME, because I get the
fieldnames with that. I'm using this trimming function on a string that
is expected to be strictly alphanumeric and that is starting with a
letter, otherwise it's rejected. So if it's a 1 character long string
with '\0' only it will get rejected.
I suppose the naming convention I used suggests that these functions
have a more generic purpose. I apologize for misleading you.
That's one of the reasons I think it was a good idea to post the sources
here. I already learned a lot :)

dfighter
 
C

CBFalconer

dfighter said:
Flash Gordon wrote:
.... roughly 150 useless lines snipped ...
Thank you Flash Gordon, you did raise some good points too. I'm
going to consider them and make changes accordingly. Setting a
variable to 0 on to consecutive lines was due to my habit of
declaring and initializing the variables at the same time, it
may be a bit confusing, but I think that I will not change that
part :)

Please get in the habit of snipping quoted material that is not
germane to your reply.
 
B

Barry Schwarz

Barry Schwarz wrote:
snip

snip

Hi Barry!
First of all thanks for your review.
I'm aware of the double underscore problem now from earlier posts to
this thread.
I don't do the indentation myself. I am currently using M$ Visual Studio
2008's editor to edit source code, so it is done automatically, while it
is very convenient as you mentioned the tabs might cause problems. I
will see if I can change the editor's behavior, or in worst case
scenario I will write a small program to change the tabs to spaces.
The rest of your thoughts are valid and good points as well I will
consider them and make changes accordingly.

I use an earlier version of VS and it has the option of using tabs or
spaces. It may not be present in your version but checking should be
easy enough.
 
F

Franken Sense

In Dread Ink, the Grave Hand of Phil Carmody Did Inscribe:
Actually, using that logic, we can conclude 1 == 0, as your presumptions
are clearly false, and therefore anything can be derived from them.
__MYTREE_H__ and __MYTREE_H may well be defined, as names with such
forms are reserved for the implementation to do with what they wish.

Phil

Leading or trailing underscores make poor identifiers, unless you intend
them to be there. I think zax_fuuq is nice and unique.

I don't think uniqueness is augmented by underscores, in particular,
before, because windows might just have one for you as _zax_fuuq, and is
that something that this code anticipates?

The answer could be yes, but I doubt it.
--
Frank

When you encounter seemingly good advice that contradicts other seemingly
good advice, ignore them both.
~~ Al Franken,
 
P

Phil Carmody

Franken Sense said:
In Dread Ink, the Grave Hand of Phil Carmody Did Inscribe:

Leading or trailing underscores make poor identifiers,

Parts of tokens don't make identifiers at all. Only whole tokens
can make identifiers.
unless you intend them to be there.

I think the reason he put them there is because he intended them
to be there.
I think zax_fuuq is nice and unique.

I don't think uniqueness is augmented by underscores,

If you use tokens with that type of framing for one use
(such as multiple inclusion protection), and tokens with
different, or the absense of, framing for other uses
(such as function-like macros, or integer constants), then
I do think that the likelyhood of wanting to use two identical
names in two different contexts is diminished. Effectively
you're creating your own name-spaces for such tokens.
So in your terms, uniqueness is somewhat augmented by underscores.

You've not mentioned any property of underscores that isn't
also a property of any other alpha characer, and therefore
your argument applies equally to all such characters.

However, as has been pointed out already by Keith, I think, the
implementation already has reserved for it a huge namespace
matching /^_[_A-Z]/. So it's not waste that makes leading __
a bad thing, it's the fact that it's already reserved for
someone else.
in particular,
before, because windows might just have one for you as _zax_fuuq, and is
that something that this code anticipates?

The answer could be yes, but I doubt it.

I couldn't sensibly parse the question, so can't give a sensible
answer.

Phil
 
J

James Kuyper

Franken said:
In Dread Ink, the Grave Hand of Phil Carmody Did Inscribe:
....
Leading or trailing underscores make poor identifiers, unless you intend
them to be there. I think zax_fuuq is nice and unique.

And what makes you think he didn't intend them to be there? Do you think
he typed them by accident? I think he did what many people have done:
seen header guards used in standard header files, noticed that they
always started with two underscores, failed to realize that this is
because such names are reserved to the implementation, and therefore
concluded that it was because such names are part of the naming
convention for header guards. If so, it was a very deliberate choice,
though misguided.
I don't think uniqueness is augmented by underscores, in particular,

The C standard disagrees with you. Two identifiers are the same only if
(among other things) any underscores present in either of them are
present in both of them, in exactly the same places - at least within
the first 32 characters of an external identifier, or the first 64
characters of an internal identifier.
before, because windows might just have one for you as _zax_fuuq, and is
that something that this code anticipates?

No, but that's precisely because of the real problem here, and has
nothing to do with underscores being useless. _zax_fuuq is clearly
distinct from zax_fuuq, and one of the key things the standard has to
say about it is that _zax_fuuq is reserved to the implementation "for
use as identifiers with file scope in both the ordinary and tag name
spaces." (7.1.3p1). This means that windows is free to use such an
identifier; they would not be free to use zax_fuuq, because that
identifier is reserved to the user for such purposes. The reverse is
true for user code: it can use zax_fuuq, and cannot use _zax_fuuq, for
such purposes.

That's what's wrong with using underscores at the beginning of an
identifier: it is not because they are USEless, but because not only are
they USEful, but also have been USEd, by the standard, to set aside a
range of names for the exclusive USE of the implementation.
 
D

dfighter

James said:
And what makes you think he didn't intend them to be there? Do you think
he typed them by accident? I think he did what many people have done:
seen header guards used in standard header files, noticed that they
always started with two underscores, failed to realize that this is
because such names are reserved to the implementation, and therefore
concluded that it was because such names are part of the naming
convention for header guards. If so, it was a very deliberate choice,
though misguided.
That's exactly what happened. I was aware from K&R2 that 1 leading
underscore is reserved, however for some stupid reason I failed to
deduct that then 2 leading underscores would be reserved too (since 2
leading underscores are basically 1 + 1 underscores). In addition I've
seen identifiers and include guards in code that was supposed to teach
me. Fortunately I've learnt the lessons thanks to this newsgroup :)

dfighter
 
J

jameskuyper

dfighter wrote:
....
... I was aware from K&R2 that 1 leading
underscore is reserved, however for some stupid reason I failed to
deduct that then 2 leading underscores would be reserved too (since 2
leading underscores are basically 1 + 1 underscores). ...

That's actually covered by two separate rules in 7.1.3p1:

"— All identifiers that begin with an underscore and either an
uppercase letter or another
underscore are always reserved for any use.
— All identifiers that begin with an underscore are always reserved
for use as identifiers
with file scope in both the ordinary and tag name spaces."

Note that if the first rule were dropped, names that begin with two
underscores would still be reserved because of the second rule.
However, the first rule establishes a stronger degree of reservation
than the second one. The identifier _second can be use for identifiers
with block scope, but the identifiers __first and _First can't safely
be used anywhere in your code.
 
D

dfighter

Franken said:
In Dread Ink, the Grave Hand of dfighter Did Inscribe:


What is the purpose of the lines that begin with #?
They are preprocessor directives. In this example they act as include
guards, they keep the header from being included more than once in a module.
 
F

Franken Sense

In Dread Ink, the Grave Hand of Richard Heathfield Did Inscribe:
Leading underscores make for *excellent* identifiers - for the
implementation. Trailing underscores make for *excellent*
identifiers - when you aren't planning to use those specific
identifiers very much. (For example, I use trailing underscores on
struct tags and inclusion guards.)

For example?
--
Frank

In many ways I'm still a Hubert Humphrey Democrat -- someone who believes
in afflicting the comfortable and comforting the afflicted. A society is
judged by how it treats the elderly, the sick, the impoverished. To me it's
a matter of ethics and compassion.
~~ Al Franken, Playboy interview
 
F

Franken Sense

In Dread Ink, the Grave Hand of dfighter Did Inscribe:
list.h:

#ifndef __LIST_H__
#define __LIST_H__

typedef struct list{
struct list* next;
unsigned char dummy;
void **field;
}list_t;

What is the purpose of the lines that begin with #?
--
Frank

When you encounter seemingly good advice that contradicts other seemingly
good advice, ignore them both.
~~ Al Franken,
 
C

CBFalconer

dfighter said:
They are preprocessor directives. In this example they act as
include guards, they keep the header from being included more
than once in a module.

As I think has been pointed out, they are faulty list guards. The
leading double '_' is reserved for the implementation.
 
J

James Kuyper

Franken said:
In Dread Ink, the Grave Hand of dfighter Did Inscribe:


What is the purpose of the lines that begin with #?

You left our the third such line, which is essential:

#endif

I'm not sure whether your question means that you are unfamiliar with
preprocessing directives such as #ifndef, #define, and #endif. If that
is the case, say so, and we can provide an explanation. I'm going to
assume your question was about the use of those directives, rather than
about their meaning.

The purpose of such code is to protect against double inclusion. Though
this is apparently in dispute by some people, there is a wide-spread
consensus that it's not only acceptable, but a very good idea, for a
header file to #include any other header files that it needs to, in
order to set up definitions and declarations it uses, so it can stand on
it's own as a translation unit.

When you follow this prescription, as many people do, you'll frequently
end up indirectly #including the same file multiple times from other
header files. However, without include guards, that would produce a
separate typedef of list_t each time, and it's an error to typedef the
same identifier multiple times, even if the typedefs are identical.

The way this code prevents that from happening is as follows:

The first time the file is #included, the author assumed (unsafely,
because of the leading double underscores) that __LIST_H__ would not be
#defined. If that is indeed the case, then it gets #defined, and list_t
gets typedefed.

The second time this file is #included, since __LIST_H__ has already
been #defined, then everything from the #ifndef to the matching #endif
is simply skipped, and not processed, so there never is a second typedef
of list_t.

This approach is what's called an internal include guard.
 
F

Franken Sense

In Dread Ink, the Grave Hand of James Kuyper Did Inscribe:
You left our the third such line, which is essential:

#endif

I'm not sure whether your question means that you are unfamiliar with
preprocessing directives such as #ifndef, #define, and #endif. If that
is the case, say so, and we can provide an explanation. I'm going to
assume your question was about the use of those directives, rather than
about their meaning.

The purpose of such code is to protect against double inclusion. Though
this is apparently in dispute by some people, there is a wide-spread
consensus that it's not only acceptable, but a very good idea, for a
header file to #include any other header files that it needs to, in
order to set up definitions and declarations it uses, so it can stand on
it's own as a translation unit.

When you follow this prescription, as many people do, you'll frequently
end up indirectly #including the same file multiple times from other
header files. However, without include guards, that would produce a
separate typedef of list_t each time, and it's an error to typedef the
same identifier multiple times, even if the typedefs are identical.

The way this code prevents that from happening is as follows:

The first time the file is #included, the author assumed (unsafely,
because of the leading double underscores) that __LIST_H__ would not be
#defined. If that is indeed the case, then it gets #defined, and list_t
gets typedefed.

The second time this file is #included, since __LIST_H__ has already
been #defined, then everything from the #ifndef to the matching #endif
is simply skipped, and not processed, so there never is a second typedef
of list_t.

This approach is what's called an internal include guard.

I see.
--
Frank

If [Stuart Saves His Family] had actually been successful it would have
been a lot better teacher for me than the failure that it was, because it
would have given me the opportunity to do more movies,
~~ Al Franken, CNN interview
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top