Boost process and C

F

Flash Gordon

Any program that reads words from any language dictionary. Like a
spell checker, or a word puzzle solver/creator, or a spam filter. For
dictionaries the size of the english language dictionary, these kinds
of applications can typically push the L2 cache of your CPU pretty
hard.

I never said they didn't exist. However, a typical dictionary + the
structures is not going to fit in my L2 cache anyway. However, the
subset of it that is likely to be actually in use is probably an order
of magnitude smaller and so could easily fit in with the extra overhead.
Alternatively, one could go to conventional C strings and have a bigger
chance of it fitting since they only have a 1 byte overhead compared to
probably an 8 byte overhead (4 byte int for lenght, 4 byte int for
memory block size) that it sounds like your library has. Even if your
library only has a 4 byte overhead it is still larger!
I think you miss the point. If the string length is negative then it
is erroneous. That's the point of it. But the amount of memory
allocated being negative, I use to indicate that the memory is not
legally modifiable at the moment, and being 0 meaning that its not
modifiable ever. The point being that the library blocks erroneous
action due to intentionally or unintentionally having bad header values
in the same test. So it reduces overhead, while increasing safety and
functionality at the same time.

If you are trying to detect corruption then you should also be checking
that the length is not longer than the memory block, so you should be
doing more than one comparison anyway. Then you can easily check if any
unused flag bits are non-0.
You know, you can actually read the explanation of all this in the
documentation if you care to do so.

Probably true.

It may well be that the performance gain is worth it for the
applications people use your library for. If so then fine, but the
limitation means it is not worth me migrating to it.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php
 
F

Flash Gordon

jacob said:
Jordan Abel a écrit :
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!


The addition operator on dates would work _exactly_ the same way as
the addition operator on pointers - you can subtract two of them, or
add one to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems
where it's not trivial, could call localtime, modify tm_secs, and then
call mktime]

Yes adding a number to a date makes sense, but I was speaking about
adding two dates!

What will the date be in 4 years, 2 months and 5 days from today?

Adding something other than a number can make a lot of sense. Adding to
real dates doesn't I agree.
 
B

Ben Hinkle

In "higher-level" languages
I agree. You need brakes.

Having said that I'm all for trying these things out in projects like
lcc-win32.

I've been playing around with extending C to be more "high-level" using the
TinyCC compiler. The license of TinyCC is GPL and it runs on win32 and linux
so I think it makes a better base for experiments that one wants to
distribute. TinyCC is located at http://www.tinycc.org and, for those
curious, my experiments are at http://www.tinycx.org.

-Ben Hinkle
 
J

Jordan Abel

jacob said:
Jordan Abel a écrit :
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!


The addition operator on dates would work _exactly_ the same way as
the addition operator on pointers - you can subtract two of them, or
add one to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems
where it's not trivial, could call localtime, modify tm_secs, and then
call mktime]

Yes adding a number to a date makes sense, but I was speaking about
adding two dates!

What will the date be in 4 years, 2 months and 5 days from today?

Adding something other than a number can make a lot of sense. Adding to
real dates doesn't I agree.

My first thought was "represent it as a number of seconds", but then
I realized - how many seconds in two months? Or in any number of years
not a multiple of four?

Should there be another type to represent intervals of time in such
a way? Or use struct tm? 0 for the year would mean either no years or
1900 depending on the context
 
C

Chris Torek

Ben C wrote:
[much snippage]
Or, comparably (but considerably more complicated):

String a = ((b + c) - "zog") * " ";

where string "addition" means "concatenate" (the usual definition
for string addition), "subtraction" means "remove the first copy
of the target string, if there is one", and string "multiplication"
means "repeatedly insert this string". Hence if b and c hold "xyz"
and "ogle" respectively, the "sum" is "xyzogle", subtracting "zog"
yields "xyle", and multiplying by " " yields "x y l e " (including
the final space).

Indeed. Suppose the String data structure is much like Paul Hsieh's
favorite, but perhaps with a few more bells and whistles (I have not
looked at his implementation):

struct StringBuffer;

struct String {
char *bytes; /* the bytes (if any) in the string */
size_t slen; /* the length of the string */
struct StringBuffer *buf; /* the underlying buffer (may be shared) */
struct String *next; /* linked list in case of shared references */
};

struct StringBuffer {
char *base; /* base address of buffer */
size_t bufsize; /* size of buffer */
size_t refcnt; /* number of references to this buffer */
struct String *firstref; /* head of reference chain */
};

This gives us functions that, in C, might look like:

/* "Copy" a string: return a new reference to an existing string */
struct String *String_copy(struct String *old) {
struct String *new = xmalloc(sizeof *new);
/* xmalloc is just malloc plus panic-if-out-of-space */

/* copy the underlying string's info */
new->bytes = old->bytes;
new->slen = old->slen;
new->buf = old->buf;

/* insert into list, remembering new ref */
new->next = old;
new->buf->refcnt++;
new->buf->firstref = new;
}

In this case, making a second copy of a very long string is
quite cheap. So is making a sub-string out of an existing
string:

/*
* Shrink a string by removing "frontoff" charcters from the
* front, and "backoff" characters from the back. The frontoff
* may be negative to extend the string back to its original length
* although typically exactly one will be zero (remove head or
* tail part of string). The backoff must be nonnegative
* (because tail parts of buffers are not necessarily valid).
*/
void String_shrink(struct String *s, int frontoff, int backoff) {
if (frontoff) {
if (frontoff < 0) {
size_t maxshrink = s->bytes - s->buf->base;

/* NB: this can be optimized to fall into the "else" */
frontoff = -frontoff;
if (frontoff > maxshrink)
frontoff = maxshrink;
s->len += frontoff;
s->bytes -= frontoff;
} else {
if (s->slen < frontoff)
frontoff = s->len;
s->len -= frontoff;
s->bytes += frontoff;
}
}
if (backoff) {
if (backoff < 0)
panic("bad call to String_shrink");
if (s->len < backoff)
backoff = s->len;
s->len -= backoff;
}
}

Now, of course, in order to *modify* the *contents* of a string,
we have to check whether the string is shared, and if so, "break"
the sharing:

/* inline */ struct String *String_preptomod(struct String *s) {
return (s->buf->refcnt == 1) ? s : String_private_copy(s);
}

[without complicated C++ style mechanisms,]
I still don't get your point.

OK: so write the "operator" functions for +, -, and * above and
tell us what happens to any intermediate copies of the String
structures that are created by each addition, subtraction, and
multiply.

Show us the code, and "we" (Ben C and I, perhaps) will show you
where you have re-invented the (detailed and hairy) C++ mechanisms
(or, contrariwise, have assumed that your underlying language has
garbage collection, so that temporary objects can be created and
then thrown away without calling "constructor" and "destructor"
functions on references, reference-copies, etc.; if you do have
constructors and destructors, you also have to decide whether such
functions can or must be "virtual" or not, and so on).
 
B

Bill Pursell

jacob said:
Ben C a écrit (regarding operator overloading)

Why not?

Suppose Matrix A,B,C;

C = A+B;

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

Do you also propose to avoid using '*' to represent
matrix multiplication on the basis that matrix multiplication is not
commutative?
 
J

jacob navia

Bill Pursell a écrit :
Do you also propose to avoid using '*' to represent
matrix multiplication on the basis that matrix multiplication is not
commutative?

Matrix multiplication is a multiplication. Granted, not commutative but
a multiplication. In other languages, for instance APL that has an
operator for matrix multiplication, there are TWO signs: one ('*') for
normal multiplication, and another (a box enclosing another sign) to
denote matrix multiplication to clearly distinguish both operations.

Of course this is a matter more of taste but in the case of strings
there isn't any mathematical operation performed in those strings. Not
even a set operation. Take for instance

"Hello World" - "World"

Is the result "Hello " ???

Are we adding or subtracting things?

Surely not.

I want to introduce operator overloading into C but I am not for ANY
application of operator overloading. It has been pointed out that
overloading could lead to excessive temporaries construction, that would
be far more efficiently handled in a normal C syntax with careful code.

This is not a problem for small structures, but it could be a show
stopper for large structures like matrices for instance, where
efficiency would be more important that syntactic sugar.


Another problem that bothers me (and is so far unsolved) is the problem
of taking the address of an operator function. What should be the syntax
in that case?

For instance:

int128 operator+(int128 a,int128 b);

typedef int128 (*i128add)(int128 a, int128 b);

i128add = operator+(i128 a,i128 b); /// This ?

jacob
 
E

Ed Jensen

CBFalconer said:
And, if you write the library in truly portable C, without any
silly extensions and/or entanglements, you just compile the library
module. All the compiler vendor need to do is meet the
specifications of the C standard.

Simple, huh?

That all depends on the license under which the source code was
released. Linking a bunch of C libraries under various licenses can
involve non-trivial amounts of legal hassle to ensure compliance.

Also, there's something to be said for having features built into the
standard library. Besides making things easier from a legal point of
view, it means you can spend that much less time evaluating multiple
solutions, since most of the time, you'll just use the implementation
already available in the standard library.

I know it's unpopular around these parts to utter such heresy, but I,
for one, would love it if the standard C library included support for
smarter strings, hash tables, and linked lists.

Then again, I'm certainly NOT advocating these things should be added
to the standard C library. I recognize C for what it is, and use it
where it's appropriate. There are other languages that offer those
features. But that doesn't stop me from wanting those features in C.
 
K

Keith Thompson

jacob navia said:
Besides, I think that using the addition operator to "add" strings is
an ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.

<OT>
Assuming the existence of operator overloading (which, once again,
standard C, the topic of this newsgroup, does not have), using "+" for
string concatenation makes at least sense as much as using "<<" and
">>" for I/O. (I know you haven't advocated that either, but it's
established practice in C++.) And I really don't have much problem
with the idea of a "+" operator being non-commutative -- just as a "*"
operator for matrices would be non-commutative. If you don't like it,
by all means don't use it -- but if you provide operator overloading
in your compiler, users *will* use it in ways that you don't like.

The point of operator overloading is to provide a notational shorthand
for something that could be expressed equivalently but more verbosely
using function calls. It isn't to provide something that absolutely
must follow the rules of mathematics. What would a mathematician
unfamiliar with computer programming think of "x = x + 1"?
</OT>

<WAY_OT>
Ada has a separate operator, "&", for array concatenation.
</WAY_OT>
 
J

jacob navia

Chris Torek a écrit :
[horrible string "math" snipped]
OK: so write the "operator" functions for +, -, and * above and
tell us what happens to any intermediate copies of the String
structures that are created by each addition, subtraction, and
multiply.

Show us the code, and "we" (Ben C and I, perhaps) will show you
where you have re-invented the (detailed and hairy) C++ mechanisms
(or, contrariwise, have assumed that your underlying language has
garbage collection, so that temporary objects can be created and
then thrown away without calling "constructor" and "destructor"
functions on references, reference-copies, etc.; if you do have
constructors and destructors, you also have to decide whether such
functions can or must be "virtual" or not, and so on).

Chris:

1) Operator overloading is NOT a good application for strings, as I have
argued in another thread in this same discussion. lcc-win32 does NOT
support addition of strings nor any math operation with them.

a+b != b+a
"Hello" + "World != "World" + "Hello"

2) Operator overloading does NOT need any constructors, nor destructors
nor the GC if we use small objects:

int128 a,b,c,d;

a = (b+c)/(b-d);

This will be translated by lcc-win32 to

tmp1 = operator+(b,c);
tmp2 = operator-(b,d);
tmp3 = operator/(tmp1,tmp2);
a = tmp3;

The temporary values are automatically allocated in the stack.

Of course if you have interior pointers those intermediate structures
must be registered so that the storage can be freed. This is solved, as
you say, with a GC. lcc-win32 offers a GC in the standard distribution,
and allows to have the best of both worlds: the easy of C++ destructors
that take care of memory managemnt WITHOUT PAYING THE PRICE of C++
complexity.

If you do not want the GC, just make a linked list with all the
allocations you make in the "constructor" (say in the new_string()
function) and periodically clean them up.
 
I

Ian Collins

jacob said:
The crucial point in this is to know when to stop. There are NO
constructors/destructors in C, and none of the proposed extensions
proposes that.

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE.

If C were to have a string type and operator overloading and it didn't
have '+' for strings, the first thing people would do is write one! It
may be syntactic sugar, but it's very convenient sugar.
 
J

jacob navia

Richard Tobin a écrit :
mid_date = (start_date + end_date) / 2;

-- Richard

Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000

If you figure out what THAT means then please explain.

You obviously meant:


mid_date = (end_date - start_date)/2

The *subtraction* of two dates yields a time interval
 
R

Richard Tobin

jacob navia said:
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates.

mid_date = (start_date + end_date) / 2;

-- Richard
 
J

jacob navia

Ian Collins a écrit :
If C were to have a string type and operator overloading and it didn't
have '+' for strings, the first thing people would do is write one! It
may be syntactic sugar, but it's very convenient sugar.

Well, in this same thread Chris Torek posted this:

String a = ((b + c) - "zog") * " ";

where string "addition" means "concatenate" (the usual definition
for string addition), "subtraction" means "remove the first copy
of the target string, if there is one", and string "multiplication"
means "repeatedly insert this string". Hence if b and c hold "xyz"
and "ogle" respectively, the "sum" is "xyzogle", subtracting "zog"
yields "xyle", and multiplying by " " yields "x y l e " (including
the final space).

:)
 
W

websnarf

Flash said:
Flash said:
(e-mail address removed) wrote:
Ben C wrote:
CBFalconer wrote:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities
You perhaps would like to name one?
I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.
[snip]

[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.
Ok, so you can name a single application of such a thing right?
Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.

So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.

If the DOS port hadn't been dropped then depending on the compiler we
might have hit this. A significant portion of the SW I'm thinking of
originated on DOS, so it could have hit it.

Oh ... I think of DOS as exactly the case where this *can't* happen.
Single objects in 16bit DOS have a size limit of 64K (size_t is just
unsigned which is 16 bits), so these huge RTF files you are talking
about *have* to be streamed, or split over multiple allocations
anyways.
Strangely enough, when a previous developer on the code I'm dealing with
thought he could limit size to a "valid" range an assert if it was out
of range we found that the asserts kept getting triggered. However, it
was always triggered incorrectly because the size was actually valid!

And how is this connected with Bstrlib? The library comes with a test
that, if you run in a 16 bit environment, will exercise length
overflowing. So you have some reasonable assurance that Bstrlib does
not make obvious mistakes with size computations.
[...] So I'll stick to not artificially limiting sizes.

And how do you deal with the fact that the language limits your sizes
anyways?
[...] If the administrator of a
server the SW is installed on wants then s/he can use system specific
means to limit the size of a process.

What? You think the adminstrator is in charge of how the compiler
works?
 
R

Richard Tobin

mid_date = (start_date + end_date) / 2;
[/QUOTE]
Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000

Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.
You obviously meant:

mid_date = (end_date - start_date)/2

No I didn't. That is something completely different.
The *subtraction* of two dates yields a time interval

True, and (end_date - start_date) / 2 would give me half the interval
between the dates, but that is not what I wanted. I wanted the
average of the dates, which is a date.

(Sep-25-1981 + Dec-22-2000) / 2 would be the date mid-way between
Sep-25-1981 and Dec-22-2000, just as (45 + 78) / 2 is the integer
mid-way between 45 and 78.

-- Richard
 
J

jacob navia

Richard Tobin a écrit :
Excuse me but what does it mean

Sep-25-1981 + Dec-22-2000


Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.

You obviously meant:

mid_date = (end_date - start_date)/2


No I didn't. That is something completely different.

The *subtraction* of two dates yields a time interval


True, and (end_date - start_date) / 2 would give me half the interval
between the dates, but that is not what I wanted. I wanted the
average of the dates, which is a date.

(Sep-25-1981 + Dec-22-2000) / 2 would be the date mid-way between
Sep-25-1981 and Dec-22-2000, just as (45 + 78) / 2 is the integer
mid-way between 45 and 78.

-- Richard[/QUOTE]

Ahh ok, you mean then

mid_date = startdate + (end_date-start_date)/2

A date + a time interval is a date later than the start date.
 
W

websnarf

Flash said:
I never said they didn't exist.

I think the point is that there are *many* such application. In fact I
would be suspicious of anyone who claimed to be an experienced
programmer who hasn't *written* one of these.
[...] However, a typical dictionary + the
structures is not going to fit in my L2 cache anyway. However, the
subset of it that is likely to be actually in use is probably an order
of magnitude smaller and so could easily fit in with the extra overhead.

Its more *likely* if the data is compacted. Another way of saying
this, is that for any overflowing data set with a locality bias with
perform better monotonically with how well it fits in the cache. I.e.,
everything you save improves some percentage of performance.
Alternatively, one could go to conventional C strings and have a bigger
chance of it fitting since they only have a 1 byte overhead compared to
probably an 8 byte overhead (4 byte int for lenght, 4 byte int for
memory block size) that it sounds like your library has. Even if your
library only has a 4 byte overhead it is still larger!

Yes, but you eat a huge additional O(strlen) penality for very *many*
typical operations. So Bstrlib makes the trade off where the more
common scenarios are faster.
If you are trying to detect corruption then you should also be checking
that the length is not longer than the memory block, so you should be
doing more than one comparison anyway.

Yes, it does that as well. So you really are talking out of your ass.
This is in the first couple pages of the documentation, and strewn
throughout the source code.
[...] Then you can easily check if any unused flag bits are non-0.

Yes, this is an alternative -- but its less safe and slower, so why
would I do it this way?
Probably true.

It may well be that the performance gain is worth it for the
applications people use your library for. If so then fine, but the
limitation means it is not worth me migrating to it.

Probably not true. But you won't look at it anyways, so I won't waste
my breath.
 
R

Richard Tobin

[/QUOTE]
Ahh ok, you mean then

mid_date = startdate + (end_date-start_date)/2

Your attitude is baffling. You deny that adding dates makes sense,
and when I post an example where adding dates makes perfect sense, you
respond by asserting that I mean some other expression that achieves
that same effect. The mere fact that you were able to post another
expression with the same meaning refutes your original claim.

-- Richard
 
R

Richard Tobin

Just because the sum of two dates is not a date doesn't mean that
it doesn't mean anything.

Just in case anyone has not noticed, this is really just a re-run of
pointer addition with dates instead of pointers.

The reason for not allowing (date|pointer) addition is not that it
doesn't make sense, but that the gain isn't worth the mechanism
required.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top