Efficency and the standard library

R

Richard Tobin

isn't strLength in the implementaion namespace?
[/QUOTE]
Yes and no!

str followed by a lower case letter is in the implementation namespace.
On those grounds, strLength doesn't qualify (yet). But now consider a
C90 implementation with a case-insensitive linker.

str* is not reserved for the implementation, but for future
standardisation. Since C90 is distinctly not a future implementation,
it should not use it.

(I can see counter-arguments looking.)

-- Richard
 
S

spinoza1111

And I certainly hope I never reach a point where a ten-minute project
takes two hours to write and maybe 8-10 hours thereafter to produce an
elaborate, complicated, inefficient, version which we still aren't totally
sure works.

Well, dear boy...

We're pretty sure that:

(1) Your off by one strlen didn't work. I guess I should say that it
is a minor saving grace of yours that the code was so simple that the
bug was easy to find, and I mean this in all sincerity.

It may be that my long and Meaningful names may in certain cases
conceal bugs from a raw statistical perspective. The fewer men, the
greater share of honor, said Henry V at Agincourt: the fewer
characters, the fewer places for bugs to hide. Based on this I may
reduce, somewhat, the length of the names I used.

(2) Your %s replacer didn't work. Here, another *mitzvot*, another
grace note, was that you told us about the bug. But in cases (1) and
(2) you proved that under the standards you subject others including
Schildt, you are not very competent. In (1), you made a basic mistake,
not a mistake of style or standards. In (2) you refused to fix the bug
you reported.

(2) You've taken a whole week as of yesterday by my clock not to even
try to solve the problem, which was to do replace() without string.h,
therefore the minimum amount of time you apparently need is w+x where
w=one week to not do anything and x is completely unknown. It's too
late to redeem yourself on this problem, since we can't be sure you
won't plagiarize me.

Now, as to assurance that my program works, as best we can tell.

We can see from a good English or other natural-language description
of the algorithm used that the algorithm works. Here is a good
description of the algorithm in English:

replace(master, target, replacement)

assert that target is not of zero length [there is no need to check
for NULL masters, targets or replacements in this step]
clear a linked list (set its header to null). Each node shall contain
the starting index and length of a "prefix" in the master, where a
"prefix" is the material to the left of a target occurrence or the end
of the string, along of course with the link. Note that a "prefix" can
be of zero length when two targets are adjacent.

from left to right, find all non-overlapping occurrences of the
target. When each occurrence is found, add the starting index and the
length of its "prefix" to the linked list UNLESS (1) there is at least
one member of the linked list, AND (2) this member records the
occurrence of one or more prefixes of zero length between adjacent
targets using a method to be chosen by the mere coder, as long as that
method records the number of prefixes of zero length. If (1) and (2)
obtain, increment the count of zero length prefixes. While scanning
left to right, sum-total the lengths of all prefixes, and each
occurrence of the target, in X.

in the above, add a linked list member for the "prefix" which appears
between the rightmost target and the end of the string only when this
prefix is not null.

in the above keep the index used to match the target character by
character for you shall need its value below.

allocate X bytes of storage [since we assume we shall use C]

concatenate all members of the linked list to the mallocated storage
by walking it: for each member append the prefix in it and then append
the replacement string if, and only if you are either not at the last
member or the last match of the target succeeded, indicating that the
master ends in a target. Note that all you need to do is have the last
value of the index used in the target to match: if this points at NUL
then the master ends with a target.


Now, from this algorithm statement a junior programmer (such as you,
Peter) can verify that the C code works in a half an hour, IF he can
parse reasonably complex English sentences.

Which brings up in fact a very, very interesting consideration.

It is that the common programmer reaction to The Code of the Other,
one of Shock and Awe, may in fact be the same psychological event as
the common reaction of many numerate but aliterate people to English
prose of a certain level of complexity.

The above algorithm description may be as "hard to read" as my code,
and it is certainly almost impossible for 90% percent of actual
programmers to write...despite the fact that interacting with texts of
this nature is an excellent way of writing software. This is something
Dijkstra knew, although he would strongly disagree with my belief that
the "informal" use of the above species of English could replace
mathematical formalism in practice.

But the "unreadability" of the above algorithm is not due to
verbosity. It is a social phenomenon in US and other "developed"
societies because (as many English professors will tell you) the
ability to write a sentence above a small upper bound of nesting and
complexity has been in a frighteningly rapid decline, tracking the
ability to read a coherent sentence above a small UB.

We see evidence for this here in the charges of "verbosity" when they
mean "I have a Master's degree,"

"I have a Master's degree" - Ask Mister Science, Duck's Breath Mystery
Theater

"...but I don't understand this".

Recall the passage from Dijkstra that I quoted earlier:

"Oh yes. In Europe a much larger fraction of the population CAN WRITE.
People in America have really suffered from the combination of TV,
which makes reading superfluous, and the telephone. A few years ago I
was at CalTech and that is a hiqh quality place and everybody
recommends it because the students are so bright. A graduate confessed
to me—no he did not confess, he just stated it, that he did not care
about his writing, since he was preparing himself for an industrial
career. Poor industry!" - EWD 1985, cf
http://www.cs.utexas.edu/users/EWD/misc/vanVlissingenInterview.html

That is, the fact that many people self-select or are tracked towards
programming careers because of difficulties with reading & writing
such as dyslexia may explain their difficulties with maintenance
programming (the need to connect with the Code, that is the Text, of
the Other) and in coding above small lower bounds of complexity. They
may in fact not be able to formulate what it is they are doing in
English.

But: this model doesn't explain why Peter Seebach in fact seems to
write rather well, and has published a programming book. It may be
that there is a writing facility that "flies under the radar" in the
sense that the individual in question has learned how to write
effectively in a SIMPLE style, at the cost of ignoring complexity.

But this "flat" writing style tends to run out of oxygen at higher
levels, as witness the real rage of formally well-educated and
financially successful people with post-baccalaureate degrees at the
texts they were expected to master in graduate school. I've met more
than one English MA working in industry in various capacities who've
mentioned that they specialized in Milton, but in response to my
questions about Paradise Lost, have responded with some ill-concealed
hostility that they really, really HATE fucking Milton, having had to
analyze Milton's famously complex grammar. They were in my opinion
unprepared by more basic English classes where the Little Darlings of
post-Sixties schools were spared the sentence diagrams we had to do,
and which I teach today.

Note that to adequately describe the replace algorithm above, I have
to fly above the upper bound of complexity found in most texts
(newspaper articles, emails, and popular novels) today OF NECESSITY.

What this means is that "complex code" is NOT the problem. My coding
tics, such as my pre-Szymonyi Hungarian, can be rapidly modified using
a good editor, and the underlying code will STILL present to many here
as "too complex", not because I'm a monstrum horrendum or a terrible
programmer, but because industrial society parks people in little
boxes labeled "numbers person" or "words person".

One more remark about the above natural language algorithm. In terms
of the pornographic and sado-masochistic division of labor
("specifications" versus "coding") it is neither fish nor fowl, and
this is as it should be. This is because a good programmer is a good
specifier and vice-versa, and a minimally acceptable programmer can
WRITE NATURAL LANGUAGE. Extreme Programming hasta la victoria siempre!
 
S

spinoza1111

Using these trivial example, there is no reason why the code could be
relatively clear. A lot of code can not be made so. Never write C for
the lowest denominator.

In more than one sense of the word, fella:

"Give not that which is holy unto the dogs, neither cast ye your
pearls before swine, lest they trample them under their feet, and turn
again and rend you." - Matthew 7.6

I'm not a Christian, but Jesus sure knew what he was talking about in
the Sermon on the Mount. Here, I am putting messages into bottles for
lurkers and the occasional person who can understand me. Oh, sure, I'm
trying to defeat Seebach, but fair and square; unlike Christ and like
Krishna, I believe we must fight the battle of life. As to Heathfield,
in this discussion, the guy's been nothing more than comic relief.
 
J

Julienne Walker

However, for easy modification, I don't think a doubly linked list
representation of strings with a length code could be beat.

Indeed. Too bad linked data structures have a tendency toward poor
cache performance. And depending on how the list is implemented, many
programmers (even in C#) would likely balk at the extra overhead of
storing and following links for a "mere" string.
copies the actual values into the list. This would have
blown me to Kingdom Come.

Can you elaborate on this? I'm not sure I fully understand what you
mean.
 
M

Malcolm McLean

Indeed. Too bad linked data structures have a tendency toward poor
cache performance. And depending on how the list is implemented, many
programmers (even in C#) would likely balk at the extra overhead of
storing and following links for a "mere" string.
The main concern is, is this library easy to use? Only when the
program hits the treacle do ypu stary worrying about how efficient the
code is behind those nice interfaces.
For the sorts of programs I write, string handling is never the issue.
However that might change - I've got a new job looking at DNA
sequences.
 
S

Seebs

The main concern is, is this library easy to use? Only when the
program hits the treacle do ypu stary worrying about how efficient the
code is behind those nice interfaces.

Yeah. My string library (like most C programmers, I wrote one at one point)
actually does have, under some circumstances, linked lists in it. It never
seems to have become an issue.

They're used to provide familiar semantics. Consider:
char s[256] = "hello, world!";
char *t;

t = strstr(s, "world");
strcpy(t, "sailor!");

You would expect this to leave s = "hello, sailor!" (note that it's a [], not
a *, and sized to provide enough room).

If you are doing structures, rather than raw pointers to characters, you
need some way to indicate that the same change must be made in both t and s.
My solution was to have s contain a linked list of strings derived from it,
and t contain a pointer to the string it is derived from. When you modify
t, it observes that it is a substring of another string, so in fact, the
modification is passed back up to s. Since t and s share storage, this
works.

-s
 
J

Julienne Walker

The main concern is, is this library easy to use?

The two concerns aren't mutually exclusive. You can design an easy to
use interface *and* consider basic efficiency in the implementation
without skimping on either.
Only when the
program hits the treacle do ypu stary worrying about how efficient the
code is behind those nice interfaces.

It doesn't matter how nice an interface is if it's too bloated for
anyone to want to use it. ;-) Efficiency is especially important with
libraries as well as ease of use, because with libraries you're less
aware of how they'll be used.
For the sorts of programs I write, string handling is never the issue.
However that might change - I've got a new job looking at DNA
sequences.

I tend not to use C for any serious string handling. The language and
standard library support is too weak, IMO.
 
C

Chris M. Thomasson

Right, but then you are mixing line length delimiting with C strings.
And you are basically solving the problem from scratch rather than
using the semantics implied by the C library.

Isn't programming in C all about solving problems from scratch? lol.

;^)



Of course the Better String Library has what you suggest above as an
obviously available technique since lengths are always available.

Nice. Thank you for creating it.



Here's a hunk of another one:
[...]

Both are using C strings and suffer from the stupidity of not being
able to use lengths to accelerate the performance of any portion of
this algorithm.

Humm... Of course one can use C strings in conjunction with a length.
Perhaps something like this crap:
_________________________________________________________________ [...]
_________________________________________________________________

BTW, how are you handling this in your Better String Library? I see that
you
have a function called `cstr2bstr()', but AFAICT it uses dynamic memory
allocation.

Look up the bsStatic() and bsStaticBlkParms() macros to see how I
handle that.

Ahh, thank you:
___________________________________________________________________
/* Static constant string initialization macro */
#define bsStaticMlen(q,m) {(m), (int) sizeof(q)-1, (unsigned char *) ("" q
"")}
#if defined(_MSC_VER)
# define bsStatic(q) bsStaticMlen(q,-32)
#endif
#ifndef bsStatic
# define bsStatic(q) bsStaticMlen(q,-__LINE__)
#endif

/* Static constant block parameter pair */
#define bsStaticBlkParms(q) ((void *)("" q "")), ((int) sizeof(q)-1)
___________________________________________________________________


I have not examined the code in any detail yet but a negative value for
`tagbstring::mlen' seems to denote that the memory is immutable right?
However, I don't quite understand the `-__LINE__' bit. Interesting.



(Note that if you pass non-string literal values to your macros you
will get strange results.)

Well, I did call that chunk of code "crap"... :^)


I wanted a way to construct a counted string instance that pointed to a
mutable string without resorting to dynamic memory allocation. How can I do
that using your library?
 
C

Chris M. Thomasson

[...]
It was hard to get my doubly linked list working, but I did. It would
have been nice to have a separate tool for linked lists. Richard
Heathfield appears to have published such a tool in 2000, in C
Unleashed, but as far as I can determine (see elsethread), the
Dickiemaster copies the actual values into the list. This would have
blown me to Kingdom Come. I THINK the Heathman wanted to avoid using
pointer to void (he is welcome to comment here).

I don't know why Richard implemented a simple linked-list that way. I take
it he is not that fond of intrusive linked lists.



However, preprocessor macro implementation of linked list would avoid
both copying entries and pointer to void. But it would be at that
point in which I would ask myself, like Ross Perot's running mate VAdm
Jim Stockdale in 1992, "Who am I? Why am I here? And why am I not
coding in C Sharp, or getting laid?"

What about intrusive data-structures that do not require any copying and/or
void pointers? I use them all the time.
 
C

Chris M. Thomasson

[...]
We can see from a good English or other natural-language description
of the algorithm used that the algorithm works. Here is a good
description of the algorithm in English:
replace(master, target, replacement)
assert that target is not of zero length [there is no need to check
for NULL masters, targets or replacements in this step]

Why exactly do you need to assert that `target' is not a zero length string?
I would assume that:


replace("", "", "Hello World!");


will spit out a string equal to "Hello World!". I would also assume that:


replace("Hello World!", "", "XXX");


would produce a string equal to "Hello World!".



This is the way I implemented my `replace()' function here:

http://clc.pastebin.com/f62504e4c



I personally do not see anything wrong with that behavior.




[...]
 
J

James

Malcolm, hi. I don't agree with you at all. "Find all strings AND
replace them" is as we've seen two problems, and the specifications
may only seem precise. Remember the gotchas I mentioned? Right to left
as opposed to left to right? Overlapping strings?

If a specification of the `replace()' function mentioned that it only
operates on non-overlapping comparands, then the overlapping string gotcha
is eliminated.


If a specification of the `replace()' function mentioned that it scans for
the comparand in a strict left-to-right fashion, then the
left-to-right/right-to-left gotcha is eliminated.


If a specification of the `replace()' function mentioned both of the above
claims, then both of the "gotchas" are eliminated.
 
W

wolfgang.riedel

!Google!!: from wabsnarf
!Google!!: from spinoza
Is the Better String Library based on a linked list representation of
strings?


I don't know, but AFAIK the IBM OCL (C++) String had three
buffers - this made realloc rarer, but complicated things
like (char*) <String object> or o.cstr().

Length counted strings are not 'the Better String Library':

on the one hand, they are very old: Pascal, Rexx, C++,... all have
them in one way or another!

on the other hand, I think, it was a (time- and hw-)based design
decision to use '\0'-terminated strings - that had - and has -
advantages and disadvantages - for the benefit,
look up sentinels in any algorithm book.

On the third hand: Nobody hinders you (or?) to create and use a
structured string type (with similar interfaces to the std clib).

I'm sure, almost everybody here has implemented and used one.
(Mine was StringBuffer - don't remember, where the name came from)

The problems/complexitity come naturally:
to avoid often realloc,
you have to maintain length & capacity;
to handle input char*, char[] and lib-alloced memory,
you have to remember, how to destroy them;
to make it more flexible for the user, you may choose
to let them define 'Class-' and 'Object-' constr & destr
etc.
And with complexity/ease of use, you have to choose your
timing test cases more carefully.

I'm not saying, it's not worthwile: there are a lot of places,
where an automagically expanding string is nice (f.e. parsing
MIME-headers with their comment and line continuation complexity).

But, that's C: use it's language and lib features, where it's
appropriate and s.th. else elsewhere.

And (to all whom it may concern): end this polemic!
The C lang and lib have some historic ballast,
live with it or leave it.

Wolfgang

(This, of course doesn't mean, I won't use a better sort than
qsort (which in itself in most libs isn't single-alg any more)
and in this ng, there are lot of interesting libs offered
(Surprise: I'm more interested in trees, hash, crypt, networking
than in something, I've done myself).
 
W

Willem

Chris M. Thomasson wrote:
) Why exactly do you need to assert that `target' is not a zero length string?
) I would assume that:
)
)
) replace("", "", "Hello World!");
)
) will spit out a string equal to "Hello World!". I would also assume that:

Why would you assume that ?
Why not 'Hello World' twice ? Why not the empty string ?

) replace("Hello World!", "", "XXX");
)
)
) would produce a string equal to "Hello World!".

This is counter to your assumption above.

) This is the way I implemented my `replace()' function here:
)
) http://clc.pastebin.com/f62504e4c
)
)
)
) I personally do not see anything wrong with that behavior.

I do.
The two mentioned behaviours above are not consistent with each other.
Why do you make a special case of the first two strings both being empty ?

For all values of <anything>:
replace("", "<anything>", "<something>") = ""
except when <anything> is <nothing>, because then suddenly:
replace("", "", "<something>") = "<something>"

And also:

For all values of <anything>:
replace("<anything>", "", "<something>") = "<anything>"
except when <anything> is <nothing, because then suddenly:
replace("", "", <something>") = "<something>"

That's illogical.

To me, even
replace("<anything>", "", "<something>") = "<anything>"
is an arbitrary choice. And not a very logical one either.

It's like saying division by zero yields zero.

Looking at the limit case, it should be:
replace("<anything>", "", "<something>") = "<something> repeated to infinity"

But that would cause unwanted behaviour.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
C

Chris M. Thomasson

Willem said:
Chris M. Thomasson wrote:
) Why exactly do you need to assert that `target' is not a zero length
string?
) I would assume that:
)
)
) replace("", "", "Hello World!");
)
) will spit out a string equal to "Hello World!". I would also assume
that:

Why would you assume that ?
Why not 'Hello World' twice ? Why not the empty string ?

Well, I was thinking that if the source string is empty, and the comparand
is empty, then there is a match, therefore, the exchange string should
replace the empty source string.



) replace("Hello World!", "", "XXX");
)
)
) would produce a string equal to "Hello World!".

This is counter to your assumption above.

I am thinking that the source is not empty, and the comparand is empty so
there cannot possibly be a match.



) This is the way I implemented my `replace()' function here:
)
) http://clc.pastebin.com/f62504e4c
)
)
)
) I personally do not see anything wrong with that behavior.

I do.
The two mentioned behaviours above are not consistent with each other.
Why do you make a special case of the first two strings both being empty ?

Because an empty source and an empty comparand match each other?


Humm... Perhaps my line of thinking on this matter is just way off somewhere
deep in the heart of moron land. Thanks for challenging me on it Willem! I
will think this over.



[...]
 
W

wolfgang.riedel

Perhaps we really need a string finder and a replacer which would
encapsulate left to right, right to left, and what to do on overlap.

However, the traditional way to do this can be slow, since it involves
creating a "table" of occurrences.

But two independent processes, a source and a sink, would handle the
problems nicely.

How so?
Your first cite is a design question -
which your second doesn't even scratches!

(I know, you're with me on this:
Rexx's changestr(needle, hay, newneedle) is an implementation -
of one choice)

Wolfgang
 
W

Willem

Chris M. Thomasson wrote:
) Because an empty source and an empty comparand match each other?
)
)
) Humm... Perhaps my line of thinking on this matter is just way off somewhere
) deep in the heart of moron land. Thanks for challenging me on it Willem! I
) will think this over.

Well, the thing is that the comparand is matched against substrings of the
source. And the question then becomes: Is the empty string a substring of
all possible strings ?

I can see where you're coming from, but mathematically speaking, it's an
anomaly. You probably had to special-case it in your code, and that's
usually a good indication that it is also a special case in the
specification.

PS: If you don't account for an empty comparand, then the result is the
replacement string repeated indefinitely. Sort of like how division by
zero is often thought to yield infinity.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
R

Richard Tobin

Chris M. Thomasson said:
Why exactly do you need to assert that `target' is not a zero length string?
....

I would also assume that:

replace("Hello World!", "", "XXX");

would produce a string equal to "Hello World!".

There are an arbitrary number of instances of the empty string before
the first character, between each pair of characters, and after the
last character. Even if it does only one replacement at each position
it will still produce

XXXHXXXeXXXlXXXlXXXoXXX XXXWXXXoXXXrXXXlXXXdXXX!XXX

(I didn't type that in, I type "Hello World!" and used the editor's
replace function with the arguments "" and "XXX".)
I would assume that:

replace("", "", "Hello World!");

will spit out a string equal to "Hello World!".

That's reasonable, since it replaces lots of empty strings with
empty strings.

-- Richard
 
S

spinoza1111

I@
"spinoza1111" <[email protected]> ha scritto nel messaggio
@Perhaps i don't understand all the written text i see ( i think it is 50%)
@but it seems to me not justly you speak against peaceful and
@what show very educated and helpful people: Keith

Because they assault people who criticise them, making unfounded
claims about competence in a public place. They don't know how to
conduct themselves collegially.

I agree that to detect their malice, one needs to read English at a
somewhat higher level. I regret we are not using a truly global
language. However, their malice is real and familiar to me from my own
experience in bad corporations.
That said, I do like the idiom
size_t lenstr(char *string) {
char *end = string;
while (*end++)
;
return end - string;
}
Not that it's likely to make a noticeable difference. Or that it's likely
worth it when obviously the standard library one will be fine.
Note that ``end - string'' yields a value of type ptrdiff_t, a signed
type which very likely can't represent all values of type size_t.
This is unlikely to be an issue in practice. On typical 32-bit
systems, problems show up only for strings longer than 2 gigabytes,
and even then the properties of 2's-complement arithmetic are likely
to result in the correct answer anyway.
But all this analysis, resulting in the conclusion that it will
almost certainly work properly, becomes unnecessary if you just
use strlen().
--
Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Kenny McCormack, thou should'st be here at this hour:
For Kiki is in finest Form, posting pompously and sour:
Exhibiting ersatz Learning, and utter Pedantry:
He concludes, after "all this analysis" that "almost certainly"
That "it" (Seebie's shit) will yeah right..."work properly".
But for "123" Seebie's code grinds out to the amazement of the Crowd
The number...the number...FOUR, to dismay that is loud.
Whilst we the putative Trolls do dance amidst burning Tyres
Laughing our hairy Asses off at the doom of Pedantry's pretentious
spires!
 
S

spinoza1111

How so?
Your first cite is a design question -
which your second doesn't even scratches!

(I know, you're with me on this:
 Rexx's changestr(needle, hay, newneedle) is an implementation -
of one choice)

Sorry, haven't used rexx recently. Can you explain?
 
S

spinoza1111

If a specification of the `replace()' function mentioned that it only
operates on non-overlapping comparands, then the overlapping string gotcha
is eliminated.

If a specification of the `replace()' function mentioned that it scans for
the comparand in a strict left-to-right fashion, then the
left-to-right/right-to-left gotcha is eliminated.

If a specification of the `replace()' function mentioned both of the above
claims, then both of the "gotchas" are eliminated.

I guess it did, but I never wrote a spec. Probably should have. Of
course, in the so-called real world, one can specify left to right and
no overlap, the user can sign off, and the code can still be wrong.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top