Challenge: tightest code to find-replace a string

Keith Thompson · Jun 9, 2014

Malcolm McLean said:
It's slightly easier to write a greedy matcher. So probably that's what
they did, and wrote the specifications afterwards.

Right, they couldn't possibly have thought about the design before they
implemented it. (That was sarcasm.)

No it's not. If you call a pattern on an input with overlapping
sequences, almost certainly you haven't thought through the output
you actually want. The fix is usually to extend the pattern.

Perhaps you could try not making assumptions about what others have or
haven't thought through.

Malcolm McLean · Jun 10, 2014

Dennis Ritchie, Donald E. Knuth, were or are they not all
masters of the English language, too?

I've found that some of the best [Software ]developers
of all are English majors. They'll often graduate with
no programming experience at all, and certainly without
a clue about the difference between DRAM and EPROM.

But they can write. That's the art of conveying
information concisely and clearly. Software development
and writing are both the art of knowing what you're going
to do, and then lucidly expressing your ideas.

I am what Americans would call an English major. My first degree
was English Language and Literature at Oxford.
But I don't apply a mathematical mindset to natural language.
Computers can't talk. There's currently a claim that a program
has passed the Turing test (successfully deceive judges into
thinking it is human), but it's bound to be flawed. Similarly
humans can't easily communicate mathematical expressions. But
that doesn't make a natural language specification meaningless.

Chris M. Thomasson · Jun 10, 2014

From: Stefan Ram

Sent: Monday, June 09, 2014 2:40 PM
Newsgroups: comp.lang.c Subject:
Re: Challenge: tightest code to find-replace a string

In C, one can write: »while(1);«.

Not having an iteration threshold in C means that the
language is simple and fast, albeit not without risks.

I was thinking more along the lines of:
______________________________
unsigned i;

for (i = 0; i < n; i += 1)
{
/*...*/
}
______________________________

;^)

So, wrt "01" and a transformation of "01" =>
"0101", then this would be 2^n number of
replacements:

01 = seed
0101 = 2^0 transformations
01010101 = 2^1 transformations
0101010101010101 = 2^2 transformations
.... = 2^n transformations

?

Think of iterating an L-system. You generally
want to stop at some point?

Stefan Ram · Jun 10, 2014

Malcolm McLean said:
humans can't easily communicate mathematical expressions.

The other animals are even worse at it! (Except maybe Alex,
who invented the number 0 by himself, while human culture
needed a long time for this.)

http://en.wikipedia.org/wiki/Alex_(parrot)

Stefan Ram · Jun 10, 2014

Chris M. Thomasson said:
Think of iterating an L-system. You generally
want to stop at some point?

The client should have a way to terminate a loop,
but should be free to choose not to terminate it,
too. Just as in C.

James Kuyper · Jun 10, 2014

"Chris M. Thomasson" wrote in message

Can I get the most recent requirements as a summary?
AFAICT, you want a one-to-many transformation.
A simple example:
s = "abcdefgabcdefghijklmnop"
A possible transformation could be:
"abcdefg" => "aaaAAaa"
So:
s_t = "aaaAAaaaaaAAaahijklmnop"
Right?

Click to expand...

WRT "infinitely" overlapping search string in the to be transformed string:

s = "01"

Transform:

"01" => "0101"

s[0] = "01"
s[1] = "0101"
s[2] = "01010101"
s[3] = "0101010101010101"
[...]

You would have to set an iteration threshold to help the
program avoid an infinite loop, and an ever expanding
buffer.

It expands like a L-system fractal.

In general, I don't like arbitrary limits like that . I think that for
the kind of utility program being discussed here, the most appropriate
way to deal with that issue is to define the transformations that it's
capable of implementing in such a way that the loop cannot be infinite.

I think that the simplest way to arrange this is to exclude the
replacement text from previous find-replace operations from all
following "find" steps.

Alternatively, one relatively minimal way to it is to require than each
found pattern must include at least one character that was not the
result of a previous replace. That would ensure that your loop would
never have more iterations than the number of characters in the input.

Tim Rentsch · Jun 10, 2014

DFS said:
DFS said:

* reads an existing file
* writes changes to new file
* counts replacements made by line
* counts total replacements made
* no fancy usage of sed!

Click to expand...

It reports rather than counts these matches. I would never write a
function with this spec. because it destroys its usefulness in
other contexts. A function should do one thing well.

I'd write a string match/replace function that returns the number
of matches. If I needed the counts reported by line, I'd write a
wrapper that adds those.

int replace_string(const char *match, const char *repl, int stopper,
FILE *fi, FILE *fo)
{
int nmatches = 0, c;
const char *mp = match;
while ((c = fgetc(fi)) != EOF && c != stopper)
if (c == *mp) {
if (!*++mp) {
++nmatches;
fputs(repl, fo);
}
}
else {
mp = match;
fputc(c, fo);
}
return nmatches;
}

Called with stopper == EOF it processes a whole file. Note how
removing the line buffer actually simplifies the code, whilst also
removing an unnecessary restriction. It's not uncommon for this to
happen (there was a recent thread about this).

[snip]

Click to expand...

I definitely like [...] and the replace-string() code, but there
might be bugs: [snip]

int
copy_file_replacing( FILE *in, FILE *out, char *this, char *that, int end ){
char *t = this;
int matches = 0;
int c;

while( c = fgetc( in ), c != end && c != EOF ){
if( t > this && *t != c ){
char *p = this, *u = t;
do {
fputc( *p, out ), p++, t--;
} while( p < u && (*t != c || strncmp( this, p, u-p )) );
}

if( c == *t ){
if( * ++t == 0 ) fputs( that, out ), matches++, t = this;
} else {
fputc( c, out );
}
}

if( c != EOF && c == 0[t] && 1[t] == 0 ){
fputs( that, out );
return matches + 1;
}

fprintf( out, "%.*s", (int){t-this}, this );
if( c != EOF ) fputc( c, out );
return matches;
}

Richard Bos · Jun 10, 2014

You are responding to:

|... called on abcabcabc returns xabc or even abcabcabc.

No, what I'm responding to is still up there in the quoted post.

Malcolm wrote:

|... called on abcabcabc returns xabc xx or even abcabcabc.
Quite.

Malcolm failed to clearly indicate the boundaries of the string.
Still, taking » xx« to be a part of the string, is the only way
to parse the English sentence, as long as there is no comma after
the »xabc«.

I presumed that Malcolm was merely being sloppy, and had omitted a comma
he intended to be there. I often disagree with him, but I don't think
he's daft enough to believe that "replace 'abcabc' 'x'" on "abcabcabc"
could result in "xabc xx". Even by his standards, that would be an
unacceptably excentric bug.

Richard

Richard Bos · Jun 10, 2014

The other animals are even worse at it! (Except maybe Alex,
who invented the number 0 by himself, while human culture
needed a long time for this.)

http://en.wikipedia.org/wiki/Alex_(parrot)

Well. So Pepperberg claims. Her claims are highly suspect, though.

Richard

DFS · Jun 10, 2014

Can I get the most recent requirements as a summary?

No changes on my side, though others have jumped on them as insufficient.

I would like for common sense to be a requirement of the developer.

AFAICT, you want a one-to-many transformation.

A simple example:

s = "abcdefgabcdefghijklmnop"

A possible transformation could be:

"abcdefg" => "aaaAAaa"

So:

s_t = "aaaAAaaaaaAAaahijklmnop"

Right?

Yep!

FWIW, this can be easily rendered into a case-sensitive function.

That would be nice. Most find-replace apps have the 'match case' option.

I'll try and add it to my own code as well.

James Kuyper · Jun 10, 2014

No changes on my side, though others have jumped on them as insufficient.

I would like for common sense to be a requirement of the developer.

"Common sense" is just a fiction invented for the sake of justifying the
assumption that everyone else will think the same way you do.

Richard Bos · Jun 11, 2014

DFS said:
No changes on my side, though others have jumped on them as insufficient.

I would like for common sense to be a requirement of the developer.

I'm all for it, but common sense in the client would be even more
welcome.

Richard

Keith Thompson · Jun 11, 2014

DFS said:
No changes on my side, though others have jumped on them as insufficient.

I would like for common sense to be a requirement of the developer.

[...]

My common sense as a developer tells me that I need an unambiguous
specification. If I can't get one, I'll probably write one myself
and verify that it's consistent with what's actually needed.

Malcolm McLean · Jun 11, 2014

My common sense as a developer tells me that I need an unambiguous
specification. If I can't get one, I'll probably write one myself
and verify that it's consistent with what's actually needed.

If someone instructed you to go through a text, and replace every
instance of "her" with "him", how would you respond?

Stefan Ram · Jun 11, 2014

Malcolm McLean said:
If someone instructed you to go through a text, and replace every
instance of "her" with "him", how would you respond?

Well, the obvious call backs would be: »also in "inheritance"?«,
»also "Her"? (by "Him" or by "him"?)«, »also, when the glyph "h"
is encoded using Unicode code point 1211 (decimal)«?

BartC · Jun 11, 2014

Malcolm McLean said:
If someone instructed you to go through a text, and replace every
instance of "her" with "him", how would you respond?

I would probably change "I told her so" to "I told him so".

I wouldn't change "There are several herds" to "Thime are several himds".

I would have to think a little about changing "Here was her coat" to "Here
was him coat" (but would not be tempted to write "Hime was him coat".

(I would probably also change whole words of "her", "Her" and "HER" to
"him", "Him" and "HIM" respectively.)

So yes, when presenting a challenge, you do have to spell these things out.
(And also why when I re-stated this problem, I talked about byte-sequences
rather than text.)

Ike Naar · Jun 11, 2014

If someone instructed you to go through a text, and replace every
instance of "her" with "him", how would you respond?

"Whime you crazy?"

Keith Thompson · Jun 11, 2014

Malcolm McLean said:
If someone instructed you to go through a text, and replace every
instance of "her" with "him", how would you respond?

I'd ask whethim thime are any bothimsome or othimwise incohiment special
cases to consider.

Possibly the context of the request would clarify whether
occurrences of "her" within a word (and what exactly is a
"word"?) should or shouldn't be replaced, and whether "Her" and
"HER" should be replaced by "Him" and "HIM", respectively. If the
intent is to replace feminine pronouns by masculine, I'd have to do
more analysis to determine whether "her" is possesive (and should
be replaced by "his") or not (and should be replaced by "him").
Given the complexity of English grammar, I woudln't be surprised
if there are cases that are inherently ambiguous.

Your point?

Keith Thompson · Jun 11, 2014

BartC said:
I would probably change "I told her so" to "I told him so".

I wouldn't change "There are several herds" to "Thime are several himds".

I would have to think a little about changing "Here was her coat" to "Here
was him coat" (but would not be tempted to write "Hime was him coat".

(I would probably also change whole words of "her", "Her" and "HER" to
"him", "Him" and "HIM" respectively.)

So yes, when presenting a challenge, you do have to spell these things out.
(And also why when I re-stated this problem, I talked about byte-sequences
rather than text.)

And if the actual intent is to implement C code that performs the
equivalent of

sed 's/him/her/g'

then most of these assumptions would be wrong. This is why we need
context. Even with perfectly precise requirements, knowing the actual
goal can help both in understanding the requirements and in judging
whether they're written correctly.

Malcolm McLean · Jun 12, 2014

I'd ask whethim thime are any bothimsome or othimwise incohiment special
cases to consider.

You can't do that and keep a secretarial job. Well actually you maybe can,
it's such an unusual response that a secretary who said that would probably
be treated as a company treasure. But usually, a businessman expects a
reasonable instruction like that to be executed.

Possibly the context of the request would clarify whether
occurrences of "her" within a word (and what exactly is a
"word"?) should or shouldn't be replaced, and whether "Her" and
"HER" should be replaced by "Him" and "HIM", respectively. If the
intent is to replace feminine pronouns by masculine, I'd have to do
more analysis to determine whether "her" is possesive (and should
be replaced by "his") or not (and should be replaced by "him").

Given the complexity of English grammar, I woudln't be surprised
if there are cases that are inherently ambiguous.

Your point?

You need to read the text. It's entirely possible that a blind search/replace
of the sequence "her" by "him" will produce an output everyone would
agree is the one desired. It's quite likely that a word replacement will
be right.
But it's also likely that no simple program can do the job, but any human
can, for example, "himself" and "herself", or "her" and "his". It's also
likely that there are other gendered words, there comes a point where
you've got to query if, say, "Freda" should be replaced by "Fred". That's
pretty clearly going beyond the instruction. It's hard to say where that
point is.

User requirements are often inherently ambiguous in computer terms.
There is no algorithm that can fulfil the businessman's request.

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
string search and replace	5	Oct 11, 2006
How to accept text and put each letter into a 2d matrix?	0	Jun 3, 2022
WIN32 - Update Text in a Window in order to show its size in Pixels and coordinates	0	Oct 4, 2023
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Boomer trying to learn coding in C and C++	6	Dec 16, 2022

Challenge: tightest code to find-replace a string

Keith Thompson

Malcolm McLean

Chris M. Thomasson

Stefan Ram

Stefan Ram

James Kuyper

Tim Rentsch

Richard Bos

Richard Bos

DFS

James Kuyper

Richard Bos

Keith Thompson

Malcolm McLean

Stefan Ram

BartC

Ike Naar

Keith Thompson

Keith Thompson

Malcolm McLean

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads