K&R Exercise 1-21: entab

D

dgoodmaniii

+AMDG

I've been working hard on the entab program in K&R Exercise
1-21, and I'm happy to say I've found a solution. (I'm
pretty sure I have, anyway.) However, in the course of
doing so I compiled and ran the first two programs listed at
the clc-wiki site:

http://clc-wiki.net/wiki/K&R2_solutions:CHapter_1:Exercise_21

I'm not sure if I'm running the programs wrong or if I'm
misunderstanding the point of the exercise, but it seems to
me that neither of these really answer to the exercise.

What these two programs do is merely replace strings of
eight blanks with tabs, but no other strings of blanks. In
other words, they do the following; blanks indicate tabs,
while underscores indicate spaces.

TABS: | | | | | |
Input: |No_________is___the__________time____for
Output: |No _is___the __time____for

Sometimes this solution corresponds to that requested by
Exercise 1-21, but more often it doesn't.

As I'm reading it, Exercise 1-21 asks that *any* sequence of
blanks be replaced by the minimum number of tabs and blanks
to achieve the same spacing. In other words, they want
this:

TABS: | | | | | |
Input: |No_________is___the__________time____for
Output: |No ____is __the _time __for

So am I missing something?

I'm quite confident that the solution I've come up with is
ugly, but it does work. How do you all think it might be
improved? Thanks.

/* +AMDG */
/*
* A program that replaces a sequence of blanks by the
* minimum number of tabs and blanks necessary to achieve
* the same spacing. Gives preference to tabs when either a
* tab or a single blank would suffice to reach a tab stop.
*
* This program is released under the GNU General Public
* License, version 3.
*
*/

#include<stdio.h>
#include<string.h>

#define MAXLINE 10000
#define TABSTOP 8

int getline(char s[], int lim);
int tabreplace(char s[], int tabspot, int index);

int main(void)
{
int i; /* keep track of string index */
int j; /* keep track of tab stops */
char string[MAXLINE];

while(getline(string, MAXLINE) > 0) {
for(i=0,j=0; string != '\0'; ++i,++j)
i -= tabreplace(string,j,i);
printf("%s",string);
}
return 0;
}

int tabreplace(char s[], int tabspot, int index)
{
int i,j;
int numspaces;

if (((tabspot % TABSTOP) == 0) && (s[index] == '_')) {
for (i = index; (s == '_') && (i > (index-TABSTOP)); --i);
numspaces = (i>(index-TABSTOP)) ? index-i-1 : index-i;
if (numspaces > 0 && s[index-numspaces] == '_')
s[index-numspaces] = '\t';
else if (numspaces > 0 && s[index-numspaces] != '_')
s[index-numspaces+1] = '\t';
for(j=index-numspaces+1; s[j] != '\0'; ++j)
s[j] = s[j+numspaces-1];
s[++j] = '\0';
return numspaces-1;
}
return 0;
}

int getline(char s[], int lim)
{
int c, i;

for (i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s = c;
if (c == '\n')
s[i++] = c;
s = '\0';
return i;
}

Even now I see that I don't need string.h (at one point the
program used strlen(), but no more), and that the "j" index
in tabreplace() could just as easily reuse "i". Anything
else?
 
S

Seebs

What these two programs do is merely replace strings of
eight blanks with tabs, but no other strings of blanks. In
other words, they do the following; blanks indicate tabs,
while underscores indicate spaces.

Very good observation!
So am I missing something?

Sounds to me like you have it.
int tabreplace(char s[], int tabspot, int index);
Hmm.

int i; /* keep track of string index */
int j; /* keep track of tab stops */
char string[MAXLINE];
while(getline(string, MAXLINE) > 0) {
for(i=0,j=0; string != '\0'; ++i,++j)
i -= tabreplace(string,j,i);
printf("%s",string);
}


This fascinates me.

It appears that you're doing this all in place. I would not have
attempted that; I'd have copied strings.
if (((tabspot % TABSTOP) == 0) && (s[index] == '_')) {
for (i = index; (s == '_') && (i > (index-TABSTOP)); --i);
numspaces = (i>(index-TABSTOP)) ? index-i-1 : index-i;
if (numspaces > 0 && s[index-numspaces] == '_')
s[index-numspaces] = '\t';
else if (numspaces > 0 && s[index-numspaces] != '_')
s[index-numspaces+1] = '\t';
for(j=index-numspaces+1; s[j] != '\0'; ++j)
s[j] = s[j+numspaces-1];
s[++j] = '\0';
return numspaces-1;
}


This is interesting. My intuitive thought about how to do this would have
been:

Upon finding a space or tab, find the logical length of the next
sequence of spaces-and-tabs, then figure out whether one or more
tabs could be used for the initial segment.

Re-reading the problem description, though, it seems as though the intent
is to handle ONLY strings of blanks.

At this point, you do indeed seem to have spotted something: The only time
you can have any work to do is when you see a blank at a tab stop, because
then that blank, plus any preceeding blanks up to the previous tab stop,
can be replaced with a tab. Obviously, you don't have to look past the
previous tab stop, because you would have gotten it when you reached it
previously.

That said, this feels sort of busy and complicated. There's a lot of +1 and
-1 offsets, and it isn't totally clear to me that there would never be
a case in which the assignment of the '\0' could break something.

What happens if you have a single space which is right at the end of
a tab stop? It seems to me that you'll end up writing a tab followed by
a '\0' there -- and that the '\0' will end up overwriting the character
AFTER that space.

.... But it's been a long day, so don't rely too heavily on me. :)

-s
 
B

Ben Bacarisse

+AMDG

I've been working hard on the entab program in K&R Exercise
1-21, and I'm happy to say I've found a solution. (I'm
pretty sure I have, anyway.) However, in the course of
doing so I compiled and ran the first two programs listed at
the clc-wiki site:

http://clc-wiki.net/wiki/K&R2_solutions:CHapter_1:Exercise_21

Should be:
http://clc-wiki.net/wiki/K&R2_solutions:Chapter_1:Exercise_21

I was surprised to find a blank page!
I'm not sure if I'm running the programs wrong or if I'm
misunderstanding the point of the exercise, but it seems to
me that neither of these really answer to the exercise.

What these two programs do is merely replace strings of
eight blanks with tabs, but no other strings of blanks.

That's wikis for you. When you are done post yours.
In
other words, they do the following; blanks indicate tabs,
while underscores indicate spaces.

TABS: | | | | | |
Input: |No_________is___the__________time____for
Output: |No _is___the __time____for

Sometimes this solution corresponds to that requested by
Exercise 1-21, but more often it doesn't.

As I'm reading it, Exercise 1-21 asks that *any* sequence of
blanks be replaced by the minimum number of tabs and blanks
to achieve the same spacing. In other words, they want
this:

TABS: | | | | | |
Input: |No_________is___the__________time____for
Output: |No ____is __the _time __for

So am I missing something?

No, I am sure that yo have understood what is wanted. The exercise is
silent on what should be done with tabs that are already there. It
makes the problem harder, so I think you are right to ignore this case
although a warning from the program would be simple and useful to users.
I'm quite confident that the solution I've come up with is
ugly, but it does work. How do you all think it might be
improved? Thanks.

I think you have some bugs. What were your test cases? Quite a few
things I tried failed:

$ ./entab
_abc
_____$ ./entab
abc_____def
abc_____def
abc______def
abc _def
$

You see that in some cases I don't even get all the non-space text
back.

I think this can be done without any line length restriction. I.e. it
is can be written as pure byte filter. I don't know if that makes it
harder, but often the discipline that comes from not having the full line
to look at help in the design. (I've not done this exercise so I
could have missed some key point that makes it hard or impossible to
do like this.)

<snip>
 
W

Willem

Ben Bacarisse wrote:
) I think this can be done without any line length restriction. I.e. it
) is can be written as pure byte filter. I don't know if that makes it
) harder, but often the discipline that comes from not having the full line
) to look at help in the design. (I've not done this exercise so I
) could have missed some key point that makes it hard or impossible to
) do like this.)

Well, let's do it then.

Basically, you count spaces instead of outputting them directly.
Whenever you encounter a tab or newline, you can forget about the spaces.
And whenever you're counting spaces and you cross a tabstop, you output a
tab and reset the space count.

int c, lp, sp;
lp = 0;
sp = 0;
while ((c = getchar()) != EOF) {
lp = (lp + 1) % 8;
if (lp == 0 && sp > 0) {
putchar('\t');
sp = 0;
}
if (c == ' ') {
sp++;
} else {
while (sp-- > 0) {
putchar(' ');
}
if (c == '\t' || c == '\n') {
lp = 0;
sp = 0;
}
putchar(c);
}
}


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
D

Don

Should be:http://clc-wiki.net/wiki/K&R2_solutions:Chapter_1:Exercise_21
I was surprised to find a blank page!

D'oh! Yes, thank you.
No, I am sure that yo have understood what is wanted.  The exercise is
silent on what should be done with tabs that are already there.  It
makes the problem harder, so I think you are right to ignore this case
although a warning from the program would be simple and useful to users.

I figured that tabs already present would already do what is needed.
I suppose once I get the rest of it right I can have it throw a
warning, though.
I think you have some bugs.  What were your test cases?  Quite a few
things I tried failed:

$ ./entab
_abc
_____$ ./entab
abc_____def
abc_____def
abc______def
abc     _def
$

Well, I didn't have a body of all-expansive tests; I just pounded in
things and watched them work, then quit after a few minutes when I
didn't spot any errors. (That is, when the resultant spacing was
always the same as the input spacing.)

The first example is hard for me to read. Did you put a newline into
the input? I'm breaking input into lines based on newlines, so I
think it's behaving properly in that case. You didn't get a tab there
because there weren't enough trailing spaces to warrant one. That
said, I didn't see what would happen when I typed in spaces at the
beginning of a line, so I can't say with any confidence that it would
work.

The other errors you list there---hmm. I think I follow you, though
you do seem to always get non-space text back (I'm not sure where
you're saying you lose it; I don't see it in the output you've posted
here). You've got "abc", which is three characters, plus five
underscores, making eight. The tab doesn't come in until you get to
nine characters, at which point the spacing is correct (you get a tab
and one underscore). The last example's spacing is correct, but the
first should have a tab. That shouldn't be hard to fix; I'll get on
it.
I think this can be done without any line length restriction.  I.e. it
is can be written as pure byte filter.  I don't know if that makes it
harder, but often the discipline that comes from not having the full line
to look at help in the design.  (I've not done this exercise so I
could have missed some key point that makes it hard or impossible to
do like this.)

Well, that's beyond my competence at this time. Maybe, once I've
learned what a byte filter is, I'll come back and do it.

As for the '\0' cutting things off, I don't see how that could
happen. I place it *after* I copy the string in place, so all the
characters in the line are already where they belong, and then I put
the '\0' once character beyond that. Am I missing something about
that?
 
B

Ben Bacarisse

Willem said:
Ben Bacarisse wrote:
) I think this can be done without any line length restriction. I.e. it
) is can be written as pure byte filter. I don't know if that makes it
) harder, but often the discipline that comes from not having the full line
) to look at help in the design. (I've not done this exercise so I
) could have missed some key point that makes it hard or impossible to
) do like this.)

Well, let's do it then.

Basically, you count spaces instead of outputting them directly.
Whenever you encounter a tab or newline, you can forget about the spaces.
And whenever you're counting spaces and you cross a tabstop, you output a
tab and reset the space count.

int c, lp, sp;
lp = 0;
sp = 0;
while ((c = getchar()) != EOF) {
lp = (lp + 1) % 8;
if (lp == 0 && sp > 0) {
putchar('\t');
sp = 0;
}
if (c == ' ') {
sp++;
} else {
while (sp-- > 0) {
putchar(' ');
}
if (c == '\t' || c == '\n') {
lp = 0;
sp = 0;
}
putchar(c);
}
}

That's the kind of thing I had in mind but the details are little off:

$ cat -A t
1234567 b$
123456 b$
12345 b$
1234 b$
123 b$
$ ./entab <t | cat -A
1234567b$
123456b$
12345b$
1234b$
123^I b$

Also, I'd add

while (sp-- > 0)
putchar(' ');

after the main loop to handle the case where spaces occur at the end
of the input.
 
B

Ben Bacarisse

[You should try to keep the attribution lines so people can tell who
wrote what.]


Well, I didn't have a body of all-expansive tests; I just pounded in
things and watched them work, then quit after a few minutes when I
didn't spot any errors. (That is, when the resultant spacing was
always the same as the input spacing.)

The first example is hard for me to read.

And it is going to get worse! Google groups does funny things to
spaces that it thinks are important (specifically it can change them
to a no-break space). The code you posted seemed to have '_' in the
code. I took that to be deliberate since ' ' is hard to read and _ is
just a good for testing your code. My tests had no spaces at all
since your program used _s.
Did you put a newline into
the input?
Yes.

I'm breaking input into lines based on newlines, so I
think it's behaving properly in that case.

I don't think so. Try <space>abc<newline> as the sole input (or, in
You didn't get a tab there
because there weren't enough trailing spaces to warrant one. That
said, I didn't see what would happen when I typed in spaces at the
beginning of a line, so I can't say with any confidence that it would
work.

The first character gives rise to a call similar to:

tabreplace("_abc\n", 0, 0);

let's see what that does:

int tabreplace(char s[], int tabspot, int index)
{
int i,j;
int numspaces;

if (((tabspot % TABSTOP) == 0) && (s[index] == '_')) {
for (i = index; (s == '_') && (i > (index-TABSTOP)); --i);

we get to loop and set i = 0. 0 > 0-8 is true, so we do --i. The
next test is then outside of the array (i.e. s[-1] is undefined).
Anything can happen from here on.
The other errors you list there---hmm. I think I follow you, though
you do seem to always get non-space text back (I'm not sure where
you're saying you lose it; I don't see it in the output you've posted
here). You've got "abc", which is three characters, plus five
underscores, making eight. The tab doesn't come in until you get to
nine characters, at which point the spacing is correct (you get a tab
and one underscore). The last example's spacing is correct, but the
first should have a tab. That shouldn't be hard to fix; I'll get on
it.

Well, don't worry about my tests. Do your own but be thorough about
it.
Well, that's beyond my competence at this time. Maybe, once I've
learned what a byte filter is, I'll come back and do it.

I think it's simpler, though the details are fiddly. Mind you, the
details of your version are fiddley too.
As for the '\0' cutting things off, I don't see how that could
happen. I place it *after* I copy the string in place, so all the
characters in the line are already where they belong, and then I put
the '\0' once character beyond that. Am I missing something about
that?

I don't think I said anything about that. In fact, once your loop has
set i < 0, all bets are off, so you don't know where the '\0' is
going, but I don't think I said anything about that before.
 
W

Willem

Ben Bacarisse wrote:
)
)> Ben Bacarisse wrote:
)> ) I think this can be done without any line length restriction. I.e. it
)> ) is can be written as pure byte filter. I don't know if that makes it
)> ) harder, but often the discipline that comes from not having the full line
)> ) to look at help in the design. (I've not done this exercise so I
)> ) could have missed some key point that makes it hard or impossible to
)> ) do like this.)
)>
)> Well, let's do it then.
)>
)> Basically, you count spaces instead of outputting them directly.
)> Whenever you encounter a tab or newline, you can forget about the spaces.
)> And whenever you're counting spaces and you cross a tabstop, you output a
)> tab and reset the space count.
)>
)> int c, lp, sp;
)> lp = 0;
)> sp = 0;
)> while ((c = getchar()) != EOF) {
)> lp = (lp + 1) % 8;
)> if (lp == 0 && sp > 0) {
)> putchar('\t');
)> sp = 0;
)> }
)> if (c == ' ') {
)> sp++;
)> } else {
)> while (sp-- > 0) {
)> putchar(' ');
)> }
)> if (c == '\t' || c == '\n') {
)> lp = 0;
)> sp = 0;
)> }
)> putchar(c);
)> }
)> }
)
) That's the kind of thing I had in mind but the details are little off:
)
) $ cat -A t
) 1234567 b$
) 123456 b$
) 12345 b$
) 1234 b$
) 123 b$
) $ ./entab <t | cat -A
) 1234567b$
) 123456b$
) 12345b$
) 1234b$
) 123^I b$
)
) Also, I'd add
)
) while (sp-- > 0)
) putchar(' ');
)
) after the main loop to handle the case where spaces occur at the end
) of the input.

I had intended to drop spaces at end of lines.
Swap around the putchar(' ') loop and the \n\t recognition.

Perhaps this is one of those cases where loop invariants and stuff
like that would come in really handy to get a correct program.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
D

dgoodmaniii

Ben Bacarisse said:
[You should try to keep the attribution lines so people can tell who
wrote what.]

Yes, sorry. All my responses here are to you.
And it is going to get worse! Google groups does funny things to
spaces that it thinks are important (specifically it can change them
to a no-break space). The code you posted seemed to have '_' in the
code. I took that to be deliberate since ' ' is hard to read and _ is
just a good for testing your code. My tests had no spaces at all
since your program used _s.

It was deliberate. Now that I'm at home and can use a real
newsreader (tin, for me), I won't have that problem.
I don't think so. Try <space>abc<newline> as the sole input (or, in
the version you posted, _abc<newline>) and I get a line consisting of
nothing but spaces (_s in using the posted code).

You're right. Thank you for pointing it out.
if (((tabspot % TABSTOP) == 0) && (s[index] == '_')) {
for (i = index; (s == '_') && (i > (index-TABSTOP)); --i);

we get to loop and set i = 0. 0 > 0-8 is true, so we do --i. The
next test is then outside of the array (i.e. s[-1] is undefined).
Anything can happen from here on.


Gah! Stupid. I changed the loop indices in main() to start
at 1, which seems to have solved that problem. (At least, I
get the result that I expect.)

There still remains the bug about it not working there's no
space *after* the tab (your test, "abc_____def", doesn't
work, while adding one more space works precisely as
expected). I don't have time for that right now, but I'll
get to it later.

Thanks again for all your help; this really is making me a
better programmer (I think; it's helping me think more
clearly about things, anyway).
 
H

Hallvard B Furuseth

Ben said:
No, I am sure that yo have understood what is wanted. The exercise is
silent on what should be done with tabs that are already there.

Tabs affect spacing, so to "achieve the same spacing" you must consider
them. E.g. track consecutive spaces and positions left to next tab stop:

void
tabify(char *s, int tab_width)
{
char *dest = s, ch, prev = '\0';
int spaces = 0, to_tab = tab_width;
do {
ch = *s++;
if (--to_tab && ch != '\t') {
spaces = ch == ' ' ? spaces + 1 : 0;
} else {
if (spaces && (ch == ' ' || ch == '\t')) {
dest -= spaces; /* multiple space/tab -> tab */
ch = '\t';
}
if (ch == '\t' && prev == ' ')
dest[-1] = '\t'; /* never emit " \t", use "\t\t" */
spaces = 0;
to_tab = tab_width;
}
} while ((*dest++ = prev = ch) != '\0');
}
 
D

dgoodmaniii

+AMDG

Thanks for all your help, everyone. I've got what appears
to be a (mostly) working version; it's still throwing some
strange bugs with very long lines, which I can't reproduce
with any reliability. However, tabs now work as expected in
all other circumstances. I suppose I can stop worrying
about long lines if I pass the input to the entab program
through my linefold program from exercise 1-22 first.

I still don't take any account of tabs, mostly because I'm
extremely tired right now. I may add that later. But I
wanted to let you all know that, thanks to your help, I've
resolved the bugs you've identified.

The new text is below. Since it's still throwing bugs now
and again, I'm not going to post it on the wiki until I've
figured that out; I post it here so you can see how I solved
things. Thanks again.

/*
* A program that replaces a sequence of blanks by the
* minimum number of tabs and blanks necessary to achieve
* the same spacing. Gives preference to tabs when either a
* tab or a single blank would suffice to reach a tab stop.
*
* This program is released under the GNU General Public
* License, version 3.
*
*/

#include<stdio.h>
#include<string.h>

#define MAXLINE 10000
#define TABSTOP 8

int getline(char s[], int lim);
int tabreplace(char s[], int tabspot, int index);

int main(void)
{
int i; /* keep track of string index */
int j; /* keep track of tab stops */
char string[MAXLINE];

while(getline(string, MAXLINE) > 0) {
for(i=1,j=1; string != '\0'; ++i,++j)
i -= tabreplace(string,j,i);
printf("%s",string);
}
return 0;
}

int tabreplace(char s[], int tabspot, int index)
{
int i;
int numspaces;

if (((tabspot % TABSTOP) == 0) && (s[index-1] == '_')) {
for (i = index-1; (s == '_') && (i > (index-TABSTOP)); --i);
numspaces = (i>(index-TABSTOP)) ? index-i-1 : index-i;
if (numspaces > 0)
s[index-numspaces] = '\t';
for(i=index-numspaces+1; s != '\0'; ++i)
s = s[i+numspaces-1];
s[++i] = '\0';
return numspaces-1;
}
return 0;
}

int getline(char s[], int lim)
{
int c, i;

for (i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s = c;
if (c == '\n')
s[i++] = c;
s = '\0';
return i;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Fibonacci 0
K&R exercise 1-18 10
K$R xrcise 1-13 (histogram) 4
K&R2, exercise 4-2 12
Beginner at c 0
K&R2, exercise 1-19 16
K&R exercise 4-1 4
K&R2, exercise 1-23 12

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top