How to get the total row number of a text file

A

Alex

CBFalconer said:
Charles M. Reinke said:
From: "ThunderBird said:
it mentioned "You could count the numbers of linebreaks. ".
My question is how to count the numbers of linebreaks?

This works for me...

cmreinke@hologram>cat lines.c
/* lines.c */
#include <stdio.h>

int main(int argc, char **argv) {
long lines=0;
FILE *fp;

if(argc>1) {
if((fp=fopen(argv[1], "r"))) {
while(fscanf(fp, "%*[^\n]\n")!=EOF) lines++;
printf("Number of lines in file \"%s\": %ld\n", argv[1], lines);
} /* if */
else printf("ERROR: could not open file \"%s\"\n", argv[1]);
} /* if */
else printf("ERROR: no input file specified\n");

return 0;
} /* main */
/* lines.c */


Why so complex? Use stdin. :)

[1] c:\c\wc>cat lc.c
#include <stdio.h>

int main(void)
{
unsigned long chars, lines;
int ch;

chars = lines = 0;
while (EOF != (ch = getchar())) {
chars++;
if ('\n' == ch) lines++;
}
printf("%lu chars in %lu lines\n", chars, lines);
return 0;
} /* main lines */

[1] c:\c\wc>cc lc.c -o lc.exe

[1] c:\c\wc>.\lc < lc.c
285 chars in 15 lines

WOW! Although not the original poster, I tried that. It works sweet. Thnx.
 
W

Walter Roberson

Do we have any ways to tell the newline character in our os? If it is
not '\n', then Charles' fscanf statement have to be changed.
I mean if I have a text file which is gotten from other os in binary
mode, and I want a general program to deal with it, what should I do?

In such a case, you would need to probe the newline conventions of
the -other- OS, not of your own OS ;-)

Actually it's worse than that: you need to know how the file
gets transformed during the transfer. For example, the original
text file might happen to be stored in a format that is not
stream-oriented. such as fixed recordlength, or counted-length, or
using lineprinter encoding (e.g., one particular character in
the first column might indicate overstrike on the same line,
whereas a different one might indicate end-of-page with an implication
that all remaining lines on the page "exist" but read as empty).
Or the file might be stored in a semi-database format with a series
of descriptors of the individual lines, followed by the data -- but
the data in binary format will not necessarily be in-order compared
to the line sequence imposed by the header (e.g., because the
file was editted or 'patched'.)

One system's internal format might not be easily representable as
a binary stream during file transfer -- some of the crucial parameters
might have been stored as meta-data for example... Hence one must know
how the binary transfer process reshaped the data. What got to the
other side might not be the same thing as the way the same file would
have read out in "binary mode" on the original OS. C doesn't promise
that "binary" files will contain only exactly the bytes one writes out:
C's promise is that if one writes in binary mode, that the byte stream
read back in will exactly match what was written out. There could
be an intermediate representation layer [as long as that layer supports
seeking and writing at arbitrary positions.]
 
A

akarl

CBFalconer said:
Charles M. Reinke said:
From: "ThunderBird said:
it mentioned "You could count the numbers of linebreaks. ".
My question is how to count the numbers of linebreaks?

This works for me...

cmreinke@hologram>cat lines.c
/* lines.c */
#include <stdio.h>

int main(int argc, char **argv) {
long lines=0;
FILE *fp;

if(argc>1) {
if((fp=fopen(argv[1], "r"))) {
while(fscanf(fp, "%*[^\n]\n")!=EOF) lines++;
printf("Number of lines in file \"%s\": %ld\n", argv[1], lines);
} /* if */
else printf("ERROR: could not open file \"%s\"\n", argv[1]);
} /* if */
else printf("ERROR: no input file specified\n");

return 0;
} /* main */
/* lines.c */


Why so complex? Use stdin. :)

[1] c:\c\wc>cat lc.c
#include <stdio.h>

int main(void)
{
unsigned long chars, lines;
int ch;

chars = lines = 0;
while (EOF != (ch = getchar())) {
chars++;
if ('\n' == ch) lines++;
}
printf("%lu chars in %lu lines\n", chars, lines);
return 0;
} /* main lines */

Well, we can still make it terser ;-)

#include <stdio.h>

int main(void)
{
int count = 0, c;

while (((c = getchar()) != EOF) && ((count += (c == '\n')), 1));
printf("%i\n", count);
return 0;
}


August
 
C

Charles M. Reinke

akarl said:
Well, we can still make it terser ;-)

#include <stdio.h>

int main(void)
{
int count = 0, c;

while (((c = getchar()) != EOF) && ((count += (c == '\n')), 1));
printf("%i\n", count);
return 0;
}


August

Wow, not bad at all...thatz kinda cute.

Thus, it would appear that I've initiated some sort of contest: write a
program to count the number of lines in a file and output the result to
stdout, using the least number of characters for the source code. Of
course, the usual caveats apply, i.e. must use portable Standard C, must
compile without errors and not dump core during *normal* operation, etc.
Any challengers? :p

-Charles
 
S

Suman

akarl said:
CBFalconer said:
Charles M. Reinke said:
From: "ThunderBird" <[email protected]>

it mentioned "You could count the numbers of linebreaks. ".
My question is how to count the numbers of linebreaks?

This works for me...

cmreinke@hologram>cat lines.c
/* lines.c */
#include <stdio.h>

int main(int argc, char **argv) {
long lines=0;
FILE *fp;

if(argc>1) {
if((fp=fopen(argv[1], "r"))) {
while(fscanf(fp, "%*[^\n]\n")!=EOF) lines++;
printf("Number of lines in file \"%s\": %ld\n", argv[1], lines);
} /* if */
else printf("ERROR: could not open file \"%s\"\n", argv[1]);
} /* if */
else printf("ERROR: no input file specified\n");

return 0;
} /* main */
/* lines.c */


Why so complex? Use stdin. :)

[1] c:\c\wc>cat lc.c
#include <stdio.h>

int main(void)
{
unsigned long chars, lines;
int ch;

chars = lines = 0;
while (EOF != (ch = getchar())) {
chars++;
if ('\n' == ch) lines++;
}
printf("%lu chars in %lu lines\n", chars, lines);
return 0;
} /* main lines */

Well, we can still make it terser ;-)

#include <stdio.h>

int main(void)
{
int count = 0, c;

while (((c = getchar()) != EOF) && ((count += (c == '\n')), 1));

I'd prefer:
while ((c = getchar()) != EOF) count += (c =='\n');
saves a few precious chars, makes the intent clear etc etc...
 
A

akarl

Charles said:
Wow, not bad at all...thatz kinda cute.

Thus, it would appear that I've initiated some sort of contest: write a
program to count the number of lines in a file and output the result to
stdout, using the least number of characters for the source code. Of
course, the usual caveats apply, i.e. must use portable Standard C, must
compile without errors and not dump core during *normal* operation, etc.

My code sure does.

(In practice we should strive for clarity and maintainability rather
than to write terse and obscure code such as the one I presented.
....still, C makes you want to do the opposite ;-)


August
 
A

akarl

Suman said:
I'd prefer:
while ((c = getchar()) != EOF) count += (c =='\n');
saves a few precious chars, makes the intent clear etc etc...

Yes, of course. That's shorter *and* clearer. (Stupid me)

August
 
A

Anonymous 7843

Thus, it would appear that I've initiated some sort of contest: write a
program to count the number of lines in a file and output the result to
stdout, using the least number of characters for the source code. Of
course, the usual caveats apply, i.e. must use portable Standard C, must
compile without errors and not dump core during *normal* operation, etc.
Any challengers? :p

Well, here's the impertinent solution...

#include <stdlib.h>
int main(void)
{
system("wc -l");
exit(0);
}
 
A

Anonymous 7843

Well, here's the impertinent solution...

#include <stdlib.h>
int main(void)
{
system("wc -l");
exit(0);
}

I might as well make a two-liner out of that.
Leave off out the #include and it could be a one-liner if you don't
mind a few warnings from the compiler.

#include <stdlib.h>
int main(void) { exit(system("wc -l")); }
 
W

Walter Roberson

I might as well make a two-liner out of that.
Leave off out the #include and it could be a one-liner if you don't
mind a few warnings from the compiler.
#include <stdlib.h>
int main(void) { exit(system("wc -l")); }

int system(const char*); int main(void){return system("wc -l");}

You can probably remove the 'const'. I would need to dig through
the standards to determine whether it is safe to use

int system(); main(void){return system("wc -l");}

K&R C pretty much had to guarantee that char* was promoted to long
safely. If I recall correctly, K&R2 promises that a pointer
can be converted to long and back again safely, but that C89's
wording is different enough to make this technically unsafe.
 
A

Anonymous 7843

int system(const char*); int main(void){return system("wc -l");}

You can probably remove the 'const'. I would need to dig through
the standards to determine whether it is safe to use

int system(); main(void){return system("wc -l");}

K&R C pretty much had to guarantee that char* was promoted to long
safely. If I recall correctly, K&R2 promises that a pointer
can be converted to long and back again safely, but that C89's
wording is different enough to make this technically unsafe.

I was a bit surprised that "gcc -ansi" (3.3) quietly compiles this:

main(){return system("wc -l");}
 
F

Flash Gordon

As I understand it pointers don't undergo any conversion when passed as
parameters to a function without a prototype, so as long as you are
passing the correct type of pointer (or a compatible one) they it is
safe. Not something I would ever choose to do, but safe.
I was a bit surprised that "gcc -ansi" (3.3) quietly compiles this:

main(){return system("wc -l");}

I'm not. It is perfectly valid on a machine where passing "wc -l" to
system is valid.
 
S

Suman

Anonymous said:
I might as well make a two-liner out of that.
Leave off out the #include and it could be a one-liner if you don't
mind a few warnings from the compiler.

#include <stdlib.h>
int main(void) { exit(system("wc -l")); }

1. Relying on the system function is not portable at all, as I found
out the hard way. It is one of the few things that is implementation
defined, and I can gie you right now an example -- CodeWarrior on
Macintosh systems, where though the `wc' command is available,
system() is a do nothing.

2. `wc' itself is not portable.

3. And int main() will do just as fine as int main(void),
when you don't mind not including libraries ;-)
 
P

Peter Pichler

Flash said:
I'm not. It is perfectly valid on a machine where passing "wc -l" to
system is valid.

It is perfectly valid (for the purpose of compilation) even on a machine
where it is not. I believe that Anonymous' surprise originated from gcc
not objecting to a missing prototype for system().

Peter
 
W

Walter Roberson

It is perfectly valid (for the purpose of compilation) even on a machine
where it is not. I believe that Anonymous' surprise originated from gcc
not objecting to a missing prototype for system().

That part is well defined by C89: if a previously-undefined
identifier is found in a function-call position, then the identifier
is implicitly defined as an unprototyped function returning int.

The part that I'm pondering is why the main() did not generate a
warning or diagnostic. C89 says that the implementation does not
define any prototype for main, but it appears to list only two forms
as being valid, with the simpler of the two having (void) .

I have not been able to find anything in C89 that would explicitly
allow the prototypeless main(), so it seems to me that using
the prototypeless version would be a system extension
(just like the 3-parameter version that includes an environment
pointer), and I would have expected -pedantic to give a murmur
about it.
 
C

Chris Torek

cmreinke@hologram>cat lines.c
/* lines.c */
#include <stdio.h>

int main(int argc, char **argv) {
long lines=0;
FILE *fp;

if(argc>1) {
if((fp=fopen(argv[1], "r"))) {
while(fscanf(fp, "%*[^\n]\n")!=EOF) lines++;
printf("Number of lines in file \"%s\": %ld\n", argv[1], lines);
} /* if */
else printf("ERROR: could not open file \"%s\"\n", argv[1]);
} /* if */
else printf("ERROR: no input file specified\n");

return 0;
} /* main */
/* lines.c */

This program has what might be a flaw (given that, as far as I
know, no one has ever defined "total row number" in this thread,
so it is hard to say how to count "number of rows").

The program *does* compile and run, but:

% cat in
line 1


line 4
line 5
% wc -l in
5 in
% ./lines in
Number of lines in file "in": 3
%

The exercise for the reader is to figure out why these numbers
are different. (I know why, so you do not need to tell me.)

Note that the program gives the same result if I change the
fscanf() call to use "%*[^\n] " or even "%*[^\n]\f".
 
D

Dave Thompson

fscanf is for use with text mode streams, so the C library should do any
conversion required for files correct for that OS. But...
Yes and no. All of the stdio routines (*get/putc/char, *printf/scanf,
fread/fwrite) are defined to work on both text and binary streams.

It is rare but does sometimes happen that actual binary files contain
data that can usefully be processed by *printf or even *scanf. There
are also situations where it makes sense to access as binary an actual
text file, or mostly text file (e.g. a tar or ar of texts).
... you are now talking about files from a foreign OS. If the file has
been imported as a binary copy, the C library on your machine has no
idea how to handle it. For the most common ones these days, you can
faitly easily write your own code to read a line, though. Mine uses an
algorithm something like: <snip>
That copes with lines ending with CR, LF, CRLF and LFCR, as long as
there are no stray CR or LF characters which are supposed to be part of
the data (which is silly but occasionally happens).
This is certainly most cases. Historically there have been systems (or
formats) where FF and maybe VT was also treated as line terminator,
sometimes with CR and sometimes not (as you have for LF). And I've
heard rumors someone actually used RS (and maybe GS) as intended.
Of course, your foreign file might have come from a system where each
like is represented as a 2 byte count followed by that many characters
in the line, or where all files are kept in a compressed form, or text
files have a header stating the line width and all lines are constant
width, or something more strange, and there is no way that you can
automatically detect all of the possibilities...
Or a 4-byte count (or two of them) as in OS/360 et seq V. I don't
know if there is an FTP for those systems that will transmit that
format in binary, but it's certainly possible to find on tape or disk.

Or, as you say, even more strange. I use one that does 2K blocks, in
arbitrary order with an index, each containing a 2-byte header then a
series of lines each with a 5-byte header and compressed data, and
_usually_ a 1-byte block trailer but not if the block is exactly full.
And I wrote a utility to fix files that were mistakenly transferred
binary and couldn't easily be retransferred correctly.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top