How do i found out the number of lines in a file

V

Vasilis Serghi

Presently I define the number of lines to be expected in a file when
defining the array size and the initialisation of this array. This works
fine for now, but i'm sure that in the future this could change. So rather
than explicitly define the number expected, the file is read and the number
defined that way.

So I have a csv file that I can read in, how can I work out the number of
lines in the file? Is there a function that can do this, or do I need to
code this myself?

Vas.
 
S

Stefano Ghirlanda

Vasilis Serghi said:
So I have a csv file that I can read in, how can I work out the
number of lines in the file? Is there a function that can do this,
or do I need to code this myself?

Just count the number of \n characters in the file (or whatever
character combination the system uses for "end of line").

From what you say it seems you already know the maximum length of a
line. If not, you can add a counter to keep track of how long the
longest line is, to be sure to allocate enough memory.
 
F

Flash Gordon

Presently I define the number of lines to be expected in a file when
defining the array size and the initialisation of this array. This
works fine for now, but i'm sure that in the future this could change.

If you were sure that it would not change then you can be certain some
abstrad would change it :)
So rather
than explicitly define the number expected, the file is read and the
number defined that way.

So I have a csv file that I can read in, how can I work out the number
of lines in the file? Is there a function that can do this, or do I
need to code this myself?

You will have to work it out yourself by counting the lines as you read
them in. If you are trying to load all of the data in to memory then you
will need to malloc the space then realloc as necessary.
 
C

Christopher Benson-Manica

Stefano Ghirlanda said:
From what you say it seems you already know the maximum length of a
line. If not, you can add a counter to keep track of how long the
longest line is, to be sure to allocate enough memory.

Who needs to allocate memory? fgetc() can handle this job quite well.
 
L

Lewis Bowers

Vasilis said:
Presently I define the number of lines to be expected in a file when
defining the array size and the initialisation of this array. This works
fine for now, but i'm sure that in the future this could change. So rather
than explicitly define the number expected, the file is read and the number
defined that way.

So I have a csv file that I can read in, how can I work out the number of
lines in the file? Is there a function that can do this, or do I need to
code this myself?

You would code it yourself using some functions from the Standard C library.

You could experiment with dynamically allocated storage for the array as you
read the data file. And, as you do this keep track of the number of elements
allocated. How this is done depends on the array data type and the file format.

You could put this all in a struct data type.
For example, if the data was type int:

typedef struct DATA
{
int *array; /* holds the data */
size_t size; /* keep track of the number of elements */
}DATA;

Write a function to add data to the array.

int *AddDATA(DATA *p, int data)
{
int *tmp;
size_t cnt = p->size;

if((tmp = realloc(p->array, (cnt+1)*sizeof *tmp)) == NULL) return NULL;
p->array = tmp;
p->array[p->size++] = data;
return &p->array[cnt];
}

In main you would declare the struct:
DATA myData = {NULL};

open the file and as you read an item of data, call the function and store the
data dyamically in
the array.

Beware that even dynamic allocations are not limitless. Available memory space
may be
exhausted to the point where no more storage can be made available for the
array.The
function return of NULL would indicated the failure of allocation.
 
D

Dan Pop

In said:
Just count the number of \n characters in the file (or whatever
character combination the system uses for "end of line").

If he opens the file in text mode, as he is supposed to do, it's '\n'
*regardless* of the local system conventions. It's the job of the C
runtime support to transparently handle the conversions between '\n'
and whatever the local system uses, on text streams (but not on binary
streams).

Dan
 
S

Stefano Ghirlanda

If he opens the file in text mode, as he is supposed to do, it's
'\n' *regardless* of the local system conventions. It's the job of
the C runtime support to transparently handle the conversions
between '\n' and whatever the local system uses, on text streams
(but not on binary streams).

Even better. I was uncertain whether this was the case, so I threw in
a warning.
 
R

Rouben Rostamian

Just count the number of \n characters in the file (or whatever
character combination the system uses for "end of line").

But beware that the number lines may be 1 more than the number
of \n characters if the file does not end in \n.

The C standard assumes that a text stream is terminated by a
newline. Your CSV file, however, may have been created by
a utility which may not have heard of the C standard.
 
D

Default User

Lewis said:
You could experiment with dynamically allocated storage for the array as you
read the data file. And, as you do this keep track of the number of elements
allocated. How this is done depends on the array data type and the file format.


I would also strongly consider a linked list for storing the strings. If
your access to the strings once loaded is essentially sequential, then
that's a viable solution.


Brian Rodenborn
 
D

Dan Pop

In said:
But beware that the number lines may be 1 more than the number
of \n characters if the file does not end in \n.

Does that extra bunch of characters after the last newline qualify as a
line?

Dan
 
D

Derk Gwen

# Presently I define the number of lines to be expected in a file when
# defining the array size and the initialisation of this array. This works
# fine for now, but i'm sure that in the future this could change. So rather
# than explicitly define the number expected, the file is read and the number
# defined that way.
#
# So I have a csv file that I can read in, how can I work out the number of
# lines in the file? Is there a function that can do this, or do I need to
# code this myself?

Reallocate the array of lines as need.

char *a,**line = 0; int m = 0, n= 0;

a = next line of input;
if (n+1>=m) {m = 2*(n+1); line = realloc(line,sizeof(char*)*m);}
line[n++] = a;
 
N

nrk

Vasilis said:
Presently I define the number of lines to be expected in a file when
defining the array size and the initialisation of this array. This works
fine for now, but i'm sure that in the future this could change. So rather
than explicitly define the number expected, the file is read and the
number defined that way.

So I have a csv file that I can read in, how can I work out the number of
lines in the file? Is there a function that can do this, or do I need to
code this myself?

You'll have to code this yourself. However, if you don't mind traversing
the file twice, things are considerably simpler. Let's for the moment
assume that you have a fixed maximum length for your lines (this artificial
restriction can be removed as well). So, here's what you can do:

#include <stdio.h>
#include <assert.h>

int count_lines(FILE *fp, int *nlptr, int *mlptr) {
int nlines = 0, maxlen = 0;
int ch, len = 0, rval;
fpos_t pos;

rval = fgetpos(fp, &pos);
if ( rval )
return 1; /* didn't read fp at all */

maxlen = 0;
while ( (ch = fgetc(fp)) != EOF ) {
if ( ch == '\n' ) {
++nlines;
maxlen = (len > maxlen) ? len : maxlen;
len = 0;
}
else ++len;
}

if ( ferror(fp) )
return 2; /* read something */

if ( len ) {
++nlines;
maxlen = (len > maxlen) ? len : maxlen;
}

*nlptr = nlines;
*mlptr = maxlen;

rval = fsetpos(fp, &pos);
if ( rval )
return 3; /* read everything, but failed to return back */
return 0;
}

That should help you get both the line count and the length of the longest
line (assuming you didn't encounter any errors meantime), and returns the
file position indicator for the stream back to where it was before the call
was made. You can use the maximum length for allocation purposes (wasteful
maybe, but depends on your input space really) or to see if the file
doesn't meet your expectations for the maximum length.

-nrk.
 
R

Rouben Rostamian

Does that extra bunch of characters after the last newline qualify as a
line?

That depends on the file's creator's concept of a "line".
If the file was derived from a spreadsheet, that last
pseudo-line may contain the balance of your bank account.
You don't want to be too cavalier in discarding it.
 
C

CBFalconer

Rouben said:
That depends on the file's creator's concept of a "line".
If the file was derived from a spreadsheet, that last
pseudo-line may contain the balance of your bank account.
You don't want to be too cavalier in discarding it.

The simple answer is to use an input mechanism that defines the
action (as far as is feasible) on that possibly orphan last line:

#include <stdio.h>
#include <stdlib.h>
#include "ggets.h"

#define useup free

int main(void)
{
char *ln;
size_t lines;

lines = 0;
while (0 == ggets(&ln)) {
lines++;
useup(ln);
}
printf("%ul lines\n", (unsigned long)lines);
return 0;
} /* untested */

and you can get ggets.c and ggets.h (in standard C) at:

<http://cbfalconer.home.att.net/download/ggets.zip>
 
D

Dan Pop

In said:
That depends on the file's creator's concept of a "line".
If the file was derived from a spreadsheet, that last
pseudo-line may contain the balance of your bank account.
You don't want to be too cavalier in discarding it.

The implementation may discard it for you:

Whether the last line
requires a terminating new-line character is implementation-defined.

Dan
 
R

Rouben Rostamian

The implementation may discard it for you:

Whether the last line
requires a terminating new-line character is implementation-defined.

You are right, the implementation may discard it for you.

I would consider an implementation that discards incoming data,
one of poor quality, if not dangerous to use.
 
K

Kelsey Bjarnason

Derk said:
# Presently I define the number of lines to be expected in a file when
# defining the array size and the initialisation of this array. This works
# fine for now, but i'm sure that in the future this could change. So rather
# than explicitly define the number expected, the file is read and the number
# defined that way.
#
# So I have a csv file that I can read in, how can I work out the number of
# lines in the file? Is there a function that can do this, or do I need to
# code this myself?

Reallocate the array of lines as need.

char *a,**line = 0; int m = 0, n= 0;

a = next line of input;
if (n+1>=m) {m = 2*(n+1); line = realloc(line,sizeof(char*)*m);}
line[n++] = a;

Bad.

If realloc returns NULL, it means it couldn't allocate enough space for
the new copy of the buffer... but it does not mean it's freed the
original; that's still sitting there, pointed to by line. Except you
just lost that pointer.

Net result: a block of allocated memory you can't access, can't free,
can't do anything with. I'll assume that the lack of a check for NULL
was simply "condensing for posting", rather than simply bad habits. ;)
 
V

Vasilis Serghi

Vasilis Serghi said:
Presently I define the number of lines to be expected in a file when
defining the array size and the initialisation of this array. This works
fine for now, but i'm sure that in the future this could change. So rather
than explicitly define the number expected, the file is read and the number
defined that way.

So I have a csv file that I can read in, how can I work out the number of
lines in the file? Is there a function that can do this, or do I need to
code this myself?

Vas.

Wow, thanks for the responses. This is what I did in the end. Don't flame me
if this is completely wrong but it seems to work ok.

while (!feof(errorFile))
{
/* Read each line of text and increment the counter. Provided the
last line is not a line feed */
if (fgets(lineText,100,errorFile) && (lineText[0] != '\r') &&
(lineText[1] != '\n'))
{
numOfRows++;
}
}

I noticed that after the last line, I saw a "\r\n", hence the check for
these characters. Stefano gave me the original idea. The csv file is
generated by me in Excel. What this won't work with is multiple \r\n after
the last line, but that is something I can deal with before the file is
read.
 
J

Joe Wright

Vasilis said:
Vasilis Serghi said:
Presently I define the number of lines to be expected in a file when
defining the array size and the initialisation of this array. This works
fine for now, but i'm sure that in the future this could change. So rather
than explicitly define the number expected, the file is read and the number
defined that way.

So I have a csv file that I can read in, how can I work out the number of
lines in the file? Is there a function that can do this, or do I need to
code this myself?

Vas.

Wow, thanks for the responses. This is what I did in the end. Don't flame me
if this is completely wrong but it seems to work ok.

while (!feof(errorFile))
{
/* Read each line of text and increment the counter. Provided the
last line is not a line feed */
if (fgets(lineText,100,errorFile) && (lineText[0] != '\r') &&
(lineText[1] != '\n'))
{
numOfRows++;
}
}

I noticed that after the last line, I saw a "\r\n", hence the check for
these characters. Stefano gave me the original idea. The csv file is
generated by me in Excel. What this won't work with is multiple \r\n after
the last line, but that is something I can deal with before the file is
read.

You're doing it wrong. You want to read lines from a text file. You must
open the text file in text mode ("r"). In text mode you can't see a '\r'
character. The sequence '\r\n' that Windows makes is reduced to '\n' by
your C implementation in text mode. I've repeated 'text mode' here
several times to aid your eventual recollections. Also fgets() works on
text streams.

A line is a stream of zero or more characters ending with and including
'\n'. Whether the last line of a stream requires '\n' is 'implementation
defined'.

It is always(?) wrong to test feof() before doing something which may
cause it. Your loop above is more correctly written like..

while (fgets(lineText, 100, errorFile)) ++ numOfRows;

That loop will run and count lines as long as there are any. It will
also count the last line that doesn't have a '\n'. If you still want to
know why fgets() stopped, now is the time for feof(), ferror() or
whatever.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top