storing 1,000,000 records

L

lector

do you think it will be better to use a file over here instead of a
link list or array ? Can the file be organized as an array so that one
can index into it ?
 
J

jacob navia

lector said:
do you think it will be better to use a file over here instead of a
link list or array ?

If you have enough RAM use it since it is thousands of times faster
than a disk.


Can the file be organized as an array so that one
can index into it ?

Yes.

If record size (on disk) is X, the 5876th record is at offset
X*5875 bytes.
 
L

lector

Is it this function

int fseek ( FILE * stream, long int offset, int origin );

I don't know if it sufficient to have offset as a long int.
 
J

jacob navia

lector said:
Is it this function

int fseek ( FILE * stream, long int offset, int origin );
yes

I don't know if it sufficient to have offset as a long int.

if your file is smaller than 2GB yes, I am supposing 32 bit
system.

Since 2GB is 2147483648, you can store 1 millio of records
of up to 2147 bytes, what is not that bad actually.

If not, just use a compiler/OS that provides 64 bit access.
 
C

CBFalconer

jacob said:
If you have enough RAM use it since it is thousands of times
faster than a disk.

If you have a suitable OS it will buffer the file, and the
performance difference from an internal buffer will be negligible.
It will also probably be smart enough not to buffer portions of the
file that you do not use.
 
O

osmium

CBFalconer said:
If you have a suitable OS it will buffer the file, and the
performance difference from an internal buffer will be negligible.
It will also probably be smart enough not to buffer portions of the
file that you do not use.

That post makes no sense at all with the usual meanings applied to suitable,
buffer and negligible. It assumes there is a way for an OS to predict the
future addressing patterns of a program that uses random access.
 
R

Richard

osmium said:
That post makes no sense at all with the usual meanings applied to suitable,
buffer and negligible. It assumes there is a way for an OS to predict the
future addressing patterns of a program that uses random access.

And you think that an OS does not use guesswork in any caching
algorithms it uses?

(ps Falconer had a double negative in his last sentence that he probably
did not mean)
 
J

jacob navia

CBFalconer said:
If you have a suitable OS it will buffer the file, and the
performance difference from an internal buffer will be negligible.
It will also probably be smart enough not to buffer portions of the
file that you do not use.

Ahh. Yes of course.

So, that OS will not access the disk when you write
1 million records in a file?

Perfect.

Just tell me then one example of a suitable OS...


In any case you are just *confirming* what I said:

Ram is much faster than disk. Only that you rely on the
OS to do that. I recommended not relying on the OS and
do it yourself.
 
N

Nick Keighley

do you think it will be better to use a file over here instead of a
link list or array ? Can the file be organized as an array so that one
can index into it ?

consider using a database
 
C

CBFalconer

osmium said:
That post makes no sense at all with the usual meanings applied to
suitable, buffer and negligible. It assumes there is a way for an
OS to predict the future addressing patterns of a program that uses
random access.

It's extremely simple. If the program doesn't need to access an
area of the file, don't read it. Buffer everything you do read.
Set up rules for buffer destruction, such as least recently
accessed.
 
U

user923005

do you think it will be better to use a file over here instead of a
link list or array ? Can the file be organized as an array so that one
can index into it ?

I guess that you will find the easiest success if you use a database.
 
L

lector

Do you think it will make things even more efficient if I read and
write data in binary and in chunks of bytes ? I'm doing this using
fread and fwrite functions. eg. something like below

/*-------------------- WRITES EMPLOYEE RECORDS TO A BINARY
FILE-----------------------*/
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
FILE *fp;
char another = 'Y';
typedef struct emp_struct
{
char name[40];
int age;
float bs;
} emp;

emp e;

fp = fopen("EMP.DAT", "wb");

if(fp == NULL)
{
puts("Cannot open file");
exit(EXIT_FAILURE);
}
while(another == 'Y')
{
printf("\nEnter name, age and basic salary\n");
scanf("%s %d %f", e.name, &e.age, &e.bs);
fwrite(&e, sizeof(e), 1, fp);

printf("Add another record (Y/N)");
fflush(stdin);
another = getchar();
}

fclose(fp);
return 0;
}
 
I

Ian Collins

lector said:
Do you think it will make things even more efficient if I read and
write data in binary and in chunks of bytes ? I'm doing this using
fread and fwrite functions. eg. something like below
If you want efficiency at the expense of portability, use whatever your
OS provides to map files and simply map the file and assign the result
to a record pointer and off you go.

Your OS can take care of mapping the required data into memory. Your
friendly neighbourhood platform specific group can help you with the
gory details.
 
U

user923005

Do you think it will make things even more efficient if I read and
write data in binary and in chunks of bytes ? I'm doing this using
fread and fwrite functions. eg. something like below

/*-------------------- WRITES EMPLOYEE RECORDS TO A BINARY
FILE-----------------------*/
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
        FILE *fp;
        char another = 'Y';
        typedef struct emp_struct
        {
                 char name[40];
                 int age;
                 float bs;
         } emp;

        emp e;

        fp = fopen("EMP.DAT", "wb");

        if(fp == NULL)
    {
                puts("Cannot open file");
                exit(EXIT_FAILURE);
      }
        while(another == 'Y')
    {
                printf("\nEnter name, age and basic salary\n");
                scanf("%s %d %f", e.name, &e.age, &e.bs);
                fwrite(&e, sizeof(e), 1, fp);

                printf("Add another record (Y/N)");
                fflush(stdin);
                another = getchar();
      }

        fclose(fp);
       return 0;



}

Yes, binary is faster though not portable.

Even better would be a system that allows you to create hashed or B-
tree indexes on your table.

And can you imagine how nice it would be to have arbitrary search
features that can find things like "name BETWEEN 'Johnson' AND
'Johnson'" or "bs > 29575.00".

Let's let our imagination run wild and suppose that we could even do
things like collecting average age or sum of basic salary.

I guess we're just dreaming now. Too bad there is nothing like that
on the planet.
;-)
 
U

user923005

If you want efficiency at the expense of portability, use whatever your
OS provides to map files and simply map the file and assign the result
to a record pointer and off you go.

Your OS can take care of mapping the required data into memory.  Your
friendly neighbourhood platform specific group can help you with the
gory details.

If all fits into memory maybe even:
http://www.garret.ru/~knizhnik/fastdb.html

And if not:
http://www.garret.ru/~knizhnik/gigabase.html

Just a thought.
 
L

lector

On Apr 7, 10:25 pm, lector <[email protected]> wrote:
Yes, binary is faster though not portable.

Even better would be a system that allows you to create hashed or B-
tree indexes on your table.

And can you imagine how nice it would be to have arbitrary search
features that can find things like "name BETWEEN 'Johnson' AND
'Johnson'" or "bs > 29575.00".

Let's let our imagination run wild and suppose that we could even do
things like collecting average age or sum of basic salary.

I guess we're just dreaming now. Too bad there is nothing like that
on the planet.
;-)

Yes, but then there might be an issue with choosing a hash function
 
B

Barry Schwarz

Do you think it will make things even more efficient if I read and
write data in binary and in chunks of bytes ? I'm doing this using
fread and fwrite functions. eg. something like below

Writing one large chunk as opposed to several small chunks usually
means less calls to the I/O functions which usually means less
overhead for those calls. This has nothing to do with binary vs text.
If you built a large string containing the text equivalent of your
structure members, you would achieve the same efficiency with regard
to calling I/O functions without the problems introduced by binary
noted below.
/*-------------------- WRITES EMPLOYEE RECORDS TO A BINARY
FILE-----------------------*/
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
FILE *fp;
char another = 'Y';
typedef struct emp_struct
{
char name[40];
int age;
float bs;
} emp;

emp e;

fp = fopen("EMP.DAT", "wb");

if(fp == NULL)
{
puts("Cannot open file");
exit(EXIT_FAILURE);
}
while(another == 'Y')
{
printf("\nEnter name, age and basic salary\n");
scanf("%s %d %f", e.name, &e.age, &e.bs);

This opens up the possibility of the user entering more than 39
characters into name. This will not support a name which contains an
embedded blank. You really should check that scanf returns 3.
fwrite(&e, sizeof(e), 1, fp);

If you change compilers, or possibly even compiler options, the file
may be difficult to process because of different padding in the
structure. If you transport the file to a different system, the int
and the float may have problems due to endian-ness or representation.
printf("Add another record (Y/N)");
fflush(stdin);

fflush is not defined for input streams.
another = getchar();

What will you do if the user enters 'y'?

On most interactive systems, the user will need to press Enter after
typing the 'Y'. This will leave a '\n' in the buffer. When you go
back to the scanf, this character will be processed immediately and
the user will never be able to enter the three values.
}

fclose(fp);
return 0;
}


Remove del for email
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top