how to store 1 million integers?

  • Thread starter nicholas.entropy
  • Start date
N

nicholas.entropy

I have troubles allocating enough memory to read an input file
containing 1 million data (the file size is 19MB), I tried to use
malloc(sizeof(buffer_size)*1000000) to allocate continuous memory to
keep that data, but it still gives me segmentation fault. in such case
what should I do?
regards,
nick
 
I

Ian Collins

I have troubles allocating enough memory to read an input file
containing 1 million data (the file size is 19MB), I tried to use
malloc(sizeof(buffer_size)*1000000) to allocate continuous memory to
keep that data, but it still gives me segmentation fault. in such case
what should I do?

Fit more memory?

How big is sizeof(buffer_size)?

19MB is relatively small, what exactly did you do? Pose enough code to
explain your problem.
 
N

nicholas.entropy

#include <stdlib.h>
#include <stdio.h>

#define BUFFSIZE 32;


size_t buffer_size=BUFFSIZE;

int main(int argc, char* argv[])
{
char *line=NULL;

int *a;

int i=0;
FILE *fp;
fp=fopen(argv[1],"r");
line=(char *) malloc(sizeof(buffer_size)*1000000);
if(fp!=NULL){
while(!feof(fp)){

fgets(line,buffer_size,fp);
a=atoi( line) ;
printf("the %ld element is %ld,its address is %ld . \n",i,a,&a
);
i++;
}
}
fclose(fp);


return 0;
}

my purpose is to use multithreads to sort this 1 million data by using
quicksort.
I also read an article like this: http://www.ibm.com/developerworks/linux/library/wa-memmng/
Thank you for your help.
 
S

Spiros Bousbouras

#include <stdlib.h>
#include <stdio.h>

#define BUFFSIZE 32;

size_t buffer_size=BUFFSIZE;

int main(int argc, char* argv[])
{
char *line=NULL;

int *a;

int i=0;
FILE *fp;
fp=fopen(argv[1],"r");
line=(char *) malloc(sizeof(buffer_size)*1000000);
if(fp!=NULL){
while(!feof(fp)){

fgets(line,buffer_size,fp);
a=atoi( line) ;


You have not initialised a.
printf("the %ld element is %ld,its address is %ld . \n",i,a,&a
);
i++;
}
}
fclose(fp);

return 0;

}

my purpose is to use multithreads to sort this 1 million data by using
quicksort.
I also read an article like this: http://www.ibm.com/developerworks/linux/library/wa-memmng/
Thank you for your help.
 
N

nicholas.entropy

even I put int *a=NULL;
it still gives me segmentation fault. Is it related to OS memory
management? maybe there is nothing to do with C implementation?

Spiros said:
#include <stdlib.h>
#include <stdio.h>

#define BUFFSIZE 32;

size_t buffer_size=BUFFSIZE;

int main(int argc, char* argv[])
{
char *line=NULL;

int *a;

int i=0;
FILE *fp;
fp=fopen(argv[1],"r");
line=(char *) malloc(sizeof(buffer_size)*1000000);
if(fp!=NULL){
while(!feof(fp)){

fgets(line,buffer_size,fp);
a=atoi( line) ;


You have not initialised a.
printf("the %ld element is %ld,its address is %ld . \n",i,a,&a
);
i++;
}
}
fclose(fp);

return 0;

}

my purpose is to use multithreads to sort this 1 million data by using
quicksort.
I also read an article like this: http://www.ibm.com/developerworks/linux/library/wa-memmng/
Thank you for your help.

 
I

Ian Collins

#include <stdlib.h>
#include <stdio.h>

#define BUFFSIZE 32;


size_t buffer_size=BUFFSIZE;

int main(int argc, char* argv[])
{
char *line=NULL;

int *a;

int i=0;
FILE *fp;
fp=fopen(argv[1],"r");
line=(char *) malloc(sizeof(buffer_size)*1000000);

The is equivalent to

line = malloc(sizeof(size_t)*1000000);

Is that what you intended?
if(fp!=NULL){
while(!feof(fp)){

fgets(line,buffer_size,fp);
a=atoi( line) ;


As already noted, a isn't initialised, which is probably the cause of
your crash.

It's good practice to avoid atoi() in deference to strtol(), which
reports error conditions.
 
I

Ian Collins

(e-mail address removed) wrote:

[please don't top-post]
Spiros said:
#include <stdlib.h>
#include <stdio.h>

#define BUFFSIZE 32;

size_t buffer_size=BUFFSIZE;

int main(int argc, char* argv[])
{
char *line=NULL;

int *a;

int i=0;
FILE *fp;
fp=fopen(argv[1],"r");
line=(char *) malloc(sizeof(buffer_size)*1000000);
if(fp!=NULL){
while(!feof(fp)){

fgets(line,buffer_size,fp);
a=atoi( line) ;

You have not initialised a.

> even I put int *a=NULL;
> it still gives me segmentation fault. Is it related to OS memory
> management? maybe there is nothing to do with C implementation?

So where did you expect a to be? You probably ended up indexing from
address 0.
 
S

Spiros Bousbouras

even I put int *a=NULL;
it still gives me segmentation fault. Is it related to OS memory
management? maybe there is nothing to do with C implementation?

You're trolling, yes?
Spiros said:
#include <stdlib.h>
#include <stdio.h>
#define BUFFSIZE 32;
size_t buffer_size=BUFFSIZE;
int main(int argc, char* argv[])
{
char *line=NULL;
int *a;
int i=0;
FILE *fp;
fp=fopen(argv[1],"r");
line=(char *) malloc(sizeof(buffer_size)*1000000);
if(fp!=NULL){
while(!feof(fp)){
fgets(line,buffer_size,fp);
a=atoi( line) ;

You have not initialised a.
 
V

Vincent Cheung

even I put int *a=NULL;
it still gives me segmentation fault. Is it related to OS memory
management? maybe there is nothing to do with C implementation?

How can you initialize it with NULL?
If you don't give a effective memory pointer to a, you can NOT use it.
 
N

nicholas.entropy

(e-mail address removed) wrote:

[please don't top-post]


Spiros said:
On 9 May, 06:11, (e-mail address removed) wrote:
#include <stdlib.h>
#include <stdio.h>
#define BUFFSIZE 32;
size_t buffer_size=BUFFSIZE;
int main(int argc, char* argv[])
{
  char *line=NULL;
  int *a;
  int i=0;
  FILE *fp;
  fp=fopen(argv[1],"r");
  line=(char *) malloc(sizeof(buffer_size)*1000000);
  if(fp!=NULL){
    while(!feof(fp)){
      fgets(line,buffer_size,fp);
      a=atoi( line) ;
You have not initialised a.


 > even I put int *a=NULL;
 > it still gives me segmentation fault. Is it related to OS memory
 > management? maybe there is nothing to do with C implementation?

So where did you expect a to be?  You probably ended up indexing from
address 0.


First I use fgets to read from input file as char* format, then
convert it to integer and store in an array.
Currently it seems I can get avoid segmentation fault by doing:
long *a;
a=(long *)malloc(sizeof(long)*2000000);

I think I am bit confused about array and pointer in C, especially
regrading integer array.
lets say
int a[3];
a[0]=10;
a[1]=20;
a[2]=30;

how does memory store this integer array? is it like below:
______________________________
|1|0|2|0|3|0|.......
________________________________
1 2 3 4 5 6 7 8 .....

where 1-8 are array index.
Its bit odd because if is true, then a[2]=0,a[3]=2,a[4]=0,a[5]=3,a[6]
=0.....well it seems not work in the same way as char array.
 
I

Ian Collins

So where did you expect a to be? You probably ended up indexing from
address 0.


First I use fgets to read from input file as char* format, then
convert it to integer and store in an array.
Currently it seems I can get avoid segmentation fault by doing:
long *a;
a=(long *)malloc(sizeof(long)*2000000);


The cast is superfluous,

long *a = malloc(sizeof(long)*2000000);

will do.

So now you have allocated some memory for a.
I think I am bit confused about array and pointer in C, especially
regrading integer array.
lets say
int a[3];
a[0]=10;
a[1]=20;
a[2]=30;

how does memory store this integer array?

As a contiguous block of 3 ints.
 
M

miloody

(e-mail address removed) wrote:
[please don't top-post]
Spiros Bousbouras wrote:
On 9 May, 06:11, (e-mail address removed) wrote:
#include <stdlib.h>
#include <stdio.h>
#define BUFFSIZE 32;
size_t buffer_size=BUFFSIZE;
int main(int argc, char* argv[])
{
  char *line=NULL;
  int *a;
  int i=0;
  FILE *fp;
  fp=fopen(argv[1],"r");
  line=(char *) malloc(sizeof(buffer_size)*1000000);
  if(fp!=NULL){
    while(!feof(fp)){
      fgets(line,buffer_size,fp);
      a=atoi( line) ;
You have not initialised a.

 > even I put int *a=NULL;
 > it still gives me segmentation fault. Is it related to OS memory
 > management? maybe there is nothing to do with C implementation?
So where did you expect a to be?  You probably ended up indexing from
address 0.


First I use fgets to read from input file as char* format, then
convert it to integer and store in an array.
Currently it seems I can get avoid segmentation fault by doing:
long *a;
a=(long *)malloc(sizeof(long)*2000000);

I think I am bit confused about array and pointer in C, especially
regrading integer array.
lets say
int a[3];
a[0]=10;
a[1]=20;
a[2]=30;

how does memory store this integer array? is it like below:
______________________________
|1|0|2|0|3|0|.......
________________________________
 1 2 3 4 5 6 7 8 .....

where 1-8 are array index.
Its bit odd because if is true, then a[2]=0,a[3]=2,a[4]=0,a[5]=3,a[6]
=0.....well it seems not work in the same way as char array.
Hi:
in my experience, the memory layout quite depended on your machine.
(little endian or big endian)
BTW, your array a is pointer of integer and the size of bytes it
points to is dependent on your compiler.
HTH,
miloody
 
J

jfbode1029

#include <stdlib.h>
#include <stdio.h>

#define BUFFSIZE 32;

size_t buffer_size=BUFFSIZE;

int main(int argc, char* argv[])
{
  char *line=NULL;

  int *a;

  int i=0;
  FILE *fp;
  fp=fopen(argv[1],"r");
  line=(char *) malloc(sizeof(buffer_size)*1000000);

Whoa whoa whoa whoa whoa. Are you saying that *each line of data* in
the file is 32 million characters long? I suspect that's not true.
  if(fp!=NULL){
    while(!feof(fp)){

      fgets(line,buffer_size,fp);
      a=atoi( line) ;


a is an uninitialized pointer; you have not allocated any memory to it
yet. This is where your segfault is coming from. Do this first:

int *a = malloc (sizeof *a * 1000000);
...
if (!a)
{
fprintf(stderr, "Could not allocate memory for a\n");
return 0;
}

You don't need to allocate an input buffer; if your input is well-
formed, you should use fscanf() instead of fgets()*:

size_t i;
int tmp;
...
while (fscanf(fp,"%d",&tmp))
{
a[i++] = tmp;
}

Note that i is size_t, not int.

You don't want to make feof(fp) your test condition for the loop,
because it won't return true until *after* you try to read past the
end of file; you'll wind up looping once too often. Instead, test the
return value of fscanf(); it returns the number of successful
conversions (in this case, that should be 1).

* You'll see several hundred articles (including some of mine)
claiming that using fgets() is better than using scanf(), but that's
for *interactive* input, not reading from a data file. If your data
file is well-formed (you don't have any junk characters among your
numerical inputs), scanf() is the better (well, easier to use)
choice.

      printf("the %ld element is %ld,its address is %ld . \n",i,a,&a
);
      i++;
    }
  }
    fclose(fp);

  return 0;

}

my purpose is to use multithreads to sort this 1 million data by using
quicksort.
I also read an article like this:  http://www.ibm.com/developerworks/linux/library/wa-memmng/
Thank you for your help.



Fit more memory?
How big is sizeof(buffer_size)?
19MB is relatively small, what exactly did you do?  Pose enough code to
explain your problem.

- Show quoted text -
 
O

Old Wolf

(e-mail address removed) wrote:
[please don't top-post]
Spiros Bousbouras wrote:
On 9 May, 06:11, (e-mail address removed) wrote:
#include <stdlib.h>
#include <stdio.h>
#define BUFFSIZE 32;
size_t buffer_size=BUFFSIZE;
int main(int argc, char* argv[])
{
  char *line=NULL;
  int *a;
  int i=0;
  FILE *fp;
  fp=fopen(argv[1],"r");
  line=(char *) malloc(sizeof(buffer_size)*1000000);
  if(fp!=NULL){
    while(!feof(fp)){
      fgets(line,buffer_size,fp);
      a=atoi( line) ;
You have not initialised a.

 > even I put int *a=NULL;
 > it still gives me segmentation fault. Is it related to OS memory
 > management? maybe there is nothing to do with C implementation?
So where did you expect a to be?  You probably ended up indexing from
address 0.


First I use fgets to read from input file as char* format, then
convert it to integer and store in an array.
Currently it seems I can get avoid segmentation fault by doing:
long *a;
a=(long *)malloc(sizeof(long)*2000000);

I think I am bit confused about array and pointer in C, especially
regrading integer array.
lets say
int a[3];
a[0]=10;
a[1]=20;
a[2]=30;

how does memory store this integer array? is it like below:
______________________________
|1|0|2|0|3|0|.......
________________________________
 1 2 3 4 5 6 7 8 .....


you seem to be a bit confused, it's like this:

Index| 0 | 1 | 2 |
---------------------
Value| 10 | 20 | 30 |

Indices start from 0 (not 1). Ints can hold
any value between -32768 and 32767 (and maybe
more), they do not get broken up into decimal
digits or something.

BTW you make several elementary errors in your
original program on this thread, I recommend
you start with some simpler tasks (the series
of examples in K&R would be one such place to start)
 
T

Thad Smith

Spiros said:
You're trolling, yes?

This is the type of response that can turn off potential posters. I
thought it was obvious that the OP is a beginner looking for help. When
I'm not sure, I give the benefit of the doubt to the poster, rather than
potentially ridiculing someone earnestly looking for help.


To the OP (Nicholas):

When a variable is declared directly, such as
int x;
memory is allocated for the variable by the C implementation.

When a pointer is declared, such as
int *p;
space is allocated for the pointer, but the pointer value hasn't been
set. This needs to be done before the pointer can be used.

NULL can be an initial value, but doesn't specify a memory location, so
int *p = NULL;
p = 2;
results in undefined behavior, since no real memory was allocated, which
can, if you are lucky, result in a segmentation or other immediate fault.

int a[10];
int *p = a;

These declarations store the address of array a into p, so that p can be
used: p[0] = 1; for example.

malloc and other allocation routines can also be used to supply a valid
address for a pointer.
 
G

Guest

the OP is trying to read, store and eventually sort 1M integers.

It might be worth trying a much smaller number like 10 and
seeing if you can get it to work. If not post your code.
The code you have now should be pretty different from the code
in your original post.


a = malloc (sizeof(long) * 2000000);

that looks like 2M longs not 1M ints
I think I am bit confused about array and pointer in C, especially
regrading integer array.
"regrading"?
lets say
int a[3];
a[0]=10;
a[1]=20;
a[2]=30;
how does memory store this integer array?

why do you need to know? It stores the integers contiguously in memory

is it like below:
Most implementaions use binary not decimal. If you use longs
they will be 4 bytes (on most 32-bit implementations) not 2.
The byte order will vary from implementation to implementation.


I don't understand this
where 1-8 are array index.
Its bit odd because if is true, then a[2]=0,a[3]=2,a[4]=0,a[5]=3,a[6]

you are mistaken
a[2] == 30
a[3], a[4] etc. undefined behaviour

you are confusing the array of ints with the underlying memory model.
You don't need to worry about the memory model at this point.
in my experience, the memory layout quite depended on your machine.

but you hardly ever have to know this...
(little endian or big endian)
BTW, your array a is pointer of integer and the size of bytes it
points to is dependent on your compiler.
HTH,

probably not, but that isn't your fault.
 
C

Chris M. Thomasson

I have troubles allocating enough memory to read an input file
containing 1 million data (the file size is 19MB), I tried to use
malloc(sizeof(buffer_size)*1000000) to allocate continuous memory to
keep that data, but it still gives me segmentation fault. in such case
what should I do?
regards,


Here is a VERY CRUDE little program that you can take a look at:
_______________________________________________________________________
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>


#define BUFSZ 4096
#define DEPTH 1024
#define DELIM ","


int compare(void const* a1, void const* a2) {
long n1 = *((long const*)a1);
long n2 = *((long const*)a2);

if (n1 < n2) return -1;
if (n1 > n2) return 1;

return 0;
}


int main(void) {
FILE* file = fopen("data.txt", "r");

if (file) {
long* numbers = malloc(sizeof(*numbers) * DEPTH);
size_t count = 0;

if (numbers) {
char data[BUFSZ + 1] = { '\0' };

if (fgets(data, BUFSZ, file)) {

size_t delimsz = strlen(DELIM);
char* current = data;
char* stopped = NULL;
char* end = data + strlen(data);

do {
numbers[count] = strtol(current, &stopped, 10);

if (errno != ERANGE) {
count++;

} else {
fprintf(stderr, "range error!");
}

current = stopped + delimsz;
} while (count < DEPTH && current < end);

} else {
fprintf(stderr, "error reading from the file!");
}

} else {
fprintf(stderr, "error allocating memory!");
}

if (fclose(file)) {
fprintf(stderr, "error closing the file!");
}

if (numbers && count) {
size_t i;

qsort(numbers, count, sizeof(long), compare);

for (i = 0; i < count; ++i) {
printf("%d\n", numbers);
}

free(numbers);
}

} else {
fprintf(stderr, "error opening the file!");
}

return 0;
}
_______________________________________________________________________




Its totally static in nature such that it will only read 1024 numbers. Also,
it only reads 4096 bytes into the `data' buffer. Also, it currently expects
the file to be setup as follows:




-5,-411,-343,-255,-1,0,1,2,3,4543,6554765,56875363,23435252,6456546,345232332




Take note of the `DELIM' macro. It currently expects each number to be
separated with a comma. One more thing, it expects the data file name to be
`data.txt'; sorry about that... It sorts the numbers in increasing order.




Of course all of those caveats can be fixed, but I leave that as an exercise
for you to chew on. Please, try and understand this tiny little program
BEFORE you mess around with multi-threading!!!!


YIKES!!!


:^o
 
C

Chris M. Thomasson

Chris M. Thomasson said:
I have troubles allocating enough memory to read an input file
containing 1 million data (the file size is 19MB), I tried to use
malloc(sizeof(buffer_size)*1000000) to allocate continuous memory to
keep that data, but it still gives me segmentation fault. in such case
what should I do?
regards,


Here is a VERY CRUDE little program that you can take a look at:
_______________________________________________________________________
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>


#define BUFSZ 4096
#define DEPTH 1024
#define DELIM ","


int compare(void const* a1, void const* a2) {
long n1 = *((long const*)a1);
long n2 = *((long const*)a2);

if (n1 < n2) return -1;
if (n1 > n2) return 1;

return 0;
}


int main(void) {
FILE* file = fopen("data.txt", "r");

if (file) {
long* numbers = malloc(sizeof(*numbers) * DEPTH);
size_t count = 0;

if (numbers) {
char data[BUFSZ + 1] = { '\0' };

if (fgets(data, BUFSZ, file)) {

size_t delimsz = strlen(DELIM);
char* current = data;
char* stopped = NULL;
char* end = data + strlen(data);

do {
numbers[count] = strtol(current, &stopped, 10);

if (errno != ERANGE) {
count++;

} else {
fprintf(stderr, "range error!");
}

current = stopped + delimsz;
} while (count < DEPTH && current < end);

} else {
fprintf(stderr, "error reading from the file!");
}

} else {
fprintf(stderr, "error allocating memory!");
}

if (fclose(file)) {
fprintf(stderr, "error closing the file!");
}





if (numbers && count) {
size_t i;

qsort(numbers, count, sizeof(long), compare);

for (i = 0; i < count; ++i) {
printf("%d\n", numbers);
}

free(numbers);
}



I should move the call to free outside the `if' block above... Also, I
should remove the NULL test on `numbers' variable:



if (count) {
size_t i;

qsort(numbers, count, sizeof(long), compare);

for (i = 0; i < count; ++i) {
printf("%d\n", numbers);
}
}

free(numbers);



This would remove any possibility of a memory leak when `numbers' is
non-NULL and `count' is zero. Sorry about that non-sense!


;^(...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top