Problems reading from files

Discussion in 'C Programming' started by lancer6238@yahoo.com, Aug 25, 2007.

  1. Guest

    Hi all,
    I'm having programs reading from files.

    I have a text file "files.txt" that contains the names of the files to
    be opened, i.e. the contents of files.txt are

    Homo_sapiens.fa
    Rattus_norvegicus.fa

    (They are FA files that can be opened in any text editor.)

    Each of the FA files contains a number in the first line and a string
    of characters (A,T,G or C). For example, the Homo_sapiens.fa file
    would contain

    16571
    GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTT
    CGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTC
    GCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTATCGCACCTACGTTCAATATT
    ACAGGCGAACATACCTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATA

    and so on, with 16571 A,T,G or Cs.

    Below is my code:

    #include <stdio.h>
    #include <stdlib.h>

    #define MAX_FILE 100 // maximum length of file name
    #define MAX_SEQ 20000 // maximum length of sequence
    #define N 2 // total number of sequences

    int main(void)
    {
    FILE *fin, *fin1, *fout;
    char input[MAX_FILE+1], seq[N][MAX_SEQ+1], c;
    int size[N], i = 0, j = 0;

    fin = fopen("files.txt", "r");
    fout = fopen("output.txt", "w");
    while (fscanf(fin, "%s", input) != EOF)
    {
    fin1 = fopen(input, "r");
    printf("%s\n", input);
    fscanf(fin1, "%d ", &size);
    printf("%d\n", size);
    while ((c = fgetc(fin1)) != EOF)
    {
    fprintf(fout, "%c", c);
    if (c != '\n')
    seq[j] = c;
    j++;
    if (j % 100 == 0)
    printf("%c", seq[j]);
    }
    fprintf(fout, "\n\n");
    j = 0;
    i++;
    }

    fclose(fin);
    fclose(fin1);
    fclose(fout);
    return 0;
    }

    The printf statements for me to check my code.

    When I try to open 2 files, the first file is read in fine, but the
    second file is incomplete. Over 600 characters are not read, and the
    program hangs.

    I get the output (due to the checking printf statements)

    Homo_sapiens.fa
    16571
    Rattus_norvegicus.fa
    16300
    <program hangs>

    Notice that the statements
    if (j % 100 == 0)
    printf("%c", seq[j]);
    are not executed, but if I just print the character seq[0][100], it
    comes out correctly.

    If I try to open 3 files, the same program happens, i.e. the first
    file is read correctly, but the second file is incomplete and the
    third file is not read at all. I get the output

    Homo_sapiens.fa
    16571
    Rattus_norvegicus.fa
    16300
    Homo_sapiens.fa
    16571
    Segmentation fault

    I tried my program with 2 much smaller files (one has 13 characters
    and the other 14), and the program works. Are the 2 files too big and
    the program ran out of memory? How do I get around this problem, as I
    have to read files even bigger than these 2 later?

    Thank you.

    Regards,
    Rayne
     
    , Aug 25, 2007
    #1
    1. Advertising

  2. <> wrote in message
    news:...
    > Hi all,
    > I'm having programs reading from files.
    >
    > I have a text file "files.txt" that contains the names of the files to
    > be opened, i.e. the contents of files.txt are
    >
    > Homo_sapiens.fa
    > Rattus_norvegicus.fa
    >
    > (They are FA files that can be opened in any text editor.)
    >
    > Each of the FA files contains a number in the first line and a string
    > of characters (A,T,G or C). For example, the Homo_sapiens.fa file
    > would contain
    >
    > 16571
    > GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTT
    > CGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTC
    > GCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTATCGCACCTACGTTCAATATT
    > ACAGGCGAACATACCTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATA
    >
    > and so on, with 16571 A,T,G or Cs.
    >
    > Below is my code:
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > #define MAX_FILE 100 // maximum length of file name
    > #define MAX_SEQ 20000 // maximum length of sequence
    > #define N 2 // total number of sequences
    >
    > int main(void)
    > {
    > FILE *fin, *fin1, *fout;
    > char input[MAX_FILE+1], seq[N][MAX_SEQ+1], c;
    >

    Thjis line could cause problems, seq is too big to so safely on the stack.
    make it static.
    >
    > int size[N], i = 0, j = 0;
    >
    > fin = fopen("files.txt", "r");
    > fout = fopen("output.txt", "w");
    >

    Check here .
    if(!fin) /* haven't opened fin */
    if(|fout) /* haven't opened fout */
    >
    > while (fscanf(fin, "%s", input) != EOF)
    > {
    > fin1 = fopen(input, "r");
    >

    Check here
    if (!fin1); /* can't open fin 1 */
    >
    > printf("%s\n", input);
    >

    Is this diagnostic doing what you expect. I suspect you don't want fscanf(),
    you wnat fgets() to read a whole line, then chop of the trailing newline.
    >
    > fscanf(fin1, "%d ", &size);
    > printf("%d\n", size);
    > while ((c = fgetc(fin1)) != EOF)
    > {
    > fprintf(fout, "%c", c);
    > if (c != '\n')
    > seq[j] = c;
    > j++;
    > if (j % 100 == 0)
    > printf("%c", seq[j]);
    >

    Check here if(j >= MAX_SEQ -1) /* j too big, out of space */
    Put a null on the end for convenience, hence the minus 1.
    >
    > }
    > fprintf(fout, "\n\n");
    > j = 0;
    > i++;
    >

    What happens when i goes greater than 1 ? You will do an illegal meory
    access. You need to check if( i >= N) /* can't continue, out of space */
    >
    > }
    >
    > fclose(fin);
    > fclose(fin1);
    > fclose(fout);
    > return 0;
    > }
    >
    > The printf statements for me to check my code.
    >
    > When I try to open 2 files, the first file is read in fine, but the
    > second file is incomplete. Over 600 characters are not read, and the
    > program hangs.
    >
    > I get the output (due to the checking printf statements)
    >
    > Homo_sapiens.fa
    > 16571
    > Rattus_norvegicus.fa
    > 16300
    > <program hangs>
    >
    > Notice that the statements
    > if (j % 100 == 0)
    > printf("%c", seq[j]);
    > are not executed, but if I just print the character seq[0][100], it
    > comes out correctly.
    >
    > If I try to open 3 files, the same program happens, i.e. the first
    > file is read correctly, but the second file is incomplete and the
    > third file is not read at all. I get the output
    >
    > Homo_sapiens.fa
    > 16571
    > Rattus_norvegicus.fa
    > 16300
    > Homo_sapiens.fa
    > 16571
    > Segmentation fault
    >
    > I tried my program with 2 much smaller files (one has 13 characters
    > and the other 14), and the program works. Are the 2 files too big and
    > the program ran out of memory? How do I get around this problem, as I
    > have to read files even bigger than these 2 later?
    >
    > Thank you.
    >
    > Regards,
    > Rayne
    >
     
    Malcolm McLean, Aug 25, 2007
    #2
    1. Advertising

  3. Army1987 Guest

    On Sat, 25 Aug 2007 02:35:35 -0700, wrote:

    > Hi all,
    > I'm having programs reading from files.
    >
    > I have a text file "files.txt" that contains the names of the files to
    > be opened, i.e. the contents of files.txt are
    >
    > Homo_sapiens.fa
    > Rattus_norvegicus.fa
    >
    > (They are FA files that can be opened in any text editor.)
    > Each of the FA files contains a number in the first line and a string
    > of characters (A,T,G or C). For example, the Homo_sapiens.fa file
    > would contain
    >
    > 16571
    > GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTT
    > CGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTC
    > GCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTATCGCACCTACGTTCAATATT
    > ACAGGCGAACATACCTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATA
    >
    > and so on, with 16571 A,T,G or Cs.
    >
    > Below is my code:
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > #define MAX_FILE 100 // maximum length of file name

    stdio.h contains a macro FILENAME_MAX for that purpose.
    It already includes room for the terminating null.
    > #define MAX_SEQ 20000 // maximum length of sequence
    > #define N 2 // total number of sequences
    >
    > int main(void)
    > {
    > FILE *fin, *fin1, *fout;
    > char input[MAX_FILE+1], seq[N][MAX_SEQ+1], c;

    Try making them static, 40 KB of auto variables could be too much.
    > int size[N], i = 0, j = 0;
    >
    > fin = fopen("files.txt", "r");
    > fout = fopen("output.txt", "w");

    You should check whether those work, and cope with that otherwise.
    > while (fscanf(fin, "%s", input) != EOF)

    %s will stop on any whitespace character, not just newlines. Is
    that ok? (BTW, what happens if files.txt contains a name which is
    too long?)
    > {
    > fin1 = fopen(input, "r");
    > printf("%s\n", input);
    > fscanf(fin1, "%d ", &size);
    > printf("%d\n", size);
    > while ((c = fgetc(fin1)) != EOF)

    c is declared as a char. If it is unsigned it will never equal
    EOF. If it is signed, some valid character (though none of 'ACGT')
    could be mistaken as EOF. fgetc returns an int. See www.c-faq.com,
    section 12, question 1.
    > {
    > fprintf(fout, "%c", c);
    > if (c != '\n')
    > seq[j] = c;
    > j++;

    Note that j will be incremented even if c is '\n', in
    this case there will be a gap in the sequence. Add braces where
    needed.
    > if (j % 100 == 0)
    > printf("%c", seq[j]);

    You're already incremented j, so seq[j] will be uninitialized
    at this time. For example, if at the beginning of the loop body
    j were 99 and c were 'T' you would write c into seq[99],
    increment j to 100, and print seq[100].
    > }
    > fprintf(fout, "\n\n");
    > j = 0;
    > i++;
    > }
    >
    > fclose(fin);
    > fclose(fin1);
    > fclose(fout);

    Ideally you should check whether the fclose() worked without
    problems.
    > return 0;
    > }


    --
    Army1987 (Replace "NOSPAM" with "email")
    No-one ever won a game by resigning. -- S. Tartakower
     
    Army1987, Aug 25, 2007
    #3
  4. CBFalconer Guest

    "" wrote:
    >
    > I'm having programs reading from files.
    >
    > I have a text file "files.txt" that contains the names of the files to
    > be opened, i.e. the contents of files.txt are
    >
    > Homo_sapiens.fa
    > Rattus_norvegicus.fa
    >
    > (They are FA files that can be opened in any text editor.)
    >
    > Each of the FA files contains a number in the first line and a string
    > of characters (A,T,G or C). For example, the Homo_sapiens.fa file
    > would contain
    >
    > 16571
    > GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTT
    > CGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTC
    > GCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTATCGCACCTACGTTCAATATT
    > ACAGGCGAACATACCTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATA
    >
    > and so on, with 16571 A,T,G or Cs.
    >
    > Below is my code:
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > #define MAX_FILE 100 // maximum length of file name
    > #define MAX_SEQ 20000 // maximum length of sequence
    > #define N 2 // total number of sequences
    >
    > int main(void)
    > {
    > FILE *fin, *fin1, *fout;
    > char input[MAX_FILE+1], seq[N][MAX_SEQ+1], c;
    > int size[N], i = 0, j = 0;
    >
    > fin = fopen("files.txt", "r");
    > fout = fopen("output.txt", "w");


    You fail to check for success of the fopen calls.

    > while (fscanf(fin, "%s", input) != EOF) {
    > fin1 = fopen(input, "r");
    > printf("%s\n", input);
    > fscanf(fin1, "%d ", &size);


    You fail to check for success of the fscanf call.

    > printf("%d\n", size);
    > while ((c = fgetc(fin1)) != EOF) {


    c can never be EOF, because you have erroneously declared it a
    char. It should be an int.

    > fprintf(fout, "%c", c);
    > if (c != '\n')
    > seq[j] = c;
    > j++;
    > if (j % 100 == 0)
    > printf("%c", seq[j]);
    > }
    > fprintf(fout, "\n\n");
    > j = 0;
    > i++;


    You fail to close fin1 before attempting to attach it to another
    file.

    > }
    >
    > fclose(fin);
    > fclose(fin1);
    > fclose(fout);
    > return 0;
    > }
    >
    > The printf statements for me to check my code.
    >
    > When I try to open 2 files, the first file is read in fine, but the
    > second file is incomplete. Over 600 characters are not read, and the
    > program hangs.


    The amount of loss (after causing undefined behaviour) leads me to
    suspect that your system has INT_MAX set at 32767. If so, you will
    need to use long to ensure 32 bit ability.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>


    --
    Posted via a free Usenet account from http://www.teranews.com
     
    CBFalconer, Aug 25, 2007
    #4
  5. "" <> writes:

    > Hi all,
    > I'm having programs reading from files.

    <snip>
    > Below is my code:


    The hang is almost certainly because 'c' should be an int. fgetc
    returns int so it can signal EOF. See the FAQ (http://c-faq.com/).

    I will not a couple of other things but I think most have now been
    covered.

    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > #define MAX_FILE 100 // maximum length of file name
    > #define MAX_SEQ 20000 // maximum length of sequence
    > #define N 2 // total number of sequences
    >
    > int main(void)
    > {
    > FILE *fin, *fin1, *fout;
    > char input[MAX_FILE+1], seq[N][MAX_SEQ+1], c;


    int c; and use FILENAME_MAX.

    > int size[N], i = 0, j = 0;
    >
    > fin = fopen("files.txt", "r");
    > fout = fopen("output.txt", "w");


    Check these!

    > while (fscanf(fin, "%s", input) != EOF)


    Danger! Danger! There are pre-processor tricks you can use to get the
    correct size into a scanf %s format, but it is probably better to use fgets.

    > {
    > fin1 = fopen(input, "r");
    > printf("%s\n", input);
    > fscanf(fin1, "%d ", &size);
    > printf("%d\n", size);
    > while ((c = fgetc(fin1)) != EOF)
    > {
    > fprintf(fout, "%c", c);
    > if (c != '\n')
    > seq[j] = c;


    It is always best (unless you know it is safe) to check that you
    indexes are in range.

    > j++;
    > if (j % 100 == 0)
    > printf("%c", seq[j]);
    > }
    > fprintf(fout, "\n\n");
    > j = 0;
    > i++;
    > }
    >
    > fclose(fin);
    > fclose(fin1);
    > fclose(fout);
    > return 0;
    > }
    >
    > The printf statements for me to check my code.
    >
    > When I try to open 2 files, the first file is read in fine, but the
    > second file is incomplete. Over 600 characters are not read, and the
    > program hangs.


    see above!

    --
    Ben.
     
    Ben Bacarisse, Aug 25, 2007
    #5
  6. CBFalconer <> writes:
    > "" wrote:

    [...]
    >> while ((c = fgetc(fin1)) != EOF) {

    >
    > c can never be EOF, because you have erroneously declared it a
    > char. It should be an int.

    [...]

    c can compare equal to EOF if plain char happens to be signed. In
    that case, the code will *probably* work "correctly"; fgetc() will
    eventually return EOF, and the test will work as intended.

    It can fail badly if plain char is unsigned, and it can terminate
    early if plain char is signed, and the file happens to contain a
    character whose value matches EOF (typically EOF is -1 and char is 8
    bits, so a character '\xff' in the input file would trigger this).

    But rather than spending any time considering how the code can fail,
    the OP should fix the bug by declarsing c as int. If the program
    continues to misbehave in the same way, he'll have narrowed down the
    problem to the rest of the program; if not, he'll have fixed one bug.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 25, 2007
    #6
  7. Guest

    Thank you all, I've revised the code and it works now.
     
    , Aug 26, 2007
    #7
  8. [snips]

    On Sat, 25 Aug 2007 10:50:40 +0100, Malcolm McLean wrote:

    >> #define MAX_FILE 100 // maximum length of file name
    >> #define MAX_SEQ 20000 // maximum length of sequence
    >> #define N 2 // total number of sequences
    >>
    >> int main(void)
    >> {
    >> FILE *fin, *fin1, *fout;
    >> char input[MAX_FILE+1], seq[N][MAX_SEQ+1], c;
    >>

    > Thjis line could cause problems, seq is too big to so safely on the
    > stack.


    What stack? Could you kindly show the part of the C standard which
    defines "stack" or requires auto variables to be created on the stack?
     
    Kelsey Bjarnason, Aug 27, 2007
    #8
  9. "Kelsey Bjarnason" <> wrote in message
    news:...
    >> Thjis line could cause problems, seq is too big to so safely on the
    >> stack.

    >
    > What stack? Could you kindly show the part of the C standard which
    > defines "stack" or requires auto variables to be created on the stack?
    >

    Oh deary me. There's useful pedantry, and then there's the sort that just
    tries to be clever.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Aug 27, 2007
    #9
  10. On Mon, 27 Aug 2007 19:42:34 +0100, Malcolm McLean wrote:

    > "Kelsey Bjarnason" <> wrote in message
    > news:...
    >>> Thjis line could cause problems, seq is too big to so safely on the
    >>> stack.

    >>
    >> What stack? Could you kindly show the part of the C standard which
    >> defines "stack" or requires auto variables to be created on the stack?
    >>

    > Oh deary me. There's useful pedantry, and then there's the sort that just
    > tries to be clever.


    Indeed. Useful pedantry says that since you're using C, and C has no
    concept of a stack, that to discuss "the stack" is meaningless at best in
    the context.

    So, since you seem to think there's something wrong with this, I ask
    again, could you kindly show the part of the C standard which defines
    "stack" or requires auto variables to be created on the stack?

    Or perhaps you weren't aware there are actually machines which don't use
    stacks? There are - which is probably why C doesn't require stacks.
     
    Kelsey Bjarnason, Aug 29, 2007
    #10
  11. Old Wolf Guest

    On Aug 30, 5:07 am, Kelsey Bjarnason <> wrote:
    > Or perhaps you weren't aware there are actually machines which don't use
    > stacks? There are - which is probably why C doesn't require stacks.


    I don't see how you can implement calling functions
    and returning, without a stack of some sort. Any
    structure that achieves the effect of pushing values
    in and popping them off again could reasonably be
    called a stack.
     
    Old Wolf, Aug 30, 2007
    #11
  12. Old Wolf <> writes:
    > On Aug 30, 5:07 am, Kelsey Bjarnason <> wrote:
    >> Or perhaps you weren't aware there are actually machines which don't use
    >> stacks? There are - which is probably why C doesn't require stacks.

    >
    > I don't see how you can implement calling functions
    > and returning, without a stack of some sort. Any
    > structure that achieves the effect of pushing values
    > in and popping them off again could reasonably be
    > called a stack.


    Yes, you need a stack *of some sort* (though a program needn't
    actually use a stack if the compiler can prove that none of its
    functions are called recursively).

    But the term "stack" is commonly used in two distinct senses, as I
    discussed at some length elsethread.

    I understand that there are real world implementations in which the
    memory required for a function's local automatically allocated objects
    (plus bookkeeping information) is allocated as if by calling malloc()
    when the function is called; the ordering in memory of what you might
    call "stack frames" is unspecified. (I initially wrote that they're
    allocated from the "heap", but that's equally ambiguous.) This is
    certainly a "stack" in the sense of a fundamental data structure; it's
    not a "stack" in the commonly used sense of a CPU-supported region of
    memory addressed by a dedicated "stack pointer" register.

    Of course no portable C program can tell the difference.

    And this whole discussion would have been unnecessary if we'd been
    clear about the distinction between the two meanings of "stack", or if
    we'd avoided using the term in the first place.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 30, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dimitri Papoutsis

    Problems with reading binary data files

    Dimitri Papoutsis, Mar 10, 2005, in forum: C++
    Replies:
    4
    Views:
    398
    Old Wolf
    Mar 11, 2005
  2. Replies:
    0
    Views:
    798
  3. Replies:
    4
    Views:
    409
    Paul Duffy
    Mar 4, 2007
  4. Meinert Schwartau

    Problems with reading xml-files

    Meinert Schwartau, Dec 9, 2005, in forum: C++
    Replies:
    3
    Views:
    341
    mlimber
    Dec 9, 2005
  5. Gaijinco
    Replies:
    4
    Views:
    403
    Gaijinco
    Jun 11, 2006
Loading...

Share This Page