read same lines of two different files

Discussion in 'Perl Misc' started by garhone, Nov 30, 2009.

  1. garhone

    garhone Guest

    Hi,
    I have 2 large files and I need to compare each line in one file, say
    line x, with line x in the second file. So compare line 1 in one file
    with line 1 in another file, line 2 with line 2, etc.
    The program will stop when it encounters the first difference.

    After searching the web, I've found recommendations of reading at
    least one of these files into memory, into an array and looping
    through the array and reading the second file.

    Is there another way of doing this without reading into memory? As
    each file is extremely large?
    Is it possible to go directly to a specific line number in a file, and
    read just that line?

    Thanks in advance,
    C
    garhone, Nov 30, 2009
    #1
    1. Advertising

  2. garhone <> wrote:
    >I have 2 large files and I need to compare each line in one file, say
    >line x, with line x in the second file. So compare line 1 in one file
    >with line 1 in another file, line 2 with line 2, etc.
    >The program will stop when it encounters the first difference.


    Sketch of the key logic (untested):

    while (<F1>) {
    next if $_ eq <F2>;
    print "Difference found in line $.\n"
    }

    You will have to add some additional logic if the two files to compare
    can have different numbers of lines (you didn't say).
    In that case add e.g. a test for EOF(F2) before the 'next' statement to
    catch a F2 that is missing the end and after the while to catch
    additional lines in F2.

    Another approach:

    while (1) {
    exit 0 if EOF(F1) and EOF (F2);
    print "Files have different length\n"
    if EOF(F1) xor EOF(F2);
    next if <F1> eq <F2>;
    print "Files are different in line $.\n";
    exit 1;
    }

    >After searching the web, I've found recommendations of reading at
    >least one of these files into memory, into an array and looping
    >through the array and reading the second file.


    That is useful when you want to know if all lines from one file are
    contained in the other files without knowing the sequence of the lines.
    If you don't cache the lines of one file in RAM you would have to
    re-read the whole file over and over again while looping over the other
    file which obviously is a rather suboptimal design.

    >Is there another way of doing this without reading into memory? As
    >each file is extremely large?


    See above for two suggestions.

    >Is it possible to go directly to a specific line number in a file, and
    >read just that line?


    No unless you are talking about fixed-format files, e.g. like old
    punchcards where each line was exactly 80 characters long. In That case
    a specific line number in that file would equal a specific positiion and
    you could use seek() to jump to that position.

    jue
    Jürgen Exner, Nov 30, 2009
    #2
    1. Advertising

  3. garhone

    ccc31807 Guest

    On Nov 30, 1:13 pm, garhone <> wrote:
    > Hi,
    > I have 2 large files and I need to compare each line in one file, say
    > line x, with line x in the second file.


    Is there any reason you don't want to use diff? If you are on Windows
    there is WinDiff.

    CC
    ccc31807, Nov 30, 2009
    #3
  4. "darkon" <> wrote:
    >"Jürgen Exner" <> wrote:
    >> Another approach:
    >>
    >> while (1) {
    >> exit 0 if EOF(F1) and EOF (F2);
    >> print "Files have different length\n"
    >> if EOF(F1) xor EOF(F2);
    >> next if <F1> eq <F2>;
    >> print "Files are different in line $.\n";
    >> exit 1;
    >> }

    >
    >Why use xor? It doesn't seem necessary to me, but I could be missing
    >something.


    Technically speaking you are right because the 'and' case is covered in
    the preceeding line already. However I like to be explicit in each
    individual conditions and the two files have different size if and only
    if exactly one is EOF and the other is not. And that's why I used xor.

    Called it part of robust programming to not rely to much on the
    execution sequence.

    jue
    Jürgen Exner, Dec 1, 2009
    #4
  5. garhone

    Justin C Guest

    On 2009-11-30, garhone <> wrote:
    > Hi,
    > I have 2 large files and I need to compare each line in one file, say
    > line x, with line x in the second file. So compare line 1 in one file
    > with line 1 in another file, line 2 with line 2, etc.
    > The program will stop when it encounters the first difference.
    >
    > After searching the web, I've found recommendations of reading at
    > least one of these files into memory, into an array and looping
    > through the array and reading the second file.
    >
    > Is there another way of doing this without reading into memory? As
    > each file is extremely large?


    Won't the command-line tool 'diff' do what you want?

    Justin.


    --
    Justin Catterall www.masonsmusic.co.uk
    Director T: +44 (0)1424 427562
    Masons Music Ltd F: +44 (0)1424 434362
    For full company details see our web site
    Justin C, Dec 1, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Christopher Brewster
    Replies:
    5
    Views:
    323
    John Machin
    Nov 14, 2008
  2. bluebaron
    Replies:
    3
    Views:
    716
    Jonathan N. Little
    Nov 4, 2009
  3. Guest
    Replies:
    2
    Views:
    169
    Foo Man Chew
    Dec 29, 2003
  4. PerlFAQ Server
    Replies:
    0
    Views:
    153
    PerlFAQ Server
    Jan 14, 2011
  5. PerlFAQ Server
    Replies:
    0
    Views:
    140
    PerlFAQ Server
    Apr 19, 2011
Loading...

Share This Page