read same lines of two different files

G

garhone

Hi,
I have 2 large files and I need to compare each line in one file, say
line x, with line x in the second file. So compare line 1 in one file
with line 1 in another file, line 2 with line 2, etc.
The program will stop when it encounters the first difference.

After searching the web, I've found recommendations of reading at
least one of these files into memory, into an array and looping
through the array and reading the second file.

Is there another way of doing this without reading into memory? As
each file is extremely large?
Is it possible to go directly to a specific line number in a file, and
read just that line?

Thanks in advance,
C
 
J

Jürgen Exner

garhone said:
I have 2 large files and I need to compare each line in one file, say
line x, with line x in the second file. So compare line 1 in one file
with line 1 in another file, line 2 with line 2, etc.
The program will stop when it encounters the first difference.

Sketch of the key logic (untested):

while (<F1>) {
next if $_ eq <F2>;
print "Difference found in line $.\n"
}

You will have to add some additional logic if the two files to compare
can have different numbers of lines (you didn't say).
In that case add e.g. a test for EOF(F2) before the 'next' statement to
catch a F2 that is missing the end and after the while to catch
additional lines in F2.

Another approach:

while (1) {
exit 0 if EOF(F1) and EOF (F2);
print "Files have different length\n"
if EOF(F1) xor EOF(F2);
next if <F1> eq <F2>;
print "Files are different in line $.\n";
exit 1;
}
After searching the web, I've found recommendations of reading at
least one of these files into memory, into an array and looping
through the array and reading the second file.

That is useful when you want to know if all lines from one file are
contained in the other files without knowing the sequence of the lines.
If you don't cache the lines of one file in RAM you would have to
re-read the whole file over and over again while looping over the other
file which obviously is a rather suboptimal design.
Is there another way of doing this without reading into memory? As
each file is extremely large?

See above for two suggestions.
Is it possible to go directly to a specific line number in a file, and
read just that line?

No unless you are talking about fixed-format files, e.g. like old
punchcards where each line was exactly 80 characters long. In That case
a specific line number in that file would equal a specific positiion and
you could use seek() to jump to that position.

jue
 
C

ccc31807

Hi,
I have 2 large files and I need to compare each line in one file, say
line x, with line x in the second file.

Is there any reason you don't want to use diff? If you are on Windows
there is WinDiff.

CC
 
J

Jürgen Exner

darkon said:
Why use xor? It doesn't seem necessary to me, but I could be missing
something.

Technically speaking you are right because the 'and' case is covered in
the preceeding line already. However I like to be explicit in each
individual conditions and the two files have different size if and only
if exactly one is EOF and the other is not. And that's why I used xor.

Called it part of robust programming to not rely to much on the
execution sequence.

jue
 
J

Justin C

Hi,
I have 2 large files and I need to compare each line in one file, say
line x, with line x in the second file. So compare line 1 in one file
with line 1 in another file, line 2 with line 2, etc.
The program will stop when it encounters the first difference.

After searching the web, I've found recommendations of reading at
least one of these files into memory, into an array and looping
through the array and reading the second file.

Is there another way of doing this without reading into memory? As
each file is extremely large?

Won't the command-line tool 'diff' do what you want?

Justin.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top