How should I compare two txt files separately coming from windows/dosand linux/unix

H

higer

I just want to compare two files,one from windows and the other from
unix. But I do not want to compare them through reading them line by
line. Then I found there is a filecmp module which is used as file and
directory comparisons. However,when I use two same files (one from
unix,one from windows,the content of them is the same) to test its cmp
function, filecmp.cmp told me false.

Later, I found that windows use '\n\r' as new line flag but unix use
'\n', so filecmp.cmp think that they are different,then return false.
So, can anyone tell me that is there any method like IgnoreNewline
which can ignore the difference of new line flag in diffrent
platforms? If not,I think filecmp may be not a good file comparison
module.


Thanks,
higer
 
C

Chris Rebert

I just want to compare two files,one from windows and the other from
unix. But I do not want to compare them through reading them line by
line. Then I found there is a filecmp module which is used as file and
directory comparisons. However,when I use two same files (one from
unix,one from windows,the content of them is the same) to test its cmp
function, filecmp.cmp told me false.

Later, I found that windows use '\n\r' as new line flag but unix use
'\n', so filecmp.cmp think that they are different,then return false.
So, can anyone tell me that is there any method like IgnoreNewline
which can ignore the difference of new line flag in diffrent
platforms? If not,I think filecmp may be not a good file comparison

Nope, there's no such flag. You could run the files through either
`dos2unix` or `unix2dos` beforehand though, which would solve the
problem.
Or you could write the trivial line comparison code yourself and just
make sure to open the files in Universal Newline mode (add 'U' to the
`mode` argument to `open()`).
You could also file a bug (a patch to add newline insensitivity would
probably be welcome).

Cheers,
Chris
 
J

John Machin

Chris Rebert said:
Nope, there's no such flag. You could run the files through either
`dos2unix` or `unix2dos` beforehand though, which would solve the
problem.
Or you could write the trivial line comparison code yourself and just
make sure to open the files in Universal Newline mode (add 'U' to the
`mode` argument to `open()`).
You could also file a bug (a patch to add newline insensitivity would
probably be welcome).

Or popen diff ...

A /very/ /small/ part of the diff --help output:

-E --ignore-tab-expansion Ignore changes due to tab expansion.
-b --ignore-space-change Ignore changes in the amount of white space.
-w --ignore-all-space Ignore all white space.
-B --ignore-blank-lines Ignore changes whose lines are all blank.
-I RE --ignore-matching-lines=RE Ignore changes whose lines all match RE.
--strip-trailing-cr Strip trailing carriage return on input.

Cheers,
John
 
H

higer

Chris Rebert <clp2 <at> rebertia.com> writes:







Or popen diff ...

A /very/ /small/ part of the diff --help output:

  -E  --ignore-tab-expansion  Ignore changes due to tab expansion.
  -b  --ignore-space-change  Ignore changes in the amount of white space.
  -w  --ignore-all-space  Ignore all white space.
  -B  --ignore-blank-lines  Ignore changes whose lines are all blank.
  -I RE  --ignore-matching-lines=RE  Ignore changes whose lines all match RE.
  --strip-trailing-cr  Strip trailing carriage return on input.

Cheers,
John

Tool can certainly be used to compare two files,but I just want to
compare them using Python code.
 
H

higer

Nope, there's no such flag. You could run the files through either
`dos2unix` or `unix2dos` beforehand though, which would solve the
problem.
Or you could write the trivial line comparison code yourself and just
make sure to open the files in Universal Newline mode (add 'U' to the
`mode` argument to `open()`).
You could also file a bug (a patch to add newline insensitivity would
probably be welcome).

Cheers,
Chris
--http://blog.rebertia.com

Thank you very much. Adding 'U' argument can perfectly work, and I
think it is definitely to report this as a bug to Python.org as you
say.

Cheers,
higer
 
P

Piet van Oostrum

h> Thank you very much. Adding 'U' argument can perfectly work, and I
h> think it is definitely to report this as a bug to Python.org as you
h> say.

Filecmp does a binary compare, not a text compare. So it starts by
comparing the sizes of the files and if they are different the files
must be different. If equal it compares the bytes by reading large
blocks. Comparing text files would be quite different especially when
ignoring line separators. Maybe comparing text files should be added as
a new feature.
 
E

Emile van Sebille

On 6/11/2009 12:09 AM higer said...
Tool can certainly be used to compare two files,but I just want to
compare them using Python code.

difflib?

Emile
 
P

Piet van Oostrum

Emile van Sebille said:
EvS> On 6/11/2009 12:09 AM higer said...
EvS> difflib?

If I understand correctly the OP just wanted to know whether two files
were equal, not what the differences would be. In that case difflib is
overkill.

On the other hand `equal' has many meanings. Ignoring line endings is
one option, ignoring trailing whitespace could be another one. Yet
another one could be normalizing the character encoding, etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top