File Compare with difflib.context_diff

J

JohnV

I have a txt file that gets appended with data over a time event. The
data comes from an RFID reader and is dumped to the file by the RFID
software. I want to poll that file several times over the time period
of the event to capture the current data in the RFID reader.

When I read the data I want to be able to compare the current data to
the date from the last time I read it and only process the data
appended since the last time it was read.

The first time I read the data, it might look like this:
AU08JEDD011485H14472402210
AU08JEDD020163C14472502210
AU08JEDD005029C14480102210
AU08JEDD004923H14482002210
AU08AWOL000799H14483902210

The next time it might look like this (with data appended to it)
AU08JEDD011485H14472402210
AU08JEDD020163C14472502210
AU08JEDD005029C14480102210
AU08JEDD004923H14482002210
AU08AWOL000799H14483902210
AU08AWOL000120H14495902210
AU08ARPU050241H14511702210
IF08DRTO008074H14520202210
IF08DRTO008089H14521102210
IF08DRTO008077H14553602210
IF08CHES000023H14594902210

What I want to do is compare the old data (lets day it is saved to a
file called 'lastdata.txt') with the new data (lets day it is saved to
a file called 'currentdata.txt') and save the new appended data to a
variable which I HTTP POST to a website where I process the data for
display to interested parties. In the example below I am trying to
save the new appended data to a file called "out.txt"

I have looked at difflib.context_diff but I cannot get the syntax
correct. This is what I have taken from an example from this page
http://docs.python.org/library/difflib.html. One thing I do not
understand is what do I do with: fromfile='before.py',
tofile='after.py' in the example code.

**********

import sys
import difflib

sys.stdout = open("out.txt","w")


f1 = open(r'C:\Users\Owner\Desktop\lastdata.txt', 'r')
read_data1 = f1.read()
f1.close()

f2 = open(r'C:\Users\Owner\Desktop\currentdata.txt', 'r')
read_data2 = f2.read()
f2.close()

for line in context_diff(read_data1, read_data2, fromfile='before.py',
tofile='after.py'):
sys.stdout.write(line)


***************

for line in context_diff(read_data1, read_data2, fromfile='before.py',
tofile='after.py'): is the line that causes the syntax error.

I would hope that when the script worked that "out.txt" would have the
appended data. I would then copy currentdata.txt to lastdata.txt. No
need to clear out the data in currentdata.txt as the next dump will
overwrite that data.

Any help or insights appreciated, thanks...
 
C

Chris Rebert

I have a txt file that gets appended with data over a time event.  The
data comes from an RFID reader and is dumped to the file by the RFID
software.  I want to poll that file several times over the time period
of the event to capture the current data in the RFID reader.

When I read the data I want to be able to compare the current data to
the date from the last time I read it and only process the data
appended since the last time it was read.

The first time I read the data, it might look like this:
AU08JEDD011485H14472402210
AU08JEDD020163C14472502210
AU08JEDD005029C14480102210
AU08JEDD004923H14482002210
AU08AWOL000799H14483902210

The next time it might look like this (with data appended to it)
AU08JEDD011485H14472402210
AU08JEDD020163C14472502210
AU08JEDD005029C14480102210
AU08JEDD004923H14482002210
AU08AWOL000799H14483902210
AU08AWOL000120H14495902210
AU08ARPU050241H14511702210
IF08DRTO008074H14520202210
IF08DRTO008089H14521102210
IF08DRTO008077H14553602210
IF08CHES000023H14594902210

What I want to do is compare the old data (lets day it is saved to a
file called 'lastdata.txt') with the new data (lets day it is saved to
a file called 'currentdata.txt') and save the new appended data to a
variable which I HTTP POST to a website where I process the data for
display to interested parties.  In the example below I am trying to
save the new appended data to a file called "out.txt"

I have looked at difflib.context_diff but I cannot get the syntax
correct.  This is what I have taken from an example from this page
http://docs.python.org/library/difflib.html.  One thing I do not
understand is what do I do with: fromfile='before.py',
tofile='after.py' in the example code.

**********

import sys
import difflib

sys.stdout = open("out.txt","w")


f1 = open(r'C:\Users\Owner\Desktop\lastdata.txt', 'r')
read_data1 = f1.read()
f1.close()

f2 = open(r'C:\Users\Owner\Desktop\currentdata.txt', 'r')
read_data2 = f2.read()
f2.close()

for line in context_diff(read_data1, read_data2, fromfile='before.py',
tofile='after.py'):
sys.stdout.write(line)


***************

for line in context_diff(read_data1, read_data2, fromfile='before.py',
tofile='after.py'): is the line that causes the syntax error.

I would hope that when the script worked that "out.txt" would have the
appended data.  I would then copy currentdata.txt to lastdata.txt.  No
need to clear out the data in currentdata.txt as the next dump will
overwrite that data.

Any help or insights appreciated, thanks...

Completely untested:

from difflib import context_diff

OLD_PATH = r'C:\Users\Owner\Desktop\lastdata.txt'
NEW_PATH = r'C:\Users\Owner\Desktop\currentdata.txt'

out = open("out.txt", 'w')

old = open(OLD_PATH, 'r')
old_lines = list(old)
old.close()

new = open(NEW_PATH, 'r')
new_lines = list(new)
new.close()

for line in context_diff(old_lines, new_lines, fromfile=OLD_PATH,
tofile=NEW_PATH):
out.write(line)


Cheers,
Chris
 
J

JohnV

Maybe something like this will work though I am not sure of my quotes
and what to import

import shutil

f = open(r'C:\Users\Owner\Desktop\mydata.txt', 'r')
read_data1 = f.read()
f.close()

shutil.copy('C:\Users\Owner\Desktop\newdata.txt', 'C:\Users\Owner
\Desktop\out.txt')
file = open(r'C:\Users\Owner\Desktop\out.txt', 'w')
file.write(text.replace(read_data1, ""))
file.close()
 
E

Emile van Sebille

JohnV said:
> What I want to do is compare the old data (lets day it is saved to a
file called 'lastdata.txt') with the new data (lets day it is saved to
a file called 'currentdata.txt') and save the new appended data to a
variable

You may get away with something like: (untested)

newdata=open('currentdata.txt').read()[len(open('lastdata.txt').read()):]

HTH,

Emile
 
G

Gabriel Genellina

JohnV said:
What I want to do is compare the old data (lets day it is saved to a
file called 'lastdata.txt') with the new data (lets day it is saved to
a file called 'currentdata.txt') and save the new appended data to a
variable

You may get away with something like: (untested)

newdata=open('currentdata.txt').read()[len(open('lastdata.txt').read()):]

The same idea, but without reading unneeded bits:

oldsize = os.stat('lastdata.txt').st_size
with open('currentdata.txt','rb') as curf:
f.seek(oldsize)
newdata = f.read()

This assumes the 'currentdata.txt' file *never* shrinks nor is overwritten
- not very realistic, so you'll need some additional checks.
 
J

JohnV

The below code does the trick with one small problem left to be solved

import shutil
import string

currentdata_file = r"C:\Users\Owner\Desktop\newdata.txt" # the current
download from the clock
lastdata_file = r"C:\Users\Owner\Desktop\mydata.txt" # the prior
download from the clock
output_file = r"C:\Users\Owner\Desktop\out.txt" # will hold delta
clock data

shutil.copy(currentdata_file, output_file)

f = open(lastdata_file, 'r')
read_data1 = f.read()
f.close()

f = open(currentdata_file, 'r')
read_data2 = f.read()
f.close()

replaceText = ''

file = open(output_file, 'w')
file.write(string.replace(read_data2, read_data1, replaceText))
file.close()

Contents of lastdata_file:
AU08JEDD011485H14472402210
AU08JEDD020163C14472502210
AU08JEDD005029C14480102210
AU08JEDD004923H14482002210
AU08AWOL000799H14483902210

Contents of currentdata_file:
AU08JEDD011485H14472402210
AU08JEDD020163C14472502210
AU08JEDD005029C14480102210
AU08JEDD004923H14482002210
AU08AWOL000799H14483902210
AU08AWOL000120H14495902210
AU08ARPU050241H14511702210
IF08DRTO008074H14520202210
IF08DRTO008089H14521102210
IF08DRTO008077H14553602210
IF08CHES000023H14594902210

Contents of output_file:

AU08AWOL000120H14495902210
AU08ARPU050241H14511702210
IF08DRTO008074H14520202210
IF08DRTO008089H14521102210
IF08DRTO008077H14553602210
IF08CHES000023H14594902210

output_file has a blank line at the top because of replaceText = ''
Is there something besides '' that I can use to not end up with a
blank line at the top of output_file, or what is the code to delete
the first line of that file?

Thanks
 
J

JohnV

Here is the latest version of the code:

currentdata_file = r"C:\Users\Owner\Desktop\newdata.txt" # the latest
download from the clock
lastdata_file = r"C:\Users\Owner\Desktop\mydata.txt" # the prior
download from the clock
output_file = r"C:\Users\Owner\Desktop\out.txt" # will hold delta
clock data

newdata = open(currentdata_file).read()[len(open(lastdata_file).read
()):]
newdata2 = newdata.strip()


file = open(output_file, 'w')
file.write(newdata2)
file.close()


Do I need to close currentdata_file and lastdata_file ?

Have not gotten the os.stat example to work yet...
thanks for the help.
 
J

JanC

JohnV said:
I have a txt file that gets appended with data over a time event. The
data comes from an RFID reader and is dumped to the file by the RFID
software. I want to poll that file several times over the time period
of the event to capture the current data in the RFID reader.

When I read the data I want to be able to compare the current data to
the date from the last time I read it and only process the data
appended since the last time it was read.

If you are on a posix system, you could maybe use the output of the
'since' commandline tool, or at least steal some ideas from it.

<http://welz.org.za/projects/since>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top