Finding duplicated photo

T

TheSaint

Hello,

I came across the problem that Gwenview moves the photo from the camera
memory by renaming them, but later I forgot which where moved.
Then I tought about a small script in python, but I stumbled upon my
ignorance on the way to do that.

PIL can find similar pictures. I was thinking to reduce the foto into gray
scale and resize them to same size, what algorithm should take place?
Is PIL able to compare 2 images?
 
B

Billy Mays

Hello,

I came across the problem that Gwenview moves the photo from the camera
memory by renaming them, but later I forgot which where moved.
Then I tought about a small script in python, but I stumbled upon my
ignorance on the way to do that.

PIL can find similar pictures. I was thinking to reduce the foto into gray
scale and resize them to same size, what algorithm should take place?
Is PIL able to compare 2 images?

I recently wrote a program after reading an article (
http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
) using the DCT method he proposes. It worked surprisingly well even
with just the 64bit hash it produces.
 
T

TheSaint

Billy said:
It worked surprisingly well even
with just the 64bit hash it produces.
I'd say that comparing 2 images reduced upto 32x32 bit seems too little to
find if one of the 2 portrait has a smile referred to the other.
I think it's about that mine and your suggestion are similar, but I'd like
to scale pictures not less than 256x256 pixel.
Also to take a wider case which the comparison involve a rotated image.
 
B

Billy Mays

I'd say that comparing 2 images reduced upto 32x32 bit seems too little to
find if one of the 2 portrait has a smile referred to the other.
I think it's about that mine and your suggestion are similar, but I'd like
to scale pictures not less than 256x256 pixel.
Also to take a wider case which the comparison involve a rotated image.

Originally I thought the same thing. It turns out that doing a DCT on
an image typically moves the more important data to the top left corner
of the output. This means that most of the other data in the output an
be thrown away since most of it doesn't significantly affect the image.
The 32x32 is an arbitrary size, you can make it any square block that
you want.

Rotation is harder to find. You can always take a brute force approach
by simply rotating the image a couple of times and try running the
algorithm on each of the rotated pics. Image matching is a difficult
problem.
 
T

Thomas Jollans

Hello,

I came across the problem that Gwenview moves the photo from the camera
memory by renaming them, but later I forgot which where moved.
Then I tought about a small script in python, but I stumbled upon my
ignorance on the way to do that.

PIL can find similar pictures. I was thinking to reduce the foto into gray
scale and resize them to same size, what algorithm should take place?
Is PIL able to compare 2 images?

If Gwenview simply moves/renames the images, is it not enough to compare
the actual files, byte by byte?
 
D

Dave Angel

Hello,

I came across the problem that Gwenview moves the photo from the camera
memory by renaming them, but later I forgot which where moved.
Then I tought about a small script in python, but I stumbled upon my
ignorance on the way to do that.

PIL can find similar pictures. I was thinking to reduce the foto into gray
scale and resize them to same size, what algorithm should take place?
Is PIL able to compare 2 images?
If your real problem is identifying a renamed file amongst thousands of
others, why not just compare the metadata? it'll be much faster.

For example, if you only have one camera, the timestamp stored in the
EXIF data would be pretty close, Some cameras also store their "shutter
release number" in the metadata, which would be even better.

One concern is whether Glenview or any other of your utilities discard
the metadata. That would be a big mistake.

Also, if Gwenview has no other features you're counting on, perhaps you
should write your own "move the files from camera to computer" utility.
that's what I did, and it renames and reorganises the files as it does,
according to my conventions, not someone else's. One reason for the
renaming is that my cameras only use 4 digit numbers, and these recycle
every 10000 images.

DaveA
 
F

Fulvio

Thomas said:
If Gwenview simply moves/renames the images, is it not enough to compare
the actual files, byte by byte?

For the work at the spot I found Geeqie, doing right. In the other hand
learning some PIL function is one of my interest.
 
F

Fulvio

Kevin said:
If anyone's interested, pleas checkout the source code in the attachment
and welcome any advise.

I found that isn't python 3 code :(

Then the code should go into some other program to allow actions on those
pictures which are matching each other. Am I right?
 
F

Fulvio

Dave said:
If your real problem is identifying a renamed file amongst thousands of
others, why not just compare the metadata? it'll be much faster.
This was the primer situation, then to get into the dirt I tought something
more sophisticated.
There was a program some year's back which was brilliant an fast to find
similar pictures on several thousand of them.
Now I can't recall what was the program name and very interesting to do some
of mine experiments.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top