Finding duplicated photo

Discussion in 'Python' started by TheSaint, Jul 8, 2011.

  1. TheSaint

    TheSaint Guest

    Hello,

    I came across the problem that Gwenview moves the photo from the camera
    memory by renaming them, but later I forgot which where moved.
    Then I tought about a small script in python, but I stumbled upon my
    ignorance on the way to do that.

    PIL can find similar pictures. I was thinking to reduce the foto into gray
    scale and resize them to same size, what algorithm should take place?
    Is PIL able to compare 2 images?

    --
    goto /dev/null
     
    TheSaint, Jul 8, 2011
    #1
    1. Advertising

  2. TheSaint

    Billy Mays Guest

    On 07/08/2011 07:29 AM, TheSaint wrote:
    > Hello,
    >
    > I came across the problem that Gwenview moves the photo from the camera
    > memory by renaming them, but later I forgot which where moved.
    > Then I tought about a small script in python, but I stumbled upon my
    > ignorance on the way to do that.
    >
    > PIL can find similar pictures. I was thinking to reduce the foto into gray
    > scale and resize them to same size, what algorithm should take place?
    > Is PIL able to compare 2 images?
    >


    I recently wrote a program after reading an article (
    http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
    ) using the DCT method he proposes. It worked surprisingly well even
    with just the 64bit hash it produces.

    --
    Bill
     
    Billy Mays, Jul 8, 2011
    #2
    1. Advertising

  3. TheSaint

    TheSaint Guest

    Billy Mays wrote:

    > It worked surprisingly well even
    > with just the 64bit hash it produces.
    >

    I'd say that comparing 2 images reduced upto 32x32 bit seems too little to
    find if one of the 2 portrait has a smile referred to the other.
    I think it's about that mine and your suggestion are similar, but I'd like
    to scale pictures not less than 256x256 pixel.
    Also to take a wider case which the comparison involve a rotated image.

    --
    goto /dev/null
     
    TheSaint, Jul 8, 2011
    #3
  4. TheSaint

    Billy Mays Guest

    On 07/08/2011 10:14 AM, TheSaint wrote:
    > Billy Mays wrote:
    >
    >> It worked surprisingly well even
    >> with just the 64bit hash it produces.
    >>

    > I'd say that comparing 2 images reduced upto 32x32 bit seems too little to
    > find if one of the 2 portrait has a smile referred to the other.
    > I think it's about that mine and your suggestion are similar, but I'd like
    > to scale pictures not less than 256x256 pixel.
    > Also to take a wider case which the comparison involve a rotated image.
    >


    Originally I thought the same thing. It turns out that doing a DCT on
    an image typically moves the more important data to the top left corner
    of the output. This means that most of the other data in the output an
    be thrown away since most of it doesn't significantly affect the image.
    The 32x32 is an arbitrary size, you can make it any square block that
    you want.

    Rotation is harder to find. You can always take a brute force approach
    by simply rotating the image a couple of times and try running the
    algorithm on each of the rotated pics. Image matching is a difficult
    problem.

    --
    Bill
     
    Billy Mays, Jul 8, 2011
    #4
  5. On 07/08/2011 01:29 PM, TheSaint wrote:
    > Hello,
    >
    > I came across the problem that Gwenview moves the photo from the camera
    > memory by renaming them, but later I forgot which where moved.
    > Then I tought about a small script in python, but I stumbled upon my
    > ignorance on the way to do that.
    >
    > PIL can find similar pictures. I was thinking to reduce the foto into gray
    > scale and resize them to same size, what algorithm should take place?
    > Is PIL able to compare 2 images?
    >


    If Gwenview simply moves/renames the images, is it not enough to compare
    the actual files, byte by byte?
     
    Thomas Jollans, Jul 8, 2011
    #5
  6. TheSaint

    Dave Angel Guest

    On 01/-10/-28163 02:59 PM, TheSaint wrote:
    > Hello,
    >
    > I came across the problem that Gwenview moves the photo from the camera
    > memory by renaming them, but later I forgot which where moved.
    > Then I tought about a small script in python, but I stumbled upon my
    > ignorance on the way to do that.
    >
    > PIL can find similar pictures. I was thinking to reduce the foto into gray
    > scale and resize them to same size, what algorithm should take place?
    > Is PIL able to compare 2 images?
    >

    If your real problem is identifying a renamed file amongst thousands of
    others, why not just compare the metadata? it'll be much faster.

    For example, if you only have one camera, the timestamp stored in the
    EXIF data would be pretty close, Some cameras also store their "shutter
    release number" in the metadata, which would be even better.

    One concern is whether Glenview or any other of your utilities discard
    the metadata. That would be a big mistake.

    Also, if Gwenview has no other features you're counting on, perhaps you
    should write your own "move the files from camera to computer" utility.
    that's what I did, and it renames and reorganises the files as it does,
    according to my conventions, not someone else's. One reason for the
    renaming is that my cameras only use 4 digit numbers, and these recycle
    every 10000 images.

    DaveA
     
    Dave Angel, Jul 8, 2011
    #6
  7. TheSaint

    Fulvio Guest

    Thomas Jollans wrote:

    > If Gwenview simply moves/renames the images, is it not enough to compare
    > the actual files, byte by byte?


    For the work at the spot I found Geeqie, doing right. In the other hand
    learning some PIL function is one of my interest.
     
    Fulvio, Jul 11, 2011
    #7
  8. TheSaint

    Fulvio Guest

    Kevin Zhang wrote:

    > If anyone's interested, pleas checkout the source code in the attachment
    > and welcome any advise.


    I found that isn't python 3 code :(

    Then the code should go into some other program to allow actions on those
    pictures which are matching each other. Am I right?
     
    Fulvio, Jul 11, 2011
    #8
  9. TheSaint

    Fulvio Guest

    Dave Angel wrote:

    > If your real problem is identifying a renamed file amongst thousands of
    > others, why not just compare the metadata? it'll be much faster.
    >

    This was the primer situation, then to get into the dirt I tought something
    more sophisticated.
    There was a program some year's back which was brilliant an fast to find
    similar pictures on several thousand of them.
    Now I can't recall what was the program name and very interesting to do some
    of mine experiments.
     
    Fulvio, Jul 11, 2011
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tom
    Replies:
    6
    Views:
    399
  2. RC
    Replies:
    2
    Views:
    505
    John M Deal
    Nov 24, 2004
  3. Replies:
    0
    Views:
    438
  4. =?Utf-8?B?SXNhYmVsIFB1aWdkZXZhbGw=?=

    Duplicated records in a report from two tables

    =?Utf-8?B?SXNhYmVsIFB1aWdkZXZhbGw=?=, Apr 4, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    373
    =?Utf-8?B?SXNhYmVsIFB1aWdkZXZhbGw=?=
    Apr 6, 2005
  5. =?Utf-8?B?QW5kcmU=?=
    Replies:
    0
    Views:
    404
    =?Utf-8?B?QW5kcmU=?=
    Oct 27, 2005
Loading...

Share This Page