Multimedia

L

Luc The Perverse

I have used applications which search through image files, look at the
picture, cross reference and generate a list of images which it considers
similiar. Sometimes it would be way off, but it seemed like for the most
part there were a lot more false positives than false negatives.

I was wondering if there is open source technology to do this. (Or similar
operations for Audio and video.)

It was trivial generating a client which could search for duplicate file
sizes and then run a checksum on the files to see if they match.
 
R

Roedy Green

I have used applications which search through image files, look at the
picture, cross reference and generate a list of images which it considers
similiar. Sometimes it would be way off, but it seemed like for the most
part there were a lot more false positives than false negatives.

I have often wanted a search engine that could find pictures "similar"
to a given one.

Here is an idea for a reasonably simple though slow algorithm.

You take say a 8x8 grid square and overlay it over the image. You then
look for the most complicated square. I define "complicated" as the
square with the most distinct colours. For tie breaking, you sum the
contrast between all adjacent pixel pairs.

You then shift the grid one pixel right and repeat. Then you repeat
shifting the grid down, until you have covered all possible grids over
the image. Eventually you will discover the most compilicated grid
square. This square considered as a binary number is what you index
images by. Feel free to optimise the algorithm.

This is mainly to help you find duplicate images that were cropped to
find copyright violations. It won't find similar images with doctored
contrast, colours, or scaling, ditto images that have been changed
from jpg to png etc.

And unfortunately, it won't help you to find pictures of blue spotted
tree frogs.
 
L

Luc The Perverse

Roedy Green said:
I have often wanted a search engine that could find pictures "similar"
to a given one.

Here is an idea for a reasonably simple though slow algorithm.

You take say a 8x8 grid square and overlay it over the image. You then
look for the most complicated square. I define "complicated" as the
square with the most distinct colours. For tie breaking, you sum the
contrast between all adjacent pixel pairs.

You then shift the grid one pixel right and repeat. Then you repeat
shifting the grid down, until you have covered all possible grids over
the image. Eventually you will discover the most compilicated grid
square. This square considered as a binary number is what you index
images by. Feel free to optimise the algorithm.

This is mainly to help you find duplicate images that were cropped to
find copyright violations. It won't find similar images with doctored
contrast, colours, or scaling, ditto images that have been changed
from jpg to png etc.

And unfortunately, it won't help you to find pictures of blue spotted
tree frogs.


You lost me with the blue spotted tree frogs part.

You've got me thinking though. I think edge detection may be the key.

If there isn't something out there that does this already (which I don't
believe) then there should be!
 
A

Andrew Thompson

Luc said:
message news:[email protected]... ...

You lost me with the blue spotted tree frogs part.

Such 'pixel comparison' methods cannot determine high level information.
- 'Blue'(ish/predominantly) - maybe.
- 'Spotted' - much harder.
- Tree frogs - "I've cracked machine vision! Where's my Nobel prize?"
 
R

Roedy Green

Such 'pixel comparison' methods cannot determine high level information.
- 'Blue'(ish/predominantly) - maybe.
- 'Spotted' - much harder.
- Tree frogs - "I've cracked machine vision! Where's my Nobel prize?"

Does there exist some standard for encoding picture content inside the
image in a way that Google for example could find photos of George
Bush with Harriet Miers in the 1970s in Albania. or
Installing a xxxx cartridge in a yyyy printer?
 
A

Andrew Thompson

Roedy said:
Does there exist some standard for encoding picture content inside the
image in a way that Google for example could find photos of George
Bush with Harriet Miers in the 1970s in Albania.

No. JPG's (as well as a variety of other image formats) have
the capacity to store extra information in images (mostly related
to the specifics of the 'shot' - F-Stop, timing..), some can also
store the type of meaningul information you are referring to.

Unfortunately, it seems that there is little/standards
commonality amongst the format of this infromation even
for single image types, let alone image types in general.

I was just thinking of the process that Google uses to
pull up images before I saw your post, actually, and was
about to point out the problem becomes a lot simpler with
meaningful file names like ..

'blue_spotted_tree_frog.jpg'

;-)
...or
Installing a xxxx cartridge in a yyyy printer?

.....huh? Are we still talking about images?
 
R

Roedy Green

Such 'pixel comparison' methods cannot determine high level information.
- 'Blue'(ish/predominantly) - maybe.
- 'Spotted' - much harder.
- Tree frogs - "I've cracked machine vision! Where's my Nobel prize?"

I really enjoyed that post. It is a joy to see someone pack so much
into so few words.
 
A

Andrew Thompson

Roedy said:
I really enjoyed that post. It is a joy to see someone pack so much
into so few words.

I was thinking much the same of your original statement!

[ ..and as an added bonus, I like frogs. :) ]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top