Texture/image similarity with ruby? (computer vision)

Discussion in 'Ruby' started by Casimir, Jun 19, 2008.

  1. Casimir

    Casimir Guest

    Still no luck so far, still looking. Anyone?
    Casimir, Jun 19, 2008
    #1
    1. Advertising

  2. Casimir

    Axel Etzold Guest

    Hi --

    -------- Original-Nachricht --------
    > Datum: Thu, 19 Jun 2008 20:33:43 +0900
    > Von: Casimir <>
    > An:
    > Betreff: Texture/image similarity with ruby? (computer vision)


    > Still no luck so far, still looking. Anyone?


    well, what does similarity of two images/textures mean for you ?

    Given a (hopefully large) set of images, you could divide it into a
    training set and a set of images to be classified into sets of mutually
    similar images. One way to perform both the training and the classification
    is using Support Vector Machines. I found these Ruby bindings to a library
    providing these by a Google search, haven't used them yet :

    http://sourceforge.net/projects/rubysvm/

    Most probably, you'll need to read out image information at the pixel level at some point.
    Imagemagick is a very powerful library to do this, and Tim Hunter provides
    a wonderfully rich Ruby binding to it: RMagick.

    Psychologists who have conducted image similarity studies compare many different
    measures and filtering methods,e.g., here:

    @misc{ karl-perception,
    author = "Dirk Neumann Karl",
    title = "Perception Based Image Retrieval",
    url = "citeseer.ist.psu.edu/635475.html" },

    At some point, most of these methods use a (discrete) Fourier transform of some of the
    image information and compare the results of the transforms of two images to assess their
    similarity.
    You could use Ruby-GSL, or Narray with fftw3 to perform that.

    Best regards,

    Axel

    --
    Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
    Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
    Axel Etzold, Jun 19, 2008
    #2
    1. Advertising

  3. Casimir wrote:
    > Still no luck so far, still looking. Anyone?
    >
    >


    First step -- define the image processing algorithms that do what you want.

    Second step - find a library or package that does them

    Third step - if it's ImageMagick, use RMagick. If not, you may have to
    write a wrapper to use Ruby.
    M. Edward (Ed) Borasky, Jun 19, 2008
    #3
  4. Casimir

    Ron Fox Guest

    Uh well this is a >HARD< research project. I would not expect an
    answer here... unless there are some people in computer vision
    floating here. If you have a local college with a _good_ computer
    science dept. that has someone that does computer vision.. make an
    appointment with them.
    Be prepared to describe exactly what you want to achieve. Be
    prepared for it not to be easy.

    Ron.

    Casimir wrote:
    > Still no luck so far, still looking. Anyone?



    --
    Ron Fox
    NSCL
    Michigan State University
    East Lansing, MI 48824-1321
    Ron Fox, Jun 20, 2008
    #4
  5. Casimir

    Casimir Guest

    Axel Etzold wrote on Thu, 19 Jun 2008 20:33:43 +0900

    > well, what does similarity of two images/textures mean for you ?


    Perceptual similarity as a human subject would experience it. Let me
    expand (=ramble) on this:

    At the moment I am focusing on following the problem: Given any single
    photograph and a random set of photos (20-100), which of the random set
    is most similar, perceptually, to the target photo.

    I have made some simple tests that divide image into color-channel
    components and a luminosity channel, downsample the channels into a
    16x16 arrays, and calculates the difference between the target photo to
    each of the random ones. Difference hashing its called?

    Results are rather confusing. Most of the time perceived similarity (as
    I experience it) does not exist, even if statistically the images might
    be similar.

    IOW, random bw-noise-image gets rather good similarity scores whilst
    morphologically having zero resemblance to the target photo.

    But anyone who has studied this subject even superficially already knows
    this - hashing wont produce the same kind of similarity we experience
    with our senses.

    > Given a (hopefully large) set of images, you could divide it into a
    > training set and a set of images to be classified into sets of mutually
    > similar images. One way to perform both the training and the classification
    > is using Support Vector Machines. I found these Ruby bindings to a library
    > providing these by a Google search, haven't used them yet :
    >
    > http://sourceforge.net/projects/rubysvm/


    This is one of the possible avenues. Gabor features sets used this kind
    of approach I believe. Thank you for that link.

    But, I don't see this as the most interesting approach. The particular
    problem I wrestle with has rather small sets, and training would have to
    be performed for every photo. Not really practical or efficient. May be
    training a nn is the only way to really do this.

    > Most probably, you'll need to read out image information at the pixel level at some point.
    > Imagemagick is a very powerful library to do this, and Tim Hunter provides
    > a wonderfully rich Ruby binding to it: RMagick.


    I have been using RMagick, AND have been looking at Hornets Eye, which
    might be useful at some point, but at the moment doesn't cater to this
    specific case.

    > Psychologists who have conducted image similarity studies compare many different
    > measures and filtering methods,e.g., here:
    >
    > @misc{ karl-perception,
    > author = "Dirk Neumann Karl",
    > title = "Perception Based Image Retrieval",
    > url = "citeseer.ist.psu.edu/635475.html" },


    Too bad that service isnt available, as I have not read that article.

    > At some point, most of these methods use a (discrete) Fourier transform of some of the
    > image information and compare the results of the transforms of two images to assess their
    > similarity.
    > You could use Ruby-GSL, or Narray with fftw3 to perform that.


    Narray is what I was planning to use once theres a working model for the
    similarity comparison in place.

    E. Borasky - Yes, I have the tools down, but clearly dont have a
    suitable perceptual image comparison algo yet.

    Thanks also to Ron Fox for pointing out its not going to be easy. :)

    So, I guess I could use the rest of my lunch break to elaborate on the
    Question.

    What kind of computational algorithm would provide the perceptual
    similarity score, rating or hash of some kind between two or more images
    that would match the way humans perceive best?

    I guess one would need two distinct classifications: similarity of
    morphological appearance (features, shapes, ? in image) and similarity
    of the colors (of the areas).

    I assume there is no answer in existance to this question, well, that I
    know of.
    Casimir, Jun 23, 2008
    #5
  6. Casimir

    Kyle Schmitt Guest

    On Mon, Jun 23, 2008 at 6:08 AM, Casimir <> wrote:
    > Axel Etzold wrote on Thu, 19 Jun 2008 20:33:43 +0900
    >
    >> well, what does similarity of two images/textures mean for you ?

    >
    > Perceptual similarity as a human subject would experience it. Let me expand
    > (=ramble) on this:
    >
    > At the moment I am focusing on following the problem: Given any single
    > photograph and a random set of photos (20-100), which of the random set is
    > most similar, perceptually, to the target photo.
    >
    > I have made some simple tests that divide image into color-channel
    > components and a luminosity channel, downsample the channels into a 16x16
    > arrays, and calculates the difference between the target photo to each of
    > the random ones. Difference hashing its called?
    >
    > Results are rather confusing. Most of the time perceived similarity (as I
    > experience it) does not exist, even if statistically the images might be
    > similar.

    ....
    > What kind of computational algorithm would provide the perceptual similarity
    > score, rating or hash of some kind between two or more images that would
    > match the way humans perceive best?


    If I'm recalling properly, a lot of research has had success doing
    histogram comparison. More or less it's showing images being similar
    based on similar color composition. That sounds silly, but in reality
    it's quite effective for a first go at it.

    You mention a training library, and since a good deal of computer
    vision is still crossing over from AI, an important question becomes,
    what type of system will be doing the learning?



    --Kyle
    Kyle Schmitt, Jun 23, 2008
    #6
  7. Casimir

    Axel Etzold Guest

    -------- Original-Nachricht --------
    > Datum: Mon, 23 Jun 2008 20:08:13 +0900
    > Von: Casimir <>
    > An:
    > Betreff: Re: Texture/image similarity with ruby? (computer vision)


    Dear Casimir,


    As a scientific folklore has it , if you haven't got the answer, it's because you did not understand
    the question ...

    > Axel Etzold wrote on Thu, 19 Jun 2008 20:33:43 +0900
    >
    > > well, what does similarity of two images/textures mean for you ?

    >
    > Perceptual similarity as a human subject would experience it.


    I think this statement is not (yet) useful, since there is no such thing as a unique perception of human
    subjects of an image. A subset of images in your collection might be similar because they all show postcard destinations (the skyline of Manhattan, Big Ben in London, a camel in front of the Pyramids in Cairo, a koala in Australia, a Tyrolean yodler), but the same images show living beings whereas others show architecture and therefore are dissimilar to a psychologist or to a civil engineer ...
    I know of many neuroscientists who train monkeys, usually for at least six months, more often for a year, on some very specific task to test a particular hypothesis about early vision. (It takes this long, because you can't explain the monkey
    what to do in English, they'll tell you). These monkeys then become experts on any weird task -- theoretical computer science holds some nice, albeit non-constructive (in the sense of their mathematical proof style) theorems about computability using neural networks, that can be confirmed experimentally this way.
    So there is a description problem that will possibly lead you down very different ways on the same particular set of data, depending on whether you are a psychologist or a civil engineer or a tourist manager in the example above.
    On the other hand, there is the anecdote of a journalist questioning a judge at Supreme Court of the United States to define obscenity (right after a decision the journalist didn't like).

    Journalist: How do you define obscenity?
    Judge: I recognise it when I see it.

    So given the preliminary information at breakfast that we need to book the summer holidays, we might all classify the
    data from the example above the way the tourist manager would as we sit in the travel agent's shop, there is still a question of how to do that.

    >
    > At the moment I am focusing on following the problem: Given any single
    > photograph and a random set of photos (20-100), which of the random set
    > is most similar, perceptually, to the target photo.
    >
    > I have made some simple tests that divide image into color-channel
    > components and a luminosity channel, downsample the channels into a
    > 16x16 arrays, and calculates the difference between the target photo to
    > each of the random ones. Difference hashing its called?
    >
    > Results are rather confusing. Most of the time perceived similarity (as
    > I experience it) does not exist, even if statistically the images might
    > be similar.


    This shows that apparently, the human visual system does not use this kind
    of downsampling for similarity measures....


    > ... [Support Vector Machines] is one of the possible avenues. Gabor features sets used this kind
    > of approach I believe.


    Gabor feature sets (or wavelets in general) and support vector machines and I'd claim, any
    feasible and plausible approach of modelling vision will need some kind of data reduction ...
    reconstruct an image with loss using basis functions, use only a few, but yet so many that
    the loss is not (annoyingly) perceptible. The number of parameters you'll need to do that
    will then suffice to do the classification also.
    Interpersonal mileage may vary.. there are some individuals who cannot distinguish Bordeaux
    from Bourgogne when tasting either, but then they don't write wine connoissor guides.


    > But, I don't see this as the most interesting approach. The particular
    > problem I wrestle with has rather small sets, and training would have to
    > be performed for every photo.


    That's in contradiction to your statement above ... if there is such a thing as human similarity
    perception, it should hold for all cases brought in front of you.
    In other words, if you are a judge at a supreme court, you should not confuse and irritate
    people by taking decisions that seem totally contradictory all the time ... those people that do not like the decisions
    taken will still appreciate internal consistency ...

    > May be training a nn is the only way to really do this.


    Neural networks can be seen as a hardware for support vector machines or for Gabor patches etc (even if
    this is a gedankenexperiment most of the time).

    > E. Borasky - Yes, I have the tools down, but clearly dont have a
    > suitable perceptual image comparison algo yet.


    I would take an iterative approach : classify the pictures that look similar to you by hand and then plot the classes as point clusters in different colours in the space of variables which you think seem useful (colour, hue, ...)

    Can you easily draw a line to separate the dots of different classes ? Then the SVM in that particular
    space delivers what you want.
    Otherwise, introducing a new dimension might give you the opportunity to introduce a separating plane,
    or you might still need another one. Eventually, there's the theorem of Hahn-Banach from functional
    analysis that shows how to separate points ... any set of points ... using a hyperplane (i.e., something linear
    and therefore computationally well-behaved).



    >
    > Thanks also to Ron Fox for pointing out its not going to be easy. :)
    >
    > So, I guess I could use the rest of my lunch break to elaborate on the
    > Question.
    >
    > What kind of computational algorithm would provide the perceptual
    > similarity score, rating or hash of some kind between two or more images
    > that would match the way humans perceive best?
    >
    > I guess one would need two distinct classifications: similarity of
    > morphological appearance (features, shapes, ? in image) and similarity
    > of the colors (of the areas).


    A good idea would be to have a look at a good introduction to wavelets ...

    http://www.gvsu.edu/math/wavelets/tutorials.htm

    You can use wavelets as basis functions, fit parameters for your data and then try to find
    separable parameter sets for your different classes.
    Think of something plausible when choosing ... as Ron Fox said, it's not easy ...

    Best regards,

    Axel
    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört?
    Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
    Axel Etzold, Jun 23, 2008
    #7
  8. Casimir

    Chris Hulan Guest

    Chris Hulan, Jun 23, 2008
    #8
  9. Casimir

    ara.t.howard Guest

    On Jun 19, 2008, at 5:33 AM, Casimir wrote:

    > Still no luck so far, still looking. Anyone?


    a good toolkit:

    http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/

    and another

    http://www.itk.org/HTML/WatershedSegmentationExample.html



    i've used bot a lot, shelling out to ruby mostly, unfortunately
    require hacking in c--. insight has ruby bindings though (via swig).


    i think you are going to want to do some filtering in the frequency
    domain focusing on edge images (shapes) and colours, possibly warping
    to achieve a best fit. whatever you do is going to be entirely custom
    - as everything in computer visions tends to be, despite having a
    plethora of off the shelf tools to start from.

    kind regards.

    ps.

    i nearly always end up stuff pixels into narrays and doing some manip
    there - if nothing else to refine algorithms in ruby before
    laboriously re-coding in c-- or c. for this is useful to have
    representatvie images that are small and as close to raw as possible.



    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
    ara.t.howard, Jun 23, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Luca Paganelli

    Computer Vision

    Luca Paganelli, May 21, 2004, in forum: Java
    Replies:
    0
    Views:
    351
    Luca Paganelli
    May 21, 2004
  2. wallge
    Replies:
    0
    Views:
    401
    wallge
    Dec 8, 2006
  3. n00m

    ImSim: Image Similarity

    n00m, Mar 5, 2011, in forum: Python
    Replies:
    24
    Views:
    871
  4. John W. Long

    Low Vision Stylesheet for Ruby-Lang

    John W. Long, Dec 6, 2006, in forum: Ruby
    Replies:
    2
    Views:
    83
    John W. Long
    Dec 6, 2006
  5. Replies:
    2
    Views:
    154
Loading...

Share This Page