addendum Re: working with images (PIL ?)

P

Poppy

I've put together some code to demonstrate what my goal is though looping
pixel by pixel it's rather slow.

import Image

def check_whitespace():
im = Image.open("\\\\server\\vol\\temp\\image.jpg")

size = im.size

i = 0
whitePixCount = 0
while i in range(size[1]):
j = 0
while j in range(size[0]):
p1 = im.getpixel((j,i))
if p1 == (255, 255, 255):
whitePixCount = whitePixCount + 1
if whitePixCount >= 492804: ## ((image dimensions 1404 x
1404) / 4) 25%
return "image no good"
j = j + 1
i = i + 1

print whitePixCount

return "image is good"

print check_whitespace()
 
I

Ivan Illarionov

I've put together some code to demonstrate what my goal is though
looping pixel by pixel it's rather slow.

import Image

def check_whitespace():
im = Image.open("\\\\server\\vol\\temp\\image.jpg")

size = im.size

i = 0
whitePixCount = 0
while i in range(size[1]):
j = 0
while j in range(size[0]):
p1 = im.getpixel((j,i))
if p1 == (255, 255, 255):
whitePixCount = whitePixCount + 1
if whitePixCount >= 492804: ## ((image dimensions 1404
x
1404) / 4) 25%
return "image no good"
j = j + 1
i = i + 1

print whitePixCount

return "image is good"

print check_whitespace()


Poppy said:
I need to write a program to examine images (JPG) and determine how much
area is whitespace. We need to throw a returned image out if too much of
it is whitespace from the dataset we're working with. I've been
examining the Python Image Library and can not determine if it offers
the needed functionality. Does anyone have suggestions of other image
libraries I should be looking at it, or if PIL can do what I need?

PIL will do this, use histogram() method of Image objects.

-- Ivan
 
K

Ken Starks

As others have said, PIL has the 'histogram' method to do most of the
work. However, as histogram works on each band separately, you have
a bit of preliminary programming first to combine them.

The ImageChops darker method is one easy-to-understand way (done twice),
but there are lots of alternatives, I am sure.


# ------------------------------------

import Image
import ImageChops

Im = Image.open("\\\\server\\vol\\temp\\image.jpg")
R,G,B = Im.split()

Result=ImageChops.darker(R,G)
Result=ImageChops.darker(Result,B)

WhiteArea=Result.histogram()[0]
TotalArea=Im.size[0] * Im.size[1]
PercentageWhite = (WhiteArea * 100.0)/TotalArea




I've put together some code to demonstrate what my goal is though looping
pixel by pixel it's rather slow.

import Image

def check_whitespace():
im = Image.open("\\\\server\\vol\\temp\\image.jpg")

size = im.size

i = 0
whitePixCount = 0
while i in range(size[1]):
j = 0
while j in range(size[0]):
p1 = im.getpixel((j,i))
if p1 == (255, 255, 255):
whitePixCount = whitePixCount + 1
if whitePixCount >= 492804: ## ((image dimensions 1404 x
1404) / 4) 25%
return "image no good"
j = j + 1
i = i + 1

print whitePixCount

return "image is good"

print check_whitespace()


Poppy said:
I need to write a program to examine images (JPG) and determine how much
area is whitespace. We need to throw a returned image out if too much of it
is whitespace from the dataset we're working with. I've been examining the
Python Image Library and can not determine if it offers the needed
functionality. Does anyone have suggestions of other image libraries I
should be looking at it, or if PIL can do what I need?
 
K

Ken Starks

Oops. I meant:

WhiteArea=Result.histogram()[255]

of course, not

WhiteArea=Result.histogram()[0]

Ken said:
As others have said, PIL has the 'histogram' method to do most of the
work. However, as histogram works on each band separately, you have
a bit of preliminary programming first to combine them.

The ImageChops darker method is one easy-to-understand way (done twice),
but there are lots of alternatives, I am sure.


# ------------------------------------

import Image
import ImageChops

Im = Image.open("\\\\server\\vol\\temp\\image.jpg")
R,G,B = Im.split()

Result=ImageChops.darker(R,G)
Result=ImageChops.darker(Result,B)

#### Mistake here:
WhiteArea=Result.histogram()[0]
TotalArea=Im.size[0] * Im.size[1]
PercentageWhite = (WhiteArea * 100.0)/TotalArea




I've put together some code to demonstrate what my goal is though
looping pixel by pixel it's rather slow.

import Image

def check_whitespace():
im = Image.open("\\\\server\\vol\\temp\\image.jpg")

size = im.size

i = 0
whitePixCount = 0
while i in range(size[1]):
j = 0
while j in range(size[0]):
p1 = im.getpixel((j,i))
if p1 == (255, 255, 255):
whitePixCount = whitePixCount + 1
if whitePixCount >= 492804: ## ((image dimensions
1404 x 1404) / 4) 25%
return "image no good"
j = j + 1
i = i + 1

print whitePixCount

return "image is good"

print check_whitespace()


Poppy said:
I need to write a program to examine images (JPG) and determine how
much area is whitespace. We need to throw a returned image out if too
much of it is whitespace from the dataset we're working with. I've
been examining the Python Image Library and can not determine if it
offers the needed functionality. Does anyone have suggestions of
other image libraries I should be looking at it, or if PIL can do
what I need?
 
P

Poppy

Thanks, since posting I figured out how to interpret the histogram results,
which seems to be the consensus in responses. I wrote a check image program
and have been periodically calling it against a folder where I make a copy
of our images used for production. My method right now is to check what we
send for errors, but is not preventive.

Also I determined whitespace is not the only issue, any color that
dominates. I'm considering rewriting this code below to setup bins, so if
combined neighboring colors exceeds the threshold then reject the image. I
have examples where half the image appears black, but actually varies
throughout.

Since my image is RGB I'm looping through a 768 element list.

Zach-

import Image, os


def check_image(file):

try:
im = Image.open(file)
except:
return "Can't open file %s " % file

imData = im.histogram()
i = 0
for ea in imData:
if ea > ((im.size[0] * im.size[1]) / 4): ## 25% of image size
return "bad image %s - %s element num is %s " % (file, ea,
str(i))
i = i + 1

return "good image %s, image size is %s" % (file, im.size)


def main(dir):
data = ""
try:
files = os.listdir(dir)
for ea in files:
data = data + str(check_image(os.path.join(dir,ea))) + "\n"
except:
return "Can't get files in %s" % dir
return data

print main("\\\\host\\path\\to\\image_folder\\")


Ken Starks said:
As others have said, PIL has the 'histogram' method to do most of the
work. However, as histogram works on each band separately, you have
a bit of preliminary programming first to combine them.

The ImageChops darker method is one easy-to-understand way (done twice),
but there are lots of alternatives, I am sure.


# ------------------------------------

import Image
import ImageChops

Im = Image.open("\\\\server\\vol\\temp\\image.jpg")
R,G,B = Im.split()

Result=ImageChops.darker(R,G)
Result=ImageChops.darker(Result,B)

WhiteArea=Result.histogram()[0]
TotalArea=Im.size[0] * Im.size[1]
PercentageWhite = (WhiteArea * 100.0)/TotalArea




I've put together some code to demonstrate what my goal is though looping
pixel by pixel it's rather slow.

import Image

def check_whitespace():
im = Image.open("\\\\server\\vol\\temp\\image.jpg")

size = im.size

i = 0
whitePixCount = 0
while i in range(size[1]):
j = 0
while j in range(size[0]):
p1 = im.getpixel((j,i))
if p1 == (255, 255, 255):
whitePixCount = whitePixCount + 1
if whitePixCount >= 492804: ## ((image dimensions 1404 x
1404) / 4) 25%
return "image no good"
j = j + 1
i = i + 1

print whitePixCount

return "image is good"

print check_whitespace()


Poppy said:
I need to write a program to examine images (JPG) and determine how much
area is whitespace. We need to throw a returned image out if too much of
it is whitespace from the dataset we're working with. I've been
examining the Python Image Library and can not determine if it offers
the needed functionality. Does anyone have suggestions of other image
libraries I should be looking at it, or if PIL can do what I need?
 
K

Ken Starks

I would still be concerned that you are checking against the percentage
of the 768 bins returned by the histogram method. Two pixels of
widely different colour end up in the same bin, so long as just ONE
of the Red, Green, or Blue components is equal.

So, for example, colours (2, 27, 200) and (200, 27, 2) are both in
the bin for G=27. But they are very different colours.

There are actualy 256 * 256 * 256 colours, but I don't suppose
you want that many bins!

What you want is a much smaller number of bins, with pixels
of 'close' colours (whatever that means) put into the same bin.

What 'close' means for colours, is quite a difficult thing, and
the consensus is that using the three RGB coordinates is not
as good as certain other colour spaces.

You could use the ImageOps.posterize method to reduce the number of
colours in the image, but whether 'close' colours end up together,
I don't know.

You might try the PIL special interest group (SIG) 'image-sig'

http://mail.python.org/mailman/listinfo/image-sig

(If you want to know exactly how many unique colours an image actually
has, load the image into the 'GIMP' assuming you have it,
and go to :

Menubar --> Filters --> Colours --> Colourcube analysis...

)
 
I

Ivan Illarionov

Thanks, since posting I figured out how to interpret the histogram
results, which seems to be the consensus in responses. I wrote a check
image program and have been periodically calling it against a folder
where I make a copy of our images used for production. My method right
now is to check what we send for errors, but is not preventive.

Also I determined whitespace is not the only issue, any color that
dominates. I'm considering rewriting this code below to setup bins, so
if combined neighboring colors exceeds the threshold then reject the
image. I have examples where half the image appears black, but actually
varies throughout.

Since my image is RGB I'm looping through a 768 element list.

I suggest:
1. convert to greyscale
2. posterize
3. check the max(im.histogram())

-- Ivan
 
P

Poppy

Thank you and the other responders have given me something to consider, I
understand the concept of the posterize idea and will be experimenting with
that.

I wanted to respond to this statement below which is true, however I believe
the histogram sums the values so both colors would be in bin 229. I say
that because all white is in histogram element 767, while black is in
element 0. Anyone on this list know how to interpret the histogram list?
Your point is still valid regardless of my interpretation.
So, for example, colours (2, 27, 200) and (200, 27, 2) are both in
the bin for G=27. But they are very different colours.

I will be checking out the SIG for PIL thanks for that pointer.



Ken Starks said:
I would still be concerned that you are checking against the percentage
of the 768 bins returned by the histogram method. Two pixels of
widely different colour end up in the same bin, so long as just ONE
of the Red, Green, or Blue components is equal.

So, for example, colours (2, 27, 200) and (200, 27, 2) are both in
the bin for G=27. But they are very different colours.

There are actualy 256 * 256 * 256 colours, but I don't suppose
you want that many bins!

What you want is a much smaller number of bins, with pixels
of 'close' colours (whatever that means) put into the same bin.

What 'close' means for colours, is quite a difficult thing, and
the consensus is that using the three RGB coordinates is not
as good as certain other colour spaces.

You could use the ImageOps.posterize method to reduce the number of
colours in the image, but whether 'close' colours end up together,
I don't know.

You might try the PIL special interest group (SIG) 'image-sig'

http://mail.python.org/mailman/listinfo/image-sig

(If you want to know exactly how many unique colours an image actually
has, load the image into the 'GIMP' assuming you have it,
and go to :

Menubar --> Filters --> Colours --> Colourcube analysis...

)









Thanks, since posting I figured out how to interpret the histogram
results, which seems to be the consensus in responses. I wrote a check
image program and have been periodically calling it against a folder
where I make a copy of our images used for production. My method right
now is to check what we send for errors, but is not preventive.

Also I determined whitespace is not the only issue, any color that
dominates. I'm considering rewriting this code below to setup bins, so if
combined neighboring colors exceeds the threshold then reject the image.
I have examples where half the image appears black, but actually varies
throughout.

Since my image is RGB I'm looping through a 768 element list.

Zach-

import Image, os


def check_image(file):

try:
im = Image.open(file)
except:
return "Can't open file %s " % file

imData = im.histogram()
i = 0
for ea in imData:
if ea > ((im.size[0] * im.size[1]) / 4): ## 25% of image size
return "bad image %s - %s element num is %s " % (file, ea,
str(i))
i = i + 1

return "good image %s, image size is %s" % (file, im.size)


def main(dir):
data = ""
try:
files = os.listdir(dir)
for ea in files:
data = data + str(check_image(os.path.join(dir,ea))) + "\n"
except:
return "Can't get files in %s" % dir
return data

print main("\\\\host\\path\\to\\image_folder\\")
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top