Excavations in tech land: Reverse engineering the ising model

I was trying to solve this problem in image segmentation. The key idea in segmentation is to group pixels which seem spatially similar owing to properties of color, region and body. There is no one single segmentation which is correct, it is a matter of perception. Perception of images varies by individual; what I perceive in an image, you may not. Its like looking at monet or chagall, my opinions of masterpiece may be lost on you.

So, starting with this how does end up with the ising model. One starts by applying a markov random field to the image and running a max-flow on the image graph to get the segmentation. Standard fare combinatorial optimization problems some say.

My introduction to markov process is from a NLP standpoint, pretty informally again (understanding the process, and not reading a whole bunch of literature). I understood the conditional inference scheme depended only on the previous state(s) and not all the past states. With this understanding I embarked to solve this problem (maybe I needed better accoutrements, but this too shall pass).

How does one group anything? A general consensus is to put all things similar in a heap and call it a group. The question then arises, "how do you define similarity in images?". This led me down the path of wave particle duality, although we only see images as composed of particles (aka pixels). It took me a little while to understand what I was solving was not a computer science question but a physics question. I imagined there should be a force applied on the pixels to conform them to some shape, in essence changing their basic property (read color intensity value). If you think of a child grouping his toys into jedi and dark forces for a epic battle, he pushes all the dark ones into a group. The act of "pushing" is a force, which is the same concept for the image. Aha, but there is catch, you cannot move the pixels spatially like the little boy. A little confusing I agree.

So then, the story thus far looks good but we still don't have a solution. Now I can use the markov knowledge from a previous class. Imagine a bunch of patches or markers hanging in the air above the ground, and you image is painted for you by monet . For simplicity let's consider 2 patches, white and black. If we have a background and foreground in an image, lets say the white patches are attracted to the background and black patches to the foreground. Now if we can define an attraction force between the pixel and the patch we are home scott free. Herein comes the spatial arrangement problem, pixels close to each other should have the same patches. (I agree with this notion in a general sense, but not completely, occlusion and a few other factors like spatial incoherence, which can be seen in illusions make me a little uncomfortable with this generalization).

Steaming ahead with this idea, we now have 2 forces, one force between the patch (in the air) and the pixel in the image on the ground and another force between the pixels on the ground making them stick close together (I also think of a force between the patches and regions, making up a complete field but that too shall pass for now).

With this scheme of things, we can define a function to to either minimize or maximize over all the variables we have. The norm is to choose a energy function here and the general idea is to have a energy minimization or maximization,

E(x) = Ed + Es.

Here Ed is the data term and Es the smoothing term. The data term makes makes up some energy but only the pixel data and no neighborhood function. The smoothness term uses the neighborhood function and ensures we get the same labeling for pixels within the same region.

You can read more about this scheme here - http://www.cis.upenn.edu/~jshi/GraphTutorial/

So now that we have that down, we need equate the force to energy. If we assume a gravitational force, it will be attraction and we will have higher force for correct labelling and we can maximize the energy function to get a stable labeling of the pixels. On the other hand, if we consider a magnetic field, where like particles repel and unlike don't. We need to minimize the function to get a correct labeling of the images, which is the accepted norm in vision.

The concepts here are grossly oversimplified to help me understand the workings and make a intuitive sense of the process underlying the markov field. As with everything, the disclaimer, I do not know everything and I can get things wrong.

Excavations in tech land

Saturday, March 31, 2012

Reverse engineering the ising model

No comments:

Post a Comment

My Blog List

Followers