Sunday, February 5, 2012

Intro to vision - Images and representation

I am dabbling in computer vision and it helps if I can gather my thoughts. (This field is not as simple as I imagined it to be.) There are a lot of background and information which goes missing , and I hate reading the unabridged versions of text. So, here goes:

Vision is primarily concerned with images, video can be treated as a series of images at a certain rate or frequency. There is the 2-D and 3-D representation of images. I shall not talk about 3-D since my knowledge on that is limited at best.

What does 2-D representation mean? Backtracking a little, every image is represented by a series of pixels, these pixels store some information. In a 2-D representation, the pixels are spread across 2 dimensions, lets say X and Y (for convenience). This is generally represented in matrix form, where the top-left pixel of the image is the top-left value in the matrix.

What are these pixel thingies? These "pixels" can hold different types of information. One common type in vision is the RGB space, where we represent the color space of the image with the composites of red, blue and green. Each of these are called the channels of the color space. Ideally, we use 8 bits for each color, thereby 2^8 = 255 possible values for each component. You can also represent the image using 18-bit color, 32-bit color and 48-bit color. These various color spaces correspond to different color depths. TrueColor is the 24-bit color, with each channel RGB getting 8 bits.

There are also CMYK, sRGB, scRGB color spaces which are used in HDR (photo enthusiasts alert), but vision folks generally deal with RGB or BGR(reverse RGB representation) color spaces.

For more on the color representation look here and dig up more - http://en.wikipedia.org/wiki/Color_depth

Although this color representation is great it does not give us an intuitive idea of color. If for example I want orange, its R=255,G=142,B=13. If I want a darker orange its R=210,G=65,B=0. So, it becomes hard to play with these colors and we use the HSI (Hue, Saturation and Intensity model). This is seen in displays and color pickers across the digital image applications. We can easily get a darker or lighter color by varying the saturation making our lives much easier.

It is possible to convert from one space to another, and most libraries have api's for easy conversion.

So, if we see a image, we can imagine it as a matrix containing 24bits in each matrix cell to represent a pixel on the image.

No comments:

Post a Comment