Excavations in tech land: 2012

Wednesday, December 5, 2012

Python threading

There are two modules to python threading, and many caveats:

Python has two modules, thread and threading - thread is lower level, while threading is higher level. This means threading uses threads or sort thereof. Threading provides a lot of sychronization, event and timing primitives which will help with a multi threaded program.

The important thing to note however is the GLOBAL INTERPRETER LOCK, which essentially means there can be only one interpreter thread running at any given time. Any thread you create is going to be run in time slices. This limits the functionality of threads in that they are no longer suitable for parallel/concurrent execution. The reason for python to do this is to provide for safety in multi-threaded systems. Many structures can be assumed to be thread safe and run inside threads without many problems. Communication between threads also become easy. The problem arises when you need a very real time concurrent execution to make use of the immense power of your multicore hardware architecture. For this python has the multiprocessing module. Of course, now you may ask; much like me, what is the multithreading module good for then? It is good if you have multiple slow methods which wait on process such as I/O. Instead of busy waiting you can yield control during the time slice to another thread which will continue crunching the numbers for you, while the slow I/O is trudging behind on its work. This is all the threading module is designed to achieve in my view. Of course you could use it as a flow control mechanism, but I believe that would be overkill.

The caveats are many, mostly with the thread module (straight from python docs).

Threads interact strangely with interrupts: the KeyboardInterrupt exception will be received by an arbitrary thread. (When the signal module is available, interrupts always go to the main thread.)
Calling sys.exit() or raising the SystemExit exception is equivalent to calling thread.exit().
Not all built-in functions that may block waiting for I/O allow other threads to run. (The most popular ones (time.sleep(), file.read(), select.select()) work as expected.)
It is not possible to interrupt the acquire() method on a lock — the KeyboardInterrupt exception will happen after the lock has been acquired.
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executingtry ... finally clauses or executing object destructors.
When the main thread exits, it does not do any of its usual cleanup (except that try ... finally clauses are honored), and the standard I/O files are not flushed.

Of course, the most important to remember are if you capturing keyboard interrupt, you have no idea which thread receives the signal. Some block waiting calls don't allow other threads to run (I learnt this the hard way).

With this in mind, thread away your python.

Thursday, October 25, 2012

thou shall ask but not receive

Dynamic frequency scaling or CPU throttling is the buzzword for chipset geeks. It is implemented in most hardware processors and supported by most modern OS's. The intel version of this is called speedstep and amd calls it powernow. The idea is simple dynamically increase or decrease the cpu frequency thereby reducing the power usage and heat requirements leading to power savings.

Of course, in linux we have the complete control over all aspects of computing and you can control your cpu speed with cpufrequtils.

Run the following to install cpufrequtils:

$sudo apt-get install cpufrequtils

The next step is to learn about cpufrequtils, it has two programs: cpufreq-info and cpufreq-set. Just run cpufreq-info to get all the info you need on the cpu frequency settings. The important things to note are the governor values and the available step frequencies. Then you can just run the cpufreq-set to set it at the desired frequency. The painful thing is you have to set it for a given cpu at a time.
This should set the cpus to a powersave mode:

$sudo cpufreq-set -c 0 -g powersave
$sudo cpufreq-set -c 1 -g powersave
$sudo cpufreq-set -c 2 -g powersave
$sudo cpufreq-set -c 3 -g powersave

In case you have more cores as specified by cpufreq-info set all of them. If you want to get more performance, say because you are running a flash player. (I know they just suck the battery life!!). If you want performance just set the governor option -g to performance. You can also look at the scaling percentages of cpus with the cpufreq-info, in case you see your cpu underperforming. Happy hacking!!!

Sunday, September 30, 2012

All these numbers and what they mean

There are innumerable facts which are indecipherable and paradoxes which make up facets unexplored in terriotories unscaped. I recently happened to talk about the two envelope exchange paradox and how we define the expected return value. The sample space defined can make up many numbers giving me a higher expected return to always make a change in the envelope. This problem is a paradox, because once you made the change and if you were offered the chance again, we would make the switch back. Of course, the problem is innane if we consider only the two envelopes.

To make things interesting, lets say you chose the other envelope, without actually knowing the values quantitaively. Now instead of on the next trial, being offered the same two envelopes, you are offered a new envelope and posed the same argument, would this change your behaviour. The fact is that it is still the same problem, but you somehow think you can deduce the "objective payoff" better.

So what do the numbers mean, the numbers don't exactly mean what they are. It is embedded in a deep sense in the way you see it and understand it. Dabbling in the science of data and visualizations, I have realized there are many ways to look at the same numbers to infer different mechanims and parameters. The objective function bias is very subjective and is inherently fixed in the representation of the problem.

Perhaps, in the future, my bias will be neutral and I will be able to see through the indecipherable facts and paradoxes.

Saturday, September 22, 2012

Deciphering the stats

So, I have been busy the past few months with an experiment in measuring attentional drift. The majority of the time was spent in data analysis. Coming from a computer science, more specifically a c++/python background, I had to get used to the R way of doing things.

I figured I need to learn R and decided to undertake the due process of wrecking my own mind with the absurdities of yet another language. I know I am not an expert at c++ or python, but I feel I can use it to my advantage and organize my coda (pieces of code). I am of course borrowing the phrase from the musical theory, which is definitely more artful than coding. Although the underpinning complexities and structures you may find are similar and the expressive nature of the coda is at hands of an able artist.

After many goof ups and fall downs I now feel I am in a place to talk about R and build an understanding of how it works and why it works. The primary draw of R is the innumerable number of packages you have to accomplish tasks statistical in nature. R is primarily a statistical language and should be used for such purposes.

To install R:

sudo echo "deb http://ftp.ussg.iu.edu/CRAN/bin/linux/ubuntu precise/" >> /etc/apt/sources.list
sudo apt-get udpate
sudo apt-get install r-base r-base-dev

As a first step to the introduction to R I will establish a very nice protocol, I use emacs and ESS. These two together have made my life a lot easier. I also split my sources and write all code in functions. I have to start using objects, hopefully sooner to organize my code.

I shall describe how to install ESS and the basic run in ESS:

To intall ESS just run

$ sudo apt-get intall ess

Once you have installed ESS, go to your emacs and type

M-x R

This should bring up a prompt which asks you for the starting data directory, you can enter the path to the directory you want to work from or just work from the current directory.

You can of course always change the working directory using setwd(path). To see all the files in a directory just say list.files(path). You can include a R source file into the shell by using source(filename).

I will write about installing and using libraries in my next post. Happy coding !!

Saturday, July 21, 2012

Memoize in python and html5

Memoization is a concept from algorithms which finds heavy use in dynamic programming. It is an optimization technique which gains time at the cost of space. So yes, this is an useful technique to know and implement in your algorithms.

The key idea is to have a data store which stores values computed and reuses them to compute the current iteration. The equation is particularly of the form

T(n) = T(n-k1) operator T(n-k2),

In case of the fibonacci series, f(n) = f(n-1) + f(n-2). If you see a computation of this form, you can readily apply a DP technique. Memoization provides a good infrastructure for writing such computations.

The implementation in python is really simple. Apply a decorator, hash the input parameter and if the function is called on the same input just return the already computed value. This is of course assuming all else is stationary and there are no side effects in the program.

Here is a small example from python which is from http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

import collections
import functools

class memoized(object):
   '''Decorator. Caches a function's return value each time it is called.
   If called later with the same arguments, the cached value is returned
   (not reevaluated).
   '''
   def __init__(self, func):
      self.func = func
      self.cache = {}
   def __call__(self, *args):
      if not isinstance(args, collections.Hashable):
         # uncacheable. a list, for instance.
         # better to not cache than blow up.
         return self.func(*args)
      if args in self.cache:
         return self.cache[args]
      else:
         value = self.func(*args)
         self.cache[args] = value
         return value
   def __repr__(self):
      '''Return the function's docstring.'''
      return self.func.__doc__
   def __get__(self, obj, objtype):
      '''Support instance methods.'''
      return functools.partial(self.__call__, obj)

@memoized
def fibonacci(n):
   "Return the nth fibonacci number."
   if n in (0, 1):
      return n
   return fibonacci(n-1) + fibonacci(n-2)

print fibonacci(12)

This is pretty neat if you ask me, saves you a lot of time when dealing with computations.

Now the fun part, HTML5 provides localStorage which is basically a dictionary into which you can write values and read back across page refreshes, AKA semi-persistence. This is sought to be a sort of replacement to cookies and most of the computation is done client side. So you could fetch data from the server and store it here, so you are paying for the network latency everytime.

The access usage goes somewhat like this

if(localStorage){
    if(localStorage['myvalue1'])
        return localStorage['myvalue1']
    else
        localStorage['myvalue1'] = value1

This is readily accessible structure, and should dramatically improve your performance. But as with anything web, there are security concerns. This is visible to anyone or anything with machine access, so

1. Encrypt your keys.
2. Don't store anything extra sensitive here.
3. Ask the user before doing this. Its only fair.

Happy memoization.

Sunday, July 15, 2012

Dependency Injection

Figuring out dependency injection manually is hard, I mean it really really hard. linkedIn has a library for depenecy injection https://github.com/Jakobo/inject. This is really great, but considering I don't care much about doing a require for others to load its not my cake.

underscore.js has unique features _bind to

Thursday, July 5, 2012

Understanding currying

As functional paradigms are increasingly finding ways into my non-functional programs, I felt it was time to upgrade my skills.

I have been dabbling with haskell lately and will update some functional concepts.

Currying: What is it?

Currying is parameter application to functions. In c++, we can have arguments passed to a function.

int foo(type par1, type par2, ...){

    //perform operations on the par's

}

This is really great, but in functional programming paradigm; every function takes only a single argument. Oh no! what are we to do? Well here is where currying comes to the rescue. If you pass a function multiple parameters in functional programming languages, such as scheme and haskell, they perform currying of the function.

How it works

The function takes the first parameter, then another function is applied to the 2nd parameter.

>let x3 x y z = x*y*z 

>x3 2 3 4

24

The above function takes 3 parameters and multiplies them together. This is just repeated application of the * function, right? I could have said (x*y)*z.

This is exactly what currying is doing. Internally I look at it like this

func1 = x3 (x)

func2 = func1(y)

func3 = func2(z)

The function is applied to one argument, the applied function is then called on the 2nd argument and so on.

This can be used with partial functions to create simple and intuitive data flow mechanisms.

Wednesday, July 4, 2012

haskell primitives

Haskell the lazy purely functional programming

1. The basic primitives

This is the greatest

Matlab Parfor

In the parallelization sphere of things, matlab is pretty up-to the mark. There are a lot of ways to parallelize in matlab.

The easiest is to create a matlabpool of workers and run you for loops in parallel using parfor. (the upper limit is 12 on a local machine) This is great if you using this on a local machine to prototype some application, but if you do want to deploy your application (compile and make an executable); you run into problems.

Friday, June 29, 2012

Matlab Profiling

Matlab comes built in with a profiler. It is much akin to the profiler in valgrind, only that in true matlab fashion it generates a html report of the profiler.

Profiling can be used to look at specfic runs of functions and the call tree, it generates statistics like the number of times a function was called and how much time a function takes to execute.

This kind of information can provide key insight into blockages, especially on the IO front. For example, in C++ it is better to use "\n" than endl to mark end of lines.endl flushes the buffer and adds consdierable overhead. In matlab, you don't figure out why some functions are slow, you can use profiling to pinpoint the exact line causing the bottleneck. This can be crucial especially if you deal with big datasets and/or heavy computation.

So, if you now have code which you want to profile and say save the stats and measure, simply run the following commands and it will be done.

Saturday, March 31, 2012

Reverse engineering the ising model

I was trying to solve this problem in image segmentation. The key idea in segmentation is to group pixels which seem spatially similar owing to properties of color, region and body. There is no one single segmentation which is correct, it is a matter of perception. Perception of images varies by individual; what I perceive in an image, you may not. Its like looking at monet or chagall, my opinions of masterpiece may be lost on you.

So, starting with this how does end up with the ising model. One starts by applying a markov random field to the image and running a max-flow on the image graph to get the segmentation. Standard fare combinatorial optimization problems some say.

My introduction to markov process is from a NLP standpoint, pretty informally again (understanding the process, and not reading a whole bunch of literature). I understood the conditional inference scheme depended only on the previous state(s) and not all the past states. With this understanding I embarked to solve this problem (maybe I needed better accoutrements, but this too shall pass).

How does one group anything? A general consensus is to put all things similar in a heap and call it a group. The question then arises, "how do you define similarity in images?". This led me down the path of wave particle duality, although we only see images as composed of particles (aka pixels). It took me a little while to understand what I was solving was not a computer science question but a physics question. I imagined there should be a force applied on the pixels to conform them to some shape, in essence changing their basic property (read color intensity value). If you think of a child grouping his toys into jedi and dark forces for a epic battle, he pushes all the dark ones into a group. The act of "pushing" is a force, which is the same concept for the image. Aha, but there is catch, you cannot move the pixels spatially like the little boy. A little confusing I agree.

So then, the story thus far looks good but we still don't have a solution. Now I can use the markov knowledge from a previous class. Imagine a bunch of patches or markers hanging in the air above the ground, and you image is painted for you by monet . For simplicity let's consider 2 patches, white and black. If we have a background and foreground in an image, lets say the white patches are attracted to the background and black patches to the foreground. Now if we can define an attraction force between the pixel and the patch we are home scott free. Herein comes the spatial arrangement problem, pixels close to each other should have the same patches. (I agree with this notion in a general sense, but not completely, occlusion and a few other factors like spatial incoherence, which can be seen in illusions make me a little uncomfortable with this generalization).

Steaming ahead with this idea, we now have 2 forces, one force between the patch (in the air) and the pixel in the image on the ground and another force between the pixels on the ground making them stick close together (I also think of a force between the patches and regions, making up a complete field but that too shall pass for now).

With this scheme of things, we can define a function to to either minimize or maximize over all the variables we have. The norm is to choose a energy function here and the general idea is to have a energy minimization or maximization,

E(x) = Ed + Es.

Here Ed is the data term and Es the smoothing term. The data term makes makes up some energy but only the pixel data and no neighborhood function. The smoothness term uses the neighborhood function and ensures we get the same labeling for pixels within the same region.

You can read more about this scheme here - http://www.cis.upenn.edu/~jshi/GraphTutorial/

So now that we have that down, we need equate the force to energy. If we assume a gravitational force, it will be attraction and we will have higher force for correct labelling and we can maximize the energy function to get a stable labeling of the pixels. On the other hand, if we consider a magnetic field, where like particles repel and unlike don't. We need to minimize the function to get a correct labeling of the images, which is the accepted norm in vision.

The concepts here are grossly oversimplified to help me understand the workings and make a intuitive sense of the process underlying the markov field. As with everything, the disclaimer, I do not know everything and I can get things wrong.

Sunday, February 5, 2012

Intro to vision - Images and representation

I am dabbling in computer vision and it helps if I can gather my thoughts. (This field is not as simple as I imagined it to be.) There are a lot of background and information which goes missing , and I hate reading the unabridged versions of text. So, here goes:

Vision is primarily concerned with images, video can be treated as a series of images at a certain rate or frequency. There is the 2-D and 3-D representation of images. I shall not talk about 3-D since my knowledge on that is limited at best.

What does 2-D representation mean? Backtracking a little, every image is represented by a series of pixels, these pixels store some information. In a 2-D representation, the pixels are spread across 2 dimensions, lets say X and Y (for convenience). This is generally represented in matrix form, where the top-left pixel of the image is the top-left value in the matrix.

What are these pixel thingies? These "pixels" can hold different types of information. One common type in vision is the RGB space, where we represent the color space of the image with the composites of red, blue and green. Each of these are called the channels of the color space. Ideally, we use 8 bits for each color, thereby 2^8 = 255 possible values for each component. You can also represent the image using 18-bit color, 32-bit color and 48-bit color. These various color spaces correspond to different color depths. TrueColor is the 24-bit color, with each channel RGB getting 8 bits.

There are also CMYK, sRGB, scRGB color spaces which are used in HDR (photo enthusiasts alert), but vision folks generally deal with RGB or BGR(reverse RGB representation) color spaces.

For more on the color representation look here and dig up more - http://en.wikipedia.org/wiki/Color_depth

Although this color representation is great it does not give us an intuitive idea of color. If for example I want orange, its R=255,G=142,B=13. If I want a darker orange its R=210,G=65,B=0. So, it becomes hard to play with these colors and we use the HSI (Hue, Saturation and Intensity model). This is seen in displays and color pickers across the digital image applications. We can easily get a darker or lighter color by varying the saturation making our lives much easier.

It is possible to convert from one space to another, and most libraries have api's for easy conversion.

So, if we see a image, we can imagine it as a matrix containing 24bits in each matrix cell to represent a pixel on the image.

Saturday, January 21, 2012

Mobius Strip

The mobius strip is a wonderful geometric conundrum. It perplexes you since it is a closed 3-D object with only one edge (lo and behold!). It is really easy to make one of these, take a piece of paper and twist it clockwise on one edge, counter clockwise on the other edge join the two edges and you have this quizzical geometry.

This image describes it correctly:

Why am I putting this here? Well, I have this annoying habit of setting wires and straps on bags straight. I recently ran into this mobius strip on a bag, seeing that I could not make it straight made me wince. I could not come up with a name to attach and put my frustration on. Now I can. :)

Excavations in tech land