Excavations in tech land

Sunday, January 20, 2013

My love hate relationship with ubuntu

I have been a long time user of ubuntu. It all started roughly some 10 years ago, one of my friends ordered the free CD from ubuntu to bangalore. The CD in our possession we went about installing "ubuntu" on the system we had. It was not as simple as it is now and I enjoyed pulling the partition apart from windows. I remember we had the ext2 system and we had to set the swap and boot space manually according to the RAM I had on my system (twice the RAM). Ubuntu was the lesser known OS and I was happy twiddling with its knobs while using fedora for most of my work. (I also played with suse, debian and mandrake. Gentoo was the uber cool linux then, and having a successful gentoo gave you bragging rights)

Since that eventful day I had been using ubuntu and fedora in conjunction. Over the last few years ubuntu's proliferation into the mainstream has led me to use only ubuntu, devoid of any flavors. (Also the fact that I have lesser free time and have been losing my patience with OS installs) Over the last year canonical has gone evil. It started with ports which only support unity (no more gnome by default), I was surviving that with some ease (I am a fairly adaptable creature) but then they went around and removed GDM. Now unity just randomly shuts down on me. Its annoying as shit to setsid unity everyone couple of weeks.

I was perfectly alright with that, knowing the dev gods would fix and patch it soon. Now, its almost been 6 months and I'm agonized with this bug - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1063354. My first thought was an issue with the hdd (bad sectors) but a smart scan revealed nothing. I tried playing with some power settings and started shutting down my PC at night. I had even thought of replacing my 3 year old battered hard disk. It took me a while to figure out its not my hardware or my usage. It is an issue with the new kernel and ext4. This is also seen in other flavors of linux to my knowledge (needs support). This makes me think, maybe I should have left windows on for the rainy days. Although its a shitty software, windows lets me work in peace (apart from annoying pop-ups, random restarts and viruses) and I'm not wasting time and money trying to find fixes and replacing my hard disk.

Why ubuntu, why, have I not loved you enough?

Wednesday, December 5, 2012

Python threading

There are two modules to python threading, and many caveats:

Python has two modules, thread and threading - thread is lower level, while threading is higher level. This means threading uses threads or sort thereof. Threading provides a lot of sychronization, event and timing primitives which will help with a multi threaded program.

The important thing to note however is the GLOBAL INTERPRETER LOCK, which essentially means there can be only one interpreter thread running at any given time. Any thread you create is going to be run in time slices. This limits the functionality of threads in that they are no longer suitable for parallel/concurrent execution. The reason for python to do this is to provide for safety in multi-threaded systems. Many structures can be assumed to be thread safe and run inside threads without many problems. Communication between threads also become easy. The problem arises when you need a very real time concurrent execution to make use of the immense power of your multicore hardware architecture. For this python has the multiprocessing module. Of course, now you may ask; much like me, what is the multithreading module good for then? It is good if you have multiple slow methods which wait on process such as I/O. Instead of busy waiting you can yield control during the time slice to another thread which will continue crunching the numbers for you, while the slow I/O is trudging behind on its work. This is all the threading module is designed to achieve in my view. Of course you could use it as a flow control mechanism, but I believe that would be overkill.

The caveats are many, mostly with the thread module (straight from python docs).

Threads interact strangely with interrupts: the KeyboardInterrupt exception will be received by an arbitrary thread. (When the signal module is available, interrupts always go to the main thread.)
Calling sys.exit() or raising the SystemExit exception is equivalent to calling thread.exit().
Not all built-in functions that may block waiting for I/O allow other threads to run. (The most popular ones (time.sleep(), file.read(), select.select()) work as expected.)
It is not possible to interrupt the acquire() method on a lock — the KeyboardInterrupt exception will happen after the lock has been acquired.
When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executingtry ... finally clauses or executing object destructors.
When the main thread exits, it does not do any of its usual cleanup (except that try ... finally clauses are honored), and the standard I/O files are not flushed.

Of course, the most important to remember are if you capturing keyboard interrupt, you have no idea which thread receives the signal. Some block waiting calls don't allow other threads to run (I learnt this the hard way).

With this in mind, thread away your python.

Thursday, October 25, 2012

thou shall ask but not receive

Dynamic frequency scaling or CPU throttling is the buzzword for chipset geeks. It is implemented in most hardware processors and supported by most modern OS's. The intel version of this is called speedstep and amd calls it powernow. The idea is simple dynamically increase or decrease the cpu frequency thereby reducing the power usage and heat requirements leading to power savings.

Of course, in linux we have the complete control over all aspects of computing and you can control your cpu speed with cpufrequtils.

Run the following to install cpufrequtils:

$sudo apt-get install cpufrequtils

The next step is to learn about cpufrequtils, it has two programs: cpufreq-info and cpufreq-set. Just run cpufreq-info to get all the info you need on the cpu frequency settings. The important things to note are the governor values and the available step frequencies. Then you can just run the cpufreq-set to set it at the desired frequency. The painful thing is you have to set it for a given cpu at a time.
This should set the cpus to a powersave mode:

$sudo cpufreq-set -c 0 -g powersave
$sudo cpufreq-set -c 1 -g powersave
$sudo cpufreq-set -c 2 -g powersave
$sudo cpufreq-set -c 3 -g powersave

In case you have more cores as specified by cpufreq-info set all of them. If you want to get more performance, say because you are running a flash player. (I know they just suck the battery life!!). If you want performance just set the governor option -g to performance. You can also look at the scaling percentages of cpus with the cpufreq-info, in case you see your cpu underperforming. Happy hacking!!!

Sunday, September 30, 2012

All these numbers and what they mean

There are innumerable facts which are indecipherable and paradoxes which make up facets unexplored in terriotories unscaped. I recently happened to talk about the two envelope exchange paradox and how we define the expected return value. The sample space defined can make up many numbers giving me a higher expected return to always make a change in the envelope. This problem is a paradox, because once you made the change and if you were offered the chance again, we would make the switch back. Of course, the problem is innane if we consider only the two envelopes.

To make things interesting, lets say you chose the other envelope, without actually knowing the values quantitaively. Now instead of on the next trial, being offered the same two envelopes, you are offered a new envelope and posed the same argument, would this change your behaviour. The fact is that it is still the same problem, but you somehow think you can deduce the "objective payoff" better.

So what do the numbers mean, the numbers don't exactly mean what they are. It is embedded in a deep sense in the way you see it and understand it. Dabbling in the science of data and visualizations, I have realized there are many ways to look at the same numbers to infer different mechanims and parameters. The objective function bias is very subjective and is inherently fixed in the representation of the problem.

Perhaps, in the future, my bias will be neutral and I will be able to see through the indecipherable facts and paradoxes.

Saturday, September 22, 2012

Deciphering the stats

So, I have been busy the past few months with an experiment in measuring attentional drift. The majority of the time was spent in data analysis. Coming from a computer science, more specifically a c++/python background, I had to get used to the R way of doing things.

I figured I need to learn R and decided to undertake the due process of wrecking my own mind with the absurdities of yet another language. I know I am not an expert at c++ or python, but I feel I can use it to my advantage and organize my coda (pieces of code). I am of course borrowing the phrase from the musical theory, which is definitely more artful than coding. Although the underpinning complexities and structures you may find are similar and the expressive nature of the coda is at hands of an able artist.

After many goof ups and fall downs I now feel I am in a place to talk about R and build an understanding of how it works and why it works. The primary draw of R is the innumerable number of packages you have to accomplish tasks statistical in nature. R is primarily a statistical language and should be used for such purposes.

To install R:

sudo echo "deb http://ftp.ussg.iu.edu/CRAN/bin/linux/ubuntu precise/" >> /etc/apt/sources.list
sudo apt-get udpate
sudo apt-get install r-base r-base-dev

As a first step to the introduction to R I will establish a very nice protocol, I use emacs and ESS. These two together have made my life a lot easier. I also split my sources and write all code in functions. I have to start using objects, hopefully sooner to organize my code.

I shall describe how to install ESS and the basic run in ESS:

To intall ESS just run

$ sudo apt-get intall ess

Once you have installed ESS, go to your emacs and type

M-x R

This should bring up a prompt which asks you for the starting data directory, you can enter the path to the directory you want to work from or just work from the current directory.

You can of course always change the working directory using setwd(path). To see all the files in a directory just say list.files(path). You can include a R source file into the shell by using source(filename).

I will write about installing and using libraries in my next post. Happy coding !!

Saturday, July 21, 2012

Memoize in python and html5

Memoization is a concept from algorithms which finds heavy use in dynamic programming. It is an optimization technique which gains time at the cost of space. So yes, this is an useful technique to know and implement in your algorithms.

The key idea is to have a data store which stores values computed and reuses them to compute the current iteration. The equation is particularly of the form

T(n) = T(n-k1) operator T(n-k2),

In case of the fibonacci series, f(n) = f(n-1) + f(n-2). If you see a computation of this form, you can readily apply a DP technique. Memoization provides a good infrastructure for writing such computations.

The implementation in python is really simple. Apply a decorator, hash the input parameter and if the function is called on the same input just return the already computed value. This is of course assuming all else is stationary and there are no side effects in the program.

Here is a small example from python which is from http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

import collections
import functools

class memoized(object):
   '''Decorator. Caches a function's return value each time it is called.
   If called later with the same arguments, the cached value is returned
   (not reevaluated).
   '''
   def __init__(self, func):
      self.func = func
      self.cache = {}
   def __call__(self, *args):
      if not isinstance(args, collections.Hashable):
         # uncacheable. a list, for instance.
         # better to not cache than blow up.
         return self.func(*args)
      if args in self.cache:
         return self.cache[args]
      else:
         value = self.func(*args)
         self.cache[args] = value
         return value
   def __repr__(self):
      '''Return the function's docstring.'''
      return self.func.__doc__
   def __get__(self, obj, objtype):
      '''Support instance methods.'''
      return functools.partial(self.__call__, obj)

@memoized
def fibonacci(n):
   "Return the nth fibonacci number."
   if n in (0, 1):
      return n
   return fibonacci(n-1) + fibonacci(n-2)

print fibonacci(12)

This is pretty neat if you ask me, saves you a lot of time when dealing with computations.

Now the fun part, HTML5 provides localStorage which is basically a dictionary into which you can write values and read back across page refreshes, AKA semi-persistence. This is sought to be a sort of replacement to cookies and most of the computation is done client side. So you could fetch data from the server and store it here, so you are paying for the network latency everytime.

The access usage goes somewhat like this

if(localStorage){
    if(localStorage['myvalue1'])
        return localStorage['myvalue1']
    else
        localStorage['myvalue1'] = value1

This is readily accessible structure, and should dramatically improve your performance. But as with anything web, there are security concerns. This is visible to anyone or anything with machine access, so

1. Encrypt your keys.
2. Don't store anything extra sensitive here.
3. Ask the user before doing this. Its only fair.

Happy memoization.

Sunday, July 15, 2012

Dependency Injection

Figuring out dependency injection manually is hard, I mean it really really hard. linkedIn has a library for depenecy injection https://github.com/Jakobo/inject. This is really great, but considering I don't care much about doing a require for others to load its not my cake.

underscore.js has unique features _bind to

Thursday, July 5, 2012

Understanding currying

As functional paradigms are increasingly finding ways into my non-functional programs, I felt it was time to upgrade my skills.

I have been dabbling with haskell lately and will update some functional concepts.

Currying: What is it?

Currying is parameter application to functions. In c++, we can have arguments passed to a function.

int foo(type par1, type par2, ...){

    //perform operations on the par's

}

This is really great, but in functional programming paradigm; every function takes only a single argument. Oh no! what are we to do? Well here is where currying comes to the rescue. If you pass a function multiple parameters in functional programming languages, such as scheme and haskell, they perform currying of the function.

How it works

The function takes the first parameter, then another function is applied to the 2nd parameter.

>let x3 x y z = x*y*z 

>x3 2 3 4

24

The above function takes 3 parameters and multiplies them together. This is just repeated application of the * function, right? I could have said (x*y)*z.

This is exactly what currying is doing. Internally I look at it like this

func1 = x3 (x)

func2 = func1(y)

func3 = func2(z)

The function is applied to one argument, the applied function is then called on the 2nd argument and so on.

This can be used with partial functions to create simple and intuitive data flow mechanisms.

Wednesday, July 4, 2012

haskell primitives

Haskell the lazy purely functional programming

1. The basic primitives

This is the greatest

Matlab Parfor

In the parallelization sphere of things, matlab is pretty up-to the mark. There are a lot of ways to parallelize in matlab.

The easiest is to create a matlabpool of workers and run you for loops in parallel using parfor. (the upper limit is 12 on a local machine) This is great if you using this on a local machine to prototype some application, but if you do want to deploy your application (compile and make an executable); you run into problems.

Friday, June 29, 2012

Matlab Profiling

Matlab comes built in with a profiler. It is much akin to the profiler in valgrind, only that in true matlab fashion it generates a html report of the profiler.

Profiling can be used to look at specfic runs of functions and the call tree, it generates statistics like the number of times a function was called and how much time a function takes to execute.

This kind of information can provide key insight into blockages, especially on the IO front. For example, in C++ it is better to use "\n" than endl to mark end of lines.endl flushes the buffer and adds consdierable overhead. In matlab, you don't figure out why some functions are slow, you can use profiling to pinpoint the exact line causing the bottleneck. This can be crucial especially if you deal with big datasets and/or heavy computation.

So, if you now have code which you want to profile and say save the stats and measure, simply run the following commands and it will be done.

Saturday, March 31, 2012

Reverse engineering the ising model

I was trying to solve this problem in image segmentation. The key idea in segmentation is to group pixels which seem spatially similar owing to properties of color, region and body. There is no one single segmentation which is correct, it is a matter of perception. Perception of images varies by individual; what I perceive in an image, you may not. Its like looking at monet or chagall, my opinions of masterpiece may be lost on you.

So, starting with this how does end up with the ising model. One starts by applying a markov random field to the image and running a max-flow on the image graph to get the segmentation. Standard fare combinatorial optimization problems some say.

My introduction to markov process is from a NLP standpoint, pretty informally again (understanding the process, and not reading a whole bunch of literature). I understood the conditional inference scheme depended only on the previous state(s) and not all the past states. With this understanding I embarked to solve this problem (maybe I needed better accoutrements, but this too shall pass).

How does one group anything? A general consensus is to put all things similar in a heap and call it a group. The question then arises, "how do you define similarity in images?". This led me down the path of wave particle duality, although we only see images as composed of particles (aka pixels). It took me a little while to understand what I was solving was not a computer science question but a physics question. I imagined there should be a force applied on the pixels to conform them to some shape, in essence changing their basic property (read color intensity value). If you think of a child grouping his toys into jedi and dark forces for a epic battle, he pushes all the dark ones into a group. The act of "pushing" is a force, which is the same concept for the image. Aha, but there is catch, you cannot move the pixels spatially like the little boy. A little confusing I agree.

So then, the story thus far looks good but we still don't have a solution. Now I can use the markov knowledge from a previous class. Imagine a bunch of patches or markers hanging in the air above the ground, and you image is painted for you by monet . For simplicity let's consider 2 patches, white and black. If we have a background and foreground in an image, lets say the white patches are attracted to the background and black patches to the foreground. Now if we can define an attraction force between the pixel and the patch we are home scott free. Herein comes the spatial arrangement problem, pixels close to each other should have the same patches. (I agree with this notion in a general sense, but not completely, occlusion and a few other factors like spatial incoherence, which can be seen in illusions make me a little uncomfortable with this generalization).

Steaming ahead with this idea, we now have 2 forces, one force between the patch (in the air) and the pixel in the image on the ground and another force between the pixels on the ground making them stick close together (I also think of a force between the patches and regions, making up a complete field but that too shall pass for now).

With this scheme of things, we can define a function to to either minimize or maximize over all the variables we have. The norm is to choose a energy function here and the general idea is to have a energy minimization or maximization,

E(x) = Ed + Es.

Here Ed is the data term and Es the smoothing term. The data term makes makes up some energy but only the pixel data and no neighborhood function. The smoothness term uses the neighborhood function and ensures we get the same labeling for pixels within the same region.

You can read more about this scheme here - http://www.cis.upenn.edu/~jshi/GraphTutorial/

So now that we have that down, we need equate the force to energy. If we assume a gravitational force, it will be attraction and we will have higher force for correct labelling and we can maximize the energy function to get a stable labeling of the pixels. On the other hand, if we consider a magnetic field, where like particles repel and unlike don't. We need to minimize the function to get a correct labeling of the images, which is the accepted norm in vision.

The concepts here are grossly oversimplified to help me understand the workings and make a intuitive sense of the process underlying the markov field. As with everything, the disclaimer, I do not know everything and I can get things wrong.

Sunday, February 5, 2012

Intro to vision - Images and representation

I am dabbling in computer vision and it helps if I can gather my thoughts. (This field is not as simple as I imagined it to be.) There are a lot of background and information which goes missing , and I hate reading the unabridged versions of text. So, here goes:

Vision is primarily concerned with images, video can be treated as a series of images at a certain rate or frequency. There is the 2-D and 3-D representation of images. I shall not talk about 3-D since my knowledge on that is limited at best.

What does 2-D representation mean? Backtracking a little, every image is represented by a series of pixels, these pixels store some information. In a 2-D representation, the pixels are spread across 2 dimensions, lets say X and Y (for convenience). This is generally represented in matrix form, where the top-left pixel of the image is the top-left value in the matrix.

What are these pixel thingies? These "pixels" can hold different types of information. One common type in vision is the RGB space, where we represent the color space of the image with the composites of red, blue and green. Each of these are called the channels of the color space. Ideally, we use 8 bits for each color, thereby 2^8 = 255 possible values for each component. You can also represent the image using 18-bit color, 32-bit color and 48-bit color. These various color spaces correspond to different color depths. TrueColor is the 24-bit color, with each channel RGB getting 8 bits.

There are also CMYK, sRGB, scRGB color spaces which are used in HDR (photo enthusiasts alert), but vision folks generally deal with RGB or BGR(reverse RGB representation) color spaces.

For more on the color representation look here and dig up more - http://en.wikipedia.org/wiki/Color_depth

Although this color representation is great it does not give us an intuitive idea of color. If for example I want orange, its R=255,G=142,B=13. If I want a darker orange its R=210,G=65,B=0. So, it becomes hard to play with these colors and we use the HSI (Hue, Saturation and Intensity model). This is seen in displays and color pickers across the digital image applications. We can easily get a darker or lighter color by varying the saturation making our lives much easier.

It is possible to convert from one space to another, and most libraries have api's for easy conversion.

So, if we see a image, we can imagine it as a matrix containing 24bits in each matrix cell to represent a pixel on the image.

Saturday, January 21, 2012

Mobius Strip

The mobius strip is a wonderful geometric conundrum. It perplexes you since it is a closed 3-D object with only one edge (lo and behold!). It is really easy to make one of these, take a piece of paper and twist it clockwise on one edge, counter clockwise on the other edge join the two edges and you have this quizzical geometry.

This image describes it correctly:

Why am I putting this here? Well, I have this annoying habit of setting wires and straps on bags straight. I recently ran into this mobius strip on a bag, seeing that I could not make it straight made me wince. I could not come up with a name to attach and put my frustration on. Now I can. :)

Monday, September 26, 2011

matplotlib thou art not matlab

I am a little taken aback by the matplotlib apis, their examples don't seem that clear on data visualization. Being a former matlab user, I switched to matplotlib as it gives me access to both nltk, opencv, numpy/scipy. But in hindsight, I could just use the insane matlab toolboxes for these purposes and be done with it.

Saturday, June 25, 2011

Shogun toolkit

A fellow researcher (read arun) in semantic parsing and sentiment analysis directed me towards the shogun toolbox. Amazing features; it implements a range of weighted kernels, SVM and HMMs. It has interfaces to python, matlab and R (in my order of preference). It also supports ascii, Json and xml data formats. The best part is its got about 600 examples (albeit not in pydoc and for dna data!!!); a few I read took me about 5 mins to understand. (5 mins, now to get someone more brilliant to explain everything else I need, R2-D2, scooby-doo)

So now if I am rapid prototyping some algorithm I can do a instant analysis for many SVM methods and also HMM classification in very little time. Now maybe I can prove my algorithms are seriously inefficient and need lots of work or they can kick ass with the best of methods. (I am rooting for the latter)

I know PyML is slow and trying to catch up, but if only they just used everything form shogun and wrote new wrappers in PyML form and their fantastic doc. (I do wish a lot!!)

Time to fire the infinite improb drive and head to my bed.

Tuesday, June 14, 2011

Woes in data representation

I have been dabbling with numpy for a while, creating serialized objects and memory mapped files. The numpy package is great because it give s you the flexibility of using python, it is definitely not as fast as implementing the same in C, but you so have the alternative to use Cython (well if you do care).

I have written a converter for the date object, I was trying to look at time series analysis, but turns out if I am using a converter I need to return in float or string and cannot store in date object format. What a bummer!!

I don't have an idea how to use serialized objects or the parallel in matlab, the next quest learn the magical techniques in matlab.

Wednesday, December 1, 2010

Discovering order in chaos and learning to control it

A course I have taken this semester has lent my imagination a new kick-start. When you learn about Karl Sims' creatures, Hod Lipson's data extraction, Turing's reaction diffusion models and Neumann's self replicator, you are dumb founded, inspired, amazed.

To this day I believe in my skills as an observer and experimenter, but when I read about Neumann and his penchant to visualize processes, trace phenomena correctly without any knowledge I am left speechless. Turing's quirkiness was particularly well known, but his thought about finding patterns through morphogenesis and obviously the Turing machine. The present day researchers are not too far behind, Lipson's work seems interesting and very promising.

Now that I have all that out of my system, let me talk about these few guys from the max planck institute who have aroused my curiosity and beckoned my imagination to sit and ponder of the what-ifs and could-bes. This bunch of people who have been featured in nature - http://www.nature.com/nphys/journal/v6/n3/full/nphys1508.html have observed unique neural behaviour in insects and captured it in a simple neuron model whose ground state is chaotic but is controlled by sensory inputs to generate behaviours. These discoveries in the neural behaviour and contruction could shed light on the actual neural processes in humans leading to potentially simple and effective models of behaviour. In combination with techniques like subsumption architectures(doomed as a failure or atleast not so successful), we could look at simple and evolvable solutions to various problems in locomotion and path planning.

Going further, if we could apply a chaotic organization mechanism to the reasoning process in AI, we would have made great advances. The reasoning scheme in AI has huge problems with representation and these problems will continue to haunt us, but some progress towards such self organizing behavior would definitely go a long way.

Thursday, November 4, 2010

Player - stage Installation

Installing the player/stage I ran into a few problems and did not find good solutions, I am posting what worked for me-

1. Download the player project
2. Extract the player folder
3. sudo apt-get install python-gdal netcdfg-dev libpq-dev libhdf4g-dev libgeos-dev libatk1.0-0 lib3ds-dev freeglut3-dev
4. Run cmake (if you want create a directory for build like the install guide specfies)
5. Run make
6. Run sudo make install
7. add the following to your .bashrc file

export PATH=$PATH:"/usr/local/lib64"
export PLAYERPATH="/usr/local/lib":"/usr/local/lib64"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"/usr/local/lib":"/usr/local/lib64"

This should get you to set up.

This link http://www.control.aau.dk/~tb/wiki/index.php/Installing_Player_and_Stage_in_Ubuntu has a lot of information for installing and configuring your player information

Sunday, October 3, 2010

The amazing fractals

Fractals are a intricate design pattern generally self repetitive and aesthetically pleasing. They are things you see in everyday life, snowflakes, coast lines, fruits. Nature likes to deal with fractals and why not considering there is such symmetry.

Being introduced to fractals in my course and learning about the pioneering fractals has been pleasurable. Learning that Turing was really interested in reaction diffusions for morphogenesis and made some pioneering contributions has only increased my admiration of the queer genius.

Some of the images do look like the Rorschach test replications, they seem nice, generally colourful in a hue of blue, yellow or red. Turing was trying to set up a mathematical or state model explanation for the generation of patterns in nature. How does a cheetah get its dots, how does a zebra get its stripes and the general things common people never ponder over.

It is amazing how a small set of rules and parameters generate the most intricate patterns. You can have a look at them here : http://cgjennings.ca/toybox/turingmorph/. There is a nice applet which animates the whole process.