Sunday, January 20, 2013

My love hate relationship with ubuntu

I have been a long time user of ubuntu. It all started roughly some 10 years ago, one of my friends ordered the free CD from ubuntu to bangalore. The CD in our possession  we went about installing "ubuntu" on the system we had. It was not as simple as it is now and I enjoyed pulling the partition apart from windows. I remember we had the ext2 system and we had to set the swap and boot space manually according to the RAM I had on my system (twice the RAM). Ubuntu was the lesser known OS and I was happy twiddling with its knobs while using fedora for most of my work. (I also played with suse, debian and mandrake.  Gentoo was the uber cool linux then, and having a successful gentoo gave you bragging rights)

Since that eventful day I had been using ubuntu and fedora in conjunction. Over the last few years ubuntu's proliferation into the mainstream has led me to use only ubuntu, devoid of any flavors. (Also the fact that I have lesser free time and have been losing my patience with OS installs) Over the last year canonical has gone evil. It started with ports which only support unity (no more gnome by default), I was surviving that with some ease (I am a fairly adaptable creature) but then they went around and removed GDM. Now unity just randomly shuts down on me. Its annoying as shit to setsid unity everyone couple of weeks.

I was perfectly alright with that, knowing the dev gods would fix and patch it soon. Now, its almost been 6 months and I'm agonized with this bug - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1063354. My first thought was an issue with the hdd (bad sectors) but a smart scan revealed nothing. I tried playing with some power settings and started shutting down my PC at night. I had even thought of replacing my 3 year old battered hard disk. It took me a while to figure out its not my hardware or my usage. It is an issue with the new kernel and ext4. This is also seen in other flavors of linux to my knowledge (needs support). This makes me think, maybe I should have left windows on for the rainy days. Although its a shitty software, windows lets me work in peace (apart from annoying pop-ups, random restarts and viruses) and I'm not wasting time and money trying to find fixes and replacing my hard disk.

Why ubuntu, why, have I not loved you enough?

Wednesday, December 5, 2012

Python threading

There are two modules to python threading, and many caveats:

Python has two modules, thread and threading - thread is lower level, while threading is higher level. This means threading uses threads or sort thereof. Threading provides a lot of sychronization, event and timing primitives which will help with a multi threaded program. 

The important thing to note however is the GLOBAL INTERPRETER LOCK, which essentially means there can be only one interpreter thread running at any given time. Any thread you create is going to be run in time slices. This limits the functionality of threads in that they are no longer suitable for parallel/concurrent execution. The reason for python to do this is to provide for safety in multi-threaded systems. Many structures can be assumed to be thread safe and run inside threads without many problems. Communication between threads also become easy. The problem arises when you need a very real time concurrent execution to make use of the immense power of your multicore hardware architecture. For this python has the multiprocessing module. Of course, now you may ask; much like me, what is the multithreading module good for then? It is good if you have multiple slow methods which wait on process such as I/O. Instead of busy waiting you can yield control during the time slice to another thread which will continue crunching the numbers for you, while the slow I/O is trudging behind on its work. This is all the threading module is designed to achieve in my view. Of course you could use it as a flow control mechanism, but I believe that would be overkill. 

The caveats are many, mostly with the thread module (straight from python docs).


  • Threads interact strangely with interrupts: the KeyboardInterrupt exception will be received by an arbitrary thread. (When the signal module is available, interrupts always go to the main thread.)
  • Calling sys.exit() or raising the SystemExit exception is equivalent to calling thread.exit().
  • Not all built-in functions that may block waiting for I/O allow other threads to run. (The most popular ones (time.sleep()file.read()select.select()) work as expected.)
  • It is not possible to interrupt the acquire() method on a lock — the KeyboardInterrupt exception will happen after the lock has been acquired.
  • When the main thread exits, it is system defined whether the other threads survive. On SGI IRIX using the native thread implementation, they survive. On most other systems, they are killed without executingtry ... finally clauses or executing object destructors.
  • When the main thread exits, it does not do any of its usual cleanup (except that try ... finally clauses are honored), and the standard I/O files are not flushed.
Of course, the most important to remember are if you capturing keyboard interrupt, you have no idea which thread receives the signal. Some block waiting calls don't allow other threads to run (I learnt this the hard way). 

With this in mind, thread away your python.

Thursday, October 25, 2012

thou shall ask but not receive

Dynamic frequency scaling or CPU throttling is the buzzword for chipset geeks. It is implemented in most hardware processors and supported by most modern OS's. The intel version of this is called speedstep and amd calls it powernow. The idea is simple dynamically increase or decrease the cpu frequency thereby reducing the power usage and heat requirements leading to power savings.

Of course, in linux we have the complete control over all aspects of computing and you can control your cpu speed with cpufrequtils.

Run the following to install cpufrequtils:

$sudo apt-get install cpufrequtils

The next step is to learn about cpufrequtils, it has two programs: cpufreq-info and cpufreq-set. Just run cpufreq-info to get all the info you need on the cpu frequency settings. The important things to note are the governor values and the available step frequencies. Then you can just run the cpufreq-set to set it at the desired frequency. The painful thing is you have to set it for a given cpu at a time.
This should set the cpus to a powersave mode:
$sudo cpufreq-set -c 0 -g powersave
$sudo cpufreq-set -c 1 -g powersave
$sudo cpufreq-set -c 2 -g powersave
$sudo cpufreq-set -c 3 -g powersave

In case you have more cores as specified by cpufreq-info set all of them. If you want to get more performance, say because you are running a flash player. (I know they just suck the battery life!!). If you want performance just set the governor option -g to performance. You can also look at the scaling percentages of cpus with the cpufreq-info, in case you see your cpu underperforming. Happy hacking!!!

Sunday, September 30, 2012

All these numbers and what they mean

There are innumerable facts which are indecipherable and paradoxes which make up facets unexplored in terriotories unscaped. I recently happened to talk about the two envelope exchange paradox and how we define the expected return value. The sample space defined can make up many numbers giving me a higher expected return to always make a change in the envelope. This problem is a paradox, because once you made the change and if you were offered the chance again, we would make the switch back. Of course, the problem is innane if we consider only the two envelopes.

To make things interesting, lets say you chose the other envelope, without actually knowing the values quantitaively. Now instead of on the next trial, being offered the same two envelopes, you are offered a new envelope and posed the same argument, would this change your behaviour. The fact is that it is still the same problem, but you somehow think you can deduce the "objective payoff" better.

So what do the numbers mean, the numbers don't exactly mean what they are. It is embedded in a deep sense in the way you see it and understand it. Dabbling in the science of data and visualizations, I have realized there are many ways to look at the same numbers to infer different mechanims and parameters. The objective function bias is very subjective and is inherently fixed in the representation of the problem.

Perhaps, in the future, my bias will be neutral and I will be able to see through the indecipherable facts and paradoxes.

Saturday, September 22, 2012

Deciphering the stats

So, I have been busy the past few months with an experiment in measuring attentional drift. The majority of the time was spent in data analysis. Coming from a computer science, more specifically a c++/python background, I had to get used to the R way of doing things.

I figured I need to learn R and decided to undertake the due process of wrecking my own mind with the absurdities of yet another language. I know I am not an expert at c++ or python, but I feel I can use it to my advantage and organize my coda (pieces of code). I am of course borrowing the phrase from the musical theory, which is definitely more artful than coding. Although the underpinning complexities and structures you may find are similar and the expressive nature of the coda is at hands of an able artist.

After many goof ups and fall downs I now feel I am in a place to talk about R and build an understanding of how it works and why it works. The primary draw of R is the innumerable number of packages you have to accomplish tasks statistical in nature. R is primarily a statistical language and should be used for such purposes.

To install R:

sudo echo "deb http://ftp.ussg.iu.edu/CRAN/bin/linux/ubuntu precise/" >> /etc/apt/sources.list
sudo apt-get udpate
sudo apt-get install r-base r-base-dev

As a first step to the introduction to R I will establish a very nice protocol, I use emacs and ESS. These two together have made my life a lot easier. I also split my sources and write all code in functions. I have to start using objects, hopefully sooner to organize my code.

I shall describe how to install ESS and the basic run in ESS:

To intall ESS just run
$ sudo apt-get intall ess
Once you have installed ESS, go to your emacs and type
M-x R

This should bring up a prompt which asks you for the starting data directory, you can enter the path to the directory you want to work from or just work from the current directory.

You can of course always change the working directory using setwd(path). To see all the files in a directory just say list.files(path). You can include a R source file into the shell by using source(filename).

I will write about installing and using libraries in my next post. Happy coding !!

Saturday, July 21, 2012

Memoize in python and html5

Memoization is a concept from algorithms which finds heavy use in dynamic programming. It is an optimization technique which gains time at the cost of space. So yes, this is an useful technique to know and implement in your algorithms.

The key idea is to have a data store which stores values computed and reuses them to compute the current iteration. The equation is particularly of the form

T(n) = T(n-k1) operator T(n-k2),

In case of the fibonacci series, f(n) = f(n-1) + f(n-2). If you see a computation of this form, you can readily apply a DP technique. Memoization provides a good infrastructure for writing such computations.

The implementation in python is really simple. Apply a decorator, hash the input parameter and if the function is called on the same input just return the already computed value. This is of course assuming all else is stationary and there are no side effects in the program.

Here is a small example from python which is from http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

import collections
import functools

class memoized(object):
'''Decorator. Caches a function's return value each time it is called.
If called later with the same arguments, the cached value is returned
(not reevaluated).
'''
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args):
if not isinstance(args, collections.Hashable):
# uncacheable. a list, for instance.
# better to not cache than blow up.
return self.func(*args)
if args in self.cache:
return self.cache[args]
else:
value = self.func(*args)
self.cache[args] = value
return value
def __repr__(self):
'''Return the function's docstring.'''
return self.func.__doc__
def __get__(self, obj, objtype):
'''Support instance methods.'''
return functools.partial(self.__call__, obj)

@memoized
def fibonacci(n):
"Return the nth fibonacci number."
if n in (0, 1):
return n
return fibonacci(n-1) + fibonacci(n-2)

print fibonacci(12)

This is pretty neat if you ask me, saves you a lot of time when dealing with computations.

Now the fun part, HTML5 provides localStorage which is basically a dictionary into which you can write values and read back across page refreshes, AKA semi-persistence. This is sought to be a sort of replacement to cookies and most of the computation is done client side. So you could fetch data from the server and store it here, so you are paying for the network latency everytime.

The access usage goes somewhat like this

if(localStorage){
if(localStorage['myvalue1'])
return localStorage['myvalue1']
else
localStorage['myvalue1'] = value1

This is readily accessible structure, and should dramatically improve your performance. But as with anything web, there are security concerns. This is visible to anyone or anything with machine access, so

1. Encrypt your keys.
2. Don't store anything extra sensitive here.
3. Ask the user before doing this. Its only fair.

 Happy memoization.

Sunday, July 15, 2012

Dependency Injection

Figuring out dependency injection manually is hard, I mean it really really hard. linkedIn has a library for depenecy injection https://github.com/Jakobo/inject. This is really great, but considering I don't care much about doing a require for others to load its not my cake.

underscore.js has unique features _bind to