Monday, September 26, 2011

matplotlib thou art not matlab

I am a little taken aback by the matplotlib apis, their examples don't seem that clear on data visualization. Being a former matlab user, I switched to matplotlib as it gives me access to both nltk, opencv, numpy/scipy. But in hindsight, I could just use the insane matlab toolboxes for these purposes and be done with it.

Saturday, June 25, 2011

Shogun toolkit

A fellow researcher (read arun) in semantic parsing and sentiment analysis directed me towards the shogun toolbox. Amazing features; it implements a range of weighted kernels, SVM and HMMs. It has interfaces to python, matlab and R (in my order of preference). It also supports ascii, Json and xml data formats. The best part is its got about 600 examples (albeit not in pydoc and for dna data!!!); a few I read took me about 5 mins to understand. (5 mins, now to get someone more brilliant to explain everything else I need, R2-D2, scooby-doo)

So now if I am rapid prototyping some algorithm I can do a instant analysis for many SVM methods and also HMM classification in very little time. Now maybe I can prove my algorithms are seriously inefficient and need lots of work or they can kick ass with the best of methods. (I am rooting for the latter)

I know PyML is slow and trying to catch up, but if only they just used everything form shogun and wrote new wrappers in PyML form and their fantastic doc. (I do wish a lot!!)

Time to fire the infinite improb drive and head to my bed.

Tuesday, June 14, 2011

Woes in data representation

I have been dabbling with numpy for a while, creating serialized objects and memory mapped files. The numpy package is great because it give s you the flexibility of using python, it is definitely not as fast as implementing the same in C, but you so have the alternative to use Cython (well if you do care).

I have written a converter for the date object, I was trying to look at time series analysis, but turns out if I am using a converter I need to return in float or string and cannot store in date object format. What a bummer!!

I don't have an idea how to use serialized objects or the parallel in matlab, the next quest learn the magical techniques in matlab.