A fellow researcher (read arun) in semantic parsing and sentiment analysis directed me towards the shogun toolbox. Amazing features; it implements a range of weighted kernels, SVM and HMMs. It has interfaces to python, matlab and R (in my order of preference). It also supports ascii, Json and xml data formats. The best part is its got about 600 examples (albeit not in pydoc and for dna data!!!); a few I read took me about 5 mins to understand. (5 mins, now to get someone more brilliant to explain everything else I need, R2-D2, scooby-doo)
So now if I am rapid prototyping some algorithm I can do a instant analysis for many SVM methods and also HMM classification in very little time. Now maybe I can prove my algorithms are seriously inefficient and need lots of work or they can kick ass with the best of methods. (I am rooting for the latter)
I know PyML is slow and trying to catch up, but if only they just used everything form shogun and wrote new wrappers in PyML form and their fantastic doc. (I do wish a lot!!)
Time to fire the infinite improb drive and head to my bed.
ruminations of the many wonders of programming languages (to be read as limitations, pitfalls and hindrances), mixed with the allure of data analysis, applied heavily to data extracted in many forms
Saturday, June 25, 2011
Tuesday, June 14, 2011
Woes in data representation
I have been dabbling with numpy for a while, creating serialized objects and memory mapped files. The numpy package is great because it give s you the flexibility of using python, it is definitely not as fast as implementing the same in C, but you so have the alternative to use Cython (well if you do care).
I have written a converter for the date object, I was trying to look at time series analysis, but turns out if I am using a converter I need to return in float or string and cannot store in date object format. What a bummer!!
I don't have an idea how to use serialized objects or the parallel in matlab, the next quest learn the magical techniques in matlab.
I have written a converter for the date object, I was trying to look at time series analysis, but turns out if I am using a converter I need to return in float or string and cannot store in date object format. What a bummer!!
I don't have an idea how to use serialized objects or the parallel in matlab, the next quest learn the magical techniques in matlab.
Subscribe to:
Posts (Atom)