2012-02-25

KNIME -- the Swiss-army knife of data workflow

I just discovered the following Swiss-army knife for handling data workflow. I have wanted something like this to exist for years, and was getting close to starting my own project with almost identical goals, but I guess I don't have to now:


KNIME lets you build a data analysis pipeline, complete with data normalization and filtering, inference/classification and visualization steps. It caches data at each node in the workflow (so changes to the pipeline only result in the minimum necessary recalculation), and keeps track of which experimental variables produced which results. It intelligently makes use of multiple cores on your machine wherever possible. It incorporates the entire Weka machine learning framework. It lets you add your own visualizers for different data types. It cross-links the highlighting of data points between different tables and views, so that if you select a data point in one view, it selects it in all other views. It reads and writes a large number of different data formats and can read from / write to a live database. You can call out to R at any point if you have existing R code you need to run on a piece of data.

i.e. KNIME basically does everything that anybody who works with data does every day, and keeps everything tied together in a nice workflow framework with built-in data visualization, smart caching, smart parallel job launching etc. etc.