I’ve just returned from a trip to Predixion Software‘s technical HQ in Redmond where I got a sneak preview of the newest release of their predictive analytics toolset – and it’s impressive!
The previous release was very much focused on scoring predictive models against huge datasets,with integration to Apache Mahout, the native Hadoop predictive algorithm library, in order to enable Predixion to ride the big-data wave that seems to be all the rage at the moment. But for me, Release 3.0 gets back to the fundamentals that differentiate Predixion from the competition – enabling ease use in predictive analytics data modeling.
It is well known that the biggest challenge in Data Mining and Predictive Analytics is understanding the underlying data in a way that it can be manipulated and modeled, not the process of building the models itself. This is a very iterative, time consuming and often frustrating process that is inhibited by the lack of tools. Predixion has tackled this step in the process with some real innovation that will save countless of hours preparing data.
Release 3.0 has numerous new features, but the two standout items for me include:
- Data Exploration – the new release now enables you to profile source data in situ e.g. SQL Server, Greenplum. So rather than having to bring your data into Excel or PowerPivot to explore and profile, you can now connect to the underlying data source and analyze the data directly. This is a great new feature that has been available in data quality management tools for a while but is the first I’ve seen integrated into a predictive analytics workbench.
- Data Preparation – the data preparation tools (and sampling tools) can also be connected to the source data which is a nice, but the ability to create a calculated or derived predictor was for me the key highlight. The number of times that I’ve had to go back to the source data and write a SQL statement that calculates a new predictor based on two or more data fields is no longer necessary, which will add significant time savings – now when faced with new insights during the data preparation phase you can rapidly continue with your train of thought rather than having to stop, go back to the source data, re-transform and extract. This should accelerate the data preparation phase and cut down on the number of coffee breaks I take!
The exact date of Release 3.0 has not yet been published, but I’m looking forward to getting started.