The Challenge
Our client needed to identify chemical concentrations that were present at a given location and point in time. The data was coming from many different sources in widely varying formats that included both structured data (databases), semi structured (spreadsheets), and unstructured data (email, PDF files, Word documents, etc.). The client had previously attempted a classic Enterprise Data Warehouse (EDW) approach. The big variety of the data made using an EDW virtually impossible. The client’s technical team was battling the challenge of EDW design and Extract, Transform, Load (ETL) implementation, rather than being able to focus on the analysis of the data. With the EDW approach, each data source was taking a large amount of time to load: anywhere from 3 weeks to 6 months.
The Solution
To handle the variety of data, our client needed a big data platform. They considered an Open-source Hadoop implementation, but the knowledge and skills required to integrate and deploy the full set of tools was an obstacle. Our client wanted a pre-built application to ingest, analyze, and visualize data without having to write a custom solution built on unsupported Open-source software. They needed a big data platform that was ready to deploy.
The Data Tactics cloud based Big Data Engine (BDE) was leveraged to provide a readily available platform for data discovery and analysis with pre-built integrations and system monitoring.The widely varying data was easily ingested by configuring the BDE data import MapReduce jobs. Out-of-the-box visualizations within BDE provided users with immediate and valuable discovery tools.
The engagement was complete within 45 days:
- Day 1-4 – Source data analysis
- Day 5-6 – Data ingest parser development
- 9 custom parsers (1 MapReduce parser)
- Day 7 – Data ingestion of 600,000,000 data elements from 20 data sources
- Day 8-35 – Analytics development
- Custom SharePoint ingestion and integration to synchronize and index SharePoint data and enable BDE to dynamically pick up modifications in near real time
- Custom analytics for environmental monitoring highlighting chemical concentrations temporally and spatially
- Custom application to highlight geolocation of chemical samples
- Day 36-45 – Data Discovery and Investigation
- Guided client personnel through solutions so that individuals can immediately and intuitively find all the data associated with an entity of interest
The Result
Sullexis was able to provision, ingest, build, analyze, and provide actionable results to the client in 45 days. By eliminating the 6-month data ingestion process of the classic EDW, users were able to discover new insights within weeks. This is a case where big data delivered value for a wide variety of data without big volume.