Cost Effective Data Archiving and Reporting using Big Data Tools and the Cloud

The Challenge

As the result of an effort to consolidate all existing ERP systems into one SAP solution, our client was under pressure to implement an archival strategy for the legacy data of 18 companies. This legacy data had to be accurately preserved and made readily available, to satisfy compliance, regulatory and operational needs. In addition, the challenges did not end there:

*  Once the SAP system was live, the client would need to shut down the legacy systems within weeks. Delays would lead to the renewal of a costly Transition Services Agreement (TSA) that the client was desperate to avoid

* It was 2016, and the Oil and Gas sector was gripped by one of the worst economic downturns in its history. Like most organizations, funding became scarce, so innovation on a tight budget was paramount.

*  Finally, due to a reduction in workforce combined with the implementation of SAP, our client was keen to limit the amount of change forced onto its staff

Given the pressure of a tight delivery timeframe, reduced budgets and a need for Big Data expertise, the client reached out to Sullexis to help deliver the rapid innovation required.

The Solution

Given our experience with the MapR platform, combined with our knowledge of the open source ecosystem around the platform, we implemented a legacy data archive system centered on the use of:

  • Sullexis’ custom Sqoop extract and load tools
  • Apache Drill
  • The schema free SQL Query Engine for Hadoop, NoSQL
  • Cloud Storage
  • Elastic Search

Sullexis’ custom Sqoop extract and load tools meant that:

  • Structured data objects could be migrated with data types preserved
  • Big tables (>200MM rows) and small tables (<100 rows) could be treated equally
  • Tables with active transactions could be extracted incrementally

Using Drill enabled our client to seamlessly access their legacy ERP data (previously on Oracle and SQL Server) using their existing SQL skillsets and BI tools which featured Tableau, Lumira, MicroStrategy, Spotfire, SSRS,  and Excel.

Driven by a desire to keep costs and timescales to a minimum, Drill’s native support for SQL was utilized. The use of Drill enabled reports to be rapidly converted and executed against the data residing in the MapR FS. This flexible, low-cost solution, meant that the Oracle and SQL Server platforms could be quickly shut down but with zero operational impact to the end users.

To support the search and extract of the clients’ unstructured data, Elastic Search was implemented on the MapR cluster and used to index a range of TIFFs, PDFs, and Text Reports. These were tagged with meta-data and indexed enabling them to be searched for and extracted on an as needed basis.

The Result

Sullexis’ rapid implementation and flexible approach provided a cost-effective legacy data archiving and reporting, on one unified platform that handles structured and unstructured data. Here are some of the outcomes the client experienced:

  • During the extract and load cycle, over 1.2 billion rows of data from nearly 250,000 tables were migrated from various Oracle and SQL Server databases to the MapR FS running in Microsoft Azure
  • Over 100 operational reports were re-directed to run against the new data to support ongoing business needs
  • Several million TIFFs and PDFs were tagged and indexed with Elastic Search
  • Several new data sources were uncovered using our discovery process, but instead of derailing the project, the Sullexis team directly incorporated these additions into the existing scope, taking just a few days to assess, extract and load successfully
  • With MapR being equally at home on Microsoft Azure as AWS, our client’s concern regarding vendor lock-in was eliminated

Finally, with the MapR infrastructure implemented, a foundation for future Big Data use cases was enabled. The client has already addressed an important operational issue by analyzing equipment sensor data. Encouraged by this recent experience, they are now looking to other use cases that support new lines of business and reduce further operational costs.