Matt Becker, Sullexis’ Data Strategy & Data Quality Managing Director, will be presenting at the “Dallas/ Fort Worth Petroleum Data Workshop 2020″ in Dallas, TX on February 25, 2020. Matt will be presenting the topic “Art of the Data Possible”: Data management technologies are a dime a dozen and are improving at an ever-increasing rate. How does one determine how to transform data technologies such as cloud storage, data streaming, no-SQL technology, ML and AI into an effective (and cost-saving) data management solution for upstream companies?
Sullexis helped a premier midstream company implement a real-time, data streaming platform and analytics platform.
- Our client’s existing real-time data platform had evolved over many years, incorporating assets from acquisitions and point-to-point integrations set up to meet specific needs.
- The client’s 17 distinct midstream assets and associated SCADA systems are streaming data from over 2 million tags and generating over 3TB of data per year.
- Demands by both internal and external customers were increasing, database instances growing and database licensing costs exploding.
- This increased demand was straining system scalability and the ability of IT to support the increasing requests.
- Developing a purpose-built, real-time, streaming data platform was seen as a way to manage the increase in demand and growing data volumes while keeping IT support costs constant and potentially drive lower
- We designed and implemented a state-of-the-art, real-time, streaming data platform built on Confluent’s Event Streaming Platform (Kafka), MongoDB, InfluxDB, with data visualization provided by PowerBI and Grafana, and reporting via Microsoft SSRS.
- The solution was implemented for 2 of the assets and a roadmap developed for implementation across the other 15 assets.
- The real-time data acquisition system was re-engineered to stream JSON data to Kafka topics, with the original data being persisted in MongoDB. Multi data consumers (sinks) were created to meet specific business needs, data aggregation for business reporting (MongoBD, PowerBI and SSRS), data analytics (MongoDB and PowerBI), real-time data visualization (InfluxDB and Grafana).
- The solution is currently receiving data from 2 assets and 130,000 tags, while the roadmap for implementation across the other 15 assets is executed.
- In addition, new data streams for specific business needs are being provisioned within a matter of days, enabling IT to meet the growing demands of end-users. The solution has laid the foundation for advanced analytics and machine learning, opening-up new opportunities for our client to increase uptime and utilization of its assets.
Sullexis helped an Oilfield Services company develop a leading edge Enterprise Data Lake and Analytics to provide Executive Dashboards and Operational Analytics by utilizing Sullexis’ Big Data Analytics framework and the MapR Converged Data Platform.
- This leader in oilfield pressure pumping has experienced aggressive growth across all major basins in North America.
- The client has gone through multiple enterprise system transformation projects and established a core set of cloud-based applications to run the business.
- Executive management is pursuing a knowledge management and metrics driven business strategy to perfect operational efficiency, reliability, and safety in oilfield services.
- To enable these metrics driven strategy, the client needs interactive KPI dashboards and ad-hoc analytics capabilities based on aggregated data from key systems such as ERP, Opportunity-to-Cash Lifecycle, Work Management, EHS, HCM, and Operational equipment.
- Initial attempts at creating these KPI dashboards and ad-hoc analytics capabilities had limited success due to the lack of data integration and the strain placed on the live transactional systems, which impacted end users in the business functions.
The client partnered with Sullexis to bring its relevant domain expertise; ability to quickly lay-down the technology foundation needed for enterprise analytics; know-how to create a fully functional set of Operations KPIs; and experience defining a roadmap to implement advanced analytic needs.
The initial project included custom ingestion routines for aggregating data from the cloud-hosted ERP, Work Management and SharePoint collaboration systems; an integrated data model for the Data Lake including materialized views; and utilizing the performance and computational power of a Big Data cluster
Key components of the solution include:
- Sullexis Enterprise Data Engineering methodology, Enterprise Data Lake architecture and Big Data principles
- MapR platform with Distributed File System and NoSQL database
- Custom data ingestion application using OData, OAuth, SharePoint REST APIs and OJAI libraries for extracting, loading and staging data
- Microsoft Azure PowerBI for KPI dashboards and analytics visualization
- Microsoft Power BI On-premise Data Gateway for enabling report data to refresh in the cloud from the on-premise Data Lake
The solution delivered cross-system analytics and dashboard reporting to support the organization’s increased use of KPIs, scorecards and related metrics that are driving departmental management decision making as well as shaping executive business strategy. This was achieved by harnessing the power of Big Data with pre-calculated and aggregated results in materialized views for specific metrics, measures, and historical trend analytics providing mobility-enabled lean and fast-loading reports with cloud-based, rich and interactive KPI dashboards and automatic refresh from an on-premise Data Lake.
As the result of an effort to consolidate all existing ERP systems into one SAP solution, our client was under pressure to implement an archival strategy for the legacy data of 18 companies. This legacy data had to be accurately preserved and made readily available, to satisfy compliance, regulatory and operational needs. In addition, the challenges did not end there:
* Once the SAP system was live, the client would need to shut down the legacy systems within weeks. Delays would lead to the renewal of a costly Transition Services Agreement (TSA) that the client was desperate to avoid
* It was 2016, and the Oil and Gas sector was gripped by one of the worst economic downturns in its history. Like most organizations, funding became scarce, so innovation on a tight budget was paramount.
* Finally, due to a reduction in workforce combined with the implementation of SAP, our client was keen to limit the amount of change forced onto its staff
Given the pressure of a tight delivery timeframe, reduced budgets and a need for Big Data expertise, the client reached out to Sullexis to help deliver the rapid innovation required.
Given our experience with the MapR platform, combined with our knowledge of the open source ecosystem around the platform, we implemented a legacy data archive system centered on the use of:
- Sullexis’ custom Sqoop extract and load tools
- Apache Drill
- The schema free SQL Query Engine for Hadoop, NoSQL
- Cloud Storage
- Elastic Search
Sullexis’ custom Sqoop extract and load tools meant that:
- Structured data objects could be migrated with data types preserved
- Big tables (>200MM rows) and small tables (<100 rows) could be treated equally
- Tables with active transactions could be extracted incrementally
Using Drill enabled our client to seamlessly access their legacy ERP data (previously on Oracle and SQL Server) using their existing SQL skillsets and BI tools which featured Tableau, Lumira, MicroStrategy, Spotfire, SSRS, and Excel.
Driven by a desire to keep costs and timescales to a minimum, Drill’s native support for SQL was utilized. The use of Drill enabled reports to be rapidly converted and executed against the data residing in the MapR FS. This flexible, low-cost solution, meant that the Oracle and SQL Server platforms could be quickly shut down but with zero operational impact to the end users.
To support the search and extract of the clients’ unstructured data, Elastic Search was implemented on the MapR cluster and used to index a range of TIFFs, PDFs, and Text Reports. These were tagged with meta-data and indexed enabling them to be searched for and extracted on an as needed basis.
Sullexis’ rapid implementation and flexible approach provided a cost-effective legacy data archiving and reporting, on one unified platform that handles structured and unstructured data. Here are some of the outcomes the client experienced:
- During the extract and load cycle, over 1.2 billion rows of data from nearly 250,000 tables were migrated from various Oracle and SQL Server databases to the MapR FS running in Microsoft Azure
- Over 100 operational reports were re-directed to run against the new data to support ongoing business needs
- Several million TIFFs and PDFs were tagged and indexed with Elastic Search
- Several new data sources were uncovered using our discovery process, but instead of derailing the project, the Sullexis team directly incorporated these additions into the existing scope, taking just a few days to assess, extract and load successfully
- With MapR being equally at home on Microsoft Azure as AWS, our client’s concern regarding vendor lock-in was eliminated
Finally, with the MapR infrastructure implemented, a foundation for future Big Data use cases was enabled. The client has already addressed an important operational issue by analyzing equipment sensor data. Encouraged by this recent experience, they are now looking to other use cases that support new lines of business and reduce further operational costs.
Our client needed to identify chemical concentrations that were present at a given location and point in time. The data was coming from many different sources in widely varying formats that included both structured data (databases), semi structured (spreadsheets), and unstructured data (email, PDF files, Word documents, etc.). The client had previously attempted a classic Enterprise Data Warehouse (EDW) approach. The big variety of the data made using an EDW virtually impossible. The client’s technical team was battling the challenge of EDW design and Extract, Transform, Load (ETL) implementation, rather than being able to focus on the analysis of the data. With the EDW approach, each data source was taking a large amount of time to load: anywhere from 3 weeks to 6 months.
To handle the variety of data, our client needed a big data platform. They considered an Open-source Hadoop implementation, but the knowledge and skills required to integrate and deploy the full set of tools was an obstacle. Our client wanted a pre-built application to ingest, analyze, and visualize data without having to write a custom solution built on unsupported Open-source software. They needed a big data platform that was ready to deploy.
The Data Tactics cloud based Big Data Engine (BDE) was leveraged to provide a readily available platform for data discovery and analysis with pre-built integrations and system monitoring.The widely varying data was easily ingested by configuring the BDE data import MapReduce jobs. Out-of-the-box visualizations within BDE provided users with immediate and valuable discovery tools.
The engagement was complete within 45 days:
- Day 1-4 – Source data analysis
- Day 5-6 – Data ingest parser development
- 9 custom parsers (1 MapReduce parser)
- Day 7 – Data ingestion of 600,000,000 data elements from 20 data sources
- Day 8-35 – Analytics development
- Custom SharePoint ingestion and integration to synchronize and index SharePoint data and enable BDE to dynamically pick up modifications in near real time
- Custom analytics for environmental monitoring highlighting chemical concentrations temporally and spatially
- Custom application to highlight geolocation of chemical samples
- Day 36-45 – Data Discovery and Investigation
- Guided client personnel through solutions so that individuals can immediately and intuitively find all the data associated with an entity of interest
Sullexis was able to provision, ingest, build, analyze, and provide actionable results to the client in 45 days. By eliminating the 6-month data ingestion process of the classic EDW, users were able to discover new insights within weeks. This is a case where big data delivered value for a wide variety of data without big volume.
Our client needed to optimize their logistics in order to better understand truck routes taken by the 21,000 trucks in the company’s fleet. There wasn’t a way to interpret the data coming in from GPS feeds from trucks in the field, making it difficult to understand when a vehicle was stopped or en route. There was a large volume of data that required heavy analytics.
Our client had attempted an Open-source Hadoop implementation, but the knowledge and skills required to integrate and deploy the full set of tools were proving difficult. Our client had also attempted a MapR implementation, but encountered many of the same challenges. They were battling the technology rather than the problem. Our client wanted a pre-built application to ingest, analyze, and visualize data without having to write a custom solution built on unsupported Open-source software. They needed a big data platform that was ready to deploy.
It was faster and more cost effective for our client to buy the Big Data Engine (BDE) from Data Tactics. The Data Tactics cloud based BDE was leveraged to provide a readily available platform for data discovery and analysis. The BDE data import tools were easily configured to ingest the required data using pre built MapReduce jobs. The cloud based hosting model provided ample storage and performance of the data. Analytics were implemented using MapReduce jobs operating in near real-time, delivering visualizations that rendered in seconds.
Sullexis and Data Tactics, working with the client, were able to deploy a big data platform quickly and cost effectively that met all of the requirements to optimize and understand the client’s truck routes within a matter of weeks.
Our client had a repeated need to quickly evaluate initial production (IP) of oil, gas, and water from recently drilled horizontal wells in a number of different geographic areas. They had been struggling to do this manually by first downloading production data from IHS and the Texas Railroad Commission (TX RRC), merging it in Excel, and then building pivot tables and charts to undertake the analysis.
Assembling the data was challenging enough, but actually identifying all the wells in the specific geographic area was proving difficult and, on many occasions, inaccurate.
The solution leveraged the Data Tactics cloud based Big Data Engine (BDE) to build ingestion routines to IHS and TX RRC so that data could be directly imported on a monthly basis. With the data loaded into BDE, a custom application was written that displayed the Well Header data on the BDE Google Map plug-in. This application enabled the user to ‘rubber band’ an area of interest on the map, capturing all the Well Header data within the region selected. This data was then available for export to a CSV for further manipulation within Spotfire.
The automated assembly of the data meant that the most up-to-date information was available for analysis. The ‘rubber banding’ directly on the map allowed the user to specifically identify the lease area of interest, eliminating any errors in selecting those wells manually. We were able to develop analytics and visualizations to help this client gain value from their data.
Our client had an existing Enterprise Data Warehouse (EDW) that was not meeting the business requirements, making it difficult to report on invoices, inventory levels, and purchase orders. The data in the EDW was inaccurate, inconsistent, and incomplete. The performance of the EDW was also a concern. Our client needed faster reports, an improvement in general performance, and required a reduction in downtime associated with refreshing the reporting Data Warehouse.
We developed a SQL Server Integration Services (SSIS) package to support the Extract, Transform, and Load (ETL) process to populate a solution in the Microstrategy tool. We incorporated data for customers, products, vendors, sales representatives, and purchase orders. The solution implemented routines to improve the accuracy and consistency of the data. The cleaned, formatted, validated, reorganized, and reliable information enabled improved reporting of invoices, inventory, and purchase orders. Our solution was architected for optimum performance, resulting in faster data refreshes and reports.
The engagement was complete within 90 days and involved:
- Mapping data sources to target tables
- Developing SSIS packages
- Testing packages
- Matching and validating EDW data against sources to confirm accuracy
- Conducting user acceptance tests to gain approval from users
Within three months, raw data was transformed into useful insights for the company. Implementing the new EDW allowed the client to identify issues arising in the operation of the company. This allowed them to focus on the analysis of the information and make better decisions based on data. By converting data into actionable insights, our client was able to make key business decisions.