Data Warehouse ETL vs Data Lake ETL
Intelligent Integration makes a fundamental change of the cost calculus between a Data Lake vs the IDW. A Data Lake is loosely integrated data typically placed in Hadoop. An IDW has tightly integrated data stored in either a relational database and/or Hadoop. Without Intelligent Integration, data warehouse ETL LOE is prohibitively high but data analytics & reporting is easier, while Data Lakes have lower ETL costs but shifts LOE to analytics & reporting. It’s the “pay me now or pay later” decision.
Intelligent Integration eliminates the rationale of a Data Lake by lower integration costs below that of the higher analytics & reporting costs of a Data Lake. Any analyst or data scientist will gladly choose integrated data over non-integrated. So how does Intelligent Integration reduce data warehouse ETL costs?
First of all, Intelligent Integration’s metadata data integration is fundamentally much faster than manual ETL coding. There are several ways to automate metadata creation from source schema or destination or even from the incoming Json document’s schema. These optioned are outline in other documentation on this site.
Semi-structured source data (even hierarchical) can be normalized into tabular tables without any coding or metadata creation. It can be even be created in real-time if required. How much simpler can we make it? Intelligent Integrationfundamentally breaks down the wall between schema and schema-less data.
Master Data Management and Intelligent Integrationfit perfectly together for a powerful data governance solution. A good MDM solution has typically raised the LOE of the IDW as the ETL programmer needs to manual integrate MDM processing, but with Intelligent Integration it’s a natural extension with minimal additional LOE.
Together, Intelligent Integration’s data integration solutions dramatically reduces ETL LOE for the IDW. Well past the point of considering the poorly integrated Data Lake architecture.