THE DATA LAKE

One solution to delays in data becoming available for business use is to use a data lakes, that are decentralised data repositories where data is gathered and is available for immediate use by specific users.

As the data lake stores data in its original form there is a trade-off where the need for speed of response overrides any major data quality issues.

A data lake can be used with a data warehouse to provide data for immediate use with some or all of it subsequently loaded into a data warehouse after it is transformed for used for several business purposes.

The following diagram simply explains how data would flow through a data lake, be used for time critical repayment collections activity with all data being loaded into a quality data warehouse, and ultimately be used in several business areas. In the example, regarding Collections and Recoveries, the need to take action to remind the customer on the day that a repayment is missed overrides any other considerations.

How data flows through the data lake.

Decentralised data lakes have a core principle of decentralisation of responsibility. Anyone within the limits of access management can use data, and it is possible to organise the same data and table relationships that exist in the data warehouse.

Because computation and processing tools are decentralised, the different layers of data can be used by different users for different purposes, avoiding the bottlenecks that occur with a data warehouse.

Data Lakes – Advantages and Disadvantages

Its major weakness is the lack of data organisation, including a centralised metadata repository (although metadata management tools are increasingly available).

It may be very difficult to track if processed data changes because of error corrections or source system modifications. The validity or structure of data cannot always be guaranteed.