How data virtualisation can add value to enterprise architecture
The modern enterprise data architecture is extremely complex with multiple different components to store, extract, transform and consume data. A common question that enterprises often ask is why we need data virtualisation when we already have properly functioning storage solution and the proper data integration technology like extract, transform, and load (ETL) in place. Data virtualisation and traditional data integration techniques like ETL offer very different capabilities to access, integrate, and deliver data.
ETL is about replicating data from the source system to the target system like a data warehouse; a data virtualisation platform creates logical views of the underlying data sources and makes the resulting data set available in real-time. Business users involved in long-term planning based on historical data are often the most valuable end-users for technology like ETL. But other areas dealing with the operational side of business consume data in a very different way. There is little or no room for data latency. In such scenarios, data virtualisation is the right technology to adopt. Data virtualisation platforms make it easy to manage the complexities of the different kinds of data using multiple database management systems, including both on-premise and cloud systems. It is, hence, important to understand how data virtualisation adds value to an enterprise architecture that already has data lakes, data warehouses and master data management (MDM) systems in it.
Data Virtualisationand Data Warehouses
Usually, organisations spend a lot of money and time building an enterprise data warehouse. However, in recent times complex business scenarios often lead to projects that don’t fit into the classic enterprise data warehouse (EDW) use cases. Sometimes organisations force fit complex use cases to the existing data warehouse but that only further complicates the design. In these circumstances, a better solution would be to extend and enhance the existing system capabilities with data virtualisation.
In situations where a data warehouse already exists within the organisation, but the business users need to add new data sources to enhance their reporting or analytics, a data virtualisation platform is layered over the data warehouse and also connects to the new data source(s). The reporting tools use the data virtualisation platform as the data source for their reporting needs, effectively including the existing data (in the data warehouse) with the data from the new data sources. In a situation where there are multiple existing data warehouses within an organisation – for example, regional data warehouses – and the business needs a single view of the data, a data virtualisation platform can be used to provide this single view.
Data Virtualisation and Data Lakes
Data lake is an extremely large storage repository of data. Data does not go through the heavy transformation and modelling that involve the definition of a data warehouse. Instead, data is usually replicated in its original format. However, the biggest risk to a data lake is the lack of governance, consistent security, and access control afforded across all the lake. Typically data can be placed into the data lake with little thought to its contents and the company’s privacy and regulatory responsibilities.
Leading analyst firm Gartner proposed that a lack of governance and the inability to understand the quality and lineage of data in the lake severely reduces a company’s ability to locate data of value and reuse it. Companies are turning to data virtualisation to complete their governed data lake architecture, by providing common access and security layer, and search and discovery capabilities. Data virtualisation also enables all types of users to discover and access all the data in the lake, whether they are BI users, business analysts or data scientists. Perhaps the final and most important benefit that data virtualisation brings, is its ability to publish the governed data lake’s data via a consistent security model. Whether the data is being accessed via logical or physical models or via any access tool like a browser, external SQL query client, BI tools, statistical analytics packages or even data preparation tools, consistent user and role based security privileges are applied to ensure that the data from any repository in the governed data lake is only seen by those with the right credentials.
Data Virtualisation and MDM
MDM projects are complex and costly and many fail to deliver their expected value because they are too ambitious in scope and create too much change and uncertainty throughout the data infrastructure environment. Data virtualisation provides flexibility and time-to-value for any MDM project, whether you are using an MDM tool or not. For projects not using an MDM tool, a data virtualisation layer allows you to create a virtual MDM repository by pulling the ‘master data’ from the relevant source systems – for example, to create a composite master view of a customer from multiple sources. This virtual master data can then be used by consuming applications to give a single consistent view of the data entity (e.g. customer). Alternatively, if the organisation already has an MDM solution in place, it can extend and enrich the data from the MDM solution by using the data virtualisation layer to access other data sources, such as unstructured data from social media and the web.
From a BI perspective, a data virtualisation layer creates a flexible, easy to modify and manage ‘logical data access layer’ on top of the existing data warehouse, databases and other data sources. From application development and operational perspective, this data access layer can be seen as a very versatile ‘shared data services layer’ that decouples the physical infrastructure that retains the data from the consuming applications and, in doing so, substantially simplifies data provisioning in organisations. Overall, data virtualisation can be used in many scenarios and its ubiquity and benefits make it a critical component within any enterprise data architecture.