Data Virtualization

Data Virtualization: A Spectrum of Approachs – A GigaOm Market Landscape Report

As the appetite for ad-hoc access to both live and historical data rises among business users, the demand stretches the limits of how even the most robust analytics tools can navigate a spiraling universe of datasets.

When Worlds Collide: Blockchain and Master Data Management

Satisfying this new order is often constrained by the laws of physics.

Analytics On-demand

Increasingly, the analytics-on-demand phenomenon, borne from an intense focus on data-driven business decision-making, means there is neither time for traditional extract transform and load (ETL) processes nor the time to ingest live data from their source repositories.

Time is not the only factor.

Data Management

Volume & Speed

The pure volume and speed with which data is generated are beyond the capacity and economic bonds of today’s typical enterprise infrastructures.

While breaking the laws of physics is obviously not in the domain of data professionals, a viable way to work around these physical limitations of querying data is by applying for federated, virtual access.

Data Warehouse

Data Virtualization (DV) — A Solution

This approach, data virtualization (DV), is a solution a growing number of large organizations are exploring and many have implemented in recent years.

Data Virtualization (DV) — Benefit #1

The appeal of DV is straightforward: by creating a federated tier where information is abstracted, it can enable centralized access to data services.

Data Virtualization (DV) — Benefit #2

In addition, with some DV solutions, cached copies of the data are available, providing the performance of more direct access without the source data having to be rehomed.

Data Virtualization (DV) — Benefit #3

Implementing DV is also attractive because it bypasses the need for ETL, which can be time-consuming and unnecessary in certain scenarios.

Whether under the “data virtualization,” “data fabric,” or “data as a service” moniker, many vendors and customers see it as a core approach to creating logical data warehousing.

Data Virtualization (DV) — Benefit #4

Data virtualization has been around for a while; nevertheless, we are seeing a new wave of DV solutions and architectures that promise to enhance its appeal and feasibility to solve the onslaught of new BI, reporting, and analysis requirements.

 

Data Virtualization – A Somewhat Nebulous Term

A handful of vendors offer platforms and services that are focused purely on enabling data virtualization and are delivered as such.

Others offer it as a feature in broader big data portfolios.

Regardless, enterprises that implement data virtualization gain this virtual layer over their structured and even unstructured datasets from relational and NoSQL databases, Big Data platforms, and even enterprise applications which allows for the creation of logical data warehouses, accessed with SQL, REST, and other data query methods.

Data Management Strategy

This provides access to data from a broader set of distributed sources and storage formats.

Moreover, DV can do this without requiring users to know where the data resides.

 

Various Factors Influencing the Need for DV

In addition to the growth of data, increased accessibility of self-service Business Intelligence (BI) tools such as Microsoft’s Power BI, Tableau, and Qlik, are creating more concurrent queries against both structured and unstructured data.

The notion that data is currency, while perhaps cliché, is increasingly and verifiably the case in the modern business world.

Accelerating the growth in data is the overall trend toward digitization, the pools of new machine data, and the ability of analytics tools and machine learning platforms to analyze streams of data from these and other sources, including social media.

data protection

Compounding this trend is the growing use and capabilities of cloud services and the evolution of Big Data solutions such as Apache Hadoop and Spark.

Besides ad-hoc reporting and self-service BI demands being bigger than ever, many enterprises now have data scientists whose jobs are to figure out how to make use of all this new data in order to make their organizations more competitive.

The emergence of cloud-native apps, enabled by Docker containers and Kubernetes, will only make analysis features more common throughout the enterprise technology stack.

Meanwhile, the traditional approach of moving and transforming data to meet these needs and power these analytic capabilities is becoming less feasible with each passing requirement.

In this report, we explore data virtualization products and technologies, and how they can help organizations that are experiencing this accelerated demand while simplifying the query process for end-users.

 

Key Findings:

  • Data virtualization is a relatively new option, with still-evolving capabilities for query or searches against transactional and historical data, in near real-time, without having to know where the data resides.
  • DV is often optimized for remote access to a cached layer of data, eliminating the need to move the data or allocate storage resources for it.
  • DV is an alternative to the more common approach of moving data into warehouses or marts, by ingesting and transforming it using ETL and data prep tools.
  • In addition to providing better efficiency and faster access to data, data virtualization can offer a foundation for addressing data governance requirements, such as compliance with the European Union’s General Data Protection Regulation (GDPR), by ensuring data is managed and stored as required by such regulatory regimes.
  • Data virtualization often provides the underlying capability for logical data warehouses.
  • Many DV vendors are accelerating the capabilities of their solutions by offering the same massively parallel processing (MPP) query capabilities found on conventional data warehouse platforms.

Two-Tier Data Storage

Data Virtualization Products

Products are available across a variety of data virtualization approaches, including:

  • (1) core DV platforms;
  • (2) standalone SQL query engines that can connect to a variety of remote data sources and can query across them;
  • (3) data source bridges from conventional database platforms that connect to files in cloud object storage, big data platforms, and other databases; and
  • (4) automated data warehouses.

Source: gigaom.com

About the Author Amel

I'm a Digital Marketing Strategist passionate about SEO and Digital Analytics. I also teach Digital Marketing and offer customized private coaching to entrepreneurs and in-house marketers to help them take their revenue or skills to the next level. Follow me on Twitter where I offer advice and share high quality content on marketing, tech and productivity.

follow me on: