Category Archives for Big Data

Data Warehouse

The Modern Data Warehouse – Enterprise Data Curation for the Artificial Intelligence Future

This free 1-hour webinar from GigaOm Research brings experts in AI and data analytics, featuring GigaOm analyst William McKnight and a special guest from Microsoft.

The discussion will focus on the promise AI holds for organizations in every industry and every size, and how to overcome some of the challenges today of how to prepare for AI in the organization, and how to plan AI applications.

The foundation for AI is data.

Data Virtualization

You must have enough data to analyze to build models.

Your data determines the depth of AI you can achieve — for example, statistical modeling, machine learning, or deep learning — and its accuracy.

The increased availability of data is the single biggest contributor to the uptake in AI where it is thriving. Indeed, data’s highest use in the organization soon will be training algorithms.

AI is providing a powerful foundation for impending competitive advantage and business disruption.

data protection


In this 1-hour webinar, you will discover:

  • AI’s impending effect on the world
  • Data’s new highest use: training AI algorithms
  • Know & change behavior
  • Data collection
  • Corporate Skill Requirements

You’ll learn how organizations need to be thinking about AI and the data for AI.

Register now to join GigaOm and Microsoft for this free expert webinar.

Data Management Strategy

Who Should Attend:

  • CIOs
  • CTOs
  • CDOs
  • Business Analysts
  • Data Analysts
  • Data Engineers
  • Data Scientists


Building a Solid Foundation for Advanced Analytics Maturity

As the best of big data analytics and business intelligence coalesce, organizations now have their sights set on AI for predictive and prescriptive analytics. But how can they get there?

Live Podcast

Effective predictive and prescriptive analytics

Effective predictive and prescriptive analytics will put an organization at an advanced analytics maturity level, but a solid foundation at the more basic levels has to come first.

Earlier analytics maturity phases, including descriptive and diagnostic/root cause analytics, and intermediate levels, such as operational analytics, must be addressed robustly before further phases can be approached successfully.

Business Chart - Visual Analysis for Productivity

Understanding the capacity your organization has to leverage analytics and designing your analytics to match that, maybe the best way forward, and this requires introspection and planning.

Join us for this free 1-hour webinar from GigaOm Research to discuss how to assess, shore up, and advance your company’s analytics maturity.

The webinar features GigaOm analyst Andrew Brust and special guest Steve Mahoney, Product Director from Looker, a leading provider of modern analytics platform software.


In this webinar, you will learn:

  • What a rational analytics maturity model looks like and where your organization lies within it
  • How to strengthen the integrity of your current analytics maturity level
  • How to map out your path to advanced analytics maturity and approach practical adoption of AI and machine learning

Register now to join GigaOm Research and Looker for this free expert webinar.

Data Management

Who Should Attend:

  • CIOs
  • CTOs
  • Chief Data Officers
  • Business Intelligence Architects
  • Business Analysts
  • Data Scientists


Two-Tier Data Storage

Two-Tier Data Storage: A GigaOm Market Landscape Report


Most organizations face a growing number of data storage challenges.

The number of applications and services IT departments support are increasing; with users accessing data from everywhere, at any time, and from different devices.

The variety of workloads are increasing as well, with applications competing for resources from the same infrastructure.

The cost of traditional infrastructure is incompatible with the exponential growth of unstructured data, big data workloads, or Internet of Things (IoT) applications.

AI-ML-Robotics Technologies

The traditional classifications of primary and secondary data, that correlate primary with structured data/databases and secondary with unstructured files, are no longer valid.

With data becoming one of the most important assets for organizations, structured and unstructured data are now equally important, and they should be protected and treated accordingly.

Data Management

A new primary/secondary classification has emerged and is based on the value of data, with data indexation and classification.

Coupling this new classification with a two-tier storage infrastructure can help reduce costs and simplify the process, especially if the two tiers are integrated and data can move seamlessly between them.

Modern applications can now be divided into two families: latency-sensitive or capacity-driven.

The first group needs data as close as possible to the processing engines (e.g. CPU, GPU, etc.) while the latter usually requires easily accessible data spanning multiple devices across the network.

data protection

New infrastructure designs must take this division into account to cope quickly with new and ever-evolving business requirements.

In this report, we analyze several aspects of the two-tier storage strategy including:

  • Why a two tier-storage strategy
  • Different types of tier integration
  • How to manage file-based storage in a two-tier storage strategy
  • Automated tiering and application-based profiling
  • Security considerations about two-tier storage strategy
  • Key players
  • First steps for adopting a two-tier storage strategy and improving overall infrastructure TCO


Key findings include:

  • A two-tier storage policy is easy to adopt and saves money while optimizing infrastructure resources.
  • This approach enables improved ROI on infrastructure and makes future investments necessary only where and when they are required, as opposed to months or years in advance.
  • The infrastructure layout is highly simplified and optimized to take advantage of the cloud and seamlessly integrate it with the rest of the infrastructure. This also helps to simplify and redistribute budget resources from CAPEX to OPEX.


CIO Report: Enterprise Data Protection

CIO Report: Enterprise Data Protection


The enterprise data protection market still leverages a traditional approach to solving the increasingly complicated enterprise requirements.

Solutions range from on-premises traditional software to specialized appliances to cloud-based solutions.

Some solutions support a hybrid cloud to meet a specific enterprise’s requirements.

Most of the solutions take a traditional approach to backup and recovery while a few are looking to leverage newer data sources and technologies.

data protection

Key findings include:

  • The data protection application is starting to disassociate from the underlying infrastructure.
  • This opens new options for different delivery methods including the cloud.
  • Cloud support exists to widely varying degrees. Solutions may support the cloud as a location to run the data protection application and/or as a source or target.
  • Primary data sources are still the focus for most enterprise data protection solutions with limited support for non-traditional data sources including cloud, containers, and endpoints.
  • Enterprise data protection solutions are starting to leverage machine learning and artificial intelligence to enhance automation capabilities. As enterprise requirements increase in complexity, so will the reliance on sophisticated automation.
  • Strong management tools are essential for any enterprise data protection solution. The increasing degree of complexity makes automation and the overall management tool a core requirement.
  • Support for regulatory and compliance requirements is in its infancy. Few products have yet to tackle this challenge. This could be one of the biggest opportunities for enterprise data protection moving forward.
  • Simplicity is starting to make its way into enterprise data protection as solutions start to consolidate storage and data protection components into one system.
  • Purchase models are moving away from traditional, perpetually-licensed software that runs on-premises to software that runs in a SaaS model from the cloud.


When Worlds Collide: Blockchain and Master Data Management

Master Data Management (MDM) is an approach to the management of golden records that have been around for over a decade only to find a growth spurt lately as some organizations are exceeding pain thresholds in the management of common data.

Data Management Strategy

Blockchain has a slightly shorter history, coming aboard with bitcoin, but also is seeing its revolution these days as data gets distributed far and wide and trust has taken center stage in business relationships.

Volumes could be written about each on its own, and given that most organizations still have a way to go with each discipline, that might be appropriate.

However, good ideas wait for no one and today’s idea is MDM on Blockchain.

mayflower automation

Thinking back over our MDM implementations over the years, it is easy to see the data distribution network becoming wider.

As a matter of fact, master data distribution is usually the most time-intensive and unwieldy part of an MDM implementation anymore.

The blockchain removes overhead, costs, and unreliability from authenticated peer-to-peer network partner transactions involving data exchange.

It can support one of the big challenges of MDM with governed, bi-directional synchronization of master data between the blockchain and enterprise MDM.

Data Management

The single version of the truth

Another core MDM challenge is arriving at the “single version of the truth”.

It’s elusive even with MDM because everyone must tacitly agree to the process used to instantiate the data in the first place.

While many MDM practitioners go to great lengths to utilize the data rules from a data governance process, it is still a process subject to criticism.

The consensus that blockchain can achieve is a governance proxy for that elusive “single version of the truth” by achieving group consensus for trust as well as a full lineage of data.

Blockchain Technology

The major challenges in MDM

Blockchain enables the major components and tackles the major challenges in MDM.

Blockchain provides a distributed database, as opposed to a centralized hub, that can store data that is certified, and for perpetuity.

By storing timestamped and linked blocks, the blockchain is unalterable and permanent.

Though not for low latency transactions yet, transactions involving master data, such as financial settlements, are ideal for blockchain and can be sped up by an order of magnitude since blockchain removes the grist in a normal process.

Blockchain uses pre-defined rules that act as gatekeepers of data quality and governs the way in which data is utilized.

Blockchains can be deployed publicly (like bitcoin) or internally (as an implementation of Hyperledger).

There could be a blockchain per subject area (like customer or product) in the implementation.

Blockchain Tech


MDM will begin by utilizing these internal blockchain networks, also known as Distributed Ledger Technology, though utilization of public blockchains is inevitable.

A shared master data ledger beyond company boundaries can, for example, contain common and agreed master data including public company information and common contract clauses with only counterparties able to see the content and destination of the communication.

Hyperledger is quickly becoming the standard for the open-source blockchain.

Hyperledger is hosted by The Linux Foundation. IBM, with the Hyperledger Fabric, is establishing the framework for blockchain in the enterprise.

Supporting master data management with a programmable interface for confidential transactions over a permissioned network is becoming a key inflection point for blockchain and Hyperledger.

Data management is about the right data at the right time and master data is fundamental to great data management, which is why centralized approaches like the discipline of master data management have taken center stage.

Cryptocurrency-Blockchain Technology

Characteristics of Blockchain

MDM can utilize the blockchain for distribution and governance and blockchain can clearly utilize the great master data produced by MDM.

Blockchain data needs data governance like any data. This data actually needs it more given its importance on the network.

MDM and blockchain are going to be intertwined now.

  • It enables the key components of establishing and distributing the single version of the truth of data. Blockchain enables trusted governed data.
  • It integrates this data across broad networks.
  • It prevents duplication and provides data lineage.
  • It will start in MDM in niches that demand these traits such as financial, insurance, and government data.

You can get to know the customer better with native fuzzy search and matching in the blockchain.

You can track provenance, ownership, relationship, and lineage of assets, do trade/channel finance, and post-trade reconciliation/settlement.

Blockchain is now a disruption vector for MDM.

MDM vendors need to be at least blockchain-aware today, creating the ability for blockchain integration in the near future, such as what IBM InfoSphere Master Data Management is doing this year.

Others will lose ground.


DevOps & DataOps

The missing element of GDPR: Reciprocity

GDPR day has come and gone, and the world is still turning, just about. Some remarked that it was like the Y2K day we never had; whereas the latter’s impact was a somewhat damp squib, the former has caused more of a kerfuffle: however much the authorities might say, “It’s not about you,” it has turned out that it is about just about everyone in a job, for better or worse.

Data Management

Data Moneytizing

I like the thinking behind GDPR. The notion that your data was something that could be harvested, processed, bought and sold, without you having a say in the matter, was imbalanced, to say the least.

Data-monetisers have been good at following the letter of the law whilst ignoring its spirit, which is why its newly expressed spirit — of non-ambiguous clarity and agreement — that is so powerful.

Data Warehouse

The principle of advertising

Meanwhile, I don’t really have a problem with the principle of advertising.

A cocktail menu in a bar could be seen as context-driven, targeted marketing, and rightly so as the chances are the people in the bar are going to be on the look-out for a cocktail.

The old adage of 50% of advertising is wasted (but nobody knows which 50%) helps none so, sure, let’s work together on improving its accuracy.


GDPR – A Regulatory Process

CIO Report: Enterprise Data Protection

The challenge, however, comes from the nature of our regulatory processes. GDPR has been created across a long period of time, by a set of international committees with all of our best interests at heart.

The resulting process is not only slow but also and inevitably, a compromise based on past use of technology. Note that even as the Cambridge Analytica scandal still looms, Facebook’s position remains that it acted within the law.

PII (personally identifiable information)

Even now, our beloved corporations are looking to how they can work within the law and yet continue to follow the prevailing mantra of the day, which is how to monetize data.

This notion has taken a bit of a hit, largely as now businesses need to be much clearer about what they are doing with it. “We will be selling your information” doesn’t quite have the same innocuous ring as “We share data with partners.”

To achieve this, most attention is on what GDPR doesn’t cover, notably around personally identifiable information (PII).

In layperson’s terms, if I cannot tell who the specific person is that I am marketing to, then I am in the clear.

I might still know that the ‘target’ is a left-leaning white male, aged 45-55, living in the UK, with a  propensity for jazz, an iPhone 6 and a short political fuse, and all manner of other details.

But nope, no name and email address, no pack-drill.



Or indeed, I might be able to exchange obfuscated details about a person with another provider (such as Facebook again), which happen to match similarly obfuscated details — a mechanism known as hashing.

As long as I am not exchanging PII, again, I am not in breach of GDPR. This is all well and good apart from the fact that it just shows how advertisers don’t need to know who I am in order to personalize their promotions to me specifically.

As I say, I don’t really have a problem with advertising done right (I doubt many people do): indeed, the day on which sloppy retargeting can be consigned to the past (offering travel insurance once one has returned home, for example) cannot come too soon.

However, I do have a concern, that the regulation we are all finding so onerous is not actually achieving one of its central goals.

What can be done about this? I think the answer lies in renewing the contractual relationship between supplier and consumer, not in terms of non-ambiguity over corporate use of data, but to recognise the role of the consumer as a data supplier.

Essentially, if you want to market to me, then you can pay for it — and if you do, I’m prepared to help you focus on what I actually want.

Man selling his Facebook data on eBay ITV

We are already seeing these conversations start to emerge.

Consider the recent story about a man selling his Facebook data on eBay; meanwhile, at a recent startup event I attended, an organization was asked about how a customer could choose to reveal certain aspects of their lifestyle, to achieve lower insurance premiums.

And let’s not forget AI. I’d personally love to be represented by a bot that could assess my data privately, compare it to what was available publicly, then perhaps do some outreach on my behalf.

Remind me that I needed travel insurance, find the best deal and print off a contract without me having to fall on the goodwill of the corporate masses.

What all of this needs is the idea that individuals are not simply hapless pawns to be protected (from where comes the whole notion of privacy), but active participants in an increasingly algorithmic game.

Sure, we need legislation against the hucksters and tricksters, plus continued enforcement of the balance between provider and consumer which is still tipped strongly towards “network economy” companies.

But without a recognition that individuals are data creators, whose interests extend beyond simple privacy rights, regulation will only become more onerous for all sides, without necessarily delivering the benefits they were set out to achieve.

P.S. Cocktail, anyone? Mine’s a John Collins.

Follow @jonno on Twitter.


Data APIs

Data APIs: Gateway to Data-Driven Operation and Digital Transformation

Enterprises everywhere are on a quest to use their data efficiently and innovatively, and to maximum advantage, both in terms of operations and competitiveness.

The advantages of doing so are taken on authority and reasonably so. Analyzing your data helps you better understand how your business actually runs.

Such insights can help you see where things can improve, and can help you make instantaneous decisions when required by emergent situations.


You can even use your data to build predictive models that help you forecast operations and revenue, and, when applied correctly, these models can be used to prescribe actions and strategies in advance.

That today’s technology allows businesses to do this is exciting and inspiring. Once such practice becomes widespread, we’ll have trouble believing that our planning and decision-making weren’t data-driven in the first place.

bumps on road

Bumps in the Road

But we need to be cautious here. Even though the technological breakthroughs we’ve had are impressive and truly transformative, there are some dependencies – prerequisites – that must be met in order for these analytics technologies to work properly.

If we get too far ahead of those requirements, then we’ll we will not succeed in our initiatives to extract business insights from data.

The dependencies concern the collection, the cleanliness, and the thoughtful integration of the organization’s data with the analytics layer.

And, in an unfortunate irony, while the analytics software has become so powerful, the integration work that’s needed to exploit that power has become more difficult.


From Consolidated to Dispersed

The reason for this added difficulty is the fragmentation and distribution of an organization’s data. Enterprise software, for the most part, used to run on-premises and much of its functionality was consolidated into a relatively small stable of applications, many of which shared the same database platform.

Integrating the databases was a manageable process if proper time and resources were allocated.

But with so much enterprise software functionality now available through Software as a Service (SaaS) offerings in the cloud, bits, and pieces of an enterprise’s data are now dispersed through different cloud environments on a variety of platforms.

Pulling all of this data together is a unique exercise for each of these cloud applications, multiplying the required integration work many times over.

Even on-premises, the world of data has become complex. The database world was dominated by three major relational database management system (RDBMS) products, but that’s no longer the case.

Now, in addition to the three commercial majors, two open-source RDBMSs have joined them in Enterprise popularity and adoption.

And beyond the RDBMS world, various NoSQL databases and Big Data systems, like Hadoop and MongoDB, have joined the on-premises data fray.

Set Goal for Self-motivation

A Way Forward

A major question emerges. As this data fragmentation is not merely an exception or temporary inconvenience, but rather the new normal, is there a way to approach it holistically?

Can enterprises that must solve the issue of data dispersal and fragmentation at least have a unified approach to connecting to, integrating, and querying that data?

While an ad hoc approach to integrating data one source at a time can eventually work, it’s a very expensive and slow way to go, and yields solutions that are very brittle.

In this report, we will explore the role of application programming interfaces (APIs) in pursuing the comprehensive data integration that is required to bring about a data-driven organization and culture.

We’ll discuss the history of conventional APIs and the web-standards that most APIs use today. We’ll then explore how APIs and the metaphor of a database with tables, rows, and columns can be combined to create a new kind of API.

And we’ll see how this new type of API scales across an array of data sources and is more easily accessible than older API types, by developers and analysts alike.


Data Lake

Using Your Whole Data Lake: How the Operational Facilitates the Predictive

Many companies in the corporate world have attempted to set up their first data lake.  Maybe they bought a Hadoop distribution, and perhaps they spent significant time, money, and effort connecting their CRM, HR, ERP, and marketing systems to it.

And now that these companies have well-crafted, centralized data repositories, in many cases…they just sit there.

Disturbance in calm water

But maybe data lakes fall into disuse because they’re not being looked at for what they are.  Most companies see data lakes as auxiliary data warehouses.

And, sure, you can use any number of query technologies against the data in your lake to gain business insights.

But consider that data lakes can – and should – also serve as the foundation for operational, real-time corporate applications that embed AI and predictive analytics.

Water Ripples

These two uses of data lakes — for (a) operational applications as well as for (b) insights and predictive analysis — aren’t mutually exclusive, either. With the right architecture, one can dovetail gracefully into the other.

But what database technologies can query and analyze, build machine learning models, and power microservices and applications directly on the data lake?

Join us for this free 1-hour webinar from GigaOm Research.  The Webinar features GigaOm analyst Andrew Brust, and Splice Machine CEO and Co-Founder, Monte Zweben.

The discussion will explore how to leverage data lakes as the underpinning of application platforms, driving efficient operations, and predictive analytics that supports real-time decisions.

Data Virtualization

In this 1-hour webinar, you will discover:

  • Why data latency is the enemy and data currency is key to digital transformation success
  • Why operational database workloads, analytics, and construction of predictive models should not be segregated activities
  • How operational databases can support continually trained predictive models

Register now to join GigaOm Research and Splice Machine for this free expert webinar.

Enterprise Data Governance with Modern Data Catalog Platforms: A GigaOm Research Byte

Who Should Attend:

  • CIOs
  • CTOs
  • Chief Data Officers
  • Digital Transformation Facilitators
  • Application Developers
  • Business Analysts
  • Data Engineers
  • Data Scientists


Self Service Master Data Management

Self-Service Master Data Management

Once data is under management in its best-fit leverageable platform in an organization, it is as prepared as it can be to serve its many callings. It is in a position to be used for purposes operationally and analytically and across the spectrum of need.

Ideas emerge from business areas no longer encumbered with the burden of managing data, which can be 60% – 70% of the effort to bring the idea to reality. Walls of distrust in data come down and the organization can truly excel with an important barrier to success removed.

Data Lake


An important goal of the information management function in an organization is to get all data under management by this definition and to keep it under management as systems come and go over time.

Master Data Management

Master Data Management (MDM) is one of these key leverageable platforms. It is an elegant place for data with widespread use in the organization. It becomes the system of record for the customer, product, store, material, reference, and all other non-transactional data.

MDM data can be accessed directly from the hub or, more commonly, mapped and distributed widely throughout the organization. This use of MDM data does not even account for the significant MDM benefit of efficiently creating and curating master data, to begin with.

MDM benefits are many, including hierarchy management, data quality, data governance/workflow, data curation, and data distribution. One overlooked benefit is just having a database where trusted data can be accessed.

Like any data for access, the visualization aspect of this is important. With MDM data having a strong associative quality to it, the graph representation works quite well.

Data Virtualization

Graph Technology

Graph traversals are a natural way of analyzing network patterns. Graphs can handle high degrees of separation with ease and facilitate visualization and exploration of networks and hierarchies.

Graph databases themselves are no substitute for MDM as they provide only one of the many necessary functions that an MDM tool does.

However, when graph technology is embedded within MDM, such as what IBM is doing in InfoSphere MDM – similar to AI (link) and blockchain (link) – it is very powerful.

Graph technology is one of the many ways to facilitate self-service to MDM. Long a goal of business intelligence, self-service has significant applicability to MDM as well. Self-service is opportunity-oriented.

Users may want to validate a hypothesis, experiment, innovate, etc. Long development cycles or laborious processes between a user and the data can be frustrating.

Self Service BI

Historically, the burden for all MDM functions has fallen squarely on a centralized, development function. It’s overloaded and, as with the self-service business intelligence movement, needs disintermediation.

Data Management

IBM is fundamentally changing this dynamic with the next release of Infosphere MDM. Its self-service data import, matching, and lightweight analytics allow the business user to find, share and get insight from both MDM and other data.

Then there’s Big Match.

Big Match can analyze structured and unstructured customer data together to gain deeper customer insights. It can enable fast, efficient linking of data from multiple sources to grow and curate customer information.

The majority of the information in your organization that is not under management is unstructured data. Unstructured data has always been a valuable asset to organizations, but it can be difficult to manage.

Unfit Docs

Emails, documents, medical records, contracts, design specifications, legal agreements, advertisements, delivery instructions, and other text-based sources of information do not fit neatly into tabular relational databases.

Most BI tools on MDM data offer the ability to drill down and roll up data in reports and dashboards, which is good. But what about the ability to “walk sideways” across data sources to discover how different parts of the business interrelate?

Unstructured Data

Using unstructured data for customer profiling allows organizations to unify diverse data from inside and outside the enterprise—even the “ugly” stuff; that is, dirty data that is incompatible with highly structured, fact-dimension data that would have been too costly to combine using traditional integration and ETL methods.

Data Management Strategy

Finally, unstructured data management enables text analytics, so that organizations can gain insight into customer sentiment, competitive trends, current news trends, and other critical business information.

In-text Analytics

In-text analytics, everything is fair game for consideration, including customer complaints, product reviews from the web, call center transcripts, medical records, and comment/note fields in an operational system.

Combining unstructured data with artificial intelligence and natural language processing can extract new attributes and facts for entities such as people, location, and sentiment from text, which can then be used to enrich the analytic experience.

All of these uses and capabilities are enhanced if they can be provided using a self-service interface that users can easily leverage to enrich data from within their apps and sources. This opens up a whole new world for discovery.

With graph technology, distribution of the publishing function and the integration of all data including unstructured data, MDM can truly have important data under management, empower the business user, be the cornerstone to digital transformation and truly be self-service.


Master data management and machine learning

Master Data Management Joins the Machine Learning Party

In a normal master data management (MDM) project, a current state business process flow is built, followed by a future state business process flow that incorporates master data management.

The current state is usually ugly as it has been built piecemeal over time and represents something so onerous that the company is finally willing to do something about it and inject master data management into the process.

Many obvious improvements to process come out of this exercise and the future state is usually quite streamlined, which is one of the benefits of MDM.

I present today that these future state processes are seldom as optimized as they could be.

Consider the following snippet, supposedly part of an optimized future state.

This leaves in the process four people to manually look at the product, do their (unspecified) thing and (hopefully) pass it along, but possibly send it backwards to an upstream participant based on nothing evident in particular.

The challenge for MDM is to optimize the flow. I suggest that many of the “approval jails” in business process workflow are ripe for reengineering.

What criteria are used? It’s probably based on data that will now be in MDM.

If training data for machine learning (ML) is available, not only can we recreate past decisions to automate future decisions, we can look at the results of those decisions and take past outcomes and actually create decisions in the process that should have been made and actually do them, speeding up the flow and improving the quality by an order of magnitude.

This concept of thinking ahead and automating decisions extends to other kinds of steps in a business flow that involve data entry, including survivorship determination.

As with acceptance & rejection, data entry is also highly predictable, whether it is a selection from a drop-down or free-form entry. Again, with training data and backtesting, probable contributions at that step can be manifested and either automatically entered or provided as default for approval.

The latter approach can be used while growing a comfort level.

Manual, human-scale processes, are ripe for the picking and it’s really a dereliction of duty to “do” MDM without significantly streamlining processes, much of which is done by eliminating the manual.

As data volumes mount, it is often the only way to not watch process time increase over time. At the least, prioritizing stewardship activities or routing activities to specific stewards based on an ML interpretation of past results (quality, quantity) is required.

This approach is paramount to having timely, data-infused processes.

As a modular and scalable trusted analytics foundational element, the IBM Unified Governance & Integration platform incorporates advanced machine learning capabilities into MDM processes, simplifying the user experience and adding cognitive capabilities.

Machine learning can also discover master data by looking at actual usage patterns. ML can source, suggest or utilize external data that would aid in the goal of business processes.

Another important part of MDM is data quality (DQ). ML’s ability to recommend and/or apply DQ to data, in or out of MDM, is coming on strong.

Name-identity reconciliation is a specific example but generally, ML can look downstream of processes to see the chaos created by data lacking full DQ and start applying the rules to the data upstream.

IBM InfoSphere Master Data Management utilizes machine learning to speed the data discovery, mapping, quality and import processes.

In the last post (link), I postulated that blockchain would impact MDM tremendously. In this post, it’s machine learning affecting MDM. (Don’t get me started on graph technology).

Welcome to the new center of the data universe.

MDM is about to undergo a revolution.

Products will look much different in 5 years.

Make sure your vendor is committed to the MDM journey with machine learning.