Accurate master data is critical foundation for running any targeted analytics around your customers, products, etc.
Importance of MDM / Master Data in today’s Big data analytics

What role does MDM play for organizations?

Most of the MDM initiatives serve to an organizational business need around implementing new business strategy or transforming your existing strategy. These organizational needs come from various scenarios including, a) preparing for your first product launch (e.g.: businesses need to focus on selecting/optimizing right sales channels and distribution channels, figuring out right customer segments based on behavioral, psychograhic, demographic, geographic, firmographic attributes, enabling better product access, etc.), b) expanding your portfolio in new markets (e.g.: Introducing new customer channels, figuring out new relationships, new processes, metrics, and KPIs, etc.), c) transforming your customer centric strategy (e.g.: Introducing new digital channels to create personalized patient experience in lifesciences through precision medicine and targeted therapies), d) transforming quickly to accommodate new policies and regulations (e.g.: industry specific compliance such as HIPAA/PHI/PDRP in health care, regional laws such as GDPR, etc.), e) integrating companies during mergers and acquisitions (e.g.: reconciling  cultures, processes, data, systems, operations, etc.), and many more flavors of these earlier scenarios….. Each of these initiatives aim towards improving business operations or making critical business decisions or actions to grow business leveraging strong business insights. Data management in these organizations implements a discipline of bringing the maximum value out of the data by allowing relevant, timely, and accurate insights that is useful for business decision making. MDM is a critical component of the organizational data management because it provides ability to classify and segment your key business entities such as customers, products, partners, etc. Transactions or Interactions alone can not provide meaningful insights with out drilling down into the details of the entity classification. Some of the key business entities include Customers, Products, Suppliers, Vendors, Influencers, and others.

Why MDM is important in Big Data Analytics?

Most of the organizations are building data lakes quickly by pooling all various kinds of structured/semi-structured/unstructured data from various internal/external data channels in the organization. Different cloud platforms are providing Hadoop based cloud native solutions to make this journey easier for data lake architecture. Some of these solutions offers data cataloging and meta data management to facilitate data governance functions through search, review, certification, collaboration, data requests, etc. Organizations are following some data engineering practices to perform data cleanup, standardization, data imputation, etc. to prepare the data ready for various data analytics and data science use cases. Is this data from data lakes ready to perform every type of analytics or predictions about your future customer?. As you are moving towards personalization of customers, it is more important to understand your customer uniquely across your channels before you run various types of personalized and targeted analytics. Customer lifetime value (CLV) is one of key metrics that is used in personalized / targeted analytics. Customer segmentation is very critical part of the marketing strategy. Knowledge graph models are foundations to drive influential analytics around customer to customer interactions and networks. Many of these critical processes require you identify your customer uniquely before you get relevant and accurate insights around your customers and their influencers. Some of the organizations are integrating this process in the structured data warehouse publishing layer, which should be fine as long as there are is a governance around how certain analytics should pull the accurate data. Some are deferring this process to be done in the analytics and data science pipelines as an adhoc matching and de-duplication step. This is not efficient and effective because i) there could be multiple teams who need to duplicate this approach, 2) this may not be systematic or thorough approach, 3) you can not map de-duplicated data back to your relationships or graphs for further analytics around influence. Hence this is more efficient and justifiable to integrate MDM as a separate capability upstream along with other data engineering process around data lake or data warehouse depending on your architecture. This is one of the critical investment towards accomplishing data quality especially when you are considering data as an organizational asset.

Given MDM is an organizational data asset and is very critical component to deliver analytics success, you will need to follow some sort governance based on your size and structure of the analytics organization. This governance will help you connect all of your key functional groups and their needs around analytical data requirements, prioritize their needs, help build consistent definitions of business entities, process definitions, metrics, etc., help manage your data catalogs, consolidate all of your data inputs into a rational size, many more of these. This can be started as a voluntary effort from one of the team members in start-ups and can expand to a central dedicated group to handle this governance across your data management needs including master data, transactions, interactions, etc.

Scope of the MDM could vary depending on your initial needs around key entities that need to master, amount of data you have, level of cross references or IDs common between sources, reference data that you need to tie to describe various types of customer segments, analytical attributes that you need to consolidate at the customer level, relationships that need to manage between various customer groups or other entities based on real world interactions, predictions, etc., and your integration requirements into various analytical applications and other customer centric applications such as CRM. You could build your custom process if your requirements are simple with small set of data with cross references between sources OR you can adopt several MDM solutions in the market that can integrate with in your cloud/on-premise architectures. You will also need some sort of stewardship efforts to address the quality issues around customer data and also to resolve any potential conflicts around multiple duplicate scenarios that could not be judged through automatic matching process. The result from the MDM process for example, customer master data includes, i) consolidated/de-duplicated profiles including core demographics, additional behaviors, psychographic, geographic and firmographic segments, analytical attributes, other reference data, ii) reference data from customer to other customer types such as influence, reporting hierarchy, etc. iii) cross reference or lineage connecting the source system to common MDM identifier for a particular customer.  This cross reference data plays a vital role in integrating all of your big data in your data lake or data warehouse to derive accurate and relevant insights around customers and other entities.