Building Reliable Customer Data Tools

In the last few years, enterprises have added several tools to their IT data integration toolbox to improve the return on their CRM investments. But most of these companies have not achieved a simple goal: creating reliable, unified views of their customers — aggregated across data silos — and delivering these to all customer-facing applications in a timely fashion.

Recently companies have turned to three common technologies to create solutions for customer data integration. These are data movement tools such as extract-transform-load (ETL), data query and aggregation tools such as enterprise information integration (EII) and data quality (DQ).

What the tool vendors aren’t telling you, however, is that these tools are woefully inadequate for developing a reliable customer data integration (CDI) platform.

Customer Hubs Emerging

Industry Market Research firm Gartner defines CDI as “the combination of the technology, processes and services needed to create and maintain an accurate, timely and complete view of the customer across multiple channels, business lines and enterprises, where there are multiple sources of customer data in multiple application systems and databases.”

There are several implementation styles of CDI solutions, but the most effective are those in which an enterprise commits to building and managing a customer hub that serves as a central repository of customer data reconciled from multiple data sources.

This hub may contain some or all of the critical customer data needed to provide multiple customer views to downstream applications. While there are significant differences among the various customer hubs available, there is little doubt that a large, enterprise-class CDI solution needs a central customer hub.

Inefficient Tools

In the past decade, many companies that tried to build an in-house version of a customer data integration hub using ETL, EII and DQ tools are now struggling with the aftermath of a custom solution. There are several reasons for the failure of CDI solutions built with these tools.

First, all three technologies originated for narrow purposes ill-suited for CDI: ETL to move large volumes of data in batch-mode; EII to run distributed queries across disparate sources in real-time; and DQ tools to “scrub” incorrect names and addresses in a single source at a time.

Each of these technologies effectively supports only a single data modality, batch or real-time. Since customer data is inextricably tied to both operational and strategic business processes, such as order-to-cash process or profitability segmentation analysis, it needs to be delivered in time for each business process.

Therefore any customer data integration solution needs to support a range of modalities of data movement: from a large-volume batch process that loads a new source into a customer hub; to scheduled intra-day batches; to a publish-subscribe model for immediate updates of critical data. Tools designed for single-modality can quickly hamper the reliability and scalability of a CDI solution.

Different Data Types

To build a reliable CDI solution, it is imperative to treat separately different types of data, such as master reference data, relationship data or transaction data.

Master reference data is the foundational entity data (such as name and address) that is critical for uniquely identifying a customer across multiple systems and channels. Without a persistent and trustworthy hub of customer reference or profile data that serves as the “system of record,” other types of data cannot be aggregated reliably.

Ideally, such a master store should create and maintain the record for each customer culled from all relevant internal and external data sources along with the associated cross-reference keys. This store then becomes the best source for customer profile information for all downstream operational and analytical applications.

The next type of data is relationship or hierarchy data. This type of data defines the relationships among various entities (such as individual to organization, organization to organization, or individuals within households). Relationship data can be managed reliably across different sources only after the underlying conflicts of master (entity) data have been resolved. Most of the custom solutions deployed have fixed relationships among entities embedded in the system’s data model, which makes it hard for IT to manage changes in customer relationships and affiliations.

The third type of data is transaction or activity data (such as amount withdrawn from an account). Although there are significant challenges in managing large volumes of transaction data, there is usually little conflict in reconciling such data since there is an unambiguous system of record for each type of transaction.

The key issue lies in attaching these transactions correctly to the same customer across multiple CRM touchpoints and then aggregating them accurately for other applications to consume (such as the average account balance). Note that transactions can be aggregated for the right customer or household only after the ambiguities of the associated master and relationship data have been removed.

Essentially, without treating different data types separately and establishing a reliable foundation of master data at the start, a trustworthy CDI platform cannot be built.

But none of the data tools maintain separation of data types. For instance, ETL tools neither recognize nor treat master data apart from other types of data. EII tools assume that all federated data results are clean and unambiguous. In fact, they rely on an external source to provide correct cross-reference keys and global IDs to join the results of a federated query. DQ tools provide ad-hoc cleansing of a source but do not recognize data types nor offer ongoing management of data changes.

The Challenge of Data Models

One of the key reasons custom solutions are inextensible is because of their instantiation of a fixed data model in a physical database repository or data warehouse. This fate is also shared by “packaged” CDI solutions offered by application vendors such as Siebel, Oracle and SAP.

In a large enterprise, rarely does a single vendor have access to all sources of customer data, external and internal. Therefore, standardizing on the application vendor data model means more, not less, work since every data source outside the vendor application has to be transformed to feed into the vendor’s customer data hub.

The best approach is to create a template-driven, logical data model specifically for each enterprise reflecting all of its specific customer data sources that need to be integrated. Ultimately, the solution provider has to deliver a data model and a solution framework cognizant of the needs of each major industry vertical. None of these data tools attempt to address the challenge of data models for a diverse set of data sources encountered in various verticals.

Needed: Meta-Data Driven Framework

The most fundamental shortcoming of the trio of data tools (ETL, EII, DQ) is the fact that they do not offer a meta-data framework for managing the complete set of data management tasks required of customer data integration solution. Each of these tools, along with the numerous enterprise application integration (EAI) technologies, solves only a narrow integration issue within the IT “stack” — integrating application to application, moving data to single warehouse, cleansing a single source, etc.

A comprehensive CDI framework must include the tools needed for all processes associated with managing different data types. For example, the framework should address the complete lifecycle of master reference data: model, cleanse, match, merge, share, extend and manage. The solution should allow customer and organization hierarchies across data sources to be leveraged instead of tied to a fixed hierarchical view of an implementation. The solution should readily access all relevant customer activity data and accurately unify it with other data types for a complete view (through caching or aggregation).

For the solution to manage data changes without software programming efforts, it must be driven by meta-data that captures the data syntax, semantics and business rules that are relevant to integrating customer data into unified views. It is important to maintain the distinction between managing meta-data through a generalized meta-data tool versus having a meta-data driven framework designed for a specific purpose (such as CDI).

A meta-data driven framework captures, stores and uses highly contextual meta-data tied to a business purpose (such as when and by whom a customer address was changed). By separating meta-data from its business context, a generalized meta-data tool often limits its business value.

The key advantage of using a meta-data driven CDI framework is that it renders the solution entirely configurable, so that business and IT changes can be implemented rapidly without writing code. Since the CDI framework is manageable by business analysts and data stewards as well as by IT, such a solution becomes the successful foundation for all unified customer views in an enterprise. Additional data sources are easy to add, without additional programming, as businesses evolve through mergers and acquisitions.

Because the custom CDI solutions built with ETL-EII-DQ tools are not meta-data driven, they are not manageable by data stewards, are hard to configure and are generally not extensible beyond a handful of sources.

Service-Oriented Architecture

Finally, if a customer data hub is to be the central repository of critical customer information for other systems, it needs to have critical capabilities to synchronize reliable data back to source systems.

In addition, such a CDI solution needs to support standards-based, service-oriented architecture (SOA) so that its underlying data services may be used by future service-oriented applications. Typically none of the hubs built by data tools offer these critical capabilities, ensuring their quick obsolescence.

Plumbers and Architects

Although necessary components of the data integration architecture, ETL, EII and DQ tools are not designed, nor able, to build a trustworthy foundation for customer data integration. For the same reason you wouldn’t hire a plumber to build your house, organizations should not rely primarily on these technologies when developing a reliable customer data foundation.

Like plumbing in a house, the tools that push data through the pipes are not representative of the overarching blueprint needed for customer data integration architecture. The cornerstone of the architecture is the recognition that different types of data need to be treated separately.

Additionally, data reliability can only be maintained through a set of best practices that first put in place the bedrock of reliable customer master reference data. A solution that has a flexible data model supported by a meta-data driven, configurable framework is the best way to construct such a foundation. Once built, it should be easily manageable by data stewards and extensible to emerging service-oriented architecture standards and therefore to new business conditions.

Before hiring a customer data integration “plumber” to build your customer foundation, take the time to evaluate a data architecture expert who can build a solid foundation from which to achieve your customer data integration goals.


Anurag Wadehra is the Vice President of Marketing at Siperian, a leading customer data integration solution provider.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories

How often do you update your passwords?
Loading ... Loading ...

CRM Buyer Channels