Raising the Bar on Data Accuracy and Quality

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0

As cross-device and the martech stack that services it continue to evolve, better approaches to data signals and graphing algorithms are emerging. Because more sophisticated approaches are becoming a reality—the bar is higher. Unfortunately, not everyone is aiming for a standard of excellence. There’s still a lot of inaccuracy that goes unchecked, as many solution providers deliver data fraught with issues, for which no one holds them accountable. The byproduct of such unchecked accuracy is a very blurry state of affairs for the marketer. The only way that the solution provider community will universally commit to quality is through accountability to a standard of accuracy and quality. By getting smart on what to expect and asking the right questions, an educated buy side can play a part and help drive that accountability, in turn clearing the blur. As a starting point, it’s worth taking a look at what a buyer can expect and which questions to ask, when vetting solutions.

What the Buyer Should Expect

As a quick tip, when a potential solution provider is talking about error rates—listen closely, for not all errors are created equal. For example “False Positive”—associating a device to the incorrect entity (consumer or household) is far worse than “False Negative” – creating a new entity for a device that should have been associated to an existing entity. Simply stated, if you assign my device to a 65-year-old woman living across the country, it significantly pollutes any data/model tied to that group as opposed to creating a new “consumer” for my device.

Trusty Trio: Occurrence, Concurrence and Persistence

The first rule of probabilistic data accuracy in our realm is that it comes down to three points: occurrence, concurrence and persistence. All of the critical questions you pose to your solution provider will relate to these points.

Data accuracy is such a hot topic right now, and everyone wants to have an answer. As a result, we see a lot of overselling or overstating of accuracy. We see this in the common practice of promoting one’s accuracy level as a single number, say: 95%. “We are 95% accurate!” Slow down. The first thing for marketers and agencies to understand is that accuracy cannot come down to one number fixed in time. This is where “occurrence” enters the picture. When you are presented with a single number, it’s important to realize (or have someone honestly tell you) that this figure is representative of a subset of your graph (match rate – rate at which provider is able to map your ID in their graph) and is exactly at a moment in time. Algorithms are at play; time is a factor. Therefore, we must define and analyze timeframes.

To get a beat on accuracy you can count on, frequent occurrence is a critical part of any cross-screen platform that claims to analyze inputs, or group data, across screens. Without frequent occurrence, numbers that attempt to capture associations are not reliable because they cannot possibly be accurate. “Rare” signals are problematic. This is a key principle on which to engage your solution provider.

Moving on to the next point to query—we have “concurrence.” This is defined at the platform level and is something to recognize when assessing the potential for your provider’s accuracy. Simply put, associated devices must show up together within a small window. Concurrent sighting of devices in a narrowly defined scope is the only valid way to associate devices with confidence. One thing that happens within a badly run model is that devices may be inappropriately added to a group when they don’t exist with the same devices in another group. As an example, I was at a party at my neighbor’s 2 months ago and hopped on their WiFi for a few minutes. This should result in low levels of confidence.

And finally, we have “persistence,” the ability to steadily glean and never lose track of a given identifier, representing a device, across the data ecosystem. All measures of accuracy must take relative ID persistency into consideration in order to be valid. As we know in the marketing ecosystem, not all IDs are created equal. Cookies behave differently than mobile identifiers; behave differently between types of devices and/or browsers. Reliable and persistent identifiers are critical in creating a confident association between devices for a long period of time.

PS: The Truth about Your Truth Set

With all of the above pertaining to imputed or probabilistic data, there’s one very important note to be made about deterministic data, the data set that often is brought to the table, to integrate to the model. Otherwise known as the “truth set,” this set includes any number of self-reported data points (i.e. log-ins, demographics, x, y and z). It’s important to apply scrutiny to this piece as well, striving for greater and greater accuracy, as we often measure against our deterministic truth set. It’s the core that’s got to hold.

Day to day, there’s clearly a need for cross-check, as we strive for a collective long-term commitment to accuracy in our industry. At a base level, with new more powerful data sets and smarter varieties of device and location graphing emerging—it’s time players that rely on handful of attributes like IP and cookies own up to the accuracy and quality problems these outdated methods can create. But, even more importantly, marketers and their agencies need to know how to engage with data solution providers, ask the right questions and get them answered. Only then can we get to the desired prevailing state—one of accuracy and quality.


Manish Ahuja is the Chief Product Officer at Qualia.

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0

Comments are closed.

Back to Top ↑