We’re excited to bring Transform 2022 back in person on July 19 and virtually July 20 – August 3. Join AI and data leaders for insightful conversations and exciting networking opportunities. Learn more
In a previous post, I described how you can ensure that marketers minimize bias when using AI. When bias creeps in, it has a significant impact on efficiency and ROAS. That’s why it’s critical for marketers to develop concrete steps to ensure minimal bias in the algorithms we use, whether it’s your own AI or third-party AI solutions.
In this post, we’re going to take the next step and document the specific questions to ask an AI vendor to make sure they minimize bias. These questions can be part of a RFI (request for information) or RFP (request for proposal) and can serve as a structured approach to periodic assessments of AI suppliers.
Marketers’ relationships with AI suppliers can take many forms, varying in which building blocks of AI are internal or external. At one end of the spectrum, marketers often use AI that is completely out-of-the-box from a supplier. For example, marketers can campaign against an audience pre-built in their demand-side platform (DSP), and that audience can be the result of a look-alike model based on a seed set of vendor audience data.
At the other end of the spectrum, marketers can choose to use their own training dataset, do their own training and testing, and simply use a third-party technical platform to manage the process, or “BYOA” (“Bring Your Own Algorithm”, a growing trend) to a DSP. There are many flavors in between, such as providing first-party data from marketers to a vendor to build a custom model.
The list of questions below is for the scenario where a marketer uses a fully baked, off-the-shelf AI-powered product. That’s largely because these scenarios are most likely to be presented as a black box to a marketer and thus carry the most uncertainty and possibly the greatest risk of undiagnosed bias. Black boxes are also more difficult to distinguish, making supplier comparisons very difficult.
But as you’ll see, all of these questions are relevant to any AI-based product, no matter where it’s built. So if parts of the AI building process are internal, the same questions are important to ask internally as part of that process.
Here are five questions to ask vendors to ensure they minimize AI bias:
1. How do you know if your training data is correct?
When it comes to AI, trash in, trash out. Having excellent training data does not necessarily mean excellent AI. However, having bad training data guarantees bad AI.
There are several reasons why certain data can be bad for training, but the most obvious is if it is inaccurate. Most marketers don’t realize how much inaccuracy there is in the datasets they rely on. In fact, the Advertising Research Foundation (ARF) just published a rare survey of demographic accuracy across the industry, and the findings are eye-openers. Industry-wide data on “presence of children at home” is inaccurate 60% of the time, “married single” status is incorrect 76% of the time and “small business ownership” is incorrect 83% of the time! To be clear, these are not results from models predicting these consumer identifiers; rather, they are inaccuracies in the datasets supposedly used to train models!
Inaccurate training data confuses the algorithm development process. Suppose an algorithm optimizes dynamic creative elements for a travel campaign based on geographic location. For example, if the training data is based on inaccurate location data (a common occurrence with location data), a consumer in the US Southwest may have responded to an ad about a road trip to a beach in Florida, or that a consumer in Seattle responded to a fishing trip in the Ozark Mountains. That will result in a very confused model of reality, and thus a sub-optimal algorithm.
Never assume that your information is correct. Consider the source, compare it with other sources, check for consistency and verify against sets of truth where possible.
2. How do you know that your training data is thorough and diverse?
Good training data also needs to be thorough, meaning you need lots of examples of every conceivable scenario and outcome you’re trying to achieve. The more thorough, the more confident you can be of patterns you find.
This is particularly relevant for AI models built to optimize rare outcomes. Freemium mobile game download campaigns are a good example of this. Games like this often rely on a small percentage of “whales”, users who make a lot of in-game purchases, while other users make little or no purchases. In order to train an algorithm to find whales, it is very important to ensure that a dataset contains a lot of examples of whale consumer journeys so that the model can learn the pattern of who eventually becomes a whale. A training data set is undoubtedly biased against non-whales because they are so much more common.
Another angle to add to this is diversity. For example, if you’re using AI to market a new product, your training data is likely to be mostly early adopters, which can bias certain aspects in terms of HHI (family income), life cycle, age, and other factors. If you’re trying to “bridge the gap” to a more mainstream consumer audience with your product, it’s critical to ensure you have a diverse set of training data that includes not only early adopters, but a more representative audience for later users.
3. What tests have been done?
Many companies focus their AI testing on the overall success of algorithms, such as accuracy or precision. Sure, that’s important. But specifically for bias, testing can’t stop there. A good way to test for bias is to document specific subgroups that are critical to primary use cases for an algorithm. For example, if an algorithm is set up to optimize for conversion, we may want to run separate tests for items with large tickets versus small items for tickets, or for new customers versus existing customers, or for different types of creatives. Once we have that list of subgroups, we need to track the same set of algorithm success metrics for each individual subgroup, to find out where the algorithm is underperforming significantly than it generally does.
The recent Interactive Advertising Bureau (IAB) report on AI Bias provides an in-depth infographic to guide marketers through a decision tree process for this subgroup testing methodology.
4. Can we do our own test?
If a marketer is using a vendor’s tool, it is highly recommended not only to trust that vendor’s testing, but to run your own testing, using a few key subsets that are specifically critical to your company.
It is essential to monitor the performance of algorithms in subgroups. It is unlikely that the performance between them will be identical. If not, can you live with the different levels of performance? Should the algorithm be used only for certain subgroups or use cases?
5. Have you tested for bias on both sides?
When I think about potential implications of AI bias, I see risks to both input into an algorithm and output.
As for input, imagine using a conversion optimization algorithm for a high-consideration product and a low-consideration product.
An algorithm can be much more successful in optimizing low-profile products because all consumer decisions are made online and thus there is a more direct path to purchase.
For a high-quality product, consumers can research offline, visit a store, talk to friends, and so there is a much less direct digital path to buy, and so an algorithm can be less accurate for these types of campaigns.
In terms of output, imagine a mobile commerce campaign optimized for conversion. An AI engine is likely to generate much more training data from short-tail apps (such as ESPN or Words With Friends) than from long-tail apps. So it’s possible for an algorithm to drive a campaign toward more short-tail inventory because it has better data about these apps and thus is better able to find performance patterns. Over time, a marketer may find that his or her campaign is over-indexing with expensive short-tail inventory and potentially missing out on highly efficient long-tail inventory.
it comes down to
The above list of questions can help you develop or refine your AI efforts to minimize bias. In a world more diverse than ever, it’s imperative that your AI solution reflect that. Incomplete training data, or insufficient testing, leads to sub-optimal performance, and it is important to remember that bias testing is something that must be repeated systematically as long as an algorithm is in use.
Jake Moskowitz is Vice President of Data Strategy and Head of the Emodo Institute at Ericsson Emodo.
Welcome to the VentureBeat Community!
DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.
If you want to read about the latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.
You might even consider contributing an article yourself!
Read more from DataDecisionMakers