We’re all being manipulated by A/B testing all the time
Testing out solutions is a core part of the design process, and on the web, that often happens in the form of A/B testing: Designers show one group of users design A, another group of users design B, and measure which gets closer to a desired outcome. And it’s not just layouts that get A/B tested–these experiments determine everything from the headlines we read to the colors we see.
But as a new paper discusses, there can be ethical issues with A/B testing.
“If you don’t understand how the content is changing or how the layout is changing in ways that are designed to affect your behavior, it leaves you open to manipulation,” says Christo Wilson, an associate professor at Northeastern University who studies algorithmic auditing, and the paper’s author. “This is the whole point of the A/B test. You’re trying to sell more products or get more clicks.”
In the paper, which was presented at the annual ACM Fairness, Accountability, and Transparency conference, Wilson and his co-authors analyzed 575 large websites that run A/B tests on the platform Optimizely, which makes it relatively easy for non-technical people to use A/B testing in a variety of ways. In the paper, Wilson dives into three specific case studies: advertising, price discrimination, and news headlines. While he doesn’t blame any companies or use cases for applying A/B testing unethically, Wilson does use the opportunity to elucidate how companies are using this little-examined technology to change what people see on the internet, whether prices, ads, headlines, or layout.
The problem is that, when it’s not done in a transparent, responsible way, A/B testing can leverage the worst impulses of human psychology to convince you to click on something. When it comes to political content, for instance, that kind of sensationalizing contributes to political polarization.
Many of us probably assume we see the same things as we move across the web, but that’s not necessarily the case. To see the A/B tests running on any given website you visit, you can download a Chrome browser extension called Pessimizely that will reveal everything that Optimizely is facilitating, from advertisers experimenting with which ads to show you to the New York Times testing out different headlines.
Wilson’s analysis shows that currently most websites using A/B testing are some of the biggest on the internet. And it will likely become only more ubiquitous: Using tools like Optimizely could help level the playing field for smaller companies. “I’d like to see smaller websites be able to compete with larger ones, so if A/B testing is a way to do that, that’s a good. I’m glad that people who are less technical have sophisticated capabilities–it shouldn’t just be the Googles and Facebooks,” he says. “On the other hand I’m just concerned that you have more untrained people running experiments on large audiences.”
When it comes to advertising, the possibilities for unethical uses are particularly harrowing. For instance, let’s say you post two ads for a high-paying job in a male-dominated industry. One includes an image of male gender stereotypes, another is gender-neutral. An A/B test might show you that a lot more people click on that gendered ad. That convinces you to show that ad to more people. But all of the people who click on that ad might be male, leading to a skewed population that applies for the job. And the people seeing these ads may have no idea of what’s going on in the background.
“We observe that Optimizely is used by many extremely popular websites,” write the researchers. “However, to our knowledge, visitors to these sites are never asked to explicitly consent to these experiments. Even the existence of experiments is rarely, if ever, disclosed.”
Price discrimination is another example of how what you see online might be different than what other people see: For instance, A/B testing might help optimize price for people in different zip codes, but where people live is often intertwined with their race, potentially leading to race-based pricing.
Wilson believes there need to be stricter guidelines about how these experiments are conducted. “I don’t mean to portray all of this as evil,” he says. “But we have to acknowledge that these are human experiments. You’re trying to change people’s behavior. If I was doing that as a scientist as a university, there’s protocol. You can’t just experiment on people.”
There are some simple ways that companies and A/B testing platforms like Optimizely could be clearer about some of these testing protocols, which are typical for scientists. Wilson envisions an internal company review board that makes sure tests won’t inadvertently do harm, as well as an external service a company could contract with to review research plans. An industry trade group with published guidelines could also help, or even just more training from platforms like Optimizely. He imagines these guidelines would be similar to what exist in academia, with rules about respect for people, respect for autonomy, and beneficence–the obligation to act for the benefit of others.
“This is really important because these tools are very easy to use,” Wilson says. “But it has to come with a set of caveats. If you’re going to do something like price discrimination, that’s a serious thing potentially. Are you treating people in a way that’s just and fair, or are you just ignoring that and saying all I care about is driving clicks?”
(22)