Why you need continuous experiments

In a previous post, I talked about conditioned experiments: unless you’re dealing with really huge sample sizes, you need to assign treatments based on all known variation in the backgrounds of the experiment participants. When we’re dealing with humans, traditional approaches to conditioning fall far short of what we need. That’s why Aampe provides automated conditioned treatment assignment based on whatever data you’re able to bring to the table. Conditioned experiments give you results you can trust.

In another post, I talked about connected experiments: the results of any individual experiment matter far less than our ability to use those results to make better business decisions. That’s why Aampe creates per-participant indices that track each customer propensity to react to each treatment over time.

This post is about using treatment response indices from well-conditioned experiments to create a continuous cycle of experiments. The difficulty of integrating experimentation into a business has relatively little to do with the design and analysis of the experiments themselves. It’s true that if you run a poorly-designed experiment your results will be garbage, and that if you poorly analyze data from a well-designed experiment, your results will still be garbage, but even if you run a perfectly designed experiment and get results that you’d bet your life on...what do you do next? The hardest part of running a data-driven business is knowing what to do with results when you get them.

Experiment with the things that matter most

Back in 2009, Google's visual design lead Douglas Bowman quit over the company’s use of experimentation to make decisions. Bowman didn’t disagree with the use of experiments in general. He objected to the specific ways experiments were being applied. He wrote:

Yes, it’s true that a team at Google couldn’t decide between two blues, so they’re testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that. I’ve grown tired of debating such minuscule design decisions. There are more exciting design problems in this world to tackle.

It’s easy to get caught up in running experiments on specific design elements. If you get results that show that a particular picture, text style, communication channel, or word choice performs better than alternatives, the next steps seem obvious: use the thing that worked and stop using the things that didn’t. In other words, it’s easy to think about next steps in terms of the next message. That’s not a bad thing. But it is, certainly, living below your means, for a few reasons:

Simple design and content changes are low-hanging fruit: they’re easy to change, but it’s rare to see them make a huge difference in performance. A blue button might get more clicks than a red button, but that only matters if you’re talking about a really huge number of clicks, where a marginal difference in performance still means a lot for your bottom line.

The impact of design and content changes decreases over time. If you have a subject line that really works well, you can’t just use that subject line in all future messages - people will get tired of it and start ignoring it. Design and content experiments work really well when you run an experiment on a few people in order to decide what everyone else is going to see. They are less useful when you need to decide what the same people involved in the first experiment are going to see next time.
Over time, you start to lose track of your lessons learned. It’s easy to look at the last half-dozen experiments and get a few ideas about what to try or avoid for your next message. If you’re doing experiments as often as you should, you’re not going to have a history of a half-dozen experiments. You’ll have hundreds. It becomes increasingly difficult to get long-term benefits from your experiments when you’re living message-to-message

That doesn’t mean you shouldn’t experiment with your design and content. And it doesn’t mean you shouldn’t use the results of one message experiment to decide what the next message should look like. You should do both those things. But you can do many, many more things with those results.

Create a hypothesis library to organize your messages

As I said, the dominant approach in messaging experiments is to think of experiments in terms of a whole message: you have, say, two different emails - you run the experiment, and decide which email was better. But those emails have parts. There are subject lines and attention getters and calls-to-action. There are value propositions and incentives and social proof. And, yes, there are images and fonts and colors and word choice and tone of voice and things like that. All of these things combine to make the message, and it’s the combination that your customer sees.

This situation isn’t particular to messaging. In any experiment, there’s the thing you want to test and then there’s all the stuff that has to happen alongside the thing you want to test, because the experiment can’t happen without those other things happening. As far as I’ve seen, across all types of industries and markets and fields of research, it’s left up to the scientist to differentiate those pieces and only test a certain number of variations at once. That’s really hard to do when you don’t see your experiment as any more granular than “A” and “B”.

As long as we keep the parts of the message aggregated as entire messages, we can't automate much: someone has to look at each whole and decide what part to vary next. But, if we can break down our messages into building blocks, we can provide the block and put them together in unique combinations automatically.

That’s our hypothesis library.

The building blocks of a message

To start off, let’s just recognize that crafting a message is much more complex than we might, at first, assume:

We don’t need to be scared of that complexity: Aampe is designed to let you handle it in manageable pieces, and you don’t even have to put the pieces together - we do that for you.

In the context of a business, a message has four aspects:

Content. This is information about your business. You encourage customers to consider specific offerings by referencing value propositions, extending incentives to buy, and putting forth evidence of the offerings’ desirability. This all culminates in a call to action - asking the customer to do something specific that will bring value to them and to your business.
Structure. The is the organization within which you communicate content. If you’re dealing with email, you have a subject line. You may have a greeting or some other way of getting the reader’s attention. You, of course, have the body of the message (in a more verbose channel - say an email rather than an SMS - that body might have multiple parts). Of course, your message will include a way for the reader to take action. You’ll then end with some form of signature and might also include a residual message (a post-script or “p.s.”).
Delivery. A message involves much more than the message itself: you have to consider all of the dynamics of how the message gets to the recipient. You might reach them via a particular channel. You have to consider the timing of the message as well. Often, there are recurrence considerations - what do you do if they don’t respond to the message? And, of course, you have to consider what language the recipient is most likely to respond to.
Action. Even after the message has been delivered, we have to consider the purpose of the message: the action we want the recipient to take. This might be as simple as a link click. But we may want them to fill out a form to give us some information about themselves or to enable further contact. We may want even more extensive action from them, in the form of an actual transaction - a purchase, a payment, or a trial run. We may chain actions together for a single message: ideally, the user might click a link to fill out a sign-up form which includes payment for an initial trial of our product.

‍

Both structure and content can take on specific aesthetics. Essentially the same content can vary widely in presentation through changes in word choice, arrangement of words and sentences, and the tone you take with the customer. Likewise, the same structure can be presented in many different ways depending on choices of layout, color, font, and images.

I mention aesthetics as an afterthought because that’s what they should be. The copy is mostly just a means to an end, but it’s easy to get caught up in phrasing or displaying things just a little differently, because those things are easy to adjust. As I mentioned earlier, you’re not likely to get a huge return on investment from a font change.

The challenge - and opportunity - is to think about messages as more than just word choice, fonts, and images: what are you communicating about?

Which value propositions resonate most with your customers?
Which incentives most reliably elicit action?
Which products and features are the best faces for your business?
When it is best to reach out to customers.
What are your customer’s channel preferences?
Which actions can a message incentivise, and which do you need a different method to motivate?

Communication experiments offer the most value when you make communication with your customers less as a thing in and of itself, and more of a window into your business. Sure, vary where you put your value proposition and how you phrase your call to action, and try couching those variations in different tones and visual designs. But the important point is that the results from each experiment will not only tell you something about a particular message, but also yield information about each customer’s propensity to act when presented with each particular aspect of what you offer. That’s the value offered by Aampe’s connected experimentation capabilities, and you get the more of the value the more you focus on experiments involving delivery, action, and the non-aesthetic aspects of content.

Modular experiment are inherently continuous

Aampe’s hypothesis library lets you constantly generate new data about our business and your customers. (Take a look at our post on connected experimentation to understand what that data looks like). Instead of trying to run a single experiment that tests out all of the different days of the week and times of day you could contact a customer, you can embed different day/time options in multiple messages over time. Because Aampe can automatically construct those messages from your library, it doesn’t put any extra work on you to run a whole string of experiments that build out scheduling and channel preferences for each of your customers.

Aampe lets you encapsulate all the important parts of your business into building blocks. When you send a message, you combine those blocks in a particular way, but when the experiment is over and done with, Aampe separates the message back out into each block and keeps a score for each block over time. You can try the same value proposition in different voices and visual designs in different parts of your message, but Aampe can keep track of your hypothesis that, say, a particular value proposition resonates with your customers. You don’t need to recognize which messages have that value proposition. In fact, you don’t even have to make the decision about which messages to include it in. As long as it’s in your hypothesis library, Aampe will dynamically include it in many different experiments to build up a memory of how that building block performs.

If all of this sounds difficult: it is. That’s why Aampe does it for you. It requires a change, not so much in how you write messages, but in when you write them. Normally, we think about what we want to accomplish with a message, and then we think about what we should say, and how we should say it, and how we should present the whole thing in order to accomplish our purpose. In other words, we think through lots of different content and structure and aesthetics in our head. A typical message is the result of our human brains implicitly doing what Aampe makes you do explicitly. It takes a little adjustment to get used to putting down your ideas before they are “finished”.

And that - putting out half-finished ideas - is the essence of experimentation.

When we try to write a complete message, we make a lot of assumptions and guesses and hopes about how the recipient will respond. The biggest assumption we make is that when we finally hit send, we’ve hit on a winning combination of building blocks. Aampe makes it so you don’t need to make that assumption. Instead of thinking through different options and selecting just the one or two you think will work best, select them all - let experiments find the real winners.

All of this allows Aampe to run truly continuous, massively-parallel experiments. You fill your library. We’ll keep running experiments to find out what wins, and over time we’ll run most often those variants that win the most. Add in new hypotheses to your library whenever you want: we’ll pick them up and start including them in the mix.

Aampe’s continuous experimentation stack lets you vary the extent to which you expose critical aspects of the business - products, features, value propositions, timing, channels - to experimental conditions. That allows you to measure the value of each of those parts of your business, rather than just looking at performance reports for individual messages. It lets you focus on what you do best: running your business and thinking about your customers.

Why you need continuous experiments

Experimentation . Sep 28, 2022 . 8 MIN

Schaun Wheeler

Don't assume that you've hit on a winning combination.

Experiment with the things that matter most

Create a hypothesis library to organize your messages

The building blocks of a message

Modular experiment are inherently continuous

Similar Articles

When is the best time to send push notifications for an eCommerce app? (For real, though.)

Reinforcement learning is about to eat A/B testing for lunch

The case for sending messages at the "Wrong" time.

Why you need conditioned experiments

Product

Company

Hey there! 👋

Thanks for reaching out. We'll answer your message as soon as possible.

Something went wrong! Please reload this page and submit your message again.

If this problem persists, reach out to us directly at get@aampe.com