DIGITAL ANALYTICS MINIDEGREE / CXL — BLOG 5
In the last blog post, I talked about the some advanced concepts and importance of the data cleaning on Google Analytics Universal. In this blog post, we will explore A/B Testing and it’s capabilties in detail with the fifth part of CXL Digital Analytics Minidegree.
You can go to the CXL website from here.
What is A/B Testing?
A/B testing (also known as bucket testing or split-run testing) is a user experience research methodology. A/B tests consist of a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing or “two-sample hypothesis testing” as used in the field of statistics.
A/B testing is a way to compare two versions of a single variable, typically by testing a subject’s response to variant A against variant B, and determining which of the two variants is more effective.
I will share a few sentences from the course below.
- Testing is for validation and learning. It’s to validate business impact through measurement.
- Testing is a measurement methodology. It is not a prescriptive in what you can or can not test.
- The opposite of testing is “intuition”. The idea that someone magically knows what world and what doesn’t is stone age product development
- If you don’t want dark patterns, then stop shipping dark patterns. A/B testing just measures what has been built. You might have a winning test, but you still don’t know why.
What to test?
You can think of testing as a way to solve problems. We need to understand the problem first.
First question should be “Where is the problem?” and after that second question should be “What is the root cause?”. A/B testing is a long process but the ideal time for testing is 4 weeks. When you look for an A/B testing it involves 80% search and 20% experimentation process generally.
After finding the problem if you want to make a new A/B testing, you should ask some questions yourself.
- What is a hypothesis to solve it?
- What’s the impact of this new feature?
- What should we test first?
Test Prioritization
Prioritization is a critical skill to master when building out a testing program. It’s about making smart choices and applying discipline to the decision-making process.
- “Even if you know what the problem is, you don’t know what the solution is.”
There are many methods for prioritizing tests and I will briefly discuss them below.
1. PIE Framework
The PIE Framework is made up of the three criteria you should consider to prioritize which pages to test and in which order: Potential, Importance, and Ease.
Potential: How much improvement can be made on this page? You should prioritize your worst performers. This should take into account your web analytics data, customer data, and expert heuristic analysis of user scenarios.
Importance: How valuable is the traffic to this page(s)? Your most important pages are those with the highest volume and the costliest traffic. You may have identified pages that perform terribly, but if they don’t have significant volume of costly traffic, they aren’t testing priorities.
Ease: How difficult will it be to implement a test on this page or template? The final consideration is the degree of difficulty of actually running a test on this page, which includes technical implementation, and organizational or political barriers.
If we’d know in advance how much potential an idea has, we wouldn’t need prioritization models. In addition, it’s hard to objectively place the importance of Ease, as well as Importance.
2. ICE Framework
ICE scores were intended to prioritize growth experiments. But now the ICE model is used for features prioritization as well.
Impact: Demonstrates how much your idea will positively affect the key metric you’re trying to improve.
Confidence: Shows how sure you are about Impact. It is also about ease of implementation in some way.
Ease: It is about the easiness of implementation. It is an estimation of how much effort and resources are required to implement this idea.
It’s got a similar problem to the PIE framework in that aspect, but in addition it’s also got the problem of “how confident am I in this idea?” Again, how could we know this in advance?
3. Five Star Method
As the name suggests, by giving stars according to different categories and tests, the one with the highest score is given priority.
4. PXL Framework
This framework brings these 3 benefits:
- It makes any “potential” or “impact” rating more objective
- It helps to foster a data-informed culture
- It makes “ease of implementation” rating more objective
If you have lots of test ideas, you need a way to prioritize them. How you prioritize them is important, both for the quality of your tests and optimization as well as the organizational efficiency.
A/B Testing Statistics
If you’re conducting A/B tests, you need to understand some basics about statistics to validate your tests and their results. Nobody wants to spend time, money and effort on something that will turn out useless at the end. To use A/B testing efficiently and effectively, you must understand what it is and all the statistics that surround it.
The Holy Trinity of an A/B Test
- P-value of 0.05 (or less)
- Statistical power %80
- Enough sample size
What does statistical significance (p-values) show?
P-value is just the probability of seeing a result or more extreme given that the null hypothesis is true.
There is two types of error:
A false positive (type1) error, or false positive, is a result that indicates a given condition exists when it does not. For example, a pregnancy test which indicates a woman is pregnant when she is not, or the conviction of an innocent person.
A false negative (type2) error, is a test result which wrongly indicates that a condition does not hold. For example, when a pregnancy test indicates a woman is not pregnant, but she is, or when a person guilty of a crime is acquitted, these are false negatives.
What does statistical power show?
Power is the probability of not making a type2 error. It reduces false negative results.
Testing Strategies
There are lots of things that can affect the test outcome. History effect, instrumentation effect quality and selection effect are some of them.
See you next week with a new blog post containing details from the CXL digital analytics mini degree program.
Thanks,
Mert Kolay