VIGILANTEWEB

B2B Conversion

ICE Scoring for Website Experiments: A Practical Guide

You have a backlog of test ideas. Every team has one.

Marketing wants to test the hero headline. Sales wants the contact form redesigned. The CEO saw a competitor's site and wants to copy their layout. Product just shipped a new feature and wants the homepage to reflect it. A designer read a blog post about button colors.

You have time and capacity to run two tests this month. How do you decide which two?

Without a prioritization framework, you default to whoever argues loudest or has the most organizational power. That’s not a testing strategy. It’s politics.

ICE scoring is a simple, fast framework for prioritizing test ideas based on three factors: Impact, Confidence, and Ease. Each idea gets a score from 1 to 10 on each factor. The scores are averaged. You run the tests with the highest ICE scores first.

It’s not perfect. No prioritization framework is. But it makes the decision visible, defensible, and based on something other than who made the strongest argument in the last meeting.

The three dimensions of ICE

Impact: How much will this move the needle if it works?

Impact is an estimate of the outcome if the test succeeds. For website experiments, that usually means conversion rate, revenue, or pipeline. A test that could improve contact form conversion by 30% scores higher than one that might improve it by 3%.

When scoring impact, think about:

Be honest about the size of the potential effect. Most tests produce modest gains. Wildly optimistic impact scores lead to disappointment when results come in.

Confidence: How sure are you that this will work?

Confidence reflects how much evidence supports the hypothesis. A test backed by session recordings showing visitors repeatedly clicking a broken link has high confidence. A test based on someone’s intuition about button colors has low confidence.

When scoring confidence, think about:

Low confidence doesn’t mean you shouldn’t run the test. It means you should run higher-confidence tests first and return to speculative ideas once you’ve captured the higher-probability wins.

Ease: How hard is this to implement?

Ease measures the effort required to get the test live. A copy change that a marketer can implement in an hour scores 10. A full page redesign that requires developer time, design work, and QA scores 2.

When scoring ease, think about:

High ease doesn’t make a low-impact test worth running. But when impact and confidence are equal between two ideas, ease is often the right tiebreaker.

How to score and rank ideas

The ICE score for each idea is the average of its three component scores:

ICE Score = (Impact + Confidence + Ease) / 3

A test with Impact 8, Confidence 7, and Ease 9 has an ICE score of 8.0. A test with Impact 9, Confidence 4, and Ease 3 has an ICE score of 5.3.

Run the 8.0 first.

The math is simple. The value is in the discipline of making each factor explicit. You can’t argue for a test by pointing to impact alone when confidence and ease are both low. You have to defend the full picture.

Applying ICE scoring to website experiments

Website experiments have some specific characteristics that affect how you apply ICE scoring.

Traffic is the constraint, not ideas. Most B2B websites have limited traffic to the pages that matter for conversion. A contact page with 500 visitors per month can run roughly one test at a time, and each test needs several weeks to reach statistical significance. That makes the prioritization decision unusually important. A wrong pick wastes a month.

Pages matter as much as elements. When you’re building your backlog, identify which page each test applies to. The contact page, the demo page, and the homepage each have different traffic levels, different baseline conversion rates, and different test capacities. A test that would be high-impact on the contact page might be lower-impact on a lower-priority page.

Behavioral data dramatically raises confidence scores. For website experiments, the single best way to increase confidence is to start with data rather than opinions. If heatmaps show visitors clicking a link instead of filling out the form, a test that removes that link has high confidence. If someone just thinks the page could be better, the confidence score should be 3 or 4, not 8.

Ease scores should reflect your actual team and tools. If you have a developer who can implement changes in a few hours, your ease scores across the board are higher than if every implementation requires a ticket, a sprint, and a two-week wait. Score ease based on your real constraints, not an imagined frictionless process.

Building and maintaining a test backlog

ICE scoring works best when you’re choosing between real, documented test ideas rather than vague ambitions. Before you can prioritize, you need a backlog.

A test idea should be documented with:

A backlog maintained this way makes prioritization meetings take ten minutes instead of an hour. Everything is visible. The scores do most of the talking.

A practical scoring example

Here’s how ICE scoring might look for a set of five test ideas on a B2B contact page.

Idea 1: Remove two form fields (company size and annual revenue)

Idea 2: Rewrite the copy above the form

Idea 3: Remove navigation links from the contact page

Idea 4: Redesign the full page layout

Idea 5: Change the submit button color

Priority order: Idea 1 (8.3), Idea 3 (7.3), Idea 2 (7.0), Idea 4 and 5 (tied at 5.0, defer).

You run Idea 1 first. While it’s running, you prep Idea 3. You don’t touch the full page redesign until you’ve captured the smaller wins, and you skip the button color entirely.

Common mistakes with ICE scoring

Inflating impact scores for pet projects. Every team has a test idea someone is attached to. ICE scoring only works if impact is assessed honestly. If a test is actually likely to produce a 3% improvement, score it 4, not 8, regardless of how excited someone is about it.

Treating ICE scores as permanent. New data changes the scores. If a heatmap comes back and shows behavior you didn’t expect, update the confidence score. If implementation turns out to be harder than you thought, update ease. The backlog should reflect current knowledge.

Ignoring ease entirely. A test with high impact and high confidence that takes three months to implement isn’t more valuable than a slightly lower-impact test you can run next week. The opportunity cost of delaying other tests is real.

Over-engineering the framework. ICE scoring is useful because it’s simple. Adding sub-scores, weighting factors, and complex aggregation logic defeats the purpose. Keep it to three scores and one average.

Takeaway

ICE scoring doesn’t make prioritization decisions for you. It makes those decisions visible, consistent, and based on evidence rather than organizational dynamics.

For website experimentation specifically, the discipline of scoring Impact, Confidence, and Ease before every test slows down the “we should just try it” impulse and forces the question: what do we actually know, and what are we guessing?

The teams that compound testing gains over time are the ones that run the right tests in the right order. ICE scoring is how you get the order right.

If you want a structured way to identify which experiments on your B2B website are most likely to move pipeline, the Web Experience Audit includes a prioritized experiment backlog as part of every engagement.

FAQ

Common questions

How is ICE scoring different from PIE scoring?

PIE scoring (Potential, Importance, Ease) is a similar framework developed by Chris Goward at WiderFunnel. The concepts are nearly identical: both try to balance expected impact, evidence quality, and implementation effort. ICE uses “Confidence” where PIE uses “Importance.” In practice, the differences are minor and either framework works. Pick one and use it consistently.

Who should assign ICE scores?

The scoring works best as a collaborative exercise with representatives from the teams involved: the person who will implement the test, someone who understands the data, and whoever owns the conversion metric. Doing it solo is fine if you’re a team of one, but be aware of your own biases. Getting a second opinion on impact and confidence scores reduces wishful thinking.

Should I use ICE scoring for every test idea, even simple ones?

Yes, even for quick tests. The discipline of asking “how confident am I in this hypothesis and why?” is valuable regardless of how fast the implementation is. It also builds a documented record of your reasoning, which is useful when you’re reviewing results and trying to understand what you’ve learned.

What's a good ICE score threshold for running a test?

There’s no universal answer. It depends on how many ideas are in your backlog and how much testing capacity you have. If you have ten ideas and capacity to run two tests per month, you run the top two ICE scores each month. If all your ideas are below 5.0, that’s a signal to generate better hypotheses before you start running tests.

Can ICE scoring be used for things other than A/B tests?

Yes. ICE scoring is used across product management, marketing strategy, and growth teams for any prioritization decision. For website work specifically, it applies just as well to SEO projects, content decisions, and UX improvements as it does to A/B tests.

Related Case Study

Enterprise analytics software company

A cluttered contact page was hiding $6.9M in pipeline

The contact page at a major enterprise analytics company had become a dumping ground. Three months of iterative A/B testing, focused on removing distractions, lifted form conversions 61% and added $6.9M in annualized pipeline.

Read the case study →

Who is this guy?

27 years on the web. Numbers to show for it.

I led web strategy and conversion optimization for an enterprise software company. I worked across engineering, marketing, and product to ship changes that moved the business. Here's what that looked like.

61%
Contact conversion lift
$6.9M
incremental pipeline