Image quality and conversion: quantifying lift with event data

3 September 2025 by
Image quality and conversion: quantifying lift with event data
WarpDriven
Dashboard
Image Source: statics.mylandingpages.co

If you’ve ever shipped “better images” and struggled to prove the business impact, this playbook is for you. In 2025, images still drive a disproportionate share of perceived quality, engagement, and speed—and they often dominate Largest Contentful Paint (LCP). The good news: with the right event schema and analysis workflow, you can quantify conversion lift from image quality and format changes with confidence.

What follows is a practitioner’s guide I’ve used across eCommerce and SaaS funnels—grounded in Core Web Vitals, privacy-aware measurement, and warehouse-grade analysis.

The business case, succinctly

  • LCP and responsiveness correlate with conversions. Google’s Core Web Vitals define “good” LCP (≤ 2.5 s) and INP (≤ 200 ms). Many pages have an image as LCP; moving these metrics can move revenue, as illustrated by web.dev’s QuintoAndar case, where an ~80% INP reduction yielded a reported +36% conversions in 2023–2024 according to the web.dev QuintoAndar INP case study and INP guidance in the web.dev INP launch announcement (2024).
  • Next‑gen formats reduce bytes at comparable quality. WebP typically shrinks files by roughly 25–35% vs JPEG, while AVIF often beats WebP by ~30% and JPEG by up to ~60%, based on technical reviews from Cloudinary’s WebP format guide and web.dev’s AVIF learning module (2023). Smaller images generally improve LCP and bandwidth, especially on mobile.
  • Public, image‑only conversion studies are scarce. Treat third‑party claims carefully and run in‑house tests. The methodology below makes those tests trustworthy.

What to measure: exposure → engagement → conversion

You can’t attribute lift from an image change unless you know who actually saw which image, on which device, and under what performance conditions.

Track three layers:

  • Exposure: The image variant rendered or entered the viewport.
  • Engagement: Interactions like zoom, carousel next, or click‑through.
  • Conversion: Add to cart, signup, purchase—joined back to the exposure.

In 2025, design your pipeline to be privacy‑resilient and durable: rely on first‑party events, server‑side tagging, and warehouse joins.

Event schema that works (GA4 + server‑side)

GA4 doesn’t ship an “image test” schema; define custom events with parameters that matter for attribution and performance analysis. Align with GA4 limits and naming rules per the Google Analytics 4 custom events documentation and the GA4 event reference.

Recommended events

  • image_variant_view (exposure)
  • image_variant_engagement (optional)
  • image_variant_conversion (optional; usually link conversions in BI to avoid double counting)

Recommended parameters (configure as custom dimensions/metrics)

  • image_id: stable asset ID (string)
  • experiment_id: link to your A/B tool/test
  • variant: control, A, B, etc.
  • format: jpeg, webp, avif
  • bytes: file size
  • width, height: rendered dimensions
  • dpr: device pixel ratio
  • lcp_ms: page’s LCP in ms (if available)
  • device_category: desktop/mobile/tablet
  • country/region: for normalization
  • session_id or ga_session_id: join key

Client‑side vs server‑side

  • Client: fire events when the image is visible (IntersectionObserver). Debounce to avoid duplicates.
  • Server: forward via Measurement Protocol through server‑side GTM (sGTM) to improve resilience against blockers, and enrich with geo/device where compliant. Simo Ahava’s patterns for sGTM and server‑side collection remain solid in 2024–2025; see the Simo Ahava guidance on server‑side GTM.

Guardrails

  • Log the variant actually delivered (from CDN headers or URL), not just assigned. CDNs sometimes rewrite formats (e.g., auto‑WebP) and can break attribution if you don’t record what shipped.
  • Capture consent state to respect privacy and interpret sample sizes correctly.

Image formats, quality, and performance context

Practical take: When testing, treat “format” (JPEG/WebP/AVIF), “compression level,” and “dimensions/crop” as separate factors. Don’t bundle them unless your sample size supports multivariate designs.

A minimal, durable data model (GA4 BigQuery)

Export GA4 to BigQuery and compute lift from first principles. See the GA4 BigQuery export schema and BigQuery basic queries guide for structure.

Example: conversion rate by image variant (by user exposure)

WITH exposures AS (
  SELECT
    user_pseudo_id,
    (SELECT ep.value.string_value FROM UNNEST(event_params) ep WHERE ep.key = 'image_variant') AS image_variant
  FROM `project.dataset.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
    AND event_name = 'image_variant_view'
),
conversions AS (
  SELECT DISTINCT user_pseudo_id
  FROM `project.dataset.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
    AND event_name = 'purchase'
)
SELECT
  e.image_variant,
  COUNT(DISTINCT e.user_pseudo_id) AS exposed_users,
  COUNT(DISTINCT IF(c.user_pseudo_id IS NOT NULL, e.user_pseudo_id, NULL)) AS converters,
  SAFE_DIVIDE(COUNT(DISTINCT IF(c.user_pseudo_id IS NOT NULL, e.user_pseudo_id, NULL)), COUNT(DISTINCT e.user_pseudo_id)) AS conversion_rate
FROM exposures e
LEFT JOIN conversions c USING (user_pseudo_id)
GROUP BY 1
ORDER BY conversion_rate DESC;

Relative lift A vs B

WITH cr AS (
  SELECT image_variant, conversion_rate FROM (/* use previous query */)
)
SELECT
  a.conversion_rate AS cr_a,
  b.conversion_rate AS cr_b,
  SAFE_DIVIDE(a.conversion_rate - b.conversion_rate, b.conversion_rate) AS relative_lift
FROM cr a
JOIN cr b ON a.image_variant = 'A' AND b.image_variant = 'B';

Segment by device

WITH exposures AS (
  SELECT
    user_pseudo_id,
    (SELECT ep.value.string_value FROM UNNEST(event_params) ep WHERE ep.key = 'image_variant') AS image_variant,
    (SELECT ep.value.string_value FROM UNNEST(event_params) ep WHERE ep.key = 'device_category') AS device
  FROM `project.dataset.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
    AND event_name = 'image_variant_view'
)
-- Same join pattern; GROUP BY image_variant, device

Add session‑based attribution and a 7‑day window by joining on ga_session_id and filtering event_timestamp between exposure and conversion. For deeper patterns, see the GA4 user and content analysis examples.

Experiment design that avoids the usual traps

Determine power and duration before launch. Use industry‑standard guidance from the Optimizely sample size method and the CXL A/B testing guide:

  • Minimum detectable effect (MDE): Set by business value (e.g., 3% relative lift) and historical variance.
  • Sample size: Calculate per device segment if traffic is skewed (mobile vs desktop). Under‑powered tests are the #1 reason for inconclusive results.
  • Duration: Run at least two full business cycles; use a duration calculator (e.g., VWO duration calculator). Avoid peeking before the pre‑committed stop.
  • Randomization integrity: Ensure exposure assignment is truly random and sticky within a session/user.
  • Guardrails: Track LCP, INP, and error rates by variant; a “win” that worsens INP may backfire post‑launch. See web.dev’s business framing for CWV for why guardrails matter.

Frequentist vs Bayesian

  • Frequentist: Report p‑values and 95% confidence intervals; simple and widely understood. VWO’s resources include a clear significance spreadsheet.
  • Bayesian: Report probability‑to‑beat‑control and posterior intervals; helpful when stakeholders want intuitive probabilities. VWO provides a primer on Bayesian posteriors.

Recommendation: Use both. Share a frequentist read for auditability and a Bayesian probability for decision clarity.

Executing image format and quality tests

Start simple, then go deeper.

  1. Single‑factor A/B (format only)
  • Hypothesis: AVIF reduces bytes by ~30% vs WebP at equal visual quality; mobile conversion improves due to faster LCP.
  • Treatment: Serve AVIF to 50% of eligible traffic with a strict accept‑header check and a control of WebP or JPEG. Ensure variant logging captures the delivered format.
  • Metrics: Conversion rate by device, LCP at p75, bytes per image, error rate (fallback frequency).
  1. Compression level A/B (quality factor)
  • Hold format constant (e.g., WebP). Test “quality 80” vs “quality 60” with perceptual metrics like SSIM/Butteraugli to validate equivalence.
  • Monitor bounce, zoom engagement, and returns/complaints (if images affect expectations).
  1. Multivariate (format × crop or format × dimension)
  • Only if you have traffic. Otherwise, sequence tests.
  1. Carousels and galleries
  • Fire exposure per frame; attribute conversion to the most recent or first exposure per your rule. Predefine this to avoid p‑hacking.
  1. PDP vs PLP
  • Test PDP hero images separately from PLP thumbnails. Thumbnails often dominate total image requests; their optimization can yield outsized speed gains.

Turning event data into trustworthy attribution

  • Cohort joins: Build exposure cohorts at the user or session level, then compute conversion within a defined window. This avoids over‑counting multiple exposures.
  • Dedup rules: One exposure per image_id per session is usually enough. If multiple images exist on a page, define a hierarchy (hero > gallery > thumbnails).
  • Attribution window: 24h for low‑consideration items; up to 7–14 days for high‑consideration. Keep it consistent across tests unless there’s a business reason to change it.
  • Device and geo: Always segment. Image wins often concentrate on mobile and on slower networks.

Reporting lift the way stakeholders buy into

  • Show both absolute and relative lift. Example: “Mobile conversion: 2.95% → 3.12% (+0.17 pp, +5.8% relative).”
  • Confidence and power: Add your CI and whether you hit pre‑planned sample size.
  • Guardrails: “p75 LCP improved 280 ms; INP unchanged; image errors −0.2 pp.”
  • Visuals: A simple bar with confidence whiskers per variant, and a cohort breakdown (device × channel) does more than a dashboard of 30 charts.

Privacy‑first measurement in 2025

Chrome’s third‑party cookie phase‑out continues into 2025, reducing cross‑site tracking fidelity. Plan for durable, first‑party measurement.

  • First‑party events and IDs: Collect exposures and conversions on your domains; join via login or consented hashed email where applicable.
  • Server‑side collection: Use sGTM and Measurement Protocol to stabilize datasets; see Simo Ahava’s sGTM walkthroughs.
  • Privacy Sandbox: For paid media attribution and experimentation context, track ongoing changes. Google documents plans and testing guidance in the Privacy Sandbox phase‑out update (2024–2025) and Chrome testing docs. Some case studies show the Attribution Reporting API capturing a substantial share of conversions compared to cookies; see examples like MiQ’s ARA case study (2024).

Implication: Your image lift tests should rely primarily on first‑party exposure→conversion joins. Use platform‑modeled conversions as a secondary read, not the source of truth for image variant attribution.

Common pitfalls and how to avoid them

  • CDN auto‑optimization masks variants: If your CDN auto‑converts JPEG→WebP, your “format test” may be unknowingly uniform. Log the served Content‑Type or file extension.
  • Visibility vs request: Requesting an image isn’t the same as exposure. Use viewport visibility; lazy‑loaded assets shouldn’t count unless shown.
  • Peeking and stopping early: Decide sample size and duration up front. Resist the urge to stop at p=0.049.
  • Low power on desktop: If your traffic skews mobile, you may never reach significance on desktop. Either pool traffic or run a separate desktop‑only test later.
  • Novelty and selection bias: Carousel interactions can bias who sees which image. Randomize the starting frame or attribute to first exposure consistently.
  • Data loss from blockers: Shore up with server‑side tagging and compare client vs server counts weekly.
  • Misattributing downstream effects: Better images can change user mix (e.g., more mobile search traffic due to better CWV). Control for channel when analyzing lift.

A quick failure story: We once shipped AVIF thumbnails site‑wide and saw no lift—until we realized the CDN was serving WebP to a big slice of “AVIF” traffic due to an accept‑header mismatch. After logging the actual served format and fixing headers, AVIF showed a +4–5% relative lift on mobile PDP conversion in our follow‑up test. The lesson: measure what actually shipped.

Putting it all together: a two‑week runbook

Week 1

  • Finalize hypothesis, MDE, and guardrails; pre‑register sample size and duration.
  • Implement image_variant_view with parameters: image_id, variant, format, bytes, device, lcp_ms, experiment_id, session_id.
  • Validate: confirm served Content‑Type and accept‑header behavior; ensure IntersectionObserver triggers.
  • Stand up sGTM forwarder; verify GA4 → BigQuery export is live.

Week 2

  • Launch to 50/50 traffic; sanity‑check exposure balance by device/geo after 24 hours.
  • Monitor guardrails daily (LCP, INP, image error rate); don’t peek at conversions for stopping decisions.
  • At end of duration, run the BigQuery lift queries; segment by device and channel.
  • Report absolute/relative lift with CIs, guardrails, and recommended rollout plan.

Reference list for further implementation

Final takeaway

Image quality work pays off when you measure what shipped, who saw it, and how it affected Core Web Vitals and conversions. Build a first‑party, event‑driven pipeline, analyze lift in your warehouse, and segment by device and channel. Do that, and you’ll stop arguing about opinions and start shipping image improvements that measurably move the business.

Image quality and conversion: quantifying lift with event data
WarpDriven 3 September 2025
Share this post
Tags
Archive