
If you’ve ever shipped “better images” and struggled to prove the business impact, this playbook is for you. In 2025, images still drive a disproportionate share of perceived quality, engagement, and speed—and they often dominate Largest Contentful Paint (LCP). The good news: with the right event schema and analysis workflow, you can quantify conversion lift from image quality and format changes with confidence.
What follows is a practitioner’s guide I’ve used across eCommerce and SaaS funnels—grounded in Core Web Vitals, privacy-aware measurement, and warehouse-grade analysis.
The business case, succinctly
- LCP and responsiveness correlate with conversions. Google’s Core Web Vitals define “good” LCP (≤ 2.5 s) and INP (≤ 200 ms). Many pages have an image as LCP; moving these metrics can move revenue, as illustrated by web.dev’s QuintoAndar case, where an ~80% INP reduction yielded a reported +36% conversions in 2023–2024 according to the web.dev QuintoAndar INP case study and INP guidance in the web.dev INP launch announcement (2024).
- Next‑gen formats reduce bytes at comparable quality. WebP typically shrinks files by roughly 25–35% vs JPEG, while AVIF often beats WebP by ~30% and JPEG by up to ~60%, based on technical reviews from Cloudinary’s WebP format guide and web.dev’s AVIF learning module (2023). Smaller images generally improve LCP and bandwidth, especially on mobile.
- Public, image‑only conversion studies are scarce. Treat third‑party claims carefully and run in‑house tests. The methodology below makes those tests trustworthy.
What to measure: exposure → engagement → conversion
You can’t attribute lift from an image change unless you know who actually saw which image, on which device, and under what performance conditions.
Track three layers:
- Exposure: The image variant rendered or entered the viewport.
- Engagement: Interactions like zoom, carousel next, or click‑through.
- Conversion: Add to cart, signup, purchase—joined back to the exposure.
In 2025, design your pipeline to be privacy‑resilient and durable: rely on first‑party events, server‑side tagging, and warehouse joins.
Event schema that works (GA4 + server‑side)
GA4 doesn’t ship an “image test” schema; define custom events with parameters that matter for attribution and performance analysis. Align with GA4 limits and naming rules per the Google Analytics 4 custom events documentation and the GA4 event reference.
Recommended events
- image_variant_view (exposure)
- image_variant_engagement (optional)
- image_variant_conversion (optional; usually link conversions in BI to avoid double counting)
Recommended parameters (configure as custom dimensions/metrics)
- image_id: stable asset ID (string)
- experiment_id: link to your A/B tool/test
- variant: control, A, B, etc.
- format: jpeg, webp, avif
- bytes: file size
- width, height: rendered dimensions
- dpr: device pixel ratio
- lcp_ms: page’s LCP in ms (if available)
- device_category: desktop/mobile/tablet
- country/region: for normalization
- session_id or ga_session_id: join key
Client‑side vs server‑side
- Client: fire events when the image is visible (IntersectionObserver). Debounce to avoid duplicates.
- Server: forward via Measurement Protocol through server‑side GTM (sGTM) to improve resilience against blockers, and enrich with geo/device where compliant. Simo Ahava’s patterns for sGTM and server‑side collection remain solid in 2024–2025; see the Simo Ahava guidance on server‑side GTM.
Guardrails
- Log the variant actually delivered (from CDN headers or URL), not just assigned. CDNs sometimes rewrite formats (e.g., auto‑WebP) and can break attribution if you don’t record what shipped.
- Capture consent state to respect privacy and interpret sample sizes correctly.
Image formats, quality, and performance context
- Core Web Vitals thresholds and definitions are documented in the Google Developers Core Web Vitals overview and PageSpeed Insights metric details. As of March 12, 2024, INP replaced FID with “good” ≤ 200 ms per the web.dev INP update (2024).
- AVIF and WebP support is now broad across modern browsers. For pros/cons and quality‑to‑bytes trade‑offs, reference web.dev’s AVIF guide and Cloudinary’s WebP guide.
Practical take: When testing, treat “format” (JPEG/WebP/AVIF), “compression level,” and “dimensions/crop” as separate factors. Don’t bundle them unless your sample size supports multivariate designs.
A minimal, durable data model (GA4 BigQuery)
Export GA4 to BigQuery and compute lift from first principles. See the GA4 BigQuery export schema and BigQuery basic queries guide for structure.
Example: conversion rate by image variant (by user exposure)
WITH exposures AS (
SELECT
user_pseudo_id,
(SELECT ep.value.string_value FROM UNNEST(event_params) ep WHERE ep.key = 'image_variant') AS image_variant
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'image_variant_view'
),
conversions AS (
SELECT DISTINCT user_pseudo_id
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
)
SELECT
e.image_variant,
COUNT(DISTINCT e.user_pseudo_id) AS exposed_users,
COUNT(DISTINCT IF(c.user_pseudo_id IS NOT NULL, e.user_pseudo_id, NULL)) AS converters,
SAFE_DIVIDE(COUNT(DISTINCT IF(c.user_pseudo_id IS NOT NULL, e.user_pseudo_id, NULL)), COUNT(DISTINCT e.user_pseudo_id)) AS conversion_rate
FROM exposures e
LEFT JOIN conversions c USING (user_pseudo_id)
GROUP BY 1
ORDER BY conversion_rate DESC;
Relative lift A vs B
WITH cr AS (
SELECT image_variant, conversion_rate FROM (/* use previous query */)
)
SELECT
a.conversion_rate AS cr_a,
b.conversion_rate AS cr_b,
SAFE_DIVIDE(a.conversion_rate - b.conversion_rate, b.conversion_rate) AS relative_lift
FROM cr a
JOIN cr b ON a.image_variant = 'A' AND b.image_variant = 'B';
Segment by device
WITH exposures AS (
SELECT
user_pseudo_id,
(SELECT ep.value.string_value FROM UNNEST(event_params) ep WHERE ep.key = 'image_variant') AS image_variant,
(SELECT ep.value.string_value FROM UNNEST(event_params) ep WHERE ep.key = 'device_category') AS device
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'image_variant_view'
)
-- Same join pattern; GROUP BY image_variant, device
Add session‑based attribution and a 7‑day window by joining on ga_session_id and filtering event_timestamp between exposure and conversion. For deeper patterns, see the GA4 user and content analysis examples.
Experiment design that avoids the usual traps
Determine power and duration before launch. Use industry‑standard guidance from the Optimizely sample size method and the CXL A/B testing guide:
- Minimum detectable effect (MDE): Set by business value (e.g., 3% relative lift) and historical variance.
- Sample size: Calculate per device segment if traffic is skewed (mobile vs desktop). Under‑powered tests are the #1 reason for inconclusive results.
- Duration: Run at least two full business cycles; use a duration calculator (e.g., VWO duration calculator). Avoid peeking before the pre‑committed stop.
- Randomization integrity: Ensure exposure assignment is truly random and sticky within a session/user.
- Guardrails: Track LCP, INP, and error rates by variant; a “win” that worsens INP may backfire post‑launch. See web.dev’s business framing for CWV for why guardrails matter.
Frequentist vs Bayesian
- Frequentist: Report p‑values and 95% confidence intervals; simple and widely understood. VWO’s resources include a clear significance spreadsheet.
- Bayesian: Report probability‑to‑beat‑control and posterior intervals; helpful when stakeholders want intuitive probabilities. VWO provides a primer on Bayesian posteriors.
Recommendation: Use both. Share a frequentist read for auditability and a Bayesian probability for decision clarity.
Executing image format and quality tests
Start simple, then go deeper.
- Single‑factor A/B (format only)
- Hypothesis: AVIF reduces bytes by ~30% vs WebP at equal visual quality; mobile conversion improves due to faster LCP.
- Treatment: Serve AVIF to 50% of eligible traffic with a strict accept‑header check and a control of WebP or JPEG. Ensure variant logging captures the delivered format.
- Metrics: Conversion rate by device, LCP at p75, bytes per image, error rate (fallback frequency).
- Compression level A/B (quality factor)
- Hold format constant (e.g., WebP). Test “quality 80” vs “quality 60” with perceptual metrics like SSIM/Butteraugli to validate equivalence.
- Monitor bounce, zoom engagement, and returns/complaints (if images affect expectations).
- Multivariate (format × crop or format × dimension)
- Only if you have traffic. Otherwise, sequence tests.
- Carousels and galleries
- Fire exposure per frame; attribute conversion to the most recent or first exposure per your rule. Predefine this to avoid p‑hacking.
- PDP vs PLP
- Test PDP hero images separately from PLP thumbnails. Thumbnails often dominate total image requests; their optimization can yield outsized speed gains.
Turning event data into trustworthy attribution
- Cohort joins: Build exposure cohorts at the user or session level, then compute conversion within a defined window. This avoids over‑counting multiple exposures.
- Dedup rules: One exposure per image_id per session is usually enough. If multiple images exist on a page, define a hierarchy (hero > gallery > thumbnails).
- Attribution window: 24h for low‑consideration items; up to 7–14 days for high‑consideration. Keep it consistent across tests unless there’s a business reason to change it.
- Device and geo: Always segment. Image wins often concentrate on mobile and on slower networks.
Reporting lift the way stakeholders buy into
- Show both absolute and relative lift. Example: “Mobile conversion: 2.95% → 3.12% (+0.17 pp, +5.8% relative).”
- Confidence and power: Add your CI and whether you hit pre‑planned sample size.
- Guardrails: “p75 LCP improved 280 ms; INP unchanged; image errors −0.2 pp.”
- Visuals: A simple bar with confidence whiskers per variant, and a cohort breakdown (device × channel) does more than a dashboard of 30 charts.
Privacy‑first measurement in 2025
Chrome’s third‑party cookie phase‑out continues into 2025, reducing cross‑site tracking fidelity. Plan for durable, first‑party measurement.
- First‑party events and IDs: Collect exposures and conversions on your domains; join via login or consented hashed email where applicable.
- Server‑side collection: Use sGTM and Measurement Protocol to stabilize datasets; see Simo Ahava’s sGTM walkthroughs.
- Privacy Sandbox: For paid media attribution and experimentation context, track ongoing changes. Google documents plans and testing guidance in the Privacy Sandbox phase‑out update (2024–2025) and Chrome testing docs. Some case studies show the Attribution Reporting API capturing a substantial share of conversions compared to cookies; see examples like MiQ’s ARA case study (2024).
Implication: Your image lift tests should rely primarily on first‑party exposure→conversion joins. Use platform‑modeled conversions as a secondary read, not the source of truth for image variant attribution.
Common pitfalls and how to avoid them
- CDN auto‑optimization masks variants: If your CDN auto‑converts JPEG→WebP, your “format test” may be unknowingly uniform. Log the served Content‑Type or file extension.
- Visibility vs request: Requesting an image isn’t the same as exposure. Use viewport visibility; lazy‑loaded assets shouldn’t count unless shown.
- Peeking and stopping early: Decide sample size and duration up front. Resist the urge to stop at p=0.049.
- Low power on desktop: If your traffic skews mobile, you may never reach significance on desktop. Either pool traffic or run a separate desktop‑only test later.
- Novelty and selection bias: Carousel interactions can bias who sees which image. Randomize the starting frame or attribute to first exposure consistently.
- Data loss from blockers: Shore up with server‑side tagging and compare client vs server counts weekly.
- Misattributing downstream effects: Better images can change user mix (e.g., more mobile search traffic due to better CWV). Control for channel when analyzing lift.
A quick failure story: We once shipped AVIF thumbnails site‑wide and saw no lift—until we realized the CDN was serving WebP to a big slice of “AVIF” traffic due to an accept‑header mismatch. After logging the actual served format and fixing headers, AVIF showed a +4–5% relative lift on mobile PDP conversion in our follow‑up test. The lesson: measure what actually shipped.
Putting it all together: a two‑week runbook
Week 1
- Finalize hypothesis, MDE, and guardrails; pre‑register sample size and duration.
- Implement image_variant_view with parameters: image_id, variant, format, bytes, device, lcp_ms, experiment_id, session_id.
- Validate: confirm served Content‑Type and accept‑header behavior; ensure IntersectionObserver triggers.
- Stand up sGTM forwarder; verify GA4 → BigQuery export is live.
Week 2
- Launch to 50/50 traffic; sanity‑check exposure balance by device/geo after 24 hours.
- Monitor guardrails daily (LCP, INP, image error rate); don’t peek at conversions for stopping decisions.
- At end of duration, run the BigQuery lift queries; segment by device and channel.
- Report absolute/relative lift with CIs, guardrails, and recommended rollout plan.
Reference list for further implementation
- Core Web Vitals thresholds and definitions: Google Developers Core Web Vitals (2024–2025)
- INP replacing FID and business impact context: web.dev INP launch and guidance (2024); INP March 12 update (2024)
- Case study linking responsiveness and conversions: web.dev QuintoAndar INP case study (2023–2024)
- Image format guidance: web.dev AVIF learning path (2023); Cloudinary WebP format guide (ongoing)
- GA4 custom events and limits: GA4 custom events documentation (2024)
- GA4 BigQuery export schema and examples: GA4 BigQuery export (2024); BigQuery basic queries (2024); Content analysis patterns
- Server‑side tagging patterns: Simo Ahava on sGTM for GA4/Measurement Protocol
- Experiment design fundamentals: Optimizely on sample size (evergreen); CXL A/B testing guide (updated); VWO duration calculator and significance spreadsheet (2024); VWO Bayesian posterior overview
- Privacy‑aware measurement: Privacy Sandbox phase‑out update (2024–2025); Chrome testing guidance (2025); MiQ ARA case study (2024)
Final takeaway
Image quality work pays off when you measure what shipped, who saw it, and how it affected Core Web Vitals and conversions. Build a first‑party, event‑driven pipeline, analyze lift in your warehouse, and segment by device and channel. Do that, and you’ll stop arguing about opinions and start shipping image improvements that measurably move the business.