Lighthouse versus real user metrics

Synthetic tests and real user metrics measure different things. Both matter. Treating them as interchangeable leads to bad decisions. Here is how to use each correctly, and why HBM Rocket reports both side by side instead of picking one.

What synthetic measures

Lighthouse, PageSpeed Insights lab data, WebPageTest, Pingdom Tools. They all do the same thing: open one fresh browser, with one network profile, with one CPU throttle, and run one page load. The output is a deterministic snapshot.

Reproducible: same site, same conditions, very similar score
Controllable: you decide the network, device, and viewport
Diagnostic: detailed breakdown of what cost what
Synthetic: not what your users experience

What real user (RUM) measures

Chrome User Experience Report (CrUX), web-vitals.js beacons sent from real visitors, and tools like SpeedCurve LUX or DataDog RUM. They aggregate millions of real page loads from real users on real devices.

Real: actual conditions your visitors face
Distributional: 50th, 75th, 95th percentile, not a single number
Lagging: 28 day rolling window for CrUX, several days for others
Coarse: less actionable, harder to attribute to a specific change

The classic mismatch

Your lab Lighthouse score is 92. CrUX shows 65 percent of users hit poor LCP. What gives?

Three usual culprits:

Geography. Your test runs from a US data center. Half your users are in Brazil on flaky 4G. Their experience is genuinely worse.
Device. Your test uses Lighthouse's mobile profile, which simulates a Moto G4. Many real users have older Androids with half the CPU power.
Cache state. The lab test always hits a warm cache or always hits a cold cache. Real users land on a mix. The 25 percent who hit cold cache experience markedly worse LCP.

How to use them together

Lab for diagnosis, RUM for reality

When the dashboard shows a regression, you want lab data. The Lighthouse JSON tells you exactly which audit regressed and which resource is responsible. RUM cannot tell you "this specific image is slow"; it tells you "users feel slower."

RUM for the goal, lab for the work

The metric that actually matters for SEO and user satisfaction is the field metric (CrUX 75th percentile). That is the number to optimize. But the way you optimize is by reading lab data, fixing what the lab flags, and watching the field metric move over the next 28 days.

What HBM Rocket reports

The site detail page shows two views, switchable with a toggle:

Lab. Latest Lighthouse score, mobile and desktop, with the trend chart over time. Updated nightly.
Field. 28 day CrUX rolling LCP, INP, CLS at 75th percentile. Updated every few days as the CrUX dataset refreshes.

We pull CrUX automatically when a Google API key is configured on the control plane. The agent itself can also send web-vitals beacons from your live visitors for sites with not enough CrUX traffic to register. Set GOOGLE_PSI_API_KEY in the control plane env to enable.

The reading order

When optimizing a site, we do this:

Look at field LCP, INP, CLS. Decide which is worst.
Open lab data for that metric. Find the specific audit dragging it down.
Fix the audit. Trigger a reoptimize.
Confirm lab moves the next morning.
Wait for field to follow within 14 to 28 days.
Repeat for the next worst metric.

What synthetic still cannot tell you

How many of your real users come from low bandwidth countries
What percentage of returning visitors hit your warm cache
The actual LCP element distribution across breakpoints (lab uses one breakpoint)
Whether interaction latency happens during scroll, click, or input

For these, RUM is the only source of truth. Bake it into your monitoring or you are flying blind on the metrics that determine SEO.