You can measure and observe, but reproducing the same execution path (same load, timing, interactions) is usually the hard part.
So debugging becomes narrowing hypotheses rather than directly verifying what happened.
Feels like this is where most debugging workflows still fall short.