I built Rubric, an open source Sentry for AI. Looking for beta testers

This is interesting — feels like a lot of effort is going into improving visibility for AI systems (logs, traces, evaluations), but debugging still feels fundamentally different from traditional systems.

In many cases, even if you can see what happened, it’s hard to reproduce the exact execution — especially when prompts, external tools, or timing are involved.

So you end up analyzing behavior rather than actually stepping through the same failure.

Curious how you think about reproducibility vs observability for AI debugging?