Introducing Apollo: Benchmarking, Validation, and Calibrated Uncertainty · Posts on Ouro