Tiered theory and experiment screening pipeline as first test case for automatic reasoning calibration

Summary
Report/paper on tiered theory and experiment screening pipeline as first test case for automatic reasoning calibration.