2025
Chain-of-Thought prompting is a technique for getting better results from large language models by asking them to show their reasoning step by step. This post explains how we use it in our Diversity Tokenism research.
Read more →
2025
Why LLMs ace easy sums, fail at 4-digit multiplication, and what that means for finance, audit, and tax teams using GenAI.
Read more →
2025
43 carefully designed tests show GPT variants drifting in accuracy over time. Teams need continuous evaluation, not blind trust in a model label.
Read more →
2025
Adding timestamps for audit trails drops accuracy by 10%. The compliance mechanism undermines the output being audited.
Read more →