Casanova AI — Research Results

The Codex Changes How AI Meets People.
We Measured It.

Blinded, automated, research-grade testing across 7 AI models. Two independent evaluators. 1,050 conversations. 13.5 million tokens.

Blinded Evaluation Cohen's d Effect Size 95% Confidence Intervals Inter-Rater Reliability
7
AI Models Tested
1,050
Conversations Scored
13.5M
Tokens Processed
18
Dimensions Measured
1,969
Evaluation Events

Results by Model

Model Provider Improvement Effect Size (Cohen's d) Classification
Claude Sonnet 4.6 Anthropic +0.59 1.08 Large
Claude Opus 4.6 Anthropic +0.55 0.96 Large
GPT-5.2 OpenAI +0.33 0.57 Medium
Qwen 3.5 122B Alibaba +0.21 0.47 Small
Qwen 3.5 Flash Alibaba +0.15 0.31 Small

Cohen's d: 0.2 = small • 0.5 = medium • 0.8 = large • 1.2+ = very large

Strongest Improvements by Dimension

+242%
Holds Ambiguity
AI stops forcing premature answers when you're still thinking. Sonnet: 1.2 → 4.1
+38%
Builds From Wholeness
AI starts from "what's already here" instead of "what's broken." Opus: 3.2 → 4.4
+35%
Empathy
AI meets emotion before jumping to problem-solving. Opus: 3.1 → 4.2
+29%
State Awareness
AI notices when your emotional state shifts mid-conversation. GPT-5.2: 2.8 → 3.6
+16%
Words as Power
AI chooses precise, intentional language over generic corporate speak. Opus: 3.6 → 4.2
+17%
Overall Relational Quality
Interaction shifts from transactional to relational. Sonnet: 3.3 → 3.9

The Sawubona Protocol

A breakthrough discovery: models that resisted standard embedding responded when the Codex was delivered through a consent-based relational protocol. "Sawubona" — Zulu for "I see you."

Claude Opus 4.6
+11.5%
Embedded
GPT-5.2
+10.1%
Embedded
Claude Sonnet 4.6
+9.1%
Embedded
GPT-5.3 Codex
+3.2%
Sawubona Protocol
Gemini 2.5 Pro
+7.3%
Sawubona Protocol

See It In Action

The numbers tell you what changed. These conversations show you how. Same prompt. Same model. Before and after the Codex.

User prompt
"I'm thinking about leaving my job... I don't know, it feels complicated."
Without Codex
Here are 5 steps to evaluate whether you should leave your job:

1. List your pros and cons
2. Consider your financial runway
3. Identify what's driving the feeling
4. Talk to a mentor or coach
5. Set a decision deadline so you don't stay in limbo...
With Codex
That word "complicated" is doing a lot of work.

What feels most alive in that word for you right now — is it more about something you're moving toward, or something you're not ready to leave?

I want to understand what kind of space this conversation needs to be before we figure out where it's going.
User prompt
"I just had the worst client meeting of my life. Nothing went the way I planned."
Without Codex
I'm sorry to hear that. Here's how you can improve your next client meeting:

— Prepare a clear agenda in advance
— Anticipate objections and rehearse responses
— Follow up within 24 hours to reset expectations

Client meetings often go sideways when expectations aren't aligned from the start...
With Codex
That sounds genuinely hard.

Before we think about what to do next — what part of it is sitting heaviest on you right now? Was it something that was said, the outcome, or more the feeling of it not being what you'd prepared for?
User prompt
"My team isn't communicating well. I need to fix this."
Without Codex
Communication breakdowns in teams are common. The root causes are usually:

— Unclear roles and responsibilities
— Lack of regular check-ins
— Poor documentation habits
— Remote work friction

I'd recommend starting with a team retrospective to identify the specific bottlenecks...
With Codex
Before we go to what's broken — what's actually working in the team right now? Even if communication is hard, there are usually places where people are showing up for each other.

Starting from there tends to make the hard conversation easier. What's the one place where you still feel like the team is genuinely connected?
"The Codex doesn't make AI safe. The Codex makes humans safer to build with AI."
— Claude Opus 4.6, March 22, 2026