Multi-Model Comparison

Compare outputs from different Claude models side-by-side to find the best fit for your use case.

Difficulty: Beginner Time: 5 minutes

What You'll Build

Use Sciorex's parallel chat feature to:

Run the same prompt across multiple models
Compare response quality, speed, and style
Make informed decisions about which model to use

Model Comparison TBD: Replace with screenshot of parallel chats comparing models

Prerequisites

Sciorex installed
Understanding of available models

Method 1: Multi-Chat Launcher

The fastest way to compare models.

Step 1: Open Multi-Chat Launcher

Go to Chat in the sidebar
Click the Multi-Chat button (or press Ctrl/Cmd + Shift + N)

Step 2: Configure Variants

Create three chat variants:

Variant	Model	Thinking	Effort
1	Opus 4.6	Think Hard	High
2	Sonnet 5.0	Think	-
3	Haiku 4.5	Off	-

Step 3: Run the Comparison

Enter your prompt in the shared input
Click Send to All
Watch responses stream in side-by-side

Step 4: Analyze Results

Compare:

Quality: Which answer is most accurate/complete?
Speed: How fast did each model respond?
Style: Which writing style fits your needs?
Cost: Consider the token usage difference

Method 2: Using Agents

Create dedicated agents for each model to compare with specific system prompts.

Create Model-Specific Agents

Opus Agent:

yaml

name: Opus Evaluator
description: Uses Opus for maximum quality
model: claude-opus-4-6
thinkingLevel: think-hard
effortLevel: high
systemPrompt: |
  Provide thorough, well-reasoned responses.
  Take your time to analyze all aspects.

Sonnet Agent:

yaml

name: Sonnet Evaluator
description: Uses Sonnet for balanced performance
model: claude-sonnet-5-0
thinkingLevel: think
systemPrompt: |
  Provide clear, practical responses.
  Balance depth with efficiency.

Haiku Agent:

yaml

name: Haiku Evaluator
description: Uses Haiku for speed
model: claude-haiku-4-5-20251001
thinkingLevel: off
systemPrompt: |
  Provide concise, direct responses.
  Prioritize speed and clarity.

Run Parallel Sessions

Use Multi-Chat Launcher
Select each agent for a different variant
Compare outputs

Thinking & Effort Level Impact

Different thinking and effort configurations change output quality significantly:

Thinking Levels

Level	Tokens	Best For
`off`	None	Simple lookups, formatting
`think`	Low	Standard analysis, summaries
`think-hard`	Medium	Complex reasoning, debugging
`think-harder`	High	Deep analysis, architecture
`ultrathink`	Maximum	Hardest problems, novel approaches

Effort Levels (Opus 4.6)

Level	Behavior
`low`	Minimal tool calls, concise answers
`medium`	Balanced exploration
`high`	Thorough investigation (default)
`max`	Exhaustive analysis, many tool calls

Try comparing the same prompt with different thinking/effort combos to see the quality difference.

Comparison Criteria

For Code Generation

Criteria	What to Look For
Correctness	Does the code work?
Completeness	Are edge cases handled?
Style	Is it idiomatic? Well-structured?
Efficiency	Is the algorithm optimal?
Documentation	Are comments helpful?

For Analysis Tasks

Criteria	What to Look For
Depth	How thorough is the analysis?
Accuracy	Are facts correct?
Reasoning	Is logic sound?
Actionability	Are recommendations practical?

For Creative Tasks

Criteria	What to Look For
Originality	Fresh ideas or generic?
Voice	Appropriate tone?
Coherence	Logical flow?
Engagement	Interesting to read?

Example: Code Review Comparison

Prompt:

Review this function for bugs and improvements:

function calculateDiscount(price, discount) {
  return price - (price * discount);
}

Opus Response:

Detailed analysis with edge cases (negative prices, discount > 1), type safety concerns, precision issues with floating point, and suggested improvements with TypeScript...

Sonnet Response:

Solid review covering main issues: no input validation, potential NaN, suggests adding bounds checking...

Haiku Response:

Quick points: add validation, consider rounding, handle edge cases.

Conclusion: For thorough code reviews, Opus provides the most comprehensive analysis. Sonnet offers good balance. Haiku is suitable for quick sanity checks.

When to Use Each Model

Based on your comparisons, build a mental model:

Task Type	Recommended Model	Thinking	Effort
Critical decisions	Opus	think-hard	high
Daily coding	Sonnet	think	-
Quick lookups	Haiku	off	-
Complex debugging	Opus	think-harder	max
Documentation	Sonnet	think	-
Formatting/refactoring	Haiku	off	-

Saving Your Findings

Create a reference ticket documenting your findings:

yaml

title: Model Selection Guidelines
type: documentation
description: |
  ## Code Review
  - Use Opus for security-critical code
  - Use Sonnet for regular PRs

  ## Research
  - Use Opus for deep analysis
  - Use Sonnet for summaries

  ## Quick Tasks
  - Use Haiku for formatting
  - Use Haiku for simple questions

Alternative: Council Mode

For structured comparison where models debate and build on each other's responses, use Council Mode instead of parallel chats.

When to Use Council Mode vs Parallel Chats

Scenario	Recommended
Compare raw responses	Parallel Chats
Get a synthesized best answer	Council Mode
Test different approaches	Parallel Chats
Architecture or design decisions	Council Mode
Performance benchmarking	Parallel Chats
Code review from multiple perspectives	Council Mode

Tips

Same prompt, different models: Always use identical prompts for fair comparison
Multiple trials: Run comparisons several times — results can vary
Consider cost: Opus is more expensive than Sonnet
Match task complexity: Don't use Opus for simple tasks
Experiment with thinking levels: Sometimes think on Sonnet beats off on Opus
Document findings: Keep notes on which models work best for your use cases

Models Reference
Parallel Chats
Extended Thinking
Council Mode - Structured multi-model debate

Multi-Model Comparison ​

What You'll Build ​

Prerequisites ​

Method 1: Multi-Chat Launcher ​

Step 1: Open Multi-Chat Launcher ​

Step 2: Configure Variants ​

Step 3: Run the Comparison ​

Step 4: Analyze Results ​

Method 2: Using Agents ​

Create Model-Specific Agents ​

Run Parallel Sessions ​

Thinking & Effort Level Impact ​

Thinking Levels ​

Effort Levels (Opus 4.6) ​

Comparison Criteria ​

For Code Generation ​

For Analysis Tasks ​

For Creative Tasks ​

Example: Code Review Comparison ​

When to Use Each Model ​

Saving Your Findings ​

Alternative: Council Mode ​

When to Use Council Mode vs Parallel Chats ​

Tips ​

Related ​

Multi-Model Comparison

What You'll Build

Prerequisites

Method 1: Multi-Chat Launcher

Step 1: Open Multi-Chat Launcher

Step 2: Configure Variants

Step 3: Run the Comparison

Step 4: Analyze Results

Method 2: Using Agents

Create Model-Specific Agents

Run Parallel Sessions

Thinking & Effort Level Impact

Thinking Levels

Effort Levels (Opus 4.6)

Comparison Criteria

For Code Generation

For Analysis Tasks

For Creative Tasks

Example: Code Review Comparison

When to Use Each Model

Saving Your Findings

Alternative: Council Mode

When to Use Council Mode vs Parallel Chats

Tips

Related