MODEL WARS

Model Skills

18 models rated across 6 dimensions — coding, reasoning, vision, writing, context, and speed. Pick a skill to see who leads. Compare up to 4 models side by side.

Anthropic
Claude Sonnet 4.6
CodingReasoningVisionWritingLong ContextSpeed
Coding
93
Reasoning
88
Vision
80
Writing
97
Long Context
95
Speed
78

Street smart. Best for real-world agentic tasks. Highest writing quality of any frontier model.

WritingCoding
OpenAI
GPT-5.4
CodingReasoningVisionWritingLong ContextSpeed
Coding
91
Reasoning
92
Vision
90
Writing
88
Long Context
80
Speed
82

OSWorld computer use leader (75.3% vs 72.4% human baseline). Strong vision. Tool use native.

VisionReasoning
Google
Gemini 3.1 Pro
CodingReasoningVisionWritingLong ContextSpeed
Coding
85
Reasoning
87
Vision
92
Writing
80
Long Context
98
Speed
75

Best-in-class context window. Excellent at processing huge documents. Google's current enterprise flagship.

Long ContextVision
Anthropic
Claude Opus 4.6
CodingReasoningVisionWritingLong ContextSpeed
Coding
90
Reasoning
95
Vision
82
Writing
98
Long Context
95
Speed
55

Anthropic's most capable. Overkill for most tasks but unbeatable on hard reasoning and nuanced writing.

ReasoningWriting
OpenAI
GPT-4o
CodingReasoningVisionWritingLong ContextSpeed
Coding
82
Reasoning
84
Vision
88
Writing
82
Long Context
78
Speed
88

Benchmark workhorse. Fast, capable, strong vision. The default for most OpenAI integrations.

VisionSpeed
OpenAI
GPT-5.4 Pro
CodingReasoningVisionWritingLong ContextSpeed
Coding
88
Reasoning
96
Vision
92
Writing
85
Long Context
80
Speed
60

Thinking mode enabled. Extended reasoning for hard problems. Most expensive OpenAI model.

ReasoningVision
Zhipu AI
GLM-5
CodingReasoningVisionWritingLong ContextSpeed
Coding
80
Reasoning
82
Vision
78
Writing
79
Long Context
88
Speed
78

Zhipu AI's flagship. Competitive across all dimensions. Strongest on Chinese-language reasoning and structured tasks.

Chinese languageReasoning
xAI
Grok 3
CodingReasoningVisionWritingLong ContextSpeed
Coding
84
Reasoning
90
Vision
70
Writing
83
Long Context
75
Speed
80

Truth-seeking first. Strong on factual accuracy, real-time X data access, and hard reasoning tasks.

ReasoningReal-time info
Minimax
Minimax M2.5
CodingReasoningVisionWritingLong ContextSpeed
Coding
78
Reasoning
76
Vision
72
Writing
82
Long Context
85
Speed
85

Good balance of writing quality and long context. Strong on creative and structured output tasks.

WritingLong Context
Moonshot
Kimi K2.5
CodingReasoningVisionWritingLong ContextSpeed
Coding
82
Reasoning
80
Vision
65
Writing
75
Long Context
88
Speed
82

Strong on very long contexts and Chinese-language tasks. Competitive mid-tier pricing.

Long ContextChinese language
Minimax
Minimax M1
CodingReasoningVisionWritingLong ContextSpeed
Coding
75
Reasoning
78
Vision
70
Writing
80
Long Context
98
Speed
70

1M token context window. Designed for massive document processing pipelines.

Long ContextDocument processing
Zhipu AI
GLM-4.7
CodingReasoningVisionWritingLong ContextSpeed
Coding
74
Reasoning
77
Vision
75
Writing
74
Long Context
85
Speed
84

Fast mid-tier option with strong Chinese language support and solid long-context handling.

Chinese languageCost efficiency
OpenAI
GPT-4o mini
CodingReasoningVisionWritingLong ContextSpeed
Coding
72
Reasoning
72
Vision
75
Writing
70
Long Context
70
Speed
95

Fast and cheap. Ideal for high-volume, low-complexity tasks where cost matters more than quality.

SpeedCost efficiency
DeepSeek
DeepSeek R1
CodingReasoningVisionWritingLong ContextSpeed
Coding
95
Reasoning
97
Vision
40
Writing
72
Long Context
78
Speed
65

The model that broke Twitter in Jan 2026. Matches o1 on reasoning at a fraction of the cost. No vision.

ReasoningCoding
Mistral
Mistral Large
CodingReasoningVisionWritingLong ContextSpeed
Coding
82
Reasoning
83
Vision
45
Writing
78
Long Context
72
Speed
80

EU-based, GDPR-native. Strong for enterprises that can't send data to US providers.

European data residencyCoding
Alibaba
Qwen 3.5 122B
CodingReasoningVisionWritingLong ContextSpeed
Coding
84
Reasoning
82
Vision
55
Writing
76
Long Context
70
Speed
72

Best open-weight model for code. Rivals frontier closed models at zero API cost when self-hosted.

Open sourceCoding
DeepSeek
DeepSeek V3
CodingReasoningVisionWritingLong ContextSpeed
Coding
88
Reasoning
85
Vision
38
Writing
70
Long Context
70
Speed
80

Best value coding model. Near-frontier coding at mid-tier cost. No vision capability.

CodingCost efficiency
Meta
Llama 3.3 70B
CodingReasoningVisionWritingLong ContextSpeed
Coding
76
Reasoning
78
Vision
0
Writing
72
Long Context
65
Speed
90

Best open-weight model for self-hosted deployments. Run locally, zero data exposure, no API costs.

Open sourceSelf-hosting

Scores based on published benchmarks (SWE-bench, MMLU, GPQA, OSWorld) and community consensus · Not all benchmarks are directly comparable · Updated 2026-03-10