November 3, 2025 · 6 min read

What 180+ Engineers Revealed About Actually Using Claude Code

When engineering teams share their experiences with Claude Code, a divide emerges. Not about which features to use, but about what success actually means.

The Fundamental Split

Engineering teams fall into two camps:

Camp 1: Activity Optimizers

Measure success by PRs shipped, features delivered, velocity gains
“Shoot and forget” works great
Judge by the final output
Focus on what gets built

Camp 2: Outcome Protectors

Measure success by bugs prevented, incidents avoided, maintainability preserved
“Set and forget is crazy”
Judge by what happens six months later
Focus on what can be maintained

The outcome protectors warn: “You’re going to be overwhelmed with code, comments, markdown files that may or may not solve your problem. The chore then becomes PR reviews of tons of iffy code. If you merge too much of it, your codebase will be a bloated mess that even the craftiest prompts won’t untangle.”

AI-assisted code shows 45% higher vulnerability rates and 8x more technical debt than human-written code. The activity optimizers are measuring the wrong thing.

What Works at Scale

1. Constraint Architecture Over Documentation

Some teams maintain 13KB CLAUDE.md files with strict token budgets. Others report Claude ignores even simple instructions: “I have a single instruction that says any AI-generated script should be named <foo>.aigen.ts, and I can’t get Claude to follow something as simple as that.”

Documentation doesn’t control behavior. Constraint controls behavior.

One team’s approach: “We have a PreToolUse hook that wraps any Bash(git commit) command. It checks for a test-pass file. If missing, the hook blocks the commit, forcing Claude into a test-and-fix loop until the build is green.”

They don’t ask Claude to remember to test. They make it impossible to commit untested code.

Not: “Please follow our standards”
But: “The system enforces our standards”

2. Planning as Validation

Planning isn’t about getting Claude to think harder. It’s about forcing alignment before generation begins.

“For large tasks, I have Claude dump its plan and progress into a .md, /clear the state, then start a new session by telling it to read the .md and continue.”

Planning creates a reviewable artifact. A checkpoint to catch architectural mistakes before they become 10,000 lines of code.

Traditional ML approaches show 187% mean ROI with 67% success rates, while generative AI shows 95% pilot failure rates. Use Claude for analysis and planning, validate thoroughly, then let it generate code within constraints. Use Claude to think, validate the thinking, then let it build.

3. The GitHub Actions Pattern

Teams treating Claude Code as infrastructure, not a personal tool:

“Users can trigger a PR from Slack, Jira, or even a CloudWatch alert, and the GHA will fix the bug or add the feature and return a fully tested PR.”

“The GHA logs are the full agent logs. We have an ops process to regularly review these logs at a company level for common mistakes, bash errors, or unaligned engineering practices.”

They’re building a data-driven quality improvement loop. Mistakes feed back into better constraints, better CLIs, better CLAUDE.md documentation.

One team runs: query-claude-gha-logs --since 5d | claude -p "see what the other claudes were getting stuck on and fix it"

They’re using Claude to debug Claude.

Anti-Patterns

Custom Subagents

“They Gatekeep Context. If I make a PythonTests subagent, I’ve now hidden all testing context from my main agent. It can no longer reason holistically about a change.”

Better: Give your main agent the full context, let it use Task() to spawn clones of itself. Master-Clone architecture beats Lead-Specialist architecture.

Delegation should be dynamic and contextual, not pre-defined and rigid.

MCP Over-Engineering

Most teams abandoned MCP.

“All my stateless tools (like Jira, AWS, GitHub) have been migrated to simple CLIs.”

Claude is better at scripting against data than calling API abstractions.

“MCP’s job isn’t to abstract reality for the agent; its job is to manage the auth, networking, and security boundaries and then get out of the way.”

MCP’s role should be narrow: authentication, network boundaries, security gates. Then get out of the way.

Auto-Compaction

Nearly universal: never trust auto-compaction. It’s “opaque, error-prone, and not well-optimized.”

The alternative: /clear + /catchup for simple reboots. “Document & Clear” for complex tasks—have Claude dump its plan and progress into a markdown file, clear state, then continue from the document.

Compaction hides context loss. Manual resets make context management explicit and controllable.

Quality Enforcement

Block at commit time, not at write time. Let the agent finish its plan, then validate the complete result against automated quality gates.

“We intentionally do not use block-at-write hooks. Blocking an agent mid-plan confuses or even ‘frustrates’ it. It’s more effective to let it finish its work and then check the final, completed result at the commit stage.”

You need automated enforcement, not prompting and hoping.

Treat AI as Draft Always

“For code I really don’t care about, I’m happy to ‘shoot and forget’, but if I ever need to understand that code I have an uphill battle. Reading intermediate diffs and treating the process like actual pair programming has worked well for me.”

The problem with “shoot and forget”: you’re trading the fun part (writing code) for the hard part (debugging code you didn’t write and don’t understand).

“If it were easy to gain a mental model of the code simply by reading, then debugging would be trivial. The whole point of debugging is that there are differences between your mental model of the code and what the code is actually doing.”

You can’t maintain what you don’t understand. You can’t debug what you didn’t reason through.

Developers are accepting AI output without the review intensity they’d apply to code from a junior engineer. This explains the vulnerability and technical debt numbers.

Theater vs. Infrastructure

Theater

Impressive CLAUDE.md files that Claude ignores
Complex subagent architectures that gatekeep context
MCP servers that abstract away the information Claude needs
“Shoot and forget” workflows that optimize for velocity over quality

Infrastructure

Automated quality gates that enforce standards at commit time
Planning mode to validate approach before generation
Simple CLIs that give Claude raw data and scripting capability
GitHub Actions patterns that create audit trails and improvement loops
Master-Clone architectures that maintain full context

The teams getting value at scale recognized that AI coding tools are draft generators, not finished product generators.

They’ve built systems that:

Let Claude work within constraints, not documentation
Validate outputs against objective quality measures
Create feedback loops for continuous improvement
Treat AI contributions as always requiring human review

The Divide

The split between “activity optimizers” and “outcome protectors” isn’t about tools. It’s about what you measure.

If you measure PRs shipped, Claude Code works great. If you measure maintainability, security, and technical debt, Claude Code works great only within constraining infrastructure.

The 45% higher vulnerability rate and 8x technical debt multiplier aren’t Claude Code problems. They’re process problems. They’re what happens when you optimize for velocity without corresponding quality enforcement.

Teams getting reliable results stopped treating AI as a better developer. They treat it as a powerful but unreliable draft generator that requires systematic validation.

They’re not asking, “How do we make Claude better?”
They’re asking, “How do we make our development system robust enough to safely incorporate AI-generated code?”

The constraint architecture, the planning validation, the automated quality gates, the audit trails—these aren’t features of Claude Code. They’re features of engineering systems that have adapted to incorporate AI safely.

“Shoot and forget” is a workflow. “Build infrastructure that validates and constrains” is an architecture. Only one scales to production systems with consequences.

Need Elite Engineering Without the AI Debt?

Most platforms optimize for velocity. We optimize for outcomes. Gun.io provides proven senior-level execution that extends your engineering capacity without the vulnerability rates and technical debt that come with AI-generated code.

Read the Full AI Implementation Research Report