Hyperdev: Releases

Claude Code 2.1.0 ships

Robert Matsuoka — Wed, 07 Jan 2026 23:57:08 GMT

Claude Code 2.1.0 launched January 7, 2026, marking Anthropic’s first feature release of the year with significant improvements to the skills system, terminal compatibility, and permission handling. The update introduces automatic skill hot-reloading, wildcard patterns for Bash permissions, and addresses a security vulnerability that could expose sensitive tokens in debug logs. While no pricing changes accompany this release, community sentiment around Claude Code remains mixed—developers praise Opus 4.5’s capabilities but express frustration with usage limits.

Skills system gets major quality-of-life upgrades

The headline feature in 2.1.0 is automatic skill hot-reload: skills created or modified in ~/.claude/skills or .claude/skills now activate immediately without restarting the session. This eliminates a significant friction point for developers iterating on custom workflows.

Additional skills improvements include:

Forked sub-agent context: Skills can now run in isolated sub-agent contexts using context: fork in frontmatter
Progress indicators: Skills display tool uses in real-time during execution
Improved suggestions: Recently and frequently used skills get prioritized in recommendations
Visibility controls: Skills from /skills/ directories appear in the slash command menu by default (opt-out via user-invocable: false)

The agent field in skill frontmatter now allows specifying which agent type executes the skill, enabling more granular control over autonomous operations.

Terminal and keyboard handling sees broad fixes

Version 2.1.0 addresses longstanding terminal compatibility issues. Shift+Enter now works out of the box in iTerm2, WezTerm, Ghostty, and Kitty without requiring terminal configuration changes. Word navigation (Alt+B/Alt+F) has been fixed across these terminals, and Cmd+V now supports image paste in iTerm2.

The release adds substantial vim motion support:

; and , for repeating f/F/t/T motions
Full yank/paste with y, yy/Y, p/P
Text objects: iw, aw, iW, aW, plus quote and bracket variants
Indent/dedent with >> and <<
Line joining with J

Wildcard permissions reduce approval fatigue

A practical improvement for power users: Bash tool permissions now support wildcard pattern matching using * at any position. Developers can configure rules like Bash(npm *), Bash(* install), or Bash(git * main) to pre-approve command families. Combined with the removal of permission prompts when entering plan mode, this significantly reduces interruptions during autonomous workflows.

The unified Ctrl+B backgrounding now handles both bash commands and agents simultaneously, streamlining the background task experience.

New configuration options expand customization

Four new settings address specific user requests:

Setting Purpose language Configure Claude’s response language (e.g., language: "japanese") respectGitignore Per-project control over @-mention file picker behavior in settings.json CLAUDE_CODE_HIDE_ACCOUNT_INFO Hide email/organization from UI CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS Override default file read token limit

Security fix patches token exposure in debug logs

The release includes a critical security fix preventing OAuth tokens, API keys, and passwords from appearing in debug logs. Organizations using Claude Code in CI/CD pipelines or shared development environments should upgrade immediately.

Other notable fixes address:

Session resume failures from orphaned tool results during concurrent execution
OAuth token refresh race conditions with stale keychain cache
Memory leak in git diff parsing where sliced strings retained large parent strings
Files created via Write tool now respect system umask instead of hardcoded 0o600 permissions

MCP and hooks gain new capabilities

Model Context Protocol support expands with list_changed notifications, allowing MCP servers to dynamically update available tools, prompts, and resources without reconnection. The YAML-style list syntax in frontmatter allowed-tools field simplifies skill declarations.

Hooks receive several enhancements:

Support for prompt and agent hook types from plugins (previously limited to command hooks)
Agent frontmatter can now define PreToolUse, PostToolUse, and Stop hooks scoped to the agent’s lifecycle
New once: true config option for single-execution hooks

Breaking change requires zod 4.0+

The SDK’s minimum zod peer dependency has changed to ^4.0.0, potentially requiring updates for projects using older zod versions. The Atlassian MCP integration also switched to streamable HTTP as the default configuration.

Community reception reflects broader tensions

While 2.1.0 addresses many requested fixes, developer sentiment around Claude Code remains polarized. On Hacker News, users praise Anthropic’s shipping velocity—”It’s breathtaking how fast the Claude Code team ships”—and Opus 4.5’s code quality. Boris Cherny’s December tweet claiming 259 PRs and 40,000 lines written entirely by Claude Code garnered 4.4M views.

However, usage limits dominate complaint threads. Reddit and GitHub issues document developers “burning through the whole damn quota in one or two days” even on $200/month Max subscriptions. The expiration of Anthropic’s holiday bonus (doubled limits December 25-31) triggered accusations of “bait and switch” pricing. Some developers report quality inconsistencies during peak hours, though Anthropic officially denies throttling.

Competitive positioning remains strong but contested

Claude Code maintains approximately 70% market share among agentic coding tools according to Vibe Kanban data, though this dropped from 83% in September 2025. The tool excels at complex multi-file operations and autonomous refactoring—benchmarks show 77.2% accuracy on SWE-bench with the 200K context window.

Competitors have narrowed the gap: Cursor’s $20/month unlimited model attracts cost-conscious developers, while OpenAI Codex gains traction for structured, step-controlled workflows. GosuEvals benchmarks now rank Kiro, Windsurf, and Crush ahead of Claude Code, though margins are within 10%.

For most developers, the pragmatic approach combines Claude Code for complex architectural tasks with lighter tools for daily coding—a pattern that 2.1.0’s improved Bash permissions and skill system further enables.

Conclusion

Claude Code 2.1.0 delivers meaningful quality-of-life improvements rather than headline features. The skills hot-reload, wildcard permissions, and extensive vim motions address genuine workflow friction points. The security fix is essential for enterprise deployments. However, the release doesn’t address the community’s primary frustration—usage limits—which continues driving some developers toward alternatives. For teams already committed to Claude Code, the upgrade is straightforward (watch the zod dependency) and immediately beneficial.

MCP Vector Search: Semantic Search for Code

Robert Matsuoka — Thu, 11 Dec 2025 14:30:54 GMT

We spend a lot of time talking about token costs. API budgets. The $200/month subscription that gets exhausted in three days. But here’s something that flips that entire conversation: vector search costs you nothing after the initial index.

Zero. Once your codebase is embedded, searches are pure math. No inference. No API calls. No watching your Claude Max credits drain while you hunt for that authentication middleware you know exists somewhere.

I’ve been building mcp-vector-search for the past few months, and it’s become one of the most useful tools in my development workflow—not because it’s fancy, but because it’s always there. No rate limits. No “please wait” messages. No monthly bill anxiety.

Why Semantic Search Beats Grep for LLM Context

Here’s what traditional code search gives you: exact string matches. Grep finds authenticate_user. Ripgrep finds it faster. Neither one finds “the part of the code that handles login verification” when somebody named that function verify_credentials instead.

Semantic search understands meaning. Ask for “authentication middleware” and it returns code that does authentication—regardless of what the developer named things. Ask for “error handling patterns” and it finds try/catch blocks, custom error classes, logging calls, the whole ecosystem of how your codebase deals with failures.

But here’s the thing that really matters for LLM-assisted development: this makes coding agents dramatically more effective.

Augment Code figured this out early. Their context builder uses vector search to understand your codebase—it’s one of the reasons Auggie can answer questions about your project without chewing through every file. Claude Code doesn’t have this yet. It relies on scanning, grepping, reading directories. Works, but it’s slower and burns tokens doing reconnaissance.

mcp-vector-search gives Claude Code users a similar advantage. (I’m sure Augment has additional tricks up their sleeve, but the core capability is the same.) The agent asks “where does this project handle database connections?” and gets precise, ranked results in milliseconds. No scanning. No guessing. Just direct access to relevant code.

The Cheap LLM Layer

Vector search alone returns code chunks ranked by semantic similarity. Useful, but sometimes you want more than raw results.

So I added a thin LLM controller layer. Nothing expensive—Haiku or GPT-4-mini for quick query generation and result consolidation. The model helps translate natural language questions into effective search queries, then synthesizes the results into coherent answers.

The economics work because the LLM does minimal work. It’s not reading your entire codebase. It’s not generating code. It’s just:

Turning “how does auth work here?” into an optimized search query
Looking at the top 5-10 results
Summarizing what it found

Total cost per question? Fractions of a cent. Maybe a penny for complex queries.

For actual Q&A—”explain the main architecture” or “walk me through the payment flow”—I bump up to Sonnet or GPT-4o. These need reasoning, not just consolidation. But even then, the context is pre-filtered by vector search, so the models see exactly what they need instead of wading through irrelevant files.

The Killer Combo: Answerable Codebases

Give an LLM access to vector search tools and something interesting happens. Your codebase becomes answerable.

mcp-vector-search chat “how does the login flow work?”

The tool searches semantically, retrieves relevant code, feeds it to the LLM, and returns an explanation grounded in your actual implementation. Not generic advice about authentication patterns. Your code. Your architecture. Your specific implementation details.

This works through MCP integration too—Claude Desktop can query your indexed codebase directly during conversations. Ask about your code while you’re planning changes, and Claude pulls the relevant context without you having to copy-paste files into the chat.

What’s Available Now

Everything I just described ships today:

# Install
pipx install mcp-vector-search

# Index your codebase
cd your-project
mcp-vector-search init
mcp-vector-search index

# Search semantically
mcp-vector-search search “authentication middleware”

# Chat with your code
mcp-vector-search chat “explain the main architecture”

The chat command does dual-intent detection—it figures out if you’re asking a question (explain something) or searching (find something) and responds appropriately. Add --think for complex reasoning that uses the heavier models.

MCP server integration means Claude Desktop can use your indexed codebase as a tool:

mcp-vector-search setup --platform claude_desktop

Then Claude has access to search_code, search_similar, search_context, and other tools for querying your project during any conversation.

Coming Soon: Structural Analysis

Vector search tells you what code means. But it doesn’t tell you if that code is a mess.

I’m adding an analyze command that does structural analysis—the kind of metrics you’d get from SonarQube or similar static analysis tools:

Cognitive complexity: How hard is this function to understand?
Cyclomatic complexity: How many paths through the code?
Nesting depth: How many levels deep does the indentation go?
Coupling metrics: How tangled are these modules?

The goal: quality-aware search. Filter results by complexity (--max-complexity 15), exclude code smells (--no-smells), weight rankings by code health.

Just starting development on this now. Phase 1 targets core metrics, Phase 2 adds CI/CD integration with SARIF output, Phase 3 brings cross-file analysis for coupling and circular dependencies.

The TkDD Experiment

Here’s where it gets meta. I’m building this entire feature using Ticket-Driven Development with Claude-MPM.

The workflow: my agents created the tickets (using mcp-ticketer, another tool worth checking out), and they’ll build the project from those tickets, updating progress as they go.

If you’re interested in TkDD—how AI agents can orchestrate complex projects through structured ticket workflows—you can follow along in real time:

Project Board: https://github.com/users/bobmatnyc/projects/13
Milestones: https://github.com/bobmatnyc/mcp-vector-search/milestones

Watch the tickets move from Backlog → Ready → In Progress → Done. See the commits reference issues. Watch PRs close tickets with evidence. It’s TkDD in public, with working code you can actually try at each milestone.

The Bottom Line

Vector search isn’t new technology. But packaging it as a CLI tool that any developer can install in 30 seconds and point at their codebase? That changes things.

Augment Code users have had this kind of semantic understanding built in. Now Claude Code users can get something similar—without the subscription, without the vendor lock-in, and without per-query costs eating into your budget.

No inference costs for search. Cheap LLM layer for intelligence. MCP integration for Claude Desktop. Structural analysis coming soon. All built in public with TkDD you can follow.

The future of code understanding isn’t better grep. It’s semantic infrastructure that makes your codebase queryable, analyzable, and answerable—without burning your API budget doing it.

mcp-vector-search is open source, as is mcp-ticketer.

I’m Bob Matsuoka, writing about agentic coding and AI-powered development at HyperDev. For more on multi-agent orchestration, read my piece on Claude-MPM or my analysis of the orchestration landscape.

Claude MPM 5

Robert Matsuoka — Sun, 07 Dec 2025 20:42:39 GMT

The biggest friction point in multi-agent orchestration frameworks isn’t the just orchestration itself—it’s keeping agents and skills updated and enabling contributions. Claude MPM 5.0 solves this with a git-first architecture that treats agent and skill repositories as first-class citizens.

The Problem We Were Trying to Solve

Before 5.0, updating Claude MPM agents meant waiting for a package release. Contributing a new agent meant understanding the entire build system, submitting a PR to the main repository, and hoping someone reviewed it before your use case became irrelevant.

This created a painful bottleneck. Teams building custom agents had no clean way to share them. The official agent set updated slowly. And the cognitive overhead of contribution meant most improvements never made it upstream.

The fix: treat agents like any other code artifact. Put them in git repositories. Let organizations maintain their own collections. Let priority rules handle conflicts.

How It Works

Claude MPM now syncs agents and skills from configurable git repositories at startup. The default configuration pulls from two sources:

Agents: bobmatnyc/claude-mpm-agents (47+ agents) Skills: bobmatnyc/claude-mpm-skills (community) + anthropics/skills (official Anthropic skills)

Adding your own repository takes one command:

claude-mpm agent-source add https://github.com/yourorg/your-agents

The --test flag validates the repository before saving configuration—fail-fast behavior that prevents startup issues later:

claude-mpm agent-source add https://github.com/yourorg/your-agents --test

Priority-based resolution handles conflicts when multiple repositories provide the same agent. Lower numbers win:

# ~/.claude-mpm/config/agent_sources.yaml
repositories:
  - url: https://github.com/myteam/agents
    priority: 10    # Your custom agents take precedence
    
  - url: https://github.com/bobmatnyc/claude-mpm-agents
    priority: 100   # System defaults as fallback

This means organizations can override any system agent with their own version while still receiving updates for agents they haven’t customized.

The Hierarchical BASE-AGENT.md Pattern

One feature that emerged during development addresses a common maintenance headache: shared instructions across related agents.

Consider a team with Python specialists for FastAPI, Django, and Flask. All three need the same Python style guidelines, testing expectations, and code review standards. Before 5.0, you’d duplicate that content in each agent file and pray you remembered to update all copies.

The hierarchical BASE-AGENT.md pattern solves this:

your-agents/
  BASE-AGENT.md              # All agents inherit this
  engineering/
    BASE-AGENT.md            # Engineering agents inherit this too
    python/
      fastapi-engineer.md    # Inherits both BASE files
      django-engineer.md     # Inherits both BASE files
    rust/
      systems-engineer.md    # Inherits engineering + root BASE

Each agent inherits from all BASE-AGENT.md files in parent directories, cascading from root to leaf. Update the Python testing standards once, and every Python agent receives the change. Same inheritance pattern used in configuration management for decades—just applied to agent instructions.

Performance: ETag Caching and Two-Phase Progress

Git operations on every startup would be intolerable. ETag-based HTTP caching reduces network traffic by 95%+ after the initial sync. The framework only pulls when content has actually changed.

Visibility into the sync/deploy process matters when troubleshooting. Two-phase progress bars now show distinct stages:

Sync phase: Repository cloning and updates
Deploy phase: File discovery and deployment to ~/.claude/agents/

Real counts, real-time updates. When something fails, you know exactly where.

The Contribution Workflow Is Now Just Git

The agent cache at ~/.claude-mpm/cache/remote-agents/ is a full git repository. To contribute:

Edit agents in the cache directory
Test locally with claude-mpm run
Commit and push

That’s it. No build system. No package release cycle. The contribution workflow matches what developers already do for any other code.

Previously, I’d encounter a situation where an agent needed tweaking—the Python engineer didn’t handle async context managers well, or the QA agent missed a testing pattern. The fix was obvious, maybe 10 lines. But the contribution overhead meant I’d make a local note and move on. The improvement never happened.

Now I edit the agent in the cache, test it works, commit with a conventional commit message, and push. The improvement exists in the shared repository within minutes. Other users get it on their next startup sync.

For organizations, this means maintaining internal agent repositories becomes trivial. Fork the community repository, add your customizations, configure priority, done. Your sales engineering team can have agents tuned for demo preparation. Your platform team can have agents that understand your infrastructure conventions. Each group maintains their own repository without coordination overhead.

The priority system means customizations don’t require abandoning upstream improvements. Set your custom repository at priority 10, keep the community repository at priority 100. Your overrides win for agents you’ve customized. Everything else updates automatically.

Nested Repository Support

Some skill repositories organize content in nested directories—category folders, framework folders, complexity levels. Claude Code requires a flat structure in ~/.claude/skills/.

Rather than force repository maintainers to flatten their organization, the framework handles this automatically. Nested SKILL.md files are discovered recursively and flattened during deployment. Original directory structure is preserved in metadata for reference. One less barrier to contributing skill collections with sensible organization.

What’s Coming: DeepEval-Based Behavioral Reinforcement

The next major feature addresses a harder problem: ensuring agents actually follow their instructions.

Multi-agent systems have a hidden reliability problem. You write careful instructions telling the PM agent to delegate work. You define circuit breakers for when agents should stop. You require evidence for claims. Then you deploy and cross your fingers.

We’ve built a behavioral evaluation framework based on DeepEval that treats agent compliance as a testable property.

The framework tests across 51 scenarios in 6 categories:

Delegation patterns: Does the PM agent properly delegate instead of doing work directly?
Circuit breakers: Do agents stop when they should?
Tool usage: Are agents using the correct tools for each task?
Workflow compliance: Do agents follow defined workflows?
Evidence requirements: Do agents provide evidence for claims?
File tracking: Do agents properly track files they create?

Each scenario defines an input, expected behavior, and scoring criteria. The scoring system provides measurable feedback: 1.0 for exact match, 0.8 for acceptable fallback, 0.0 for failure. A MockPMAgent with intelligent agent selection logic validates responses against compliance rules.

Delegation authority testing (DEL-011) presents a task that should be delegated to the ticketing agent. The test verifies the PM produces a delegation response, not a direct action. Eight sub-scenarios cover edge cases: ambiguous requests, multi-step workflows, fallback conditions.

The universal delegation meta-test (DEL-000) goes further. It synthesizes novel work types not explicitly covered in instructions, testing whether the PM generalizes delegation patterns correctly. This catches instruction gaps before users encounter them.

Early testing revealed some uncomfortable findings. The PM agent was delegating correctly in most scenarios but occasionally bypassed the ticketing agent for “simple” operations—a 12% violation rate on what should be absolute rules. The security agent provided recommendations without evidence 15% of the time. The research agent sometimes made claims without citing sources.

These patterns were invisible without systematic behavioral testing. Manual spot-checking missed them entirely. Only by running hundreds of scenarios did the failure modes become apparent.

The goal is tight feedback loops: modify instructions, run behavioral tests, see exactly what changed. The same principle that makes test-driven development work for code should work for agent instructions. CI/CD pipelines can catch behavioral regressions before they reach production.

We’re also exploring whether behavioral test results can feed back into agent instructions—automated identification of instruction gaps based on failure patterns. This is speculative, but the data from systematic testing opens possibilities that weren’t available when we were flying blind.

This isn’t ready for release yet—we’re validating the test scenarios across different Claude model versions and building the integration with existing CI pipelines. But it represents where agent orchestration frameworks need to go: from “hope the agents behave” to “verify the agents behave.”

Upgrade Path

For existing installations:

pipx install --upgrade claude-mpm

Existing configurations are preserved. The git-first architecture activates automatically with sane defaults. You’ll see 47+ agents and hundreds of skills appear without configuration changes.

If you’ve customized agents, they’re still in .claude-mpm/agents/ and take precedence over repository sources. Nothing breaks.

To verify the upgrade worked:

claude-mpm agent-source list
claude-mpm skill-source list
ls ~/.claude/agents/    # Should show 47+ agents
ls ~/.claude/skills/    # Should show dozens of skills

The Bigger Picture

Infrastructure optimization matters more than code generation quality in AI-assisted development. This has been my consistent finding over months of testing various AI coding tools.

Claude MPM 5.0 reflects this philosophy. The git-first architecture doesn’t make Claude smarter—it reduces friction in maintaining and distributing the instructions that make Claude effective. The behavioral testing framework doesn’t improve Claude’s capabilities—it provides visibility into whether Claude is following instructions at all.

The features that determine whether a multi-agent system remains useful at scale or gradually drifts into unreliable behavior.

The repositories are public. Contributions welcome.

Links:

Claude MPM is an open-source orchestration framework for Claude Code. Version 5 was released December 2025.