Enter Codex Web

OpenAI's Latest Attempt at Agentic Coding

May 20, 2025

OpenAI quietly rolled out Codex Web last week, gradually enabling it for pro and teams users. I spotted the new "Codex" option under my ChatGPT menu today and decided to put it through its paces on my personal website matsuoka.com (more on that project later). After several hours of testing, I've got some thoughts on what works, what doesn't, and where this fits in the increasingly crowded AI coding space.

Yet Another "Codex"

First, let's address the confusion. "Codex" now refers to at least three different products from OpenAI:

The original Codex model powering GitHub Copilot
A CLI-based coding tool I wrote about previously
This new web-based autonomous coder

The naming overlap is getting ridiculous, but I'll focus on OpenAI's new offering here. Worth noting that while Codex CLI is open source and lets you pick your own model, Codex Web uses whatever model OpenAI provides (likely o3), giving you less flexibility but more integration.

What Sets Codex Web Apart

Unlike IDE plugins (Copilot, Codeium), editor extensions (Cursor's Augment), or CLI tools (Claude Code), Codex Web seems to be a standalone web application focused on autonomous coding. The closest comparisons would be Cognition Labs' Devin or SweepAI - tools designed to autonomously implement entire features or fixes.

Setup requires:

Device 2FA authentication
GitHub repository access (granted per repo)
Manual entry of environment variables and CI/CD scripts

Once connected, Codex scans your codebase and suggests tasks it can perform, from explaining code structure to fixing bugs it identifies.

The Surprising Part: It Actually Found Real Bugs

Scanning my personal website repo, Codex Web immediately identified several legitimate issues, including:

A stray terminal string at the end of build.js
A useToast implementation attaching state listeners on every render
An import referencing a non-existent file (simple-mock-translations vs. the correct mock-translations.ts)

These weren't just linting issues but actual functional bugs that would impact performance or break builds. I was genuinely impressed by this discovery capability.

The Execution Model: Both Powerful and Problematic

When Codex tackles a task, it spins up a virtual environment to actually execute your code. This approach has profound implications:

The Good:

It can verify changes work before committing them
Execution reveals runtime issues static analysis might miss
No worrying about breaking your local environment

The Bad:

No explicit workflow configuration or instruction support
Each task is its own separate thread, which means checkpointing is more complex. This is an intentional design choice as checkpointing is directly tied to git branches. Long term, this actually may be more natural -- but then that choice should be made clearer.
Environment setup overhead for each major operation

The Ugly:

No built-in preview option
Build cycles are excruciatingly long and may be completely unworkable for large codebases

Modern web development relies on hot module reloading and incremental builds that take milliseconds. Codex's sandbox approach turns this into minutes. The productivity gains from autonomy are partially offset by this execution overhead.

The GitHub Disconnect

Perhaps most puzzling was Codex's inability to access GitHub issues despite having repository access. When I referenced issue #44, it responded:

I ended up copying and pasting the issue text manually. This reveals OpenAI's approach: Codex Web can only access what's explicitly enabled, even when it technically has the permissions. This feels like working with oven mitts on.

Where Codex Web Excels and Falls Short

After testing various scenarios, here's my assessment:

Strengths:

Code understanding and bug identification is genuinely impressive
Autonomous implementation of fixes works well
Clean interface with minimal setup friction

Weaknesses:

Build process is painfully inefficient
Limited workflow integration (issues, CI/CD)
No ability to leverage CLI tools or custom scripts
Context is limited to your code, not research or external docs

The Bigger Picture: Three Competing Visions

The AI coding landscape is splitting into three distinct approaches:

Full IDEs (either on Web or as VS style desktop replacements) - Bolt, v0, Cursor, Loveable, Windsurf
CLI/Plugin Agents (Augment, Cline, Claude Code) that enhance your existing tools
PR-style agents (Codex Web, Devin) that autonomously create pull requests

OpenAI seems to be testing different approaches to AI coding - providing models for Microsoft's GitHub Copilot while building their own autonomous agent with Codex Web. But the current implementation feels caught between worlds - not seamless enough for daily coding, yet not fully autonomous enough to handle end-to-end workflows.

The Verdict: Promising but Not There Yet

Codex Web feels like a tech demo of what's possible rather than a refined developer tool. Its bug identification capabilities are impressive, and the execution model has potential, but the performance limitations and workflow gaps make it impractical as a primary development tool.

I'll keep it in my arsenal for occasional code review and maintenance tasks, but it won't replace my existing workflow anytime soon. For now, it's most valuable as a glimpse into an autonomous coding future that's still several iterations away from practical reality.

The most interesting question is whether OpenAI will address these limitations or if they're fundamental to their approach. The sandbox model provides safety and verification but creates friction that undermines the productivity promise of AI coding.

For developers already comfortable with their tools, Codex Web isn't compelling enough to switch - yet. But this space is moving incredibly fast, and I wouldn't be surprised if these limitations are addressed sooner than we expect.

What's your experience with Codex Web or other AI coding tools? I'd love to hear from others who've tried it in different contexts.

Troubladore

Jun 3

Thank you for this article. I appreciate that you provide a "state of the state" overview, examples of what worked for you, and conclusions. I would add references to some resources that helped you get started using the tool, to help others shortcut their productivity. For example, this video helped me get oriented to Codex, understand it more deeply, and be productive more quickly: https://www.youtube.com/watch?v=TYe4pi7_svM

I have no relationship with the video author, I just submit it as an example that packs in a lot of 'getting started' knowledge into one watch.

Expand full comment