Why AI-built Skunkworks Apps are Becoming the New Enterprise Risk
For many organizations, Claude Code and other AI coding tools are not a technology problem first. They are an execution and governance problem.
The excitement is easy to understand. Anthropic describes Claude Code as an agentic coding tool that can read a codebase, edit files, run commands, integrate with development tools, run tests, and deliver committed code. Anthropic also positions it as an entry point to software development for builders without an engineering background.
That is powerful. It also changes executive behavior.
A senior leader can now describe a scheduling tool, a billing exception tracker, a field service dashboard, or an inventory workflow and see something working quickly. What used to require a business case, architecture review, development resources, integration planning, testing, and change control can now start with a prompt.
The risk is not that executives are experimenting. The risk is that prototypes start becoming production systems before anyone has answered the hard questions: Who owns the data? What is the system of record? How is access controlled? How will it be tested? What happens when IFS, ERP, CRM, payroll, or the integration layer changes? Who supports it when the person who prompted it leaves?
We’ve Seen This Movie Before
The tools are new, but the pattern is not.
During the dot-com boom and the years that followed, organizations built fast. Departments created custom web apps, Access databases, Excel models, Lotus Notes workflows, and heavily customized enterprise applications to solve urgent business problems. Many of those tools were useful. Many also became undocumented, unsupported, and deeply embedded in daily operations.
Years later, companies discovered the real cost. The expensive part was not the original build. It was the maintenance, the hidden business logic, the orphaned integrations, the brittle data dependencies, and the fact that no one left in the organization fully understood how the process worked.
McKinsey has described this “shadow side” of enterprise application development as business-built solutions, often using tools such as Excel or low-code platforms, that lack IT governance and structured development. McKinsey also warns that these shadow applications can create “phantom couplings,” where a shadow app depends on enterprise data without IT knowing about the dependency.
AI coding raises the stakes because it can create more software, faster, with a more convincing user experience.
The Productivity Story is Real but Incomplete
AI coding tools can absolutely improve productivity in the right setting. A Microsoft Research study found that developers using GitHub Copilot completed a controlled JavaScript coding task 55.8% faster than the control group.
But that does not mean every AI-generated application is enterprise-ready. In a 2025 randomized controlled trial by METR, experienced open-source developers working in their own mature repositories took 19% longer when using early-2025 AI tools, even though they believed the tools made them faster.
That gap matters for executives. A working prototype can create the feeling of speed. Production operations require much more than speed.
Enterprise applications are not just screens and workflows. They enforce contracts, entitlements, pricing, approvals, scheduling rules, inventory transactions, financial controls, audit trails, security, mobile offline behavior, reporting, and data integrity. In service and asset-intensive environments, a small workflow mistake can become a technician delay, a billing error, an SLA miss, a compliance issue, or a customer-impacting failure.
The Real Risk is Shadow Operations
The new risk is not “shadow IT” in the old sense. It is shadow operations.
A leader builds a tool to solve a real problem. A team starts using it. Someone connects it to production data. Another person adds a workflow. A scheduler relies on it. A technician updates it. Finance asks for a report from it. Suddenly the business is running on an application that was never designed as part of the operating model.
This is where the cautionary examples matter.
In 2020, Public Health England reported that 15,841 COVID-19 cases were not included in daily reporting because a technical issue in a data load process involved files exceeding maximum size limits. The people received their test results, but reporting and contact tracing were disrupted.
In 2012, Knight Capital experienced a software failure in one of its primary order-routing systems. According to the SEC, Knight accumulated unintended multi-billion-dollar securities positions in about 45 minutes and lost more than $460 million.
These examples are not about Claude Code. They are about a broader lesson: when software touches real operations, governance, controls, testing, and support are not optional.
What Practical Should Look Like
Use case 1: AI-assisted prototyping
The problem: Business leaders often struggle to explain what they need until they can see it.
The practical application: Use Claude Code or similar tools to create prototypes, clickable workflows, mock dashboards, or proof-of-concept automations. Then use those artifacts to improve requirements, validate user journeys, and accelerate design.
The value: Faster alignment without pretending the prototype is the production system.
Use case 2: Governed extensions around enterprise platforms
The problem: Teams often build side applications because the enterprise backlog is too slow.
The practical application: Decide whether the need belongs in IFS configuration, an approved extension, an integration, a report, or a temporary prototype. Define data ownership, APIs, security, testing, and support before production use.
The value: More agility without creating another unsupported system of record.
Use case 3: AI-generated code with enterprise controls
The problem: AI can generate more code than teams can properly review, test, and support.
The practical application: Treat AI-generated code like software. Require architecture review, source control, test coverage, security review, release management, documentation, and named ownership.
The value: Speed with accountability.
This is becoming urgent. Tricentis’ 2026 Quality Transformation Report found that 6 in 10 organizations still report deploying untested code, with 32% saying leadership pressure to prioritize speed over quality is a driver and 30% citing the volume of AI-generated code becoming too overwhelming to fully test.
What to Avoid
The mistake is not using AI coding tools. The mistake is treating generated code as “free software.”
Avoid building around the enterprise application because the backlog is inconvenient. Avoid connecting skunkworks apps directly to production data without ownership and controls. Avoid creating a new source of truth for customers, inventory, assets, contracts, pricing, schedules, or work orders. Avoid skipping testing because the demo looked good. Avoid assuming the person who prompted the application can support it six months later.
OWASP’s LLM application guidance highlights risks such as insecure output handling, sensitive information disclosure, excessive agency, and overreliance. Those risks become more serious when AI-generated applications are connected to operational systems and given authority to act.
How Gogh Helps
Gogh helps IFS customers use AI without losing operational control.
Our view is simple: AI should accelerate discovery, design, testing, knowledge capture, automation, and continuous improvement. It should not create a second, unsupported operating model beside IFS.
We help customers start with the workflow, not the tool. We clarify which processes belong in IFS, which should be handled through approved integrations, which can be supported through reporting or automation, and which ideas are still only prototypes. Then we focus on practical execution: configuration, service management, scheduling, mobile work execution, integrations, data ownership, testing, go-live readiness, adoption, and long-term optimization.
The goal is not to slow innovation down. The goal is to prevent today’s AI experiment from becoming tomorrow’s legacy system.
If your executives, operations teams, or business analysts are experimenting with Claude Code or other AI coding tools, the right question is not “Can we build this?”
The better question is: “Should this become part of how we run the business?”
Start there. Prototype quickly. Govern carefully. Scale what belongs in the enterprise platform. Keep the business moving without creating the next generation of unsupported applications. Contact us for more information.













