AI Code Generation: Quality and Security

FlowRidge

Definition

Artificial Intelligence (AI) code generation is the use of AI — primarily Large Language Models (LLMs) and specialised code models — to produce, complete, transform, or refactor source code. Tools include Microsoft GitHub Copilot, Amazon CodeWhisperer, Google Gemini Code Assist, Anthropic Claude, and many open-weights alternatives. Adoption is widespread: surveys consistently report that a majority of professional developers use AI coding tools regularly. The governance question for engineering organisations is therefore not whether to permit AI code generation but how to manage the quality, security, intellectual property, and licence risks it introduces while capturing the productivity benefits it promises.

This article describes the principal risk categories AI code generation introduces, the governance and operational practices that mitigate them, and the cultural shifts engineering organisations must navigate as AI becomes a normal participant in code production.

The Productivity Case and Its Caveats

AI code generation reliably accelerates routine engineering work: boilerplate generation, syntax recall, test case scaffolding, and exploratory prototyping. Multiple studies, including the GitHub research at https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/, document material productivity improvements for these tasks.

The productivity benefit is uneven. Empirical studies including the METR randomised trial at https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ have shown that for experienced developers working on familiar codebases, AI assistance can actually reduce productivity, even though the developers themselves perceive the opposite. The reasons include time spent reviewing AI output, time spent integrating AI suggestions, and cognitive overhead of context-switching. Governance should not assume uniform productivity gain.

Beyond productivity variance, AI code generation introduces several risk categories that traditional engineering tools do not.

Risk Categories

Code Quality Risk

AI-generated code is sometimes plausible but wrong: subtle bugs, edge case failures, performance pathologies, or violations of project conventions. Code that compiles and passes the tests the AI generated for itself can fail in production scenarios.

Security Vulnerability Risk

AI-generated code can include security vulnerabilities: SQL injection, cross-site scripting, insecure deserialisation, weak cryptographic patterns. Studies including academic research on Copilot suggestions have documented elevated rates of certain vulnerability classes in AI-generated code. The OWASP Top 10 patterns at https://owasp.org/www-project-top-ten/ remain relevant — perhaps more so when developers accept AI suggestions less critically than handwritten code.

Intellectual Property and Licence Risk

AI code models are trained on large corpora of public code, much of it under licences that impose conditions on derivative works. Generated code may resemble training-corpus code in ways that create licence-compatibility risk for the consuming codebase. The U.S. Copyright Office Report on Copyright and AI at https://www.copyright.gov/ai/ describes the unsettled legal landscape; practical risk management requires defence-in-depth.

Confidential Information Exposure

When developers query AI tools, the prompts may include proprietary code, internal architecture details, or business-confidential information. Vendor terms of service vary on whether queries can be used for model training; insufficient configuration creates exposure.

Skill Atrophy

Long-term over-reliance on AI generation can erode developer skill in ways that surface only in the situations AI cannot handle well — debugging novel failures, designing new architectures, working with unfamiliar technologies. The cultural and capability dimension is harder to measure but worth attention.

Governance Patterns

AI Tool Approval

A formal approval process for AI coding tools, including security review, licence review of the tool itself, and configuration of organisational settings (code retention, training opt-out, model selection). Approved tools are documented; use of unapproved tools is policy-violation.

Acceptable Use Policy

A written policy covering what can be sent to AI tools (and what cannot), expectations for review of AI output, attribution and retention requirements for AI-generated code, and the consequences of policy violation. The policy should be specific enough to be actionable.

Configuration Standards

Approved tools deployed with organisational configuration: enterprise model variants where available, training opt-out enabled, code retention disabled where applicable, IP indemnification provisions activated.

Code Review Discipline

AI-generated code reviewed at least as carefully as handwritten code, with explicit indication in the pull request that AI was used. The reviewer pattern of “did the human author understand this?” applies to AI-generated submissions even when the same human is the submitter.

Dependency and Provenance Tracking

AI-generated code that introduces new dependencies, replicates patterns from external sources, or implements known algorithms should carry the same provenance documentation as code authored from external sources directly.

Security Testing

AI-generated code subject to standard security testing pipelines: static analysis, dependency scanning, dynamic testing. The OWASP Application Security Verification Standard at https://owasp.org/www-project-application-security-verification-standard/ provides reference test categories.

Operational Practices

Differentiated Risk Tier

Different code categories warrant different oversight intensity. Code in critical systems (authentication, payment processing, safety-critical control) warrants more careful AI review than code in throwaway internal tooling. The risk tiering should be explicit.

Pair Review Pattern

Some organisations require that AI-generated code be reviewed by a developer who did not author it before merge. The pattern adds friction but catches issues that author review misses.

Test Generation Discipline

Tests generated by AI should not be the only validation of code generated by the same AI. The pattern of “AI wrote the code and AI wrote the tests” produces tests that pass for the wrong reasons. Independent test design — even if also AI-assisted — provides better validation.

Model Selection

Different AI models have different code quality and security profiles. Model selection should consider security testing results, language coverage, and integration with the organisation’s development environment. Re-evaluation should happen as models update.

Logging for Investigation

Logging of AI tool usage at sufficient granularity to support post-incident investigation: which developer used which tool, what context was provided, what suggestion was accepted. The logs need not capture every keystroke but should capture material patterns.

Vendor Contract Provisions

Vendor contracts should address: training data exclusion of customer prompts, IP indemnification for generated code, audit rights, data residency, security standards, breach notification, and termination provisions.

The Indemnification Question

Several major AI coding tool vendors offer IP indemnification for code generated through their service. The terms vary materially. Some indemnify only when the customer enables specific filters; some exclude open-source dependencies; some have caps. Reading the indemnification terms carefully is essential. The U.S. Federal Trade Commission has signalled enforcement attention on AI claims at https://www.ftc.gov/business-guidance/blog including indemnification claims that prove illusory.

Even with indemnification, the operational consequences of an IP dispute (litigation discovery, codebase remediation, customer notification) typically far exceed any direct legal cost. Indemnification reduces but does not eliminate IP risk.

Cultural Considerations

AI code generation changes engineering culture in ways governance should anticipate.

Skill Development

Junior developers who learn programming with AI assistance may develop differently than those who learned without. Onboarding and training programs should address this explicitly, including periods of AI-free practice for skill foundation.

Author Identity and Credit

Pull requests authored with substantial AI assistance raise questions about attribution, performance evaluation, and code ownership. Explicit norms should be established.

Code Review Workload

If AI generates more code, code reviewers handle more code per unit time. Without proportional adjustment to review capacity, review quality degrades.

Commitment Practices

Cultural norms around what constitutes a unit of work, what is committable, and what crosses the line into “automated code generation that should be controlled differently” need to be established locally.

Common Failure Modes

The first is uncontrolled tool sprawl — developers using whatever AI tool they prefer, with no standard configuration or oversight. Counter with formal tool approval and configuration standards.

The second is security blind spot — AI-generated code receives less security review on the assumption that AI “knows what it’s doing.” Counter by treating AI-generated code as untrusted input requiring full security review.

The third is prompt confidentiality leakage — developers pasting proprietary code into AI tools without realising the implications. Counter with clear policy, technical controls (DLP, network segmentation), and training.

The fourth is test theatre — AI-generated tests that look thorough but exercise only the AI-generated implementation, missing the real edge cases. Counter with independent test design and code coverage analysis that goes beyond statement coverage.

Looking Forward

The next article in Module 1.30 turns to AI for software testing — a related but distinct discipline that shares some governance considerations with code generation and adds others specific to test strategy and quality assurance.