← All Articles

The Hidden Risk of Incomplete AI Answers: Why AEC Cannot Afford Half-Truths

10 January 2026 12 min read Governance & Risk Share

The most dangerous moment in the deployment of artificial intelligence is not when systems fail catastrophically. It is when they deliver partial truths with complete confidence. In the architecture, engineering, and construction (AEC) sectors, this distinction matters profoundly. Incomplete information doesn’t just underperform. It kills.

The Mechanics of Incomplete Information

Current large language models and generative AI systems operate through statistical pattern matching rather than true comprehension or reasoning. When confronted with an incomplete dataset, a novel scenario, or a knowledge gap, these systems don’t acknowledge absence. They fabricate. This isn’t a bug that can be fixed by model scaling—it’s inherent to how probabilistic AI functions.

The Alarming Data

Research demonstrates this consistently:

ChatGPT-class models hallucinate between 33% and 79% of the time, depending on task complexity

In code generation—AI assistants maintain accuracy rates between 50-65%, with approximately 1 in 5 suggestions containing factual errors

The problem intensifies as task complexity increases. Medical AI systems and engineering-domain applications show similar degradation patterns.

The Real Problem: Confidence Without Accuracy

The critical failure isn’t the error rate itself. It’s the confidence interval for incorrect answers.

These systems don’t signal uncertainty. They present conjectures as certainties because the model optimises for fluency and coherence rather than factual verification.

When an AI can’t determine whether a calculation is correct, it doesn’t say “I don’t know.” It states “The answer is X” with the same textual confidence as when the answer is genuinely X.

Why AEC Is Categorically Different

Construction and engineering workflows are not information-seeking conversations. They are deterministic domains where half-answers are half-failures.

The Stakes in AEC

In AEC, the consequences of incomplete information are absolute:

Incomplete structural load calculations → the structure fails

Omitted soil conditions in geotechnical assessments → foundations fail

Overlooked edge cases (seismic activity, thermal expansion, material degradation) → irreversible operational and safety consequences

Generative AI systems are architected for probabilistic inference. They generate varied outputs for identical inputs because they operate through sampling from probability distributions. This adaptive variability—celebrated in creative and exploratory applications—is a liability in deterministic domains.

Construction companies have recognised this risk. The industry has adopted a cautious “wait and see” approach precisely because AI lacks the reasoning capacity required for safety-critical applications. The phrase often used is revealing: “Stakes are too high when crafting structures where people will live or work, and the industry cannot afford to rely on AI outputs that may contain wildly incorrect information.”

The Liability Paradox

Here lies a structural problem that extends beyond technical capability: AI assumes no responsibility for its outputs. Responsibility remains with the operator, the deployer, and the engineering firm.

Under existing law in most jurisdictions, construction companies have a legal duty of care. If an AI system is deployed without rigorous testing, validation, and human oversight, and if that system fails to detect a structural hazard or provides an incomplete analysis of load-bearing capacity, negligence claims attach to the firm deploying the system—not to the system itself.

Product liability may shift to the software developer if a defect in the AI system directly causes failure. However, construction firms remain subject to strict liability if they deploy the system without conducting sufficient validation or adhering to usage guidelines.

This creates an accountability asymmetry: the firm bears unlimited liability whilst the AI system bears none. The system cannot be held negligent. It cannot defend its reasoning. It cannot provide testimony in court about what information it considered or rejected. The operator is entirely responsible.

Contracts with AI vendors rarely resolve this issue adequately. Most vendor agreements include clauses that disclaim liability, require indemnification from the deployer, and restrict the types of claims construction firms can bring. In practice, when an AI system fails catastrophically, the deploying organisation faces the legal, financial, and reputational consequences alone.

The Regulatory Reality

Emerging regulatory frameworks are beginning to acknowledge this danger, though implementation lags behind urgency.

The European Union AI Act imposes specific obligations on providers and deployers of “high-risk” AI systems—a classification that includes AI systems used as safety components in construction and built-environment systems. The regulatory requirements are explicit:

Risk management system. Providers must establish iterative risk management throughout the entire AI system lifecycle. This includes identifying known and reasonably foreseeable risks to safety and health; implementing appropriate, targeted measures to address those risks; and conducting post-market monitoring analyses to identify emerging risks.

Data governance. Training, validation, and testing datasets must be representative, sufficiently complete, and systematically assessed for bias and error. This is not optional verification; it is mandatory pre-deployment assurance.

Accuracy and robustness. High-risk systems must achieve an appropriate level of accuracy, robustness, and cybersecurity, with declared accuracy metrics provided in instructions for use. Systems must perform consistently throughout their operational lifecycle and remain resilient to errors, faults, and inconsistencies.

Explainability. The system must be transparent enough that humans can understand, validate, and justify its decision-making process. Explainability is not a post-deployment nice-to-have; it is a legal requirement for accountability.

Continuous monitoring. Systems that continue learning after deployment must eliminate or reduce the risk of biased outputs creating biased inputs for future operations. Feedback loops must be documented and mitigated.

These requirements exist because regulators understand the fundamental mismatch between probabilistic AI and deterministic requirements.

What Deterministic Demands

The alternative to probabilistic AI is a deterministic architecture: systems that produce identical outputs for identical inputs, operate according to explicitly defined rules, can be audited and traced, and explicitly acknowledge when they lack sufficient information to produce an answer.

Deterministic systems cannot perform creative generation, speculative design, or exploratory analysis. They excel at structured tasks that require consistency, such as security scanning, compliance checking, rule-based risk assessment, and automated verification against explicit standards. More critically, they provide traceability. An engineer can inspect every decision point, understand every inference, and document every assumption.

This is computationally simpler and intellectually less impressive than contemporary generative AI. But it is fundamentally more honest. When a deterministic system cannot complete a task, the operator knows why. When it produces an output, the operator can explain the reasoning and defend it before a regulator or in court.

For AEC applications, the distinction is not academic. Structural design optimisation, load-bearing verification, material selection validation, and safety assessment must operate in this deterministic mode. They can be augmented by generative AI for exploratory phases—suggesting design variants, identifying novel material combinations, and generating planning scenarios. But the moment the output enters the domain of safety-critical decision-making, it must transition to a deterministic, auditable, and human-reviewable space.

The Non-Negotiable Requirements

Before deploying any AI system in AEC contexts, organisations must establish explicit, verified, and enforceable safeguards:

Structured human-in-the-loop oversight. AI outputs do not deploy directly into construction or engineering decisions. Instead, qualified domain experts must review, validate, and either approve or reject each material output. Research in critical infrastructure AI systems shows that structured human oversight increases error detection from 18.9 per cent to 91.5 per cent. This is the difference between acceptable and unacceptable risk. 2.

Explicit accuracy and completeness criteria. Before deployment, define what “complete” means for each use case. What datasets must be present? What edge cases must be addressed? What accuracy threshold is required for the system to be considered reliable? These criteria must be documented, measured, and verified before the system operates at scale. 3.

Traceability and auditability throughout the lifecycle. From data ingestion through model training, validation, deployment, and ongoing operation, the entire system must be logged, documented, and auditable. This is not for regulatory compliance alone; it is an operational necessity. When the system produces unexpected output, the audit trail allows engineers to understand why and either correct the system or reject its conclusion. 4.

Explainability is tied to every material decision. If the system recommends a design modification, specifies a material, or flags a risk, the recommendation must be accompanied by a clear, technically defensible explanation of the reasoning. This explanation must be understandable by domain engineers without AI expertise. Techniques such as SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-Agnostic Explanations), and feature attribution methods must be embedded in deployment, not applied retrospectively. 5.

Continuous monitoring and post-deployment assurance. Deployment is not the end of verification. Live performance must be continuously monitored against expected behaviour. Deviations must trigger investigation, documentation, and corrective action. Systems that learn or adapt after deployment must include mechanisms to prevent feedback loops where biased initial outputs contaminate subsequent training data. 6.

Clear contractual responsibility allocation. Contracts with AI vendors must explicitly define:

Who is responsible for system validation and testing

Who bears liability if the system fails to detect a hazard or provides an incomplete analysis

What warranties does the vendor provide regarding accuracy, completeness, and suitability for the intended use

What indemnification protections does the deploying organisation retains

What audit rights does the deployer maintain over model training and data

Vendor-supplied liability disclaimers are not sufficient. They do not eliminate the deployer’s own legal and ethical obligations to exercise due diligence.

The Accountability Imperative

The reason these requirements exist is not to impose a regulatory burden for its own sake. It is because the stakes are genuinely unforgiving.

When AI systems fail in low-consequence environments—generating suboptimal content, missing minor analysis points, creating inefficiency—the cost is inconvenience and rework. When AI systems fail in AEC environments, the consequences can include injury, death, or catastrophic infrastructure failure. This is not hyperbole. Structural failures kill. Geotechnical miscalculations displace communities. Safety assessment failures create conditions in which workers die unnecessarily.

AI will not be held responsible for these outcomes. The system will not appear in court. It will not face negligence claims. The operator will. The firm will. The engineers will. The individuals who deployed and supervised the system will face scrutiny, liability, and possibly criminal negligence charges if the failure was foreseeable and inadequately mitigated.

This is not a reason to reject AI in AEC. It is a reason to be uncompromising about how it is deployed, verified, and governed.

The Path Forward

The path is neither “embrace AI without reservation” nor “abandon AI entirely.” It is: deploy AI in AEC only within a governance framework that acknowledges the domain’s deterministic safety requirements and establishes verification practices that exceed regulatory minimums.

This means:

Using probabilistic generative AI exclusively for exploratory, non-binding analysis.

Transitioning to deterministic systems for all safety-critical decisions.

Implementing structured human-in-the-loop validation for every material output.

Maintaining complete audit trails and explainability mechanisms.

Continuously monitoring deployed systems and responding to deviations.

Allocating clear contractual and operational responsibility for failures.

Training teams to understand the difference between “the AI recommended this” and “this recommendation is complete and safe and has been independently validated.”

The last point deserves emphasis. The most dangerous phrase in AI-enabled engineering is “the system said so.” That is not professional engineering judgment. That is an abdication of responsibility. The required phrase is: “We have reviewed the system’s analysis, considered the underlying data and logic, conducted independent verification, and determined this conclusion is sound.”

This requires more work than blind reliance on AI output. It requires discipline. It requires organisations deploying AI to maintain genuine expertise in the domains in which they deploy it. It requires pushback against the narrative that AI replaces expert judgment.

But it is the only approach that makes sense when lives depend on getting the answer right, and when no one else will bear the responsibility if the answer is wrong.

The guard cannot be lowered. It must be installed, tested, documented, and continuously monitored. AI is a powerful tool. But in AEC, it is a tool that requires more, not less, expert human oversight.


References:

Talkspace (2025). The Dangers of ChatGPT Hallucinations. OpenAI testing documented hallucination rates of 33-79%.

Augmentcode (2025). Deterministic AI for Predictable Coding. AI coding assistants achieve 50-65% accuracy with 1 in 5 suggestions containing factual errors.

AI21 Labs (2025). Specific risks of AI hallucinations. LLMs optimised for fluent language rather than factual verification.

Kubiya AI (2025). Deterministic AI Architecture: Why It Matters. Deterministic systems produce identical outputs for identical inputs; probabilistic systems generate varied responses.

Graitec (2024). AI Meets Sustainability: Innovations in AEC and MFG. Construction firms adopting cautious “wait and see” approach due to unreliability risks.

Construction Legal Services (2025). AI and the Construction Industry. Liability question complex; firms need contractual protections and clear responsibility definitions.

Insulation.org (2025). Legal Risks of AI in Construction. Negligence claims possible if insufficient testing or blind reliance; strict liability applies if deployed without adequate validation.

Keysight Technologies (2026). Building Trust Into AI for Safety Critical Systems. EU AI Act and ISO PAS 8800 mandate transparency, traceability, and risk-based validation.

European Commission (2024). Classification Rules for High-Risk AI Systems. Risk management system mandatory throughout entire lifecycle.

EU AI Act (2024). Article 5-7. Data governance requirements: representative, error-free datasets with bias assessment.

EU AI Act (2024). Article 10-15. Accuracy, robustness, cybersecurity requirements with declared metrics and consistent performance throughout lifecycle.

Port.io (2024). What is Deterministic AI? Deterministic systems operate on predefined rules; identical inputs produce identical outputs.

eajournals (2025). Human-in-the-Loop Architectures for Validating GenAI. Structured HITL validation increased error detection from 18.9% to 91.5% in medical AI.

Journals MRI India (2025). Explainable AI for Critical Infrastructure Monitoring and Control. Feature attribution methods (SHAP, LIME, counterfactual explanations) essential for transparency.

EU AI Act (2024). Article 13(e). Post-market monitoring and continuous feedback loop management mandatory for systems that learn after deployment.