Dialogue Coding as a Generative Methodology for Non-Coder Specification and Audit of Complex Systems

Uri Vainberg
Jan 8
7 min read

Uri Vainberg

Independent Researcher and Developer BiblioCave (www.bibliocave.com)

Corresponding author: unifiedgongfu@gmail.com

Abstract

This paper analyzes Dialogue Coding - an elaborate form of AI-mediated software creation - as a formal methodology for building complex, multi-file software. Dialogue Coding views the prompt interface as an iterative, instructional discussion between the human architect and the generative model. The case study involves BiblioCave, a multi-file Python desktop application architected and audited by a person with a STEM background but no traditional programming skills. The central finding is that by shifting the human role from mechanical syntax writing to Prompt-Based Architectural Auditing, the methodology enables rapid, bottom-up construction of deep system architecture. The process leverages the AI’s failed code execution as precise diagnostic feedback for refining the architect’s core specification, proving the viability of this methodology for complex software construction by non-coders.

1. Introduction: Decoupling Architecture from Execution

The traditional barrier to complex software creation is proficiency in mechanical coding. The “Dialogue Coding” methodology proposes a paradigm shift: decoupling architectural specification from coding execution through rigorous use of Large Language Models (LLMs) [1], [2].

This study documents the creation of a multi-file Python software built entirely by an architect whose strengths lie in outlining, system thinking, and spotting flaws, rather than computer science or coding. The objective is to validate that the architect's maximum strength (architectural logic, UI/UX, feature specification) can be utilized while the AI handles the mechanical complexity and syntax, thus empowering non-coders to create functional, complex software. This AI-mediated approach focuses the human effort on diagnosing and refining the architecture via prompt iterations, a method of iterative prototyping accelerated by LLM-driven error diagnosis treating LLM execution failures as diagnostic feedback rather than mere bugs [3], [4].

2. Methodology: The Dialogue Coding Pipeline

BiblioCave development followed a strict bottom-up, error-driven process in which the human role was limited to high-level system specification and architectural audit, treating the AI as the Code Executor.

2.1 Initial Specification and UI/UX-First Construction

The process began with graphical sketching of the UI (wireframing) using manual tools (a pen and paper). This graphical skeleton was used to produce a full, meticulous characterization of the system, defining all necessary entities, properties, database information, controls, hierarchy, and the intended UI/UX (top bar, ribbon, workspace divisions, etc.).

The development sequence was strictly UI/UX-First Construction. The architect prioritized building the visual and functional frame first. This approach ensures the database implementation has a concrete, testable UI/UX context to reference. The architect prioritized the visual and functional frame, allowing the user-facing requirements to dictate the underlying architectural needs. This approach contrasts with traditional database-first methodologies.

2.2 The Auditing and Error-Driven Design Loop

The architect utilized LLMs capable of reading the entire codebase, enabling direct, context-aware prompt-based auditing of the multi-file Python structure. This necessitated the use of tools that maintained complete system context. The process was highly iterative and relied on the AI's execution failures to guide architectural refinement.

The methodology adhered to the following human-led sequence:

Planning and Outline: Detailed conceptualization of the features and structural requirements.
Bottom-Up Feature Construction: Building the foundational data and core architectural components first, then layering features (e.g., building basic entities before complex relational links).
Limits Testing (The Vast Prompt): A deliberate, unsupervised prompt was occasionally given to the AI to observe the resultant code and uncover the generative model's internal limits and architectural misinterpretations. This provided critical insight into necessary prompt constraints.
Supervised Refinement: The feature prompt was then refined and broken down into smaller, sequential steps, ensuring controlled, supervised generation.
Step-by-Step Validation: Each generated code chunk and feature was immediately tested by running the Python software, ensuring execution matched the architectural specification. This is a form of code auditing achieved through prompt refinement rather than automated debugging.

Execution failures were interpreted as evidence of incomplete or ambiguous architectural specification in the prompt, not as code errors [3],[5]. This turned traditional debugging into architectural refinement.

Example of Dialogue Coding Specification (Entity Connectivity): The interactive and instructional nature of Dialogue Coding is exemplified by the prompt required for entity connectivity. The architect did not request raw code, but mandated specific architectural behaviors, often discussing both technical components and data structure:

"Implement connectivity logic for linear and non-linear entities. Crucially, the connection itself must be treated as a database entity (ConnectionID, StartNodeID, EndNodeID) to maintain stability during object movement. Linear items must have snapping points every 50px. Non-linear items (Notes, MiniFrames) must expose 9 edge points. Snapping occurs when a drag operation is within 25px of a point. The connection must be permanent until a user explicitly breaks it; no automated breaking. No backend algorithm to break or search or adjust. Use Qt's signal communication for interaction logic.

Example of Dialogue Coding Specification (Maps Module): To demonstrate the complexity of prompting a full module with integrated UI, data, and I/O requirements, the Maps module specification was provided in a single, cohesive prompt:

"Next, build the Maps Module. This requires a 4-layered graphical system (Topography, Elements, Locations, Borders) with a hexagonal grid overlay. The Map entity must utilize a mapstorage property to save its state both as a standard PNG file and a proprietary layered format. Double-clicking the Map MiniFrame must open the specialized window8 editor. The architect is specifying the UI/UX colors to match window3 defaults. This complex prompt integrates file I/O, graphical structure, data storage, and UI flow."

This is followed by iterative prompts, treating the AI's output as part of an ongoing, structural discussion.

2.3 Time Scaling and Empirical Velocity

The methodology demonstrates extreme compression of development time. The overall ten-month timeline included significant non-productive periods and initial architectural setbacks:

· Initial Planning: Two months dedicated to outlining and initial document writing (considered a vital part of the architectural net time).

· Idling/Malfunction Loss: Two months of idling due to AI malfunctioning (e.g., service instability or unavailability).

· Inefficiency Loss: An estimated summed total of about two months due to using non-context-aware AI, leading to derived inefficiency and necessary architectural reboots.

The net, focused development time to build the complete, multi-file desktop application—from zero knowledge to a fully functioning product, including site deployment and write protection—was approximately six months This speed was critically achieved once the architect began working with a context-aware LLM capable of "Prompt-Based Architectural Auditing." The 6-month timeline validates the "rapid" nature of the methodology compared to the 6-12+ months typically required for a complex application using traditional, manual coding, especially when considering the architect had no prior software design and engineering background, and was working completely solo[6], [7].

2.4 Architectural Diagnosis via Failure

Architecture failures were the sole cause for several reboots. Such a failure was detected when the architect encountered a loop in bug fixing: failing to fix one bug, creating a different bug, and repeating this loop became a signature for major architecture failure. These flaws obligated a reset of the entire construction of the software, or at the very least everything besides the basic UI frame.

An example for such a failure was detected during the attempt to build interactive linear entities (lines, arrows) without a defined data layer. The AI-generated code could not maintain connection stability while objects were moved. This failure instantly served as diagnostic feedback that forced the architect to redefine the database schema, ensuring connections were treated as data, relating to entities. Another example for a reset cause was the need to introduce entities’ relations (parent-child entity relationships) so that complex features could be made (cascading operations).

3. Discussion: The Cognitive Shift

The Dialogue Coding methodology redefines the human role in software creation, focusing on abstract thought rather than mechanical task execution – from syntax production to architectural auditing [5], [8].

3.1 Shift in Cognitive Load

The architect's cognitive load shifted entirely from debugging syntax errors (which the AI handles) to auditing architectural integrity—a task aligned with high-level system thinking. The human is focused on the correctness of the relational structure (e.g., designing parent-child relationships to facilitate cascading operations) rather than the mechanics of the code required to implement it. This shift from coding to auditing is a recognized change in the software development paradigm.

3.2 Dialogue Coding as Formal Specification

The process proves that the generative prompt functions as the formal architectural specification. The difficulty of the Dialogue Coding process lies in the high skill required for prompt precision and troubleshooting. Errors in the generated code are not code bugs; they are flaws in the architectural specification encoded in the prompt. The process is thus not about programming, but about continuously refining system logic through an iterative dialogue.

3.3. Methodological Reflection and Philosophical Stand

Given the lack of programming experience, the architect had to learn and make up the working procedure "on the fly." Now, after the lessons have been learned, he would have started with the actual database structure parallel to, or one step behind, UI/UX planning (conceptual data was planned in the data-informed UI/UX). Nevertheless, not only did the UI/UX driven planning prove itself as the correct approach, the pure database driven approach is still causing a cognitive dissonance, since database should serve UI/UX, that in turn, are the sole purpose of making user centered software. UI/UX first approach following by database construction ended up leading to coherent, cohesive planning.

3.4 Velocity Trade-Off

Traditional programming skills would have accelerated polish and visual quality, but were demonstrably unnecessary for achieving a complete, shippable product [7].

4. Results

The methodology produced BiblioCave — a commercial-gradesoftware, that is also the proof-of-concept for Dialogue Coding.

A free trial is available for Windows at www.bibliocave.com.

5. Conclusion

Dialogue Coding provides a validated, reproducible methodology enabling individuals without programming training to specify and audit complex software systems using only natural-language dialogue with LLMs and rigorous architectural thinking.

References

[1] L. Chen et al., "A Survey on Evaluating Large Language Models in Code Generation Tasks," arXiv preprint arXiv:2408.16498, 2024.

[2] S. Kang, J. Yoon, and S. Yoo, “Large language models are few-shot testers: Exploring LLM-based general bug reproduction,” in Proc. IEEE/ACM 45th Int. Conf. Softw. Eng. (ICSE), 2023, pp. 2312–2324.

[3] M. Jha, J. Wan, H. Zhang, and D. Chen, "A Reinforcement Learning Framework for Code Verification via LLM Prompt Repair," in Proc. Great Lakes Symp. VLSI (GLSVLSI), Jun. 2025, pp. 1–6, doi: 10.1145/3716368.3735300.

[4] P. Straubinger, M. Kreis, S. Lukasczyk, and G. Fraser, "Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging," Empir. Softw. Eng., 2025, arXiv:2503.08182.

[5] J. Keim, A. Kaplan, A. Koziolek, and M. Mirakhorli, "Does BERT Understand Code? – An Exploratory Study on the Detection of Architectural Tactics in Code," in Software Architecture (ECSA 2020), Springer, Cham, 2020, pp. 220–228, doi: 10.1007/978-3-030-58923-3_15.

[6] S. Zhang, J. Wang, G. Dong, J. Sun, Y. Zhang, and G. Pu, "Experimenting a New Programming Practice with LLMs," arXiv preprint arXiv:2401.01062, 2024.

[7] F. Lin, D. J. Kim, and T.-H. (P.) Chen, "SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents," arXiv preprint arXiv:2403.15852, Mar. 2024.

[8] H. An, Y. Kim, W. Seo, J. Park, D. Kang, C. Oh, D. Kim, and S. Lee, "AIAP: A No-Code Workflow Builder for Non-Experts with Natural Language and Multi-Agent Collaboration," arXiv preprint arXiv:2508.02470, Aug. 2025, doi: 10.48550/arXiv.2508.02470.