Background

What I wanted to see from a coding model was not whether it could produce a single UI fragment. I wanted to know whether it could maintain consistency across a multi-page, multi-file structure without losing coherence by the end. A task that only asks for index.html typically ends with a hero section and some card components.

To test the coding ability of a Qwen3.5-class model, I gave it a concrete web production task: a static marketing website for a dental clinic. Not a single-page landing page, but a six-page build with independent HTML files per top-level navigation item. The tech stack was limited to HTML, Tailwind CSS, and Alpine.js, with no build step and no SPA approach allowed.

The constraints were clear: use only these three technologies, avoid a build step, support multilingual behavior at the UI level, and preserve maintainability and SEO. When the delivery window is short, the important decision is not how many tools to add, but how much complexity to avoid.

Objectives

  1. Evaluate whole-site design ability for a static multi-page build
  2. Check responsiveness to accessibility requirements (ARIA attributes, focus management, contrast)
  3. Examine implementation accuracy of lightweight front-end interactions (Alpine.js)
  4. Record agent-like behavior around error recovery and environment adaptation

Test Design

Technical Constraints

  • Plain HTML files only (no build step)
  • Tailwind CSS via CDN
  • Alpine.js via CDN (lightweight interactions only)
  • No external UI component libraries
  • No reuse of reference site text or imagery

Required Page Structure

FileContentKey Interactions
index.htmlHero, service cards, doctor previews, news, CTANone (static)
services.html12+ treatment cards, category filterAlpine.js filter, FAQ accordion
doctors.html6+ profilesExpandable detail UI
info.htmlPricing table, insurance, payment methodsEstimate modal
visit.htmlFirst-visit flow, booking funnelValidation states
access.htmlAddress, transit, mapRoute-mode tabs (car/train/walk)

Shared Layout Requirements

  • Header: clinic name, primary nav, language-switch UI
  • Mobile nav: Alpine.js hamburger menu, focus management, ESC-to-close
  • Book Appointment: in-header on desktop, fixed bottom action on mobile
  • Footer: clinic info, privacy/terms links, copyright

The value of this prompt design is that page count is fixed, Alpine.js interaction types are numerous, and accessibility requirements are explicit. It makes it easy to compare “how completely, how coherently, and how cleanly” different models finish the same task.

Execution Results

Error Recovery Behavior

The model did not start cleanly. The first Create directory site attempt failed with Path to create was outside the project.

However, the recovery was straightforward:

  1. Called list_allowed_directories
  2. Confirmed / was listed as an allowed root
  3. Retried with explicit absolute paths: /Users/ksh3/Development/sandbox7/site and site/assets

It did not ignore the failure or rephrase it away. It checked the allowed scope and continued. As agent behavior under tool constraints, that is reasonable.

Generated Files

The model wrote each HTML file one at a time, added README.md, listed the directory to verify the structure, and appended assets/placeholder.svg. The processing order was:

  1. Create directory structure
  2. Generate each page (index -> services -> doctors -> info -> visit -> access)
  3. Write README
  4. List files to confirm existence
  5. Add supporting asset

Self-Reported Content

PageReported Content
index.htmlHero, 8 service cards, 6 Why Choose Us points, 4 doctor previews, 3 news items, hours/insurance summary, CTA band
services.html16 treatment cards, category filter, 6-item FAQ accordion
doctors.html6 doctor profiles with expandable details
info.html8-row pricing table, cost-breakdown modals, insurance and payment methods
visit.html6-step first-visit timeline, reservation methods, validation states
access.htmlContact details, hours, route-mode tabs

Page-Level Implementation Details

The six-page structure maps well to the content expectations of a dental clinic website. Splitting the site by role makes the information easier to maintain and easier to navigate.

PageMain ElementsAlpine.js Interaction
Home (index.html)Hero section, 8 service cards, 4 strengths, doctor preview, news, office hoursMobile navigation and language-switching UI
Services (services.html)14 treatment cards, 4 categories, FAQ sectionClient-side category filtering and FAQ accordion
Doctors (doctors.html)6 doctor profiles, specialties, languages, qualificationsExpand/collapse behavior for profile details
Info (info.html)8 pricing items, insurance support list, payment methodsDynamic estimate modal showing cost breakdown
Visit (visit.html)6-step first-visit flow and booking guidanceBooking request form with validation
Access (access.html)Contact info, map preview, nearby stations, parking infoRoute guidance toggle for car, train, and walking

The Home page concentrates first-contact information: hero, service overview, strengths, doctor preview, news, and office hours. Putting mobile navigation and language-switching UI there makes sense because this is the highest-traffic entry point.

The Services page organizes treatment cards into four categories with an FAQ section. Client-side category filtering is a clean example of where Alpine.js adds real value without changing the overall architecture.

The Doctors page lists profiles with specialties, languages, and qualifications. The expand/collapse interaction balances density and readability. For a clinic site, staff information strongly affects trust, so this section carries more weight than a generic profile listing would.

The Info page focuses on pricing, insurance, and payment methods, with a dynamic estimate modal for cost breakdowns. Even in a static-first site, that kind of targeted interaction helps answer sensitive questions without forcing a heavier frontend stack.

The Visit page outlines a first-visit flow and explains the booking process. A booking request form with validation turns it into an actual conversion path rather than a purely descriptive page.

The Access page gathers contact information, a map preview, nearby stations, and parking details. The route toggle for car, train, and walking directly reduces friction for visitors planning a visit.

Quality and Accessibility Validation

For responsive design, the booking button is fixed at the bottom of the screen on mobile and integrated into the header on desktop. That keeps the main action visible while adapting cleanly to different layouts.

For navigation, the Alpine.js hamburger menu supports focus management and closing with the ESC key. Many interfaces stop at visual toggling and never address keyboard behavior.

Semantic HTML uses header, nav, main, and footer, with heading order correct from h1 through h3. On a content-heavy clinic site, those basics directly affect both accessibility and maintainability.

On performance, the approach avoids large external UI libraries, keeping load time low and preventing layout shift. A buildless architecture only stays lean if dependency choices remain disciplined.

Execution Metrics

The full process took about 18 minutes and 2 seconds. For a pass that includes design, implementation, and validation, that is a meaningful data point.

  • Total generated code covered HTML, CSS utility classes, and JavaScript logic for six pages
  • Diagnostics reported no errors or warnings across the HTML files
  • Asset management relied on local SVGs and inline icons to minimize outside dependencies

Those numbers support the main observation: with a clear scope and a constrained stack, a practical delivery state is reachable quickly without relying on a larger frontend toolchain.

Limitations of These Results

What this note can confirm is that the model parsed the specification and reported completion of all required files. The actual generated HTML was not preserved in the source, so the following essential aspects require hands-on inspection:

  • Whether ARIA attributes are actually present
  • Whether focus management is implemented
  • Whether Tailwind class composition stays tidy
  • Whether Alpine.js state management is correct

Impressions

This prompt works not only as a Qwen3.5 evaluation record but also as a reusable cross-model benchmark for front-end coding. The fixed page count, variety of required interactions, and explicit accessibility requirements make fair model-to-model comparison straightforward.

To make this kind of evaluation note more useful in the future, I should also preserve representative excerpts from the generated files or a structured review checklist. Self-reported output alone is not enough for quality judgment. Actual file contents or screenshots would enable more concrete comparison.

Future Work

  • Run the same prompt on other models for side-by-side comparison
  • Preserve generated files and quantitatively evaluate accessibility, interactions, and class composition
  • Catalog failure modes: multi-page structure collapse, accessibility omissions, Alpine.js under-implementation, inconsistent page shells, thin README documentation
  • If expanding to translated content, document the translation boundary and update workflow separately