AI Products — Hui Wai Kit

01 — The Problem

Why dApp developers kept getting stuck

Blink's thesis was that the bottleneck in Web3 development was not creativity — it was the gap between a developer's idea and a deployable smart contract. The average developer spent hours navigating fragmented documentation, incompatible frameworks, and brittle boilerplate before writing a single line of meaningful code.

The product we built used an LLM pipeline to generate dApp code from natural language intent, grounded against a curated library of verified smart contract patterns. The challenge was not just getting the model to produce code — it was making the output trustworthy enough that a developer would actually deploy it.

That required a different kind of product thinking: one where the model's behaviour, its failure modes, and its latency were all first-class design concerns.

02 — The Work

Decisions made inside the stack

Most of the interesting product decisions at Blink happened at the intersection of model behaviour, developer trust, and UX. Here is how the key pieces came together.

›

01 ——

Model Selection

Benchmarked models on output accuracy, latency and token cost for code generation. Different tasks warranted different trade-offs.

›

02 ——

Prompt Engineering

Built multi-layer system prompts with explicit guardrails constraining outputs to valid smart contract patterns.

›

03 ——

RAG Grounding

Integrated retrieval from curated Web3 developer knowledge bases as a grounding layer to reduce hallucination.

›

04 ——

LLMOps

Defined tracking for prompt performance, generation quality, latency and token cost to enable structured iteration.

05 ——

UX for Latency

Designed loading states, streaming patterns and fallback handling so wait time did not collapse user trust.

Model Selection

Benchmarked models against a golden dataset of real developer prompts. Quality, latency and cost pulled in different directions — routing logic matched model capability to task complexity.

Prompt Engineering + RAG

Multi-layer system prompts with guardrails constraining output to valid contract patterns. RAG retrieval from curated Web3 knowledge bases as grounding — model generating against verified patterns, not from scratch.

LLMOps

Defined tracking with engineering for prompt performance, generation quality, latency and token cost. Success metrics: task completion rate, prompt-to-deploy rate, user retry rate.

UX for Latency

Chose progressive streaming over single response drop. Designed explicit loading states and fallbacks — so generation failures didn't surface as blank screens.

03 — Outcome

From prototype to proof

We took the product from concept through prototype to functional iterations across the full AI-driven code generation workflow. The validation came through a combination of developer research and live market exposure.

$2M

Seed funding secured

50%

Qualified conversion from live demos

400+

Developers researched

ETH
Denver

Live demo validation — 2024

EthDenver 2024 was the forcing function. Live demos with real developers under conference conditions gave us signal that existing benchmark-led evaluation could not — which parts of the UX held up, where developers hesitated, and what they actually needed from an AI coding tool.

LLM pipeline RAG / retrieval grounding Prompt engineering Golden datasets LLMOps instrumentation AI UX design Developer research Smart contracts

04 — How I Think About AI Products

Principles from working inside the stack

Building with LLMs changed how I think about product work. The usual levers — spec the feature, ship the feature, measure the feature — do not transfer cleanly when the output is probabilistic and the failure mode is subtle rather than binary.

01 ——

Define success before you touch the model

What does a good output actually look like? If you cannot articulate that with examples, you cannot evaluate whether the model is getting better. Golden datasets are not a testing afterthought — they are the thing that lets you iterate with confidence.

02 ——

The prompt is the product

System prompts carry product logic — what the model will and will not do, how it handles edge cases, what it should admit it does not know. Writing them is product work, not engineering work. Changing them changes user experience.

03 ——

Instrument for iteration, not just monitoring

Tracking generation quality, latency and token cost is not just operational hygiene — it is what lets you make a structured argument for changing a prompt or swapping a model. Without it, improvement is anecdotal.

04 ——

Latency is a UX problem, not just an infrastructure one

LLMs are slow relative to user expectation. How you handle that wait — streaming, loading states, partial output, fallbacks — shapes whether users trust the product or abandon it. These are design decisions that belong in PM scope.

These principles came out of work at Blink, and have since been applied to other contexts — including a support automation project at Laguna, where the same questions around model behaviour, prompt design, evaluation and AI UX came up in a different product domain.

05 — Applied Context · Laguna

The same problems, different domain

At Laguna I scoped a support automation chatbot to reduce repetitive query volume. Different domain — conversational AI vs code generation — same core work: prompt design, model evaluation, and UX decisions around how the AI handles uncertainty. Trust failure in support costs differently than in code generation, which sharpened how I think about calibrating model behaviour to context.

06 — What's Next

Where I'm building next

The next layer of interesting AI product problems sits at the edge of agentic behaviour — where the model is not just generating output but taking actions, coordinating across tools, and operating within real business workflows.

Agentic Commerce

AI that acts, not just answers

How do you design trust and control into products where the AI is making decisions, not just generating text? That is the product question I want to be working on.

MCP / Tool Use

Models connected to real systems

Model Context Protocol and similar patterns let agents operate within existing toolchains. The product surface is less about the model and more about the orchestration — what the agent is allowed to do and how.