← Back to portfolio
AI Product Work

Building with

language models

Primary Work
Blink AI — dApp Code Generation
Period
2023–2024
Role
Product Lead
Outcome
$2M seed · 50% demo conversion
Genesis AI — dApp code generation interface

01 — The Problem

Why dApp developers kept getting stuck

Blink's thesis was that the bottleneck in Web3 development was not creativity — it was the gap between a developer's idea and a deployable smart contract. The average developer spent hours navigating fragmented documentation, incompatible frameworks, and brittle boilerplate before writing a single line of meaningful code.

The product we built used an LLM pipeline to generate dApp code from natural language intent, grounded against a curated library of verified smart contract patterns. The challenge was not just getting the model to produce code — it was making the output trustworthy enough that a developer would actually deploy it.

That required a different kind of product thinking: one where the model's behaviour, its failure modes, and its latency were all first-class design concerns.

02 — The Work

Decisions made inside the stack

Most of the interesting product decisions at Blink happened at the intersection of model behaviour, developer trust, and UX. Here is how the key pieces came together.

01 ——
Model Selection
Benchmarked models on output accuracy, latency and token cost for code generation. Different tasks warranted different trade-offs.
02 ——
Prompt Engineering
Built multi-layer system prompts with explicit guardrails constraining outputs to valid smart contract patterns.
03 ——
RAG Grounding
Integrated retrieval from curated Web3 developer knowledge bases as a grounding layer to reduce hallucination.
04 ——
LLMOps
Defined tracking for prompt performance, generation quality, latency and token cost to enable structured iteration.
05 ——
UX for Latency
Designed loading states, streaming patterns and fallback handling so wait time did not collapse user trust.
Model Selection
Benchmarked models against a golden dataset of real developer prompts. Quality, latency and cost pulled in different directions — routing logic matched model capability to task complexity.
Prompt Engineering + RAG
Multi-layer system prompts with guardrails constraining output to valid contract patterns. RAG retrieval from curated Web3 knowledge bases as grounding — model generating against verified patterns, not from scratch.
LLMOps
Defined tracking with engineering for prompt performance, generation quality, latency and token cost. Success metrics: task completion rate, prompt-to-deploy rate, user retry rate.
UX for Latency
Chose progressive streaming over single response drop. Designed explicit loading states and fallbacks — so generation failures didn't surface as blank screens.

03 — Outcome

From prototype to proof

We took the product from concept through prototype to functional iterations across the full AI-driven code generation workflow. The validation came through a combination of developer research and live market exposure.

$2M
Seed funding secured
50%
Qualified conversion from live demos
400+
Developers researched
ETH
Denver
Live demo validation — 2024

EthDenver 2024 was the forcing function. Live demos with real developers under conference conditions gave us signal that existing benchmark-led evaluation could not — which parts of the UX held up, where developers hesitated, and what they actually needed from an AI coding tool.

LLM pipeline RAG / retrieval grounding Prompt engineering Golden datasets LLMOps instrumentation AI UX design Developer research Smart contracts

04 — How I Think About AI Products

Principles from working inside the stack

Building with LLMs changed how I think about product work. The usual levers — spec the feature, ship the feature, measure the feature — do not transfer cleanly when the output is probabilistic and the failure mode is subtle rather than binary.

01 ——
Define success before you touch the model
What does a good output actually look like? If you cannot articulate that with examples, you cannot evaluate whether the model is getting better. Golden datasets are not a testing afterthought — they are the thing that lets you iterate with confidence.
02 ——
The prompt is the product
System prompts carry product logic — what the model will and will not do, how it handles edge cases, what it should admit it does not know. Writing them is product work, not engineering work. Changing them changes user experience.
03 ——
Instrument for iteration, not just monitoring
Tracking generation quality, latency and token cost is not just operational hygiene — it is what lets you make a structured argument for changing a prompt or swapping a model. Without it, improvement is anecdotal.
04 ——
Latency is a UX problem, not just an infrastructure one
LLMs are slow relative to user expectation. How you handle that wait — streaming, loading states, partial output, fallbacks — shapes whether users trust the product or abandon it. These are design decisions that belong in PM scope.

These principles came out of work at Blink, and have since been applied to other contexts — including a support automation project at Laguna, where the same questions around model behaviour, prompt design, evaluation and AI UX came up in a different product domain.

05 — Applied Context · Laguna

The same problems, different domain

At Laguna I scoped a support automation chatbot to reduce repetitive query volume. Different domain — conversational AI vs code generation — same core work: prompt design, model evaluation, and UX decisions around how the AI handles uncertainty. Trust failure in support costs differently than in code generation, which sharpened how I think about calibrating model behaviour to context.

06 — What's Next

Where I'm building next

The next layer of interesting AI product problems sits at the edge of agentic behaviour — where the model is not just generating output but taking actions, coordinating across tools, and operating within real business workflows.

Agentic Commerce
AI that acts, not just answers
How do you design trust and control into products where the AI is making decisions, not just generating text? That is the product question I want to be working on.
MCP / Tool Use
Models connected to real systems
Model Context Protocol and similar patterns let agents operate within existing toolchains. The product surface is less about the model and more about the orchestration — what the agent is allowed to do and how.