Natural-language commands for a map: parsing COMMAND:type:params out of a chat response

10/6/2025Data Science & AI•Geospatial Urban Analytics Platform•263 views•15 min read•by Kuray Karaaslan

Natural-language commands for a map: parsing COMMAND:type:params out of a chat response

User types "show me schools in the south-east". An LLM returns a single line: COMMAND:FILTER_LAYER:schools,south-east. The viewer reacts. This post is about the wire contract that turns a chat box into a map control.

The temptation, when you bolt an LLM onto a map app, is to let the model "do everything". You hand it the full map state, the layer catalogue, the geometry helpers — and you hope it returns something the renderer can use. Six prompt rewrites later you have a chat box that gives a confident answer and changes nothing on the map.

The alternative is small and slightly boring: pick a wire format the LLM is allowed to emit, parse it on receipt, dispatch to the viewer. One line in, one effect out. The chat-driven map control surface in the Next.js app this post is grounded in does exactly that, and the contract fits in a single sentence inside a system prompt: COMMAND:COMMAND_TYPE:PARAMS.

This is a framework post. Five steps, each tied to real code in the app/api/bitie/route.ts endpoint that wraps OpenAI's gpt-3.5-turbo and the chat-to-command pipeline that sits behind it. If your project is a data visualisation tool that's about to grow a natural-language layer, the shape below is one defensible starting point.

The decision this framework is for

You are adding chat to a thing that already has a UI. The UI has buttons. The buttons toggle layers, change styles, run filters, zoom to extents — verbs the product already understands. A user can do all of this today by clicking. The question is what role the LLM plays.

Two roads. The first is conversational: the model becomes the new front end, you stream prose, you let it call tools, and you accept that every interaction round-trips through token budgets and latency. The second is translational: the existing UI keeps shipping verbs to the viewer, and the LLM is just a different way to produce those verbs. The chat box becomes a typed shortcut for the buttons that already exist.

This framework is for the second road. Use it when the map has a closed verb set, when the verbs are cheap to enumerate, and when you would rather ship a thin natural-language layer this sprint than spend a quarter on tool-calling infrastructure. If you need the model to reason across multiple turns, hold state, or generate novel actions, this is the wrong shape — go look at OpenAI's tool calling or a structured-output schema and budget accordingly.

The framework: a five-step wire contract

The pipeline has five named pieces. Two live on the server, three on the client.

A closed verb set. Enumerate the actions the viewer can already perform from the existing button surface.
A single-line wire format. Pick a delimiter the model can produce reliably with low temperature, and pin it in the system prompt.
A model with a tight leash. Small model, low temperature, low max_tokens, system prompt that forbids prose when a command applies.
A receipt-side parser. Treat every reply as untrusted input. Reject anything that doesn't parse. Pass the rest as text.
A dispatcher into the existing action layer. The parsed command maps to the same handler the buttons call. No new code paths.

Steps one and two are design decisions you make once. Step three is a configuration you tune. Steps four and five are the code that earns the latency of a chat round-trip.

Each step with one paragraph of explanation

Step 1 — closed verb set. You cannot ask the model to do things the renderer cannot do. Before any prompt is written, list every map mutation the existing UI already supports — toggle this layer, set this style, filter to that polygon, zoom to that extent — and decide which ones are reachable from chat. The smaller the set, the lower the hallucination rate. Anything outside the set should fall through to plain text and stay on the screen as a chat message.

Step 2 — single-line wire format. The format is not negotiable mid-project, so pick one you can defend. Three constraints: it must be greppable from inside a longer reply (in case the model leaks prose), it must survive being trimmed and split, and the model must be able to produce it under low temperature without quote-escaping its own output. COMMAND:TYPE:PARAMS clears all three. JSON is more "correct" and harder to coerce out of a chat-tuned model reliably without response_format. Pick the boring one.

Step 3 — model with a tight leash. A small model, kept honest. The endpoint behind this post uses gpt-3.5-turbo with temperature: 0.2 and max_tokens: 200. The system prompt ends with an explicit instruction that when a command applies the model returns only the command, no explanation. The two-knob loop — drop temperature, cap tokens — is what gets you from "interesting demo" to "predictable enough to ship".

Step 4 — receipt-side parser. This is where most teams cut corners. The reply is a string from an external system. You do not trust it. The parser checks the prefix, splits on the delimiter, validates TYPE against the closed set from step one, and only then hands PARAMS to a typed coercer. If parsing fails, the string is rendered as a chat bubble and the map does nothing. Silent failure beats wrong action.

Step 5 — dispatcher. The dispatcher is a switch over the closed verb set. Each case calls the same handler the corresponding UI button already calls. No business logic lives in the dispatcher itself. This is the rule that keeps the chat surface from becoming a second product: every action a chat command can take is also reachable from a button click, and they share a code path.

Walk the framework through a real artifact

The endpoint that drives all of this is a single Next.js Route Handler — about fifty lines, sitting at app/api/bitie/route.ts. It takes a { prompt, style } payload, calls OpenAI, and returns a { reply } string. Here it is, edited only to genericize the product names.

// app/api/<assistant>/route.ts
import { NextRequest, NextResponse } from 'next/server'

export async function POST(req: NextRequest) {
  const { prompt, style } = await req.json()

  const apiKey = process.env.OPENAI_API_KEY
  if (!apiKey) {
    return NextResponse.json({ error: 'OpenAI key not set' }, { status: 500 })
  }
  // ...
}

Two things to note before the prompt itself. The handler is a thin POST that reads OPENAI_API_KEY from the environment and short-circuits with a 500 if it's missing — the build does not depend on the key existing, only the runtime. And the payload is two fields, prompt and style. The style is the current map state, JSON-stringified into the system context, so the model can answer questions about what is actually rendered, not about a default map it has never seen.

const styleMsg = style
  ? `\nBelow is the current <product> map style in JSON format:\n${JSON.stringify(style)}\n`
  : ''

That single optional block is what turns the assistant from a stateless wikipedia into a thing that can answer "which layers are visible right now". The trade-off — and there's always one — is token cost. A full map style JSON is not small. For a v1, dumping it into the system context is fine. For scale, you'd hash it, cache it, or send only the parts that changed.

Now the model call. This is the load-bearing block.

const openaiRes = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'gpt-3.5-turbo',
    messages: [
      {
        role: 'system',
        content:
          'You are <name>, the smart city assistant for <product>. ' +
          'Always give concise, map- and data-focused answers for users about their city. ' +
          'If a map style is included, use it as context for your answer.\n ' +
          'if you gonna give commands never make explanations, just give the command ' +
          'in the format: COMMAND:COMMAND_TYPE:PARAMS',
      },
      { role: 'user', content: prompt + styleMsg },
    ],
    max_tokens: 200,
    temperature: 0.2,
  }),
})

Read the system prompt carefully. It does two jobs and only two. The first sentence sets the role and the answer style — concise, map-focused, anchored to the current style JSON. The last clause specifies the wire format and forbids commentary when a command applies. There is no list of legal COMMAND_TYPE values in this prompt. That is deliberate, and it is the place this v1 is honest about its limits — the closed verb set lives in the parser, not the prompt, which means the model can hallucinate verbs the parser will reject. We'll come back to that.

The two numeric knobs matter. temperature: 0.2 is low enough that the same question produces the same command across most retries. max_tokens: 200 is a hard ceiling on the cost of a single round-trip — enough headroom for a short paragraph or a one-line command, not enough for the model to write an essay if the system prompt fails to constrain it. If you watch one number on the dashboard, watch this one.

const data = await openaiRes.json()
if (!openaiRes.ok) {
  return NextResponse.json({ error: data.error || 'AI Error' }, { status: 500 })
}
const reply =
  data.choices?.[0]?.message?.content?.trim() ||
  'Sorry, <name> could not generate a response right now.'
return NextResponse.json({ reply })

The response handling is three lines that are easy to skim past. The first checks the HTTP status from OpenAI and forwards the error envelope. The second pulls choices[0].message.content with optional chaining at every step — because the schema can drift, and a TypeError on an LLM endpoint is the worst kind of 500 to debug at 2 a.m. The third trims the string and applies a fallback message. The server does no parsing of the command itself. It returns the raw trimmed reply.

That last decision is the one I'd defend hardest. Parsing on the server feels tidier — you could return a typed envelope, { kind: 'command', type, params } versus { kind: 'message', text }. But the parser belongs in the place that knows about the map: the client, where the layer registry lives. A second team could write a different client against this same endpoint tomorrow, and the contract would still hold. Keep the route handler dumb.

The client-side parser, then, looks roughly like this. The shape below is the pattern the v1 client implements against the contract — your call signatures will differ, but the five gates do not.

// client-side parser, conceptual shape
type MapCommand =
  | { type: 'FILTER_LAYER'; params: string[] }
  | { type: 'TOGGLE_LAYER'; params: [layerId: string, on: boolean] }
  | { type: 'ZOOM_TO'; params: [lng: number, lat: number, zoom: number] }

const KNOWN_TYPES = new Set([
  'FILTER_LAYER',
  'TOGGLE_LAYER',
  'ZOOM_TO',
])

export function parseReply(reply: string): MapCommand | { type: 'TEXT'; text: string } {
  if (!reply.startsWith('COMMAND:')) {
    return { type: 'TEXT', text: reply }
  }
  const parts = reply.split(':')
  if (parts.length < 3) return { type: 'TEXT', text: reply }

  const [, type, ...rest] = parts
  if (!KNOWN_TYPES.has(type)) return { type: 'TEXT', text: reply }

  const params = rest.join(':').split(',').map(s => s.trim())
  // coerce to the typed variant for `type` here — failure falls through to TEXT
  return coerceCommand(type, params) ?? { type: 'TEXT', text: reply }
}

Five gates: prefix check, minimum part count, verb-set membership, parameter coercion, fallback to text. Anything that fails any gate becomes a chat bubble and the map does nothing. That is the silent-failure rule, and it is what stops a confused LLM from emptying the user's layer state.

The dispatcher is the boring half. It is a switch on type that calls the existing action functions the UI buttons already call. No new code paths, no new validation, no shadow state machine. If TOGGLE_LAYER already has a handler that the layer panel uses, the dispatcher calls that. The chat surface is a typed alias for the button surface.

Where the framework fails

Three places, in increasing severity.

Brittle delimiters. COMMAND:TYPE:PARAMS with comma-separated params breaks the moment a parameter contains a comma. A polygon WKT string, a layer name with punctuation, a free-text filter clause — any of these will tear the parser. You can fix it by escaping, by switching to a length-prefixed format, or by moving to JSON with response_format: { type: 'json_object' } and accepting the latency tax. For a closed verb set with short identifier-style params, the framework holds. Stretch the params and the wire format is the first thing that snaps.

The closed verb set drifts. Step one assumes the set is small and stable. In practice it grows. Every product change adds a verb, and unless you regenerate the system prompt's hint or the client's KNOWN_TYPES registry from the same source of truth, the two drift. The version of this codebase you're reading does not enumerate the verbs in the system prompt at all — it leaves the model to infer them. That's defensible at v1 (the parser rejects unknowns) but it caps how reliably the model can pick the right verb when it has many to choose from. The mitigation is to generate both the system-prompt verb list and the client registry from a single TypeScript const at build time.

No multi-step reasoning. "Compare last year's flood zones to this year's school catchments and show me the overlap" is two commands and a join. This framework returns one line. The honest answer is that if your product needs multi-step, you need tool calling, and the wire-contract approach was the wrong starting point. Most map products do not need multi-step on day one. Plan to outgrow this, not to extend it.

CTA

The prompt that triggers the framework, when you're staring at a chat box and an empty viewer: "What verb did the user just type, and can the renderer already do it?" If both halves answer cleanly, you're in the framework's lane. If either answer is "I'm not sure", you're not — pick a different shape.

Trade-off

The trade-off this recommendation accepts is expressiveness for reliability. A wire contract with a closed verb set will never produce the conversation a tool-calling agent can. It will never compose actions. It will never hold context across a long thread. What it will do is ship in a sprint and not regress when OpenAI rotates a model version, because the surface area is small enough to retest by hand. For a v1 natural-language layer on top of an existing UI, that's the trade I'd make. For a product whose core experience is conversation, it's the wrong trade.

Business impact

What does this look like to the business? It is the difference between a chat box that demos well and a chat box that ships. A wire contract bounds the failure mode — the worst case is a chat reply that does nothing to the map, never a chat reply that drops every layer the user just configured. That is the kind of failure mode you can show to a non-technical stakeholder without flinching. It is also the kind of code path that a junior on the team can extend in an afternoon by adding one verb to the registry and one case to the dispatcher, instead of refactoring a tool-calling agent. The cost is ceiling, not floor — you cap how clever the assistant gets in exchange for predictable behaviour at the price point of a gpt-3.5-turbo call and 200 tokens.

What to do next

If you have a map app and a chat box in the same product, open the route handler that calls the LLM and look at two lines: the temperature and the max_tokens. If either is missing, add them. If temperature is above 0.3, lower it. Then open the consumer of the reply and find the parser. If there isn't one — if the string is rendered straight into the chat panel and a separate code path watches for keywords — write the five-gate parser above, in whatever shape your registry takes. Those two changes are the difference between a chat box that occasionally toggles a layer by accident and a chat box that toggles a layer because the user asked for it.

This is the last post in a thirteen-part series on the engineering judgement behind a Next.js geospatial app. The series started in the renderer — coordinate systems, layer fetchers, MapLibre style state — and ends here, at the seam where natural language meets the same code path a button click already runs. If you've read the rest, you have the pattern. If this is the first one you've landed on, the prompt at the top of the CTA still works: name the verb, check the renderer, decide whether you're in the framework's lane.

Natural-language commands for a map: parsing COMMAND:type:params out of a chat response

Natural-language commands for a map: parsing COMMAND:type:params out of a chat response

The decision this framework is for

The framework: a five-step wire contract

Each step with one paragraph of explanation

Walk the framework through a real artifact

Where the framework fails

CTA

Trade-off

Business impact

What to do next

Related Articles

Comments (0)

Newsletter