AI with BASIC
Local LLMs, ONNX, RAG — all in one binary
jdBasic ships with ONNX Runtime for any pretrained model, llama.cpp for local LLM chat, a built-in RAG engine, and a k-NN classifier. No Python, no Docker, no cloud — everything runs on your hardware out of a single self-contained EXE.
The AI stack at a glance
ONNX Runtime
AI.LOAD +
AI.RUN — drop in any
.onnx model. MNIST, ResNet,
stable-diffusion encoders, you name it.
llama.cpp
AI.LOAD_LLM +
AI.CHAT_STREAM — load any
GGUF (Phi-3, Llama, Qwen, gpt-oss). CUDA / Vulkan / ROCm offload
depending on the build.
RAG engine
AI.RAG_* — TF-IDF or dense
embeddings, HNSW index, file/dir ingestion, streaming answers grounded
in your own data.
Classifier
AI.CLASSIFIER_* — k-NN over
embeddings, ideal for ticket routing, intent detection, anything where
you have labelled examples but no training budget.
1) ONNX inference
AI.LOAD takes any
.onnx file;
AI.RUN feeds it
the inputs as nested jdBasic arrays and returns the outputs the same way. Use it
for image classifiers, embedding extractors, or — like below — as a tiny SIMD
compute backend for convolutions.
jdb/mini_onnx.jdb
' Load a tiny 3×3 convolution model and run it on a 5×5 input.
' Same pattern works for full image classifiers — swap the .onnx
' file and reshape the input.
DIM conv = AI.LOAD("bench/conv3x3.onnx")
' ONNX Conv2D wants [batch, channels, h, w]. Edge-detect kernel below.
LET edge_kernel = [[[[-1.0, -1.0, -1.0], [2.0, 2.0, 2.0], [-1.0, -1.0, -1.0]]]]
LET image = [[[[0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0]]]]
LET out = AI.RUN(conv, [image, edge_kernel])
PRINT "Edge-detected response:"
PRINT out[0][0]
See jdb/ai_demo.jdb for an interactive
MNIST digit recogniser using the same two natives plus
AI.SOFTMAX +
AI.ARGMAX.
2) Local LLM chat
AI.LOAD_LLM takes a
GGUF file plus a context size and a GPU-layer count. AI.CHAT_STREAM
hands each new token to your callback as it's generated — perfect for typewriter-style UIs
or for piping into a TUI.
jdb/mini_llm.jdb
DIM model = "models/Phi-3-mini-4k-instruct-q4.gguf" ' ~2.4 GB
DIM ctx_size = 2048
DIM gpu_layers = 99 ' all on GPU if the build has CUDA/Vulkan/ROCm
DIM llm = AI.LOAD_LLM(model, ctx_size, gpu_layers)
DIM rc1 = AI.SET(llm, "system", "You are a concise assistant. Answer in 2 sentences.")
DIM rc2 = AI.SET(llm, "temperature", 0.5)
DIM rc3 = AI.SET(llm, "max_tokens", 120)
' AI.CHAT_STREAM hands each new token to the callback as it's generated.
' Return TRUE to keep going, FALSE to stop early.
FUNC OnToken(t)
PRINT t;
RETURN TRUE
ENDFUNC
PRINT "Q: Why is BASIC a good language to teach with?"
PRINT "A: ";
DIM full_response = AI.CHAT_STREAM(llm, "Why is BASIC a good language to teach with?", OnToken@)
PRINT
DIM rc4 = AI.FREE_LLM(llm)
AI.SET also takes
"top_p",
"top_k",
"repeat_penalty",
"grammar" (GBNF), and JSON-mode toggles.
See jdb/ai_chat_demo.jdb for a full
ImGui chat studio (history, streaming, tokenizer view).
3) Retrieval-augmented Q&A
Bundle your own docs with a query interface in ~20 lines.
AI.RAG_CREATE in
TF-IDF mode needs no model files at all;
AI.RAG_SEARCH returns
raw chunk + score hits, and
AI.RAG_QUERY folds the
retrieved context into an LLM prompt and streams an answer.
jdb/mini_rag.jdb
' TF-IDF mode (llm_id=0, no embed_id): zero model files required.
DIM rag = AI.RAG_CREATE(0, 400, 40)
DIM rc1 = AI.RAG_ADD(rag, "jdBasic is a modern BASIC dialect with APL-style array ops, ImGui graphics, and a built-in LLVM-18 native compiler.", "overview")
DIM rc2 = AI.RAG_ADD(rag, "AI features include AI.LOAD for ONNX models, AI.LOAD_LLM for GGUF local language models via llama.cpp, and AI.RAG_QUERY for retrieval-augmented generation.", "ai")
DIM rc3 = AI.RAG_ADD(rag, "The jdBasic MCP server lets Claude or Cursor pair-program on a persistent VM with STOP/RESUME and LLVM compile-and-ship.", "mcp")
DIM rc4 = AI.RAG_ADD(rag, "Graphics commands include SCREEN, LINE / RECT / CIRCLE, SPRITE.LOAD, and a Tiled-map loader.", "graphics")
DIM hits = AI.RAG_SEARCH(rag, "How can I pair-program with Claude?", 2)
DIM i AS INTEGER
FOR i = 0 TO LEN(hits) - 1
PRINT "── hit ", i + 1, " — source=", hits[i]{"source"}, " score=", hits[i]{"score"}
PRINT hits[i]{"text"}
NEXT i
Swap to dense embeddings by passing a 4th arg to
AI.RAG_CREATE: the id of an
embedding model loaded via AI.LOAD_EMBEDDINGS.
Then AI.RAG_BUILD_INDEX builds an HNSW
graph for sub-millisecond search over millions of chunks. Full example in
jdb/rag_demo.jdb.
4) k-NN classifier
For text classification (ticket routing, intent detection, sentiment, …) you usually
don't need to fine-tune a model — just embed your labelled examples and look up the
nearest neighbours. The
AI.CLASSIFIER_* family
does this with an HNSW index under the hood.
Sketch — see jdb/train_classifier.jdb for the full demo
DIM emb = AI.LOAD_EMBEDDINGS("models/bge-m3-Q4_K_M.gguf", 2048, 99)
DIM clf = AI.CLASSIFIER_CREATE(emb)
' Add labelled examples — one per CALL, or AI.CLASSIFIER_ADD_BATCH for many.
DIM n1 = AI.CLASSIFIER_ADD(clf, "Scanner offline, red LED blinking", "hardware")
DIM n2 = AI.CLASSIFIER_ADD(clf, "Reset my password please", "account")
DIM n3 = AI.CLASSIFIER_ADD(clf, "Excel macro fails after the update", "software")
' ... more examples ...
DIM rc = AI.CLASSIFIER_BUILD_INDEX(clf)
DIM pred = AI.CLASSIFIER_PREDICT(clf, "My printer won't turn on", 5)
PRINT "Top label: ", pred{"label"}, " — confidence ", pred{"confidence"}
PRINT "Votes: ", pred{"votes"}
DIM rc2 = AI.CLASSIFIER_SAVE(clf, "tickets.clf") ' reload next time with AI.CLASSIFIER_LOAD
Performance & tips
GPU offload
The third arg to AI.LOAD_LLM
and AI.LOAD_EMBEDDINGS is the
number of model layers to push onto the GPU. 99 = all of them.
- Windows release builds ship with CUDA — pick a small enough model that fits VRAM.
- Linux builds inside the AMD Strix-Halo distrobox offload via Vulkan / RADV.
- Test the Strix-Halo gpt-oss-20b baseline: ~74 tokens/sec on a Radeon 8060S iGPU.
Where to grab models
jdBasic loads anything .gguf
(LLMs) or .onnx (general inference).
- huggingface.co — the GGUF hub. Start with Phi-3-mini Q4 (2.4 GB) for a fast baseline.
- github.com/onnx/models — pretrained ONNX zoo.
- For embeddings:
nomic-embed-text-v1.5orbge-m3.
Dotted-native syntax
Call dotted natives in function form:
DIM rc = AI.SET(id, "k", v) —
not AI.SET id, "k", v (parsed
as method-on-value in some contexts).
Compile to a single EXE
jdbasic -c my_ai_script.jdb
produces a standalone EXE via the built-in LLVM-18 backend. Pair with
MCP-based pair coding
and your AI co-builder can hand you a redistributable binary in the same
turn.