AI with BASIC
Autodiff, Tensors, and Transformers
jdBasic isn’t only retro fun — it has a modern Tensor + Autodiff engine. Build neural networks, train them, and experiment with Transformer-style LLMs right in BASIC.
What you’ll build
Neural basics
Learn how neurons, layers, and activations work — using jdBasic arrays and matrix math.
Autodiff training
Train a small network to solve XOR using TENSOR.BACKWARD.
Transformer demo
See how attention blocks stack into a Transformer that learns next-character prediction.
1) The AI building blocks in jdBasic
Plain arrays: neuron + layer
jdBasic array math is great for understanding the “math inside” a network.
' --- A single neuron (array math) ---
INPUTS = [0.5, -1.2, 0.8]
WEIGHTS = [0.8, 0.1, -0.4]
BIAS = 0.5
WEIGHTED_SUM = SUM(INPUTS * WEIGHTS)
OUTPUT = WEIGHTED_SUM + BIAS
PRINT "Output:"; OUTPUT
Tensors + autodiff
When you call TENSOR.BACKWARD, jdBasic calculates gradients for you.
' --- Convert arrays into tensors ---
X = TENSOR.FROM([[1, 2], [3, 4]])
W = TENSOR.FROM([[5, 6], [7, 8]])
Y = TENSOR.MATMUL(X, W)
' --- Backprop through the graph ---
TENSOR.BACKWARD Y
PRINT "Grad of X:"
PRINT FRMV$(TENSOR.TOARRAY(X.grad))
2) Train XOR
XOR is the “hello world” of training. In jdBasic, autodiff makes the loop surprisingly clean.
Full XOR Training Loop
' ==========================================================
' == Autodiff Neural Network: Learn XOR in jdBasic
' ==========================================================
' 1) Training data
TRAINING_INPUT_DATA = [[0, 0], [0, 1], [1, 0], [1, 1]]
TRAINING_OUTPUT_DATA = [[0], [1], [1], [0]]
INPUTS = TENSOR.FROM(TRAINING_INPUT_DATA)
TARGETS = TENSOR.FROM(TRAINING_OUTPUT_DATA)
' 2) Model definition (2 → 3 → 1)
MODEL = {}
HIDDEN_LAYER = TENSOR.CREATE_LAYER("DENSE", {"input_size": 2, "units": 3})
OUTPUT_LAYER = TENSOR.CREATE_LAYER("DENSE", {"input_size": 3, "units": 1})
MODEL{"layers"} = [HIDDEN_LAYER, OUTPUT_LAYER]
' 3) Optimizer + training params
OPTIMIZER = TENSOR.CREATE_OPTIMIZER("SGD", {"learning_rate": 0.1})
EPOCHS = 15000
' 4) Forward pass (sigmoid activations)
FUNC MODEL_FORWARD(current_model, input_tensor)
temp = input_tensor
layers = current_model{"layers"}
FOR i = 0 TO LEN(layers) - 1
layer = layers[i]
temp = MATMUL(temp, layer{"weights"}) + layer{"bias"}
temp = TENSOR.SIGMOID(temp)
NEXT i
RETURN temp
ENDFUNC
' 5) Loss function (MSE)
FUNC MSE_LOSS(predicted, actual)
err = actual - predicted
RETURN SUM(err ^ 2) / LEN(TENSOR.TOARRAY(err))[0]
ENDFUNC
' 6) Training loop
FOR epoch = 1 TO EPOCHS
PRED = MODEL_FORWARD(MODEL, INPUTS)
LOSS = MSE_LOSS(PRED, TARGETS)
TENSOR.BACKWARD LOSS
MODEL = TENSOR.UPDATE(MODEL, OPTIMIZER)
IF epoch MOD 1000 = 0 THEN
PRINT "Epoch:"; epoch; " Loss:"; TENSOR.TOARRAY(LOSS)
ENDIF
NEXT epoch
3) Debugging: gradients in 10 lines
When training "does nothing", run this check:
PRINT "--- Testing MATMUL + BACKWARD ---"
A = TENSOR.FROM([[1, 2], [3, 4]])
B = TENSOR.FROM([[5, 6], [7, 8]])
C = TENSOR.MATMUL(A, B)
PRINT "C:"
PRINT FRMV$(TENSOR.TOARRAY(C))
TENSOR.BACKWARD C
PRINT "Grad(A):"
PRINT FRMV$(TENSOR.TOARRAY(A.grad))
4) A tiny Transformer LLM
This demo shows a character-level Transformer that learns next-character prediction. It uses real embeddings, positional encoding, and stacked attention blocks.
Core steps
- Build a character vocabulary
- Embed tokens into vectors
- Run stacked self-attention blocks (pre-LN)
- Train with cross-entropy loss
Note: Full demo files live in the repo scripts folder. Keep datasets small for browser use.
Snippet: Stacking Blocks
' --- Stack Transformer blocks ---
MODEL{"layers"} = []
FOR i = 0 TO NUM_LAYERS - 1
layer = {}
layer{"attention"} = TENSOR.CREATE_LAYER("ATTENTION", {"embedding_dim": EMBEDDING_DIM})
layer{"norm1"} = TENSOR.CREATE_LAYER("LAYER_NORM", {"dim": HIDDEN_DIM})
layer{"ffn1"} = TENSOR.CREATE_LAYER("DENSE", {"input_size": HIDDEN_DIM, "units": HIDDEN_DIM * 2})
layer{"norm2"} = TENSOR.CREATE_LAYER("LAYER_NORM", {"dim": HIDDEN_DIM})
MODEL{"layers"} = APPEND(MODEL{"layers"}, layer)
NEXT i
Practical Tips
Keep it small
Browser execution is slower than native C++.
HIDDEN_DIM = 32or 64NUM_LAYERS = 1to start- Print loss every 50 steps, not every step
Save and reuse
Don't retrain every time.