Seed · Scientific Round Open

A new class of AI, backed by patents you can read.

A 15M-parameter model, trained on isiZulu only, matches or beats GPT-4 on zero-shot intent, NER, and sentiment across 40+ languages — at 1/10,000th the size, on a $50 phone, with signed audit trails.

Seven provisional patents at USPTO. Nine independent claims on the controllable-embedding API alone.

The undeniable numbers

Every claim on this page is reproducible.

Eval scripts and checkpoints under NDA. Every number links to the code that produced it.

61.7% / 56.5%
Korean / Japanese

Zero-shot intent from a 15M model that only saw Zulu. 60x above random.

73.2% vs 70.6%
Sozisi-15M vs GPT-4o

Swahili MASSIVE intent. Zero Swahili in our pretraining. GPT-4o saw billions of Swahili tokens.

15M / 24 MB
Parameters / footprint

Sub-50ms CPU inference. No GPU at training or inference.

1-2 hours
Single-GPU training

Pretrained on 2M Zulu sentences. Full run on one consumer GPU.

The moat

Seven provisional filings. One hundred and three claims.

The controllable-embedding API is the crown jewel: nine independent claim fences covering the operator library, cross-lingual transfer, signed inference, and tiered access. Filing receipts and full claims under NDA.

Composable embeddings

Operator algebra on the representation space. Named, reversible transforms at inference. The commercial crown jewel.

9 independent claim fences

Cross-lingual structural transfer

One-language pretraining, many-language deployment. Zero-shot across ten families with no parallel data.

Small-model architecture

Composition primitives let a 15M-parameter model match or beat systems 28x larger on cross-lingual tasks.

Data-efficient training

Production capability from a single-GPU training run on 2M Zulu sentences. No data centers, no H100 clusters.

The bet

The next decade of AI is small, local, and composable.

Small. A 15M model ties GPT-4o on Swahili intent and beats InkubaLM-422M. 28x less compute. Every edge device becomes an inference surface.

Local. 24 MB on a $50 phone. No cloud, no GPU, no data leaving the device. This matters for banks, insurers, health systems, ministries, militaries, and four billion people frontier labs barely serve.

Composable. Prompts are blunt. Named operators for sentiment, negation, and intent are algebra you can name, sign, and reverse. Every inference carries a cryptographic receipt.

Frontier labs spent five years proving scale works. We spent eighteen months proving structure was the higher ground all along.

Partner with us before the category is priced in.

Scientific seed round open. Data room, reproducibility bundle, and technical deep-dive under NDA.

Backed by

Techstars