Research

Sozisi: 15M Parameters, No Compromises

A 15M-parameter model that matches or exceeds GPT-4 on intent classification, entity recognition, and sentiment analysis across 40+ languages — without any additional training per language. 1/10,000th the size. Runs on a $75 phone. Every decision produces a signed audit record.

Model Overview

15M

Parameters

24MB

Model Size

<2 seconds

Language Adaptation

<50ms on device

Inference

High-efficiency structural dataset

Training Data

Zero-Shot Languages

<2 seconds

Probe Adaptation

Sozisi

Architecture

Single consumer GPU

Training Cost

None

GPU Required

Architectural thesis

Most AI learns token statistics. We learn the building blocks underneath.

Languages don’t share every feature — but they draw from the same toolkit: morphology, agreement, argument structure, phonological patterns. Only the combinations differ. Teach a model the statistics of one language’s vocabulary and it knows one language. Teach it the underlying patterns of human language and it transfers to thousands.

Today’s AI methods fail for over 6,000 languages (Stanford HAI, 2024) — not because the data is missing, but because the training is wrong. We pretrained a 15M-parameter model on one morphologically-rich language (isiZulu) and it works zero-shot across 40+ more, because the structural scaffolding is what they share.

Every benchmark uses the same rule: the language under test was not in our training data. We say so explicitly each time, because the normal assumption is “big multilingual model beat a specialist.” Here the specialist is the 422M-parameter competitor that trained on the target language. We’re the 15M- parameter model that didn’t.

Pretraining

1 language

isiZulu only (true zero-shot)

Inference coverage

40+ languages

10 families, 9 writing systems

Adaptation cost per language

< 2 seconds

No fine-tuning. No retraining.

Our architecture

Five units. Each one small, inspectable, and composable.

Every benchmark comes out of the same stack. Each unit is a small, inspectable component — swap, verify, or extend any piece. Together they are what makes 15M parameters beat trillion-parameter systems on the tasks that matter.

Core Unit

Sozisi Encoder

One model. Structure shared across human languages.

A compact neural encoder trained to capture the structural patterns shared across human languages — morphology, agreement, composition. New languages attach in seconds, not weeks.

Zero-shot transfer to 17+ languages across 10 families
Adapts to a new language in <2 seconds
Scripts-agnostic: Arabic, Devanagari, Hangul, Cyrillic
Stable under perturbation (robust by geometry)

Core Unit

Programmable Behavior

Named, composable controls applied at inference

A patented embedding space where semantic dimensions like sentiment, intent, and bias are accessible as named controls on any query or document. Every shift is auditable per call, at 100% flip accuracy on every language tested in-family.

100% sentiment-flip accuracy across every language tested in-family
100% intent-redirect accuracy across 4 tested transitions
Cross-family transfer verified at 77% on English (zero-shot)
Every shift logged for audit + compliance

Core Unit

Morpheme-Aware Tokenization

Inspectable tokens, not opaque subwords

Tokens carry linguistic meaning and you can read them, unlike opaque sub-word fragments. Downstream core units become more sample-efficient and their outputs more explainable.

Human-readable tokens across all supported languages
Compact vocabulary: ~5K tokens covers 23 languages
Dramatically more sample-efficient than opaque sub-word tokenizers
Enables downstream interpretability

Available Now

On-Device Runtime

Linear-time architecture built for phones, not GPUs

A sequence model with linear complexity, so 15M parameters run on a smartphone, feature phone, or sensor. Paired with the encoder and tokenizer, it delivers GPT-4-class NLU at a fraction of the cost, fully offline.

<50ms inference on commodity hardware
No GPU required for production workloads
Runs on Android, iOS, and embedded Linux
24MB footprint — fits alongside your app

Available Now

Self-Healing Inference

Robustness as a core unit, not a bolt-on

A structural correction mechanism that snaps perturbed inputs back to clean representations during inference. The model degrades gracefully on noisy, OOD, or adversarial inputs.

Graceful degradation on out-of-distribution inputs
Resilient to typos, code-switching, and transliteration drift
Stable inference under input perturbation
No retraining needed per failure mode

37-Year-Old Challenge · Empirical Evidence

In 1988, two scientists said AI could never truly understand meaning — only memorize patterns. Here’s the proof they were wrong.

Imagine you mix apple juice and orange juice. Now imagine you can separate them back out — perfectly — and swap the apple juice into a completely different fruit blend you’ve never tried before. That’s what understanding meaning actually requires: the ability to take concepts apart, move pieces around, and put them back together in new combinations.

Most AI models can’t do this. They memorize which sentences tend to be positive or negative. Ask them to move a meaning — take a negative sentence and shift it toward positive while keeping everything else the same — and they fail. They have no “positive” direction to point to. The meaning is locked inside a pattern they can’t manipulate.

Bhala’s embedding space is structured differently. Every sentence sits in a space where named directions actually mean something — “more positive,” “more formal,” “a question instead of a statement.” You can move a sentence along those directions, reverse the move, and combine multiple moves at once. Like algebra, but for meaning.

How it works in practice

We measure the “positive sentiment” direction from a few hundred sentence pairs in one language (isiZulu) — the difference between how a negative sentence sits in the space versus how a positive one does.
We apply that same direction to a KiSwahili sentence the model has never seen: “Chakula hiki ni kibaya sana” (“This food is very bad”).
The shifted sentence now lives where positive KiSwahili sentences live. 100% accuracy across 547 held-out test sentences.
The same direction also works for English, and for intent — turning “book a flight” into “cancel a reservation” — across 22 languages, without any additional training.

Why this matters beyond benchmarks

Bias removal becomes a dial, not a retrain. To remove gender bias from a document retrieval system, you apply the “remove gender” direction. No new data. No retraining. One API call.
It works across languages you never trained on. A direction learned in Zulu transfers to Swahili, Xhosa, and English — because the geometry of meaning is shared, not language-specific.
You can undo it. Every operation is reversible. Apply positive sentiment; reverse it exactly. No information is destroyed.

100% / 94% / 77% — measured 2026-04-25

Steerable meaning — proven across three languages

Bhala is the first model that lets you steer a sentence's meaning at inference time. Tell it to remove a specific bias, flip sentiment, or redirect an intent — and an independent classifier (the kind your team would deploy in production) confirms the change took effect. Reduce bias by 77–100% on the categories you care about, without retraining.

Other sentence models — including the ones from the largest US labs — are optimized for similarity search, not for being steered. You can compare two sentences in their space, but you cannot reliably change one. Bhala is built so that named directions in meaning behave like real controls. No other production-ready model offers this today.

Task	Zulu	Swahili	English	Test cases
Sentiment shift (negative → positive)	100%	100%	—	263
Intent redirect (12 categories)	—	94%	77%	1,969
Bias removal (27 categories — gender, race, religion, age, disability, …)	—	—	100%	15,966

Why English is lower

Today's model was pretrained almost entirely on isiZulu. English results come from generalization — applying learned structure to a language the model never saw at scale. We are now training the English-native version, and expect English to match or exceed the 94% Swahili number.

Independently verifiable

Full benchmark results — every task, every number.

Bias removal across 28 dimensions. Hate speech detection. Sentiment analysis. Intent classification vs GPT-4o. Cross-lingual transfer across 40+ languages and 10 families. Every result is reproducible from public datasets.

See all benchmarks Bias removal Intent vs GPT-4o Hate speech Systematic compositionality

Build with Sozisi

API available now. On-device SDK coming soon.

Try the API Contact Us