Core Unit

Morpheme-Aware Tokenization

Inspectable tokens, not opaque subwords

Tokens carry linguistic meaning — you can read them. This is the opposite of BPE, where tokens are statistical fragments with no human-auditable structure. Morpheme-aware tokenization makes downstream core units more sample-efficient and their outputs more explainable.

Request Access All Core Units

Properties

What this core unit gives you

The guarantees this core unit provides when composed into a production system.

Human-readable tokens across all supported languages

Compact vocabulary: ~5K tokens covers 23 languages

Dramatically more sample-efficient than BPE

Enables downstream interpretability

Composes With

The other core units it pairs with

Core units are decoupled by design. These are the ones we've validated in production compositions — but you can compose with your own stack too.

Sozisi Manifold

Universal linguistic structure as a composable substrate

Edge-Class Backbone

Linear-time architecture for on-device inference

Used In

Where this core unit ships today

Use cases that include this core unit as part of the composition.

Compose Morpheme-Aware Tokenization into your stack.

Available via API, on-prem license, or as part of a composed packaged product. Talk to us about the right entry point.

Request Access Read the Docs

Morpheme-Aware Tokenization

What this core unit gives you

The other core units it pairs with

Sozisi Manifold

Edge-Class Backbone

Where this core unit ships today

Governable Embeddings

Interpretable AI

Sovereign AI

Edge AI

Language Understanding

Compose Morpheme-Aware Tokenization into your stack.