NewGraphRAG now in early beta

Glossary

Tokenization

Also known as: Tokens, BPE

Definition

In language AI, tokenization is the preprocessing step that breaks input text into tokens. Modern LLMs use subword schemes such as Byte-Pair Encoding (BPE) or SentencePiece, which handle unknown words robustly. In English, one token corresponds to roughly three to four characters; German and non-Latin languages often produce more tokens per word. Billing, context-window usage, and latency of LLM APIs are typically measured in tokens.

How Swiss Knowledge Hub uses this term

Swiss Knowledge Hub surfaces per-chat and per-workspace token usage in the admin UI so that cost transparency and quota management are possible — regardless of which LLM or BYOK configuration is in use.

Related terms

Sources

  1. Wikipedia: Byte pair encodinghttps://en.wikipedia.org/wiki/Byte_pair_encoding
  2. OpenAI — What are tokens and how to count themhttps://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them

Last updated: April 22, 2026