What is TOON?
Token-Oriented Object Notation is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.

Typically 30-60% fewer tokens on large uniform arrays vs formatted JSON
Explicit lengths and fields enable validation
Removes redundant punctuation (braces, brackets, most quotes)
Like YAML, uses whitespace instead of braces
Declare keys once, stream data as rows
AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money โ and standard JSON is verbose and token-expensive.
JSON Example
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}TOON Equivalent
users[2]{id,name,role}:
1,Alice,admin
2,Bob,userTOON conveys the same information with fewer tokens, making it ideal for LLM input where token costs can add up quickly with large datasets.
- Uniform arrays of objects (same fields, primitive values)
- Large datasets with consistent structure
- Repeated structure โข tables
- LLM input where token costs matter
- Deeply nested or non-uniform structures
- Semi-uniform arrays (~40โ60% tabular eligibility)
- Pure tabular data (CSV is more compact)
- API responses or storage (use JSON)
TOON achieves significant token savings while maintaining high retrieval accuracy in LLM applications. Benchmarks show:
Token Efficiency
30-60%
Fewer tokens than JSON
Retrieval Accuracy
73.9%
vs 69.7% for JSON
Note: Token counts vary by tokenizer and model. Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models.
For the complete specification, examples, and more details, visit the official TOON format repository:
TOON Format on GitHub โ