What is TOON?

Token-Oriented Object Notation is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.

TOON Format Overview - Token savings, workflow, and performance comparison
๐Ÿ’ธ
Token-efficient

Typically 30-60% fewer tokens on large uniform arrays vs formatted JSON

๐Ÿคฟ
LLM-friendly guardrails

Explicit lengths and fields enable validation

๐Ÿฑ
Minimal syntax

Removes redundant punctuation (braces, brackets, most quotes)

๐Ÿ“
Indentation-based structure

Like YAML, uses whitespace instead of braces

๐Ÿงบ
Tabular arrays

Declare keys once, stream data as rows

Why TOON?

AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money โ€“ and standard JSON is verbose and token-expensive.

JSON Example

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

TOON Equivalent

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

TOON conveys the same information with fewer tokens, making it ideal for LLM input where token costs can add up quickly with large datasets.

Best For
  • Uniform arrays of objects (same fields, primitive values)
  • Large datasets with consistent structure
  • Repeated structure โ€ข tables
  • LLM input where token costs matter
Not Ideal For
  • Deeply nested or non-uniform structures
  • Semi-uniform arrays (~40โ€“60% tabular eligibility)
  • Pure tabular data (CSV is more compact)
  • API responses or storage (use JSON)
Format Overview
TOON borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency.

Performance & Benchmarks

TOON achieves significant token savings while maintaining high retrieval accuracy in LLM applications. Benchmarks show:

Token Efficiency

30-60%

Fewer tokens than JSON

Retrieval Accuracy

73.9%

vs 69.7% for JSON

Note: Token counts vary by tokenizer and model. Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models.

Key Points

Learn More

For the complete specification, examples, and more details, visit the official TOON format repository:

TOON Format on GitHub โ†’