// T2 · Structured Data & YAML

YAML is not a format.
It's a grammar.

You know the syntax. Now learn why it behaves the way it does — including the type-coercion quirks, multiline string variants, and production patterns that catch every intermediate developer eventually.

T2 Intermediate 4 sections + cheatsheet
TL;DR — skip to the cheatsheet?

YAML has three building blocks — scalars (single values), sequences (lists), and mappings (key-value pairs) — and supports two presentation styles (block and flow). The parser infers types from unquoted values, which means NO, yes, 2024-01-15, and 042 can all surprise you unless you understand the resolution rules.

The big gotchas: YAML 1.1 treats yes/no/on/off as booleans (the Norway Problem). Dates auto-parse. Tabs are illegal for indentation. Duplicate keys are undefined behaviour. Multiline strings come in two forms — literal | (preserves newlines) and folded > (collapses newlines to spaces).

In production: Astro uses js-yaml (YAML 1.2) with optional Zod schema validation. Jekyll uses Ruby's Psych (YAML 1.1). Hugo defaults to TOML. When portability and explicitness matter more than terseness, reach for TOML.

Front matter isn't just Markdown decoration. It's the data layer that drives every modern static site generator, content pipeline, and AI workflow. Astro, Hugo, Jekyll, and Obsidian all parse it. GitHub Actions workflows, Docker Compose files, and Kubernetes manifests share its grammar. Understanding YAML at the parser level — not just the syntax level — means you stop guessing and start reasoning.

The edge cases in this guide aren't rare. They're the specific situations that show up when your content pipeline moves to production, when a country code becomes false, when a version number silently drops its leading zero. You've probably already hit at least one.

01 / How the parser sees your file

The Grammar of YAML

YAML is a serialization language with three fundamental node types and two presentation styles. Once you have this mental model, every quirky behavior has a logical explanation.

Three node types — that's the whole language

Scalar

A single value

A string, number, boolean, null, or date. Everything that isn't a container is a scalar.

Sequence

An ordered list

A list of nodes. Each item can itself be a scalar, sequence, or mapping.

Mapping

Key-value pairs

An unordered set of key-value pairs. Keys are almost always scalars. Values can be anything.

Block style vs flow style

Every YAML node can be written in one of two styles. Block style uses indentation and newlines — it's what you write in front matter. Flow style uses inline JSON-like syntax. Both are legal YAML; they represent the same data.

Block style — what you write
---
title: My Post
tags:
  - astro
  - yaml
author:
  name: Alice
  role: editor
---
Flow style — same data
---
title: My Post
tags: [astro, yaml]
author: {name: Alice, role: editor}
---


Mental model

YAML is a superset of JSON. Any valid JSON document is also valid YAML. Flow style is essentially JSON without the mandatory quoting. This is why your front matter parser can usually handle both.

How the parser resolves types

When the parser encounters an unquoted scalar, it runs it through a type resolution sequence. The first type whose pattern matches wins:

implicit type resolution order
# The parser tries each of these in sequence:

null       ← matches: null, ~, or empty value
boolean    ← matches: true, false (YAML 1.2)
              also: yes, no, on, off (YAML 1.1 only)
timestamp  ← matches: 2024-01-15, 2024-01-15T10:00:00Z
integer    ← matches: 42, 0xFF, 0o77
float      ← matches: 3.14, .inf, -.inf, .nan
string     ← everything else falls through to string

This is exactly why quoting matters. A value that looks like a string to you might pattern-match a type higher in this list. The parser doesn't read your mind — it reads the value.

Watch out — how front matter is extracted

A front matter parser first strips the --- delimiters, then hands the content to a YAML parser. The YAML parser has no knowledge of Markdown. It sees a standalone document. This means the indentation-sensitive rules, type resolution, and all YAML restrictions apply in full — including the tab-indent prohibition.

02 / The cases everyone gets wrong

Data Types,
in depth

The basic types — string, number, boolean, null — all behave intuitively most of the time. Then they don't. Here's exactly where the surprises live, and why they exist.

Strings: three quoting modes

Unquoted risky

Safe for simple prose. Dangerous when the value could match a type higher in the resolution list, or contains special characters (:, #, [, {).

title: Hello World ← fine country: NO ← false in YAML 1.1! version: 1.0 ← float, not string
Single-quoted safe

Forces string type. No escape sequences. To include a literal single quote, double it: ''. Use when you don't need escape sequences and want guaranteed string parsing.

country: 'NO' ← string "NO" note: 'It''s fine' ← "It's fine" path: 'C:\Users\me' ← backslash literal
Double-quoted expressive

Forces string type and supports escape sequences: \n, \t, \\, \", \uXXXX for Unicode. The most powerful option.

desc: "Line 1\nLine 2" ← real newline icon: "\u2705" ← ✅ emoji msg: "She said \"hi\""← escaped quote

The Norway Problem

This is one of the most notorious YAML gotchas. In YAML 1.1, the following values all parse as booleans: yes, no, true, false, on, off, y, n, and their capitalised variants. The ISO 3166-1 country code for Norway is NO. A config file with country: NO silently becomes country: false.

the norway problem
# YAML 1.1 — used by Ruby/Psych (Jekyll), PyYAML older versions
country: NO          ← parses as false  (!!)
enabled: yes         ← parses as true
mode: on             ← parses as true
debug: off           ← parses as false

# YAML 1.2 — used by js-yaml (Astro), Go's yaml.v3
country: NO          ← parses as string "NO"  ✓
enabled: yes         ← parses as string "yes"  (not boolean!)
mode: true           ← parses as true  ✓

# Safe: always quote if the value could be ambiguous
country: 'NO'        ← string in any YAML version
Watch out — know your parser's YAML version

Jekyll uses Ruby's Psych, which implements YAML 1.1. Astro uses js-yaml (v4+), which implements YAML 1.2. The same front matter file can produce different values depending on which tool reads it. When content travels between tools — export from Jekyll, import to Astro — this bites.

Multiline strings

When a description or body text needs line breaks, YAML offers two distinct block scalar styles with different semantics. Choosing the wrong one is a common source of subtle rendering bugs.

Literal block — |
description: |
  This is line one.
  This is line two.
  This is line three.

# Result (newlines preserved):
# "This is line one.\n
#  This is line two.\n
#  This is line three.\n"
Folded block — >
description: >
  This is line one.
  This is line two.
  This is line three.

# Result (newlines → spaces):
# "This is line one. 
#  This is line two. 
#  This is line three.\n"

Both styles also support chomping indicators — characters that control the trailing newline behaviour:

chomping: clip (default), strip, keep
a: |    ← clip (default): single trailing newline
  text

b: |-   ← strip: no trailing newline
  text

c: |+   ← keep: all trailing newlines preserved
  text

Numbers, dates, and the other surprises

numbers and dates
# Integers
count:   42         ← integer
hex:     0xFF       ← 255
octal:   0o77       ← 63 (YAML 1.2 syntax)
old_oct: 077        ← 63 in YAML 1.1, string "077" in 1.2!

# Floats
ratio:   3.14
big:     .inf       ← positive infinity
small:   -.inf      ← negative infinity
undef:   .nan       ← Not a Number

# Dates — YAML parses these automatically
published: 2024-01-15          ← Date object, not a string!
updated:   2024-01-15T10:30:00Z ← full ISO 8601 timestamp
safe_date: "2024-01-15"        ← force string with quotes
Pro tip — dates in Astro content collections

Astro's content collections accept pubDate as a z.date() in Zod schema, which means the automatic date parsing actually helps you. But if you're reading the front matter raw and expecting a string, you'll get a Date object instead. Define your Zod schema explicitly — then both the type and the behaviour are under your control.

03 / Structures inside structures

Nesting &
Composition

Real-world front matter is rarely flat. Once you need to model relationships — an author with a name and a role, a list of links each with a URL and label — you need to understand how YAML's nesting rules actually work.

Indentation is semantics

Unlike Python (which uses indentation for code blocks), YAML's indentation rules are strictly defined: each nesting level must use more spaces than its parent. Two spaces is the convention; four works too. But the number must be consistent within a document. Mix and match and the parser will reject it or silently misparse.

nested mappings and sequences
---
title: Deep Dive

# Nested mapping: author is an object
author:
  name: Alice Chen
  role: senior editor
  social:
    github: alicechen
    twitter: alicec

# Sequence of scalars: simple list
tags:
  - yaml
  - front-matter
  - intermediate

# Sequence of mappings: list of objects
links:
  - title: Documentation
    url: https://yaml.org
  - title: YAML Spec
    url: https://yaml.org/spec/1.2
---
Critical — tabs are illegal

YAML explicitly prohibits tab characters for indentation. The spec says so unambiguously, and every major parser enforces it. Editors that auto-convert tabs to spaces mask this. Editors that don't will cause silent parse failures. If your YAML parser errors with "mapping values are not allowed here" or similar, the first thing to check is tab characters.

Anchors, aliases, and the merge key

When the same data appears in multiple places, YAML gives you a way to define it once and reference it elsewhere. This is less common in front matter than in full YAML documents, but you'll see it in GitHub Actions workflows, Docker Compose files, and complex Hugo configurations.

anchors and aliases
# & defines an anchor — names this node for reuse
defaults: &post_defaults
  layout: post
  draft: false
  author: Alice

# * is an alias — inserts the anchored value here
published_post:
  <<: *post_defaults  ← merge key: inherits all defaults
  title: Overrides title, keeps everything else

# Result: published_post has layout, draft, author, AND title

# Without merge key — direct scalar alias
name: &the_name Alice
display_name: *the_name   ← also "Alice"
Where this matters

GitHub Actions workflows use anchors extensively to share steps between jobs. Docker Compose uses the merge key (<<) to share service configuration. If you're only writing Markdown front matter, you'll rarely need this — but when you move into CI/CD or Docker, understanding anchors means you can read and modify these files without guessing.

Common nesting gotcha: duplicate keys

The YAML specification says that duplicate keys in a mapping are undefined behaviour. In practice, most parsers silently keep the last value — but this is not guaranteed. Linters will flag it; parsers may not. Never rely on key ordering or overriding behaviour.

duplicate keys — undefined behaviour
---
title: First title
draft: true
title: Second title   ← which one wins?
---

# js-yaml (Astro): "Second title" (last wins, with warning)
# Ruby Psych (Jekyll): "Second title" (last wins, silently)
# strictYaml: throws an error (correct behaviour)
# yamllint: reports an error ✓
04 / Real parsers, real decisions

Front Matter
in the Wild

The same front matter block can produce different results depending on the parser your tool uses. Knowing which parser each major tool uses — and what choices it makes — prevents a class of production bugs.

Tool Parser YAML Version Key behaviour
Astro js-yaml v4 1.2 Only true/false are booleans. Optional Zod schema coercion on top.
Jekyll Ruby Psych 1.1 Yes/no/on/off are booleans. Dates auto-parse. Country code gotcha is live here.
Hugo Go yaml.v3 1.2 YAML supported but TOML (+++) is the preferred default in Hugo projects.
Eleventy js-yaml 1.2 Same behaviour as Astro's YAML parsing layer.
GitHub Actions Go yaml.v3 1.2 Full YAML document (no front matter delimiters). Same grammar, different context.

Astro: YAML + Zod = type safety

Astro's content collections let you define a Zod schema that validates and coerces your front matter. This is the production-grade approach — instead of trusting implicit YAML type resolution, you declare exactly what types you expect:

src/content/config.ts
// Define the schema for your blog collection
import { defineCollection, z } from 'astro:content';

const blog = defineCollection({
  schema: z.object({
    title:     z.string(),
    pubDate:   z.date(),           // coerces "2024-01-15" → Date
    draft:     z.boolean().default(false),
    tags:      z.array(z.string()).optional(),
    author:    z.object({
      name:    z.string(),
      role:    z.string().optional()
    }).optional(),
  })
});

export const collections = { blog };
Why this matters at S2

With a Zod schema, Astro validates your front matter at build time and gives you TypeScript types in your page components. Instead of frontmatter.pubDate being string | Date | null | undefined, it's typed as Date. You shift from runtime guessing to compile-time certainty.

YAML vs TOML vs JSON — choosing deliberately

Format Delimiter Type system Best for
YAML --- Implicit (inferred) Human-authored content. Expressive, but edge cases require care.
TOML +++ Explicit (declared) Config files where correctness matters more than terseness. No implicit typing.
JSON {...} Explicit (declared) Machine-generated content. Strict spec, no comments, no ambiguity.
YAML front matter
---
title: My Post
date: 2024-01-15
draft: false
tags:
  - yaml
  - tutorial
---
TOML front matter (Hugo)
+++
title = "My Post"
date = 2024-01-15T00:00:00Z
draft = false
tags = ["yaml", "tutorial"]
+++

TOML's key difference: all strings must be quoted, all booleans are exactly true/false, and dates have a prescribed format. There is no implicit typing — what you write is unambiguously what you get. The tradeoff is verbosity. YAML wins on brevity; TOML wins on predictability.

05 / Take this with you

Reference
Cheatsheet

A condensed reference of everything in this guide — data types, multiline string variants, gotchas, and tool behaviour at a glance.

Scalar types — block style
title: Hello World
Unquoted string — works until it doesn't
flag: 'NO'
Single-quoted string — safe, no escapes
msg: "Line\nBreak"
Double-quoted — escape sequences active
count: 42
Integer
ratio: 3.14
Float
active: true
Boolean — only true/false in YAML 1.2
empty: ~
Null — also: null, or nothing after colon
date: 2024-01-15
Date object — quote to force string
Multiline string variants
key: |
Literal block — newlines preserved, trailing newline added
key: |-
Literal, strip — newlines preserved, no trailing newline
key: |+
Literal, keep — newlines preserved, all trailing newlines kept
key: >
Folded block — newlines become spaces
key: >-
Folded, strip — spaces only, no trailing newline
Anchors and aliases
key: &anchor value
Define anchor named "anchor"
other: *anchor
Alias — insert anchored value here
<<: *anchor
Merge key — merge anchored mapping's keys
Number edge cases
n: 0xFF
Hex integer → 255
n: 0o77
Octal (YAML 1.2) → 63
n: .inf
Positive infinity
n: .nan
Not a Number
v: '1.0'
Force version string — not a float

The gotcha list

The Norway Problem

In YAML 1.1 (Ruby/Psych, older PyYAML), yes, no, on, off, y, n and their case variants are booleans. Country code NO becomes false. Fix: quote any value that could be misread.

Tabs in indentation

YAML explicitly forbids tabs for indentation. Only spaces are legal. This is enforced by all conforming parsers. Configure your editor to use spaces in YAML files.

Auto-parsed dates

An unquoted 2024-01-15 is a Date object in most parsers, not a string. If you're reading the value and expecting a string, you'll get a Date. Fix: quote it, or use a Zod schema to control coercion.

Duplicate keys

Undefined by the spec. Most parsers silently keep the last value. Linters (yamllint) will catch this. Don't rely on key ordering or shadowing behaviour.

Octal number syntax changed between versions

YAML 1.1: 077 is octal 63. YAML 1.2: 077 is the integer 77. Use 0o77 (YAML 1.2 syntax) when you mean octal, and it's unambiguous in both versions that support it.

Special characters trigger parsing

The characters :, #, [, ], {, }, |, >, !, &, * have special meaning in YAML. If any appear in your value at the start or after a space, quote the whole value.

Validation tools

yamllint — CLI linter, catches tabs, duplicate keys, trailing spaces, and more. yaml.online-parser.appspot.com — paste YAML, see the parsed structure instantly. Astro's built-in type checking — run astro check to validate front matter against your Zod schemas.

go deeper · T3

What comes next

Natural T3 continuations of this topic — not live yet, but they're coming.

T3 · read now →

YAML Internals, Parsers & Typed Config

The YAML 1.2 spec's four processing layers, parser divergence, advanced Zod schemas, and typed config alternatives at scale.

T3 · coming soon

JSON Schema & Validation

The spec that defines what your data is allowed to look like — powers VS Code autocomplete, API contracts, and Astro content schemas.

T3 · coming soon

Environment Variables & Secrets

Why .env files exist, how dotenv works, what secrets managers do, and why you should never hardcode a key.