Header menu logo XParsec

A Complete JSON Parser

One of the best ways to see the power of a parser combinator library is to build a parser for a well-known format. This guide will walk you through the creation of a complete, robust JSON parser using XParsec.

Our goal is to parse a JSON string into a structured F# object model. This parser was written following the official JSON specification. The inclusion of the "railroad diagrams" from the spec shows how XParsec's approach allows you to write a parser that is a near-direct transliteration of a formal grammar.

1. Defining the F# Model (AST)

First, we need to define the F# types that will represent a parsed JSON document. This is our Abstract Syntax Tree (AST). We'll use a recursive discriminated union for JsonValue and ImmutableArray for collections to ensure our model is efficient and immutable.

Note: JSON types are mutually recursive so we used type ... and ... to specify the type definitions.

open System.Collections.Immutable

[<RequireQualifiedAccess>]
type JsonValue =
    | String of string
    | Number of float
    | Object of JsonObject
    | Array of JsonArray
    | True
    | False
    | Null
and JsonMember = { Name: string; Value: JsonValue }
and JsonObject = ImmutableArray<JsonMember>
and JsonArray = ImmutableArray<JsonValue>

2. Structuring the Parsers: The static let Pattern

We want our parser to be able to parse any kind of character input, not just strings. While XParsec parsers are functions and can be generic, defining them as top-level or module-level let-bound values runs into F#'s "value restriction," which prevents simple values from being automatically generalized.

A clean solution is to define our parsers as static let members inside a generic type. This allows the parsers to be implicitly parameterized by the type's generics, enabling both reusability and mutual recursion without issue. At the point of use, a concrete version of the JsonParsers type will be created based on the input Reader.

For a deeper dive, see Microsoft's documentation on Value Restriction and Finer Points of F# Value Restriction.

Caution:

It's tempting to define a parser like let pThing reader = parser { ... } reader. But this is a performance trap. By giving pThing an explicit reader parameter it causes the parser { } builder to be re-evaluated each time the parser is run.

open XParsec
open XParsec.CharParsers

// Generic parameters must match the contrains of the XParsec.Reader type
type JsonParsers<'Input, 'InputSlice
    when 'Input :> IReadable<char, 'InputSlice> and 'InputSlice :> IReadable<char, 'InputSlice>> =

    // Parsers will be defined here as static let bindings...
    
    // We can expose the final parser as a static member
    static member Parser = pJson

This structure forms the backbone of our JSON parser. All subsequent parsers will be defined inside this JsonParsers type.

3. Parsing Primitives

Whitespace

Whitespace can be inserted between any pair of tokens.

JSON Whitespace

The diagram shows that whitespace is zero or more space, tab, or newline characters. The skipMany combinator is a perfect fit.

static let pWhitespace = skipMany (anyOf [ ' '; '\t'; '\n'; '\r' ])

String

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.

JSON String

A string is enclosed in quotes, containing a sequence of characters. These characters can be normal (pOtherChar) or escaped (pEscape).

static let pString =
    parser {
        let! _ = pitem '"'
        let! chars = manyChars (choiceL [ pEscape; pOtherChar ] "")
        let! _ = pitem '"'
        return chars
    }

// Any codepoint except " or \ or control characters
static let pOtherChar: Parser<_, _, _, _, _> =
    satisfyL
        (function
        | '"'
        | '\\' -> false
        | c -> not (Char.IsControl c))
        "Other Char"

Escape sequences are all preceded by \. Where there are many posibilities with the same prefix it can be more efficient to use XParsec in an imperative mode, using the PeekN and SkipN methods on Reader. This makes the parser more efficient in several ways:

Avoiding creating unnecessary error messages is often one of the best ways to speed up your parser. That is one reason why the parseL parser variants exist like satisfyL or choiceL, it allows XParsec to skip generating a dynamic error messages and use the provided string.

// Parse a hex digit to an integer (case insensitive)
static let pHexDigit =
    satisfyL Char.IsAsciiHexDigit ("Hex digit")
    |>> function
        | c when c >= '0' && c <= '9' -> int c - int '0'
        | c when c >= 'a' && c <= 'f' -> int c - int 'a' + 10
        | c when c >= 'A' && c <= 'F' -> int c - int 'A' + 10
        | _ -> failwith "Invalid hex digit" // Unreachable

// A unicode escape sequence is \u followed by 4 hex digits
static let pUnicodeEscape: Parser<_, _, _, _, _> =
    parser {
        let! _ = pstring "\\u"
        let! hex0 = pHexDigit
        let! hex1 = pHexDigit
        let! hex2 = pHexDigit
        let! hex3 = pHexDigit
        let hexValue = (hex0 <<< 12) + (hex1 <<< 8) + (hex2 <<< 4) + hex3
        return Convert.ToChar(hexValue)
    }

// An escape sequence is \ followed by one of [ '"'; '\\'; '/'; 'b'; 'f'; 'n'; 'r'; 't'; 'u' ]
static let pEscape =
    fun (reader: Reader<_, _, _, _>) ->
        let span = reader.PeekN 6

        if span.Length >= 2 && span[0] = '\\' then
            match span[1] with
            | '"'
            | '\\'
            | '/' as c ->
                reader.SkipN 2
                preturn c reader
            | 'b' ->
                reader.SkipN 2
                preturn '\b' reader
            | 'f' ->
                reader.SkipN 2
                preturn '\f' reader
            | 'n' ->
                reader.SkipN 2
                preturn '\n' reader
            | 'r' ->
                reader.SkipN 2
                preturn '\r' reader
            | 't' ->
                reader.SkipN 2
                preturn '\t' reader
            | 'u' ->
                if span.Length < 6 then
                    fail (Message "Unicode escape sequence must be followed by 4 hex digits") reader
                else
                    pUnicodeEscape reader
            | c -> fail (Unexpected c) reader
        else
            fail (Message "Escape char") reader

Number

A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.

JSON Number

The number grammar can be broken down into an optional sign, an integer part, an optional fractional part, and an optional exponent part. We build parsers for each piece and combine them.

static let pDigit = satisfyL (fun c -> c >= '0' && c <= '9') "Char in range '0' - '9'"
static let pOneNine = satisfyL (fun c -> c >= '1' && c <= '9') "Char in range '1' - '9'"

// A fraction is a dot followed by one or more digits
static let pFraction =
    parser {
        let! dot = pitem '.'
        let! digits = many1Chars pDigit
        return digits
    }

// An exponent is 'e' or 'E', an optional sign, and one or more digits.
static let pExponent =
    parser {
        let! e = anyOf [ 'e'; 'E' ]
        let! sign = opt (anyOf [ '+'; '-' ])
        let! digits = many1Chars pDigit
        return struct (sign, digits)
    }

// The complete number parser.
static let pNumber =
    parser {
        // 1. Parse optional sign.
        let! sign = opt (pitem '-')

        // 2. Parse integer part (either "0" or "1-9" followed by digits).
        // Leading zeroes are not allowed.
        let! int = (pitem '0' >>% "0") <|> (many1Chars2 pOneNine pDigit)

        // 3. Parse optional fractional and exponent parts.
        let! fraction = opt pFraction
        let! exponent = opt pExponent

        // 4. Build the final number string and parse it to a float.
        let! number =
            let sb = StringBuilder(16)

            match sign with
            | ValueSome c -> sb.Append(c) |> ignore
            | _ -> ()

            sb.Append(int) |> ignore

            match fraction with
            | ValueSome f -> sb.Append('.').Append(f) |> ignore
            | _ -> ()

            match exponent with
            | ValueSome(sign, digits) ->
                sb.Append('e') |> ignore

                match sign with
                | ValueSome sign -> sb.Append(sign) |> ignore
                | _ -> ()

                sb.Append(digits) |> ignore
            | _ -> ()

            let number = sb.ToString()

            match Double.TryParse(number) with
            | true, result -> preturn result
            | _ -> failwithf "Failed to parse number: %s" number

        return JsonValue.Number number
    }

4. Recursive Parsers: Values, Objects and Arrays

Now we get to the core of the JSON structure: values, objects, and arrays. These parsers are mutually recursive. We define them using F#'s let rec ... and ... syntax.

Value

A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.

JSON Value

This is the central dispatcher. It looks at the next character to decide which specific value parser to run. This is a common and efficient technique called "predictive parsing."

static let pTrue = pstring "true" >>% JsonValue.True
static let pFalse = pstring "false" >>% JsonValue.False
static let pNull = pstring "null" >>% JsonValue.Null
static let pStringValue = pString |>> JsonValue.String

static let rec pValue: Parser<JsonValue, char, _, _, _> =
    fun cursor ->
        match cursor.Peek() with
        | ValueSome '{' -> pObject cursor
        | ValueSome '[' -> pArray cursor
        | ValueSome '"' -> pStringValue cursor
        | ValueSome 't' -> pTrue cursor
        | ValueSome 'f' -> pFalse cursor
        | ValueSome 'n' -> pNull cursor
        | _ -> pNumber cursor

// A helper to parse any value, surrounded by optional whitespace
and pElement =
    parser {
        let! _ = pWhitespace
        let! value = pValue
        let! _ = pWhitespace
        return value
    }

Object

An object is an unordered set of name/value pairs. An object begins with { left brace and ends with } right brace. Each name is followed by : colon and the name/value pairs are separated by , comma.

JSON Object

// A key-value pair inside an object
and pMember =
    parser {
        let! _ = pWhitespace
        let! name = pString
        let! _ = pWhitespace
        let! _ = pitem ':'
        let! value = pElement
        return { Name = name; Value = value }
    }

// A full JSON object: '{' members '}'
and pObject =
    parser {
        let! _ = pitem '{'
        let! _ = pWhitespace
        // sepBy will not parse a trailing comma, consistent with the JSON spec
        let! members, _ = sepBy pMember (pitem ',')
        let! _ = pitem '}'
        return JsonValue.Object members
    }

Array

An array is an ordered collection of values. An array begins with [ left bracket and ends with ] right bracket. Values are separated by , comma.

JSON Array

// A full JSON array: '[' elements ']'
and pArray = parser {
    do! pchar '['
    let! values = sepBy pElement (pchar ',')
    do! pchar ']'
    return JsonValue.Array (values |> fst)
}

5. The Final Parser

A valid JSON document consists of a single element (a value, object or array) followed by the end of the input.

// inside JsonParsers type...
static let pJson: Parser<JsonValue, char, unit, 'Input, 'InputSlice> = 
    pElement .>> eof

// We expose it as a public member
static member Parser = pJson

With pJson defined, our JsonParsers.Parser static member is now complete and ready to use. This example shows how to build a complex, efficient, and readable parser by breaking the problem down into small, manageable pieces that mirror the official specification.

namespace System
namespace System.Collections
namespace System.Collections.Immutable
Multiple items
type RequireQualifiedAccessAttribute = inherit Attribute new: unit -> RequireQualifiedAccessAttribute

--------------------
new: unit -> RequireQualifiedAccessAttribute
type JsonValue = | String of string | Number of float | Object of JsonObject | Array of JsonArray | True | False | Null
module String from Microsoft.FSharp.Core
Multiple items
val string: value: 'T -> string

--------------------
type string = System.String
Multiple items
val float: value: 'T -> float (requires member op_Explicit)

--------------------
type float = System.Double

--------------------
type float<'Measure> = float
type JsonObject = ImmutableArray<JsonMember>
module Array from Microsoft.FSharp.Collections
type JsonArray = ImmutableArray<JsonValue>
type JsonMember = { Name: string Value: JsonValue }
Multiple items
type ImmutableArray = static member BinarySearch<'T> : array: ImmutableArray<'T> * index: int * length: int * value: 'T -> int + 3 overloads static member Create<'T> : unit -> ImmutableArray<'T> + 9 overloads static member CreateBuilder<'T> : unit -> Builder<'T> + 1 overload static member CreateRange<'T> : items: IEnumerable<'T> -> ImmutableArray<'T> + 4 overloads static member ToImmutableArray<'TSource> : items: IEnumerable<'TSource> -> ImmutableArray<'TSource> + 3 overloads
<summary>Provides methods for creating an array that is immutable; meaning it cannot be changed once it is created. NuGet package: System.Collections.Immutable (about immutable collections and how to install)</summary>

--------------------
[<Struct>] type ImmutableArray<'T> = member Add: item: 'T -> ImmutableArray<'T> member AddRange: items: IEnumerable<'T> -> ImmutableArray<'T> + 7 overloads member As<'TOther (requires reference type)> : unit -> ImmutableArray<'TOther> member AsMemory: unit -> ReadOnlyMemory<'T> member AsSpan: unit -> ReadOnlySpan<'T> + 2 overloads member CastArray<'TOther (requires reference type)> : unit -> ImmutableArray<'TOther> member Clear: unit -> ImmutableArray<'T> member Contains: item: 'T -> bool + 1 overload member CopyTo: sourceIndex: int * destination: 'T array * destinationIndex: int * length: int -> unit + 3 overloads member Equals: other: ImmutableArray<'T> -> bool + 1 overload ...
<summary>Represents an array that is immutable; meaning it cannot be changed once it is created. NuGet package: System.Collections.Immutable (about immutable collections and how to install)</summary>
<typeparam name="T">The type of element stored by the array.</typeparam>
Multiple items
val char: value: 'T -> char (requires member op_Explicit)

--------------------
type char = System.Char
Multiple items
val int: value: 'T -> int (requires member op_Explicit)

--------------------
type int = int32

--------------------
type int<'Measure> = int
val failwith: message: string -> 'T
val sign: value: 'T -> int (requires member Sign)
union case ValueOption.ValueSome: 'T -> ValueOption<'T>
val ignore: value: 'T -> unit
val failwithf: format: Printf.StringFormat<'T,'Result> -> 'T
union case JsonValue.Number: float -> JsonValue
union case JsonValue.True: JsonValue
union case JsonValue.False: JsonValue
union case JsonValue.Null: JsonValue
union case JsonValue.String: string -> JsonValue
union case JsonValue.Object: JsonObject -> JsonValue
union case JsonValue.Array: JsonArray -> JsonValue
val fst: tuple: ('T1 * 'T2) -> 'T1
type unit = Unit
'Input
'InputSlice

Type something to start searching.