Generics Guide

The center of the generic runtime is fy_generic.

fy_generic is the sum-type value used by libfyaml to represent YAML and JSON data in C. A fy_generic holds one value of one runtime type:

  • null

  • bool

  • int

  • float

  • string

  • sequence

  • mapping

  • YAML-specific wrappers such as anchors, tags, and aliases

The rest of the generic API is about working with fy_generic values: creating them, storing them, parsing them, querying them, transforming them, and emitting them again.

If you know Python dict / list / scalar workflows, or value-tree APIs such as serde_json::Value, this is the closest match in libfyaml.

fy_generic As The Core Type

Think of fy_generic as a tagged sum type and as the main application-facing generic object:

  • scalar fy_generic values stay typed

  • sequence fy_generic values are ordered collections

  • mapping fy_generic values are key/value collections

  • the same fy_generic type represents YAML or JSON input

  • fy_generic values are immutable, so operations return new values and leave the original value unchanged

The design goal is Python-like data handling in C with explicit allocation and schema control.

In practice, most generic code is built around a small set of questions:

  • how do I create a fy_generic value?

  • how long does this fy_generic need to live?

  • how do I read values out of it?

  • how do I produce a new fy_generic from an existing one?

fy_generic Layout

fy_generic is a single pointer-sized word. The low 3 bits carry the type tag. The remaining bits hold either:

  • the value itself, for small inline cases

  • an aligned pointer to out-of-place storage

That layout is what makes the sum-type model practical in regular C. A large part of the common scalar space fits directly in the word with no separate allocation.

On 64-bit builds

The 64-bit layout is the main target for the generic API:

  • signed inline integers use 61 bits

  • inline integer range is -1152921504606846976 to 1152921504606846975

  • inline strings use a 7-byte NUL-terminated slot in the word, which gives 6 bytes of payload plus the trailing \\0

  • 32-bit floats can be stored inline; larger double values are stored out of place

In practice this means that values such as 42, true, "abc", and a large number of short keys and enum-like strings fit directly in the fy_generic word.

On 32-bit builds

The same model applies on 32-bit builds with smaller inline ranges:

  • signed inline integers use 29 bits

  • inline integer range is -268435456 to 268435455

  • inline strings use a 3-byte NUL-terminated slot, which gives 2 bytes of payload plus the trailing \\0

  • floats are stored out of place

Special Inline Values

Some values are always represented directly in the word:

  • fy_null

  • fy_false

  • fy_true

  • fy_invalid

  • empty sequence

  • empty mapping

These are encoded as fixed tagged values, so they never need separate storage.

Out-Of-Place Values

When a value does not fit inline, the word stores an aligned pointer:

  • scalar out-of-place storage uses 8-byte alignment

  • sequence and mapping storage uses 16-byte alignment

  • the low tag bits remain available because the pointed-to storage is aligned

This is how larger integers, larger strings, full double values, sequences, mappings, and indirect YAML metadata wrappers are represented.

Public Constants

The public header exposes the constants that define this layout:

  • FYGT_GENERIC_BITS

  • FYGT_INT_INPLACE_BITS

  • FYGT_INT_INPLACE_MIN / FYGT_INT_INPLACE_MAX

  • FYGT_STRING_INPLACE_SIZE

Those constants are the authoritative description of the current fy_generic encoding.

Three Lifetime Modes

The API comes in three practical tiers for storing fy_generic values.

Stack-local literals

Use these when the fy_generic value only needs to live inside the current function.

fy_generic cfg = fy_mapping(
    "host", "localhost",
    "port", 8080,
    "tls", true);

This entry point maps directly to the Python binding model.

Builder-backed values

Use a struct fy_generic_builder when fy_generic values need to outlive a stack frame, or when operations may allocate larger composite values.

char storage[16384];
struct fy_generic_builder *gb;

gb = fy_generic_builder_create_in_place(
    FYGBCF_SCHEMA_AUTO | FYGBCF_SCOPE_LEADER,
    NULL, storage, sizeof(storage));

fy_generic cfg = fy_gb_mapping(
    gb,
    "host", "localhost",
    "port", 8080);

Builders fit persistent results, repeated transforms, and application-owned value stores.

Low-level storage control

The low-level helpers let you control the exact backing buffers used by fy_generic values. They are useful when you already have prebuilt item arrays or you are optimizing memory placement.

Schemas Matter

fy_generic is schema-less in shape, but scalar typing still depends on the parse or builder schema:

  • YAML 1.2 and JSON behave conservatively

  • YAML 1.1 enables older implicit forms

  • Python-compatible mode is useful when you want behavior that feels closer to the Python binding or PyYAML-style expectations

This matters for plain scalars such as yes, on, octal numbers, or sexagesimal forms.

The example examples/generic-roundtrip.c demonstrates the same input under different schema choices.

Constructing Values

The core construction macros are:

  • fy_value(...) for a single scalar or already-generic value

  • fy_sequence(...) for list-shaped data

  • fy_mapping(...) for dict-shaped data

Example:

fy_generic service = fy_mapping(
    "name", "api",
    "ports", fy_sequence(8080, 8443),
    "labels", fy_mapping("tier", "edge", "owner", "platform"));

This mirrors the Python binding closely:

service = {
    "name": "api",
    "ports": [8080, 8443],
    "labels": {"tier": "edge", "owner": "platform"},
}

Access and Conversion

Use fy_get() and casts to move between generic values and native C types.

fy_generic server = fy_get(doc, "server", fy_invalid);
const char *host = fy_get(server, "host", "");
long long port = fy_get(server, "port", 0LL);

For nested structures, prefer small intermediate values or fy_get_at_path() when it reads more naturally than repeated fy_get() calls.

Parsing and Emitting

The generic API has its own parse/emit helpers:

  • fy_parse() / fy_parse_file()

  • fy_emit() / fy_emit_file()

Example:

fy_generic doc = fy_parse(
    "items: [1, 2, 3]",
    FYOPPF_DISABLE_DIRECTORY | FYOPPF_INPUT_TYPE_STRING,
    NULL);

fy_generic yaml = fy_emit(
    doc,
    FYOPEF_DISABLE_DIRECTORY | FYOPEF_MODE_YAML_1_2 | FYOPEF_STYLE_BLOCK,
    NULL);

Emission returns a generic string value. This is convenient for testing, transform pipelines, and bindings.

Functional Operations

The generic runtime is value-oriented. Collection operations such as fy_filter(), fy_map(), and fy_reduce() support the same style used in Python, functional collections, and sum-type/value-tree libraries.

There are parallel variants too: fy_pfilter(), fy_pmap(), and fy_preduce(). These accept an explicit struct fy_thread_pool * so the application controls how worker threads are created and reused.

Lambda forms are available for both serial and parallel operations: fy_filter_lambda(), fy_map_lambda(), fy_reduce_lambda(), fy_pfilter_lambda(), fy_pmap_lambda(), and fy_preduce_lambda(). These are useful for short transforms where an inline expression reads more clearly than a separate callback function.

Compiler support follows the generic API implementation:

  • GCC uses nested-function lambdas

  • Clang uses Blocks when built with Blocks support

The build system checks for GCC heap trampoline support and enables -ftrampoline-impl=heap automatically when it is available. Newer GCC releases can therefore use the lambda APIs without the executable-stack warning seen on older toolchains.

Typical flow:

  • parse YAML into a generic

  • select a sub-value

  • transform a sequence

  • reduce to a summary value

  • install the new values back into a mapping

  • emit the final result

See examples/generic-transform.c for a complete builder-backed example.

Examples

The examples directory contains companion programs for this guide:

  • generic-literals.c for Python-like construction in C

  • generic-transform.c for value-oriented transforms

  • generic-lambda-capture.c for explicit lambda usage with captured local variables

  • generic-parallel-transform.c for serial and parallel lambda-based transforms over a larger sequence, with timing output and configurable item/thread counts

  • generic-roundtrip.c for parse/emit and schema behavior

  • generic-adoption-bridge.c for Python-binding-to-C mental mapping