package irmin-chunk

  1. Overview
  2. Docs

This package provides an Irmin backend to cut raw contents into blocks of the same size, while preserving the keys used in the store. It can be used to optimize space usage when dealing with large files or as a an intermediate layer for a raw block device backend.

# Install

Use opam:

```shell opam install irmin-chunk ```

# Use

```ocaml (* Build an Irmin store, where blobs are cut into chunks of same size *) module AO = Irmin_chunk.AO_stable(Irmin_mem.Link)(Irmin_mem.AO) module Store = Irmin.Make(AO)(Irmin_mem.RW) ```

Managing Chunks.

This module exposes functors to store raw contents into append-only stores as chunks of same size. It exposes the AO functor which split the raw contents into Data blocks, addressed by Node blocks. That's the usual rope-like representation of strings, but chunk trees are always build as perfectly well-balanced and blocks are addressed by their hash (or by the stable keys returned by the underlying store).

A chunk has the following structure:

     --------------------------
     | uint8_t type            |
     ---------------------------
     | uint16_t length         |
     ---------------------------
     | byte data[length]       |
     ---------------------------

type is either Data (0) or Node (1). If the chunk contains data, length is the payload length. Otherwise it is the number of children that the node has.

It also exposes AO_stable which -- as AO does -- stores raw contents into chunks of same size. But it also preserves the nice properpty that values are addressed by their hash. instead of by the hash of the root chunk node as it is the case for AO.

val chunk_size : int Irmin.Private.Conf.key

chunk_size is the configuration key to configure chunks' size. By default, it is set to 4666, so that payload and metadata can be stored in a 4K block.

val config : ?config:Irmin.config -> ?size:int -> ?min_size:int -> unit -> Irmin.config

config ?config ?size ?min_size () is the configuration value extending the optional config with bindings associating chunk_size to size.

Fail with Invalid_argument if size is smaller than min_size. min_size is, by default, set to 4000 (to avoid hash colision on smaller size) but can be tweaked for testing purposes. Notes: the smaller size is, the bigger the risk of hash collisions, so use reasonable values.

AO(X) is an append-only store which store values cut into chunks into the underlying store X.

AO_stable(L)(X) is similar to AO(X) but is ensures that the return keys are similar as if they were stored directly in X, so that the fact that blobs are cut into chunks is an implementation detail.