package yuscii

  1. Overview
  2. Docs
Mapper of UTF-7 to Unicode

Install

Dune Dependency

Authors

Maintainers

Sources

yuscii-v0.3.0.tbz
sha256=ef8d87ed575d14547326887930f9d8c0a638d35c40889d5aacec79c45d5074b1
sha512=d9747ddc01ce0d35be6ec95ff09cf9a0cf67fa53449ce0d700e5c4e31b83e6cf5c8985b0f2e154087bae8ddcce6cb1d4f33819e538570e4308bbba7e21cd669f

Description

A simple mapper between UTF-7 to Unicode according RFC2152. Useful for a translation between UTF-7 and Unicode

Published: 14 Mar 2020

README

Yuscii

Yuscii is a little library to decode an UTF-7 (RFC2152 for instance) input flow to Unicode. This library does not implement an encoder because, Eh guy, we are in 2018...

How to use it?

yuscii follows the same design than uutf or some others libraries with the same purpose: translate something to Unicode. We need to be able to control memory-consumption and ensure to offer a non-blocking computation. Finally, an error should not stop the process of the decoding.

This is a little example with uutf to translate UTF-7 to UTF-8:

let trans ic oc =
  let decoder = Yuscii.decoder (`Channel ic) in
  let encoder = Uutf.encoder `UTF_8 (`Channel oc) in
  let rec go () = match Yuscii.decode decoder with
    | `Await -> assert false (* XXX(dinosaure): impossible when you use `String of `Channel as source. *)
    | `Uchar _ as uchar -> ignore @@ Uutf.encode encoder uchar ; go ()
    | `End -> ignore @@ Uutf.encoder `End
    | `Malformed err -> failwith err in
  go ()
  
let () = trans stdin stdout

About UTF-7

SMTP protocol, for historical reasons is not necessary 8-bit clean protocol. In others words, SMTP may only support 7-bit data - and a 8-bit message had high chances to be garbled during transmission.

For this purpose, UTF-7 exists and provide a way to encode a message under this limit. The advantage of UTF-7 if we compare with the quoted-printable encoding or the base64 encoding (RFC2045), is the size where UTF-8 combined with quoted-printable produces a very size-inefficient flow.

Of course, nobody uses it...

About RFC2060

We rely only on RFC2152 where IMAP has his own UTF-7 and this package does not want to handle both - in others words, if you want to decode an IMAP UTF-7 flow (a mUTF-7), you probably should use something else than this library.

About encoding

As we said, nobody continues to use UTF-7 (0.002 % according w3techs) and this library is just an excuse to lost our times. So, the encoding is definitely not a part of our plan and if you really want to encode something to UTF-7, you are probably wrong.

A larger decoder

As a part of the mrmime project, yuscii is used by rosetta has an higher decoder of a larger set of encodings. You probably want to use it to decode everythings.

Distribution

yuscii integrates a little binary to translate UTF-7 flow to UTF-8: yuscii.to_utf8. It is provided as an example of how to use yuscii with uutf.

Did you know?

YUSCII is a 7-bit character encoding used in Yugoslavia. That's all...

Dependencies (2)

  1. dune
  2. ocaml >= "4.03.0"

Dev Dependencies (3)

  1. alcotest with-test
  2. uutf with-test
  3. fmt with-test

Used by (1)

  1. rosetta

Conflicts

None

OCaml

Innovation. Community. Security.