package orsetto

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Unicode character set properties.

Overview

This module provides an interface to the Unicode character set database.

Types
type 'a map = 'a Ucs_ucdgen_aux.map

An alias for the abstract type representing a map of all Unicode code points to the value of its corresponding property.

type 'a index

The property index type. The full Unicode character database is large, and the portion required by the Orsetto Ucs library itself is small, so values of this type provide an abstraction of the relevant portion of the database available to the application.

type utyp = ..

The extensible universal property type.

type utyp +=
  1. | Typ_bool of bool map * bool index
  2. | Typ_int of int map * int index
  3. | Typ_string of string map * string index
  4. | Typ_uchars of Uchar.t list option map * Uchar.t list option index

The core population of the extensible universal property type.

Functions and Constants
val create_index : (string * 'a) list -> 'a index

Use create_index s to compose an index from a sequence of pairs.

val query_map : 'a map -> Uchar.t -> 'a

Use query m c to resolve the value property m for character c.

val search_index : 'a index -> string -> 'a option

Use search_index idx nym to query the index idx for the entry named by nym. Index keys are loosely matched.

val search_property : utyp index -> string -> utyp option

Use search_property idx nym to query the property database index idx for the property named nym. Property names are loosely matched.

val require_property : utyp index -> string -> utyp

Use require_property idx nym to query the property database index idx for the property named nym. Property names are loosedly matched. Raises Not_found if no property named nym is indexed.

type blk = [
  1. | `ASCII
  2. | `Adlam
  3. | `Aegean_Numbers
  4. | `Ahom
  5. | `Alchemical
  6. | `Alphabetic_PF
  7. | `Anatolian_Hieroglyphs
  8. | `Ancient_Greek_Music
  9. | `Ancient_Greek_Numbers
  10. | `Ancient_Symbols
  11. | `Arabic
  12. | `Arabic_Ext_A
  13. | `Arabic_Ext_B
  14. | `Arabic_Ext_C
  15. | `Arabic_Math
  16. | `Arabic_PF_A
  17. | `Arabic_PF_B
  18. | `Arabic_Sup
  19. | `Armenian
  20. | `Arrows
  21. | `Avestan
  22. | `Balinese
  23. | `Bamum
  24. | `Bamum_Sup
  25. | `Bassa_Vah
  26. | `Batak
  27. | `Bengali
  28. | `Bhaiksuki
  29. | `Block_Elements
  30. | `Bopomofo
  31. | `Bopomofo_Ext
  32. | `Box_Drawing
  33. | `Brahmi
  34. | `Braille
  35. | `Buginese
  36. | `Buhid
  37. | `Byzantine_Music
  38. | `CJK
  39. | `CJK_Compat
  40. | `CJK_Compat_Forms
  41. | `CJK_Compat_Ideographs
  42. | `CJK_Compat_Ideographs_Sup
  43. | `CJK_Ext_A
  44. | `CJK_Ext_B
  45. | `CJK_Ext_C
  46. | `CJK_Ext_D
  47. | `CJK_Ext_E
  48. | `CJK_Ext_F
  49. | `CJK_Ext_G
  50. | `CJK_Ext_H
  51. | `CJK_Radicals_Sup
  52. | `CJK_Strokes
  53. | `CJK_Symbols
  54. | `Carian
  55. | `Caucasian_Albanian
  56. | `Chakma
  57. | `Cham
  58. | `Cherokee
  59. | `Cherokee_Sup
  60. | `Chess_Symbols
  61. | `Chorasmian
  62. | `Compat_Jamo
  63. | `Control_Pictures
  64. | `Coptic
  65. | `Coptic_Epact_Numbers
  66. | `Counting_Rod
  67. | `Cuneiform
  68. | `Cuneiform_Numbers
  69. | `Currency_Symbols
  70. | `Cypriot_Syllabary
  71. | `Cypro_Minoan
  72. | `Cyrillic
  73. | `Cyrillic_Ext_A
  74. | `Cyrillic_Ext_B
  75. | `Cyrillic_Ext_C
  76. | `Cyrillic_Ext_D
  77. | `Cyrillic_Sup
  78. | `Deseret
  79. | `Devanagari
  80. | `Devanagari_Ext
  81. | `Devanagari_Ext_A
  82. | `Diacriticals
  83. | `Diacriticals_Ext
  84. | `Diacriticals_For_Symbols
  85. | `Diacriticals_Sup
  86. | `Dingbats
  87. | `Dives_Akuru
  88. | `Dogra
  89. | `Domino
  90. | `Duployan
  91. | `Early_Dynastic_Cuneiform
  92. | `Egyptian_Hieroglyphs
  93. | `Egyptian_Hieroglyph_Format_Controls
  94. | `Elbasan
  95. | `Elymaic
  96. | `Emoticons
  97. | `Enclosed_Alphanum
  98. | `Enclosed_Alphanum_Sup
  99. | `Enclosed_CJK
  100. | `Enclosed_Ideographic_Sup
  101. | `Ethiopic
  102. | `Ethiopic_Ext
  103. | `Ethiopic_Ext_A
  104. | `Ethiopic_Ext_B
  105. | `Ethiopic_Sup
  106. | `Geometric_Shapes
  107. | `Geometric_Shapes_Ext
  108. | `Georgian
  109. | `Georgian_Ext
  110. | `Georgian_Sup
  111. | `Glagolitic
  112. | `Glagolitic_Sup
  113. | `Gothic
  114. | `Grantha
  115. | `Greek
  116. | `Greek_Ext
  117. | `Gujarati
  118. | `Gunjala_Gondi
  119. | `Gurmukhi
  120. | `Half_And_Full_Forms
  121. | `Half_Marks
  122. | `Hangul
  123. | `Hanifi_Rohingya
  124. | `Hanunoo
  125. | `Hatran
  126. | `Hebrew
  127. | `High_PU_Surrogates
  128. | `High_Surrogates
  129. | `Hiragana
  130. | `IDC
  131. | `IPA_Ext
  132. | `Ideographic_Symbols
  133. | `Imperial_Aramaic
  134. | `Indic_Number_Forms
  135. | `Indic_Siyaq_Numbers
  136. | `Inscriptional_Pahlavi
  137. | `Inscriptional_Parthian
  138. | `Jamo
  139. | `Jamo_Ext_A
  140. | `Jamo_Ext_B
  141. | `Javanese
  142. | `Kaithi
  143. | `Kaktovik_Numerals
  144. | `Kana_Ext_A
  145. | `Kana_Ext_B
  146. | `Kana_Sup
  147. | `Kanbun
  148. | `Kangxi
  149. | `Kannada
  150. | `Katakana
  151. | `Katakana_Ext
  152. | `Kayah_Li
  153. | `Kawi
  154. | `Kharoshthi
  155. | `Khitan_Small_Script
  156. | `Khmer
  157. | `Khmer_Symbols
  158. | `Khojki
  159. | `Khudawadi
  160. | `Lao
  161. | `Latin_1_Sup
  162. | `Latin_Ext_A
  163. | `Latin_Ext_Additional
  164. | `Latin_Ext_B
  165. | `Latin_Ext_C
  166. | `Latin_Ext_D
  167. | `Latin_Ext_E
  168. | `Latin_Ext_F
  169. | `Latin_Ext_G
  170. | `Lepcha
  171. | `Letterlike_Symbols
  172. | `Limbu
  173. | `Linear_A
  174. | `Linear_B_Ideograms
  175. | `Linear_B_Syllabary
  176. | `Lisu
  177. | `Lisu_Sup
  178. | `Low_Surrogates
  179. | `Lycian
  180. | `Lydian
  181. | `Mahajani
  182. | `Mahjong
  183. | `Makasar
  184. | `Malayalam
  185. | `Mandaic
  186. | `Manichaean
  187. | `Marchen
  188. | `Masaram_Gondi
  189. | `Math_Alphanum
  190. | `Math_Operators
  191. | `Mayan_Numerals
  192. | `Medefaidrin
  193. | `Meetei_Mayek
  194. | `Meetei_Mayek_Ext
  195. | `Mende_Kikakui
  196. | `Meroitic_Cursive
  197. | `Meroitic_Hieroglyphs
  198. | `Miao
  199. | `Misc_Arrows
  200. | `Misc_Math_Symbols_A
  201. | `Misc_Math_Symbols_B
  202. | `Misc_Pictographs
  203. | `Misc_Symbols
  204. | `Misc_Technical
  205. | `Modi
  206. | `Modifier_Letters
  207. | `Modifier_Tone_Letters
  208. | `Mongolian
  209. | `Mongolian_Sup
  210. | `Mro
  211. | `Multani
  212. | `Music
  213. | `Myanmar
  214. | `Myanmar_Ext_A
  215. | `Myanmar_Ext_B
  216. | `NB
  217. | `NKo
  218. | `Nabataean
  219. | `Nag_Mundari
  220. | `Nandinagari
  221. | `New_Tai_Lue
  222. | `Newa
  223. | `No_Block_Assigned
  224. | `Number_Forms
  225. | `Nushu
  226. | `Nyiakeng_Puachue_Hmong
  227. | `OCR
  228. | `Ogham
  229. | `Ol_Chiki
  230. | `Old_Hungarian
  231. | `Old_Italic
  232. | `Old_North_Arabian
  233. | `Old_Permic
  234. | `Old_Persian
  235. | `Old_Sogdian
  236. | `Old_South_Arabian
  237. | `Old_Turkic
  238. | `Old_Uyghur
  239. | `Oriya
  240. | `Ornamental_Dingbats
  241. | `Osage
  242. | `Osmanya
  243. | `Ottoman_Siyaq_Numbers
  244. | `PUA
  245. | `Pahawh_Hmong
  246. | `Palmyrene
  247. | `Pau_Cin_Hau
  248. | `Phags_Pa
  249. | `Phaistos
  250. | `Phoenician
  251. | `Phonetic_Ext
  252. | `Phonetic_Ext_Sup
  253. | `Playing_Cards
  254. | `Psalter_Pahlavi
  255. | `Punctuation
  256. | `Rejang
  257. | `Rumi
  258. | `Runic
  259. | `Samaritan
  260. | `Saurashtra
  261. | `Sharada
  262. | `Shavian
  263. | `Shorthand_Format_Controls
  264. | `Siddham
  265. | `Sinhala
  266. | `Sinhala_Archaic_Numbers
  267. | `Small_Forms
  268. | `Small_Kana_Ext
  269. | `Sogdian
  270. | `Sora_Sompeng
  271. | `Soyombo
  272. | `Specials
  273. | `Sundanese
  274. | `Sundanese_Sup
  275. | `Sup_Arrows_A
  276. | `Sup_Arrows_B
  277. | `Sup_Arrows_C
  278. | `Sup_Math_Operators
  279. | `Sup_PUA_A
  280. | `Sup_PUA_B
  281. | `Sup_Punctuation
  282. | `Sup_Symbols_And_Pictographs
  283. | `Super_And_Sub
  284. | `Sutton_SignWriting
  285. | `Syloti_Nagri
  286. | `Symbols_And_Pictographs_Ext_A
  287. | `Symbols_For_Legacy_Computing
  288. | `Syriac
  289. | `Syriac_Sup
  290. | `Tagalog
  291. | `Tagbanwa
  292. | `Tags
  293. | `Tai_Le
  294. | `Tai_Tham
  295. | `Tai_Viet
  296. | `Tai_Xuan_Jing
  297. | `Takri
  298. | `Tamil
  299. | `Tamil_Sup
  300. | `Tangsa
  301. | `Tangut
  302. | `Tangut_Components
  303. | `Tangut_Sup
  304. | `Telugu
  305. | `Thaana
  306. | `Thai
  307. | `Tibetan
  308. | `Tifinagh
  309. | `Tirhuta
  310. | `Toto
  311. | `Transport_And_Map
  312. | `UCAS
  313. | `UCAS_Ext
  314. | `UCAS_Ext_A
  315. | `Ugaritic
  316. | `VS
  317. | `VS_Sup
  318. | `Vai
  319. | `Vedic_Ext
  320. | `Vertical_Forms
  321. | `Vithkuqi
  322. | `Wancho
  323. | `Warang_Citi
  324. | `Yezidi
  325. | `Yi_Radicals
  326. | `Yi_Syllables
  327. | `Yijing
  328. | `Zanabazar_Square
  329. | `Znamenny_Music
]

Unicode code block

val equal_blk : blk -> blk -> bool

Equality

val show_blk : blk -> string

String representation

type utyp +=
  1. | Typ_block of blk map * blk index

Extend the universal type

type gc = [
  1. | `C
  2. | `Cc
  3. | `Cf
  4. | `Cs
  5. | `Co
  6. | `Cn
  7. | `L
  8. | `LC
  9. | `Lu
  10. | `Ll
  11. | `Lt
  12. | `Lm
  13. | `Lo
  14. | `M
  15. | `Mn
  16. | `Mc
  17. | `Me
  18. | `N
  19. | `Nd
  20. | `Nl
  21. | `No
  22. | `P
  23. | `Pc
  24. | `Pd
  25. | `Ps
  26. | `Pe
  27. | `Pi
  28. | `Pf
  29. | `Po
  30. | `S
  31. | `Sm
  32. | `Sc
  33. | `Sk
  34. | `So
  35. | `Z
  36. | `Zs
  37. | `Zl
  38. | `Zp
]

The general category property value type.

val equal_gc : gc -> gc -> bool

Equality

val show_gc : gc -> string

String representation

type utyp +=
  1. | Typ_general_category of gc map * gc index
type qc =
  1. | QC_yes
  2. | QC_no
  3. | QC_maybe

The normalization quick check property type.

val equal_qc : qc -> qc -> bool

Equality

val show_qc : qc -> string

String representation

type utyp +=
  1. | Typ_quick_check of qc map * qc index

Extension of the universal type

type script = [
  1. | `Adlm
  2. | `Aghb
  3. | `Ahom
  4. | `Arab
  5. | `Armi
  6. | `Armn
  7. | `Avst
  8. | `Bali
  9. | `Bamu
  10. | `Bass
  11. | `Batk
  12. | `Beng
  13. | `Bhks
  14. | `Bopo
  15. | `Brah
  16. | `Brai
  17. | `Bugi
  18. | `Buhd
  19. | `Cakm
  20. | `Cans
  21. | `Cari
  22. | `Cham
  23. | `Cher
  24. | `Chrs
  25. | `Copt
  26. | `Cpmn
  27. | `Cprt
  28. | `Cyrl
  29. | `Deva
  30. | `Diak
  31. | `Dogr
  32. | `Dsrt
  33. | `Dupl
  34. | `Egyp
  35. | `Elba
  36. | `Elym
  37. | `Ethi
  38. | `Geor
  39. | `Glag
  40. | `Gong
  41. | `Gonm
  42. | `Goth
  43. | `Gran
  44. | `Grek
  45. | `Gujr
  46. | `Guru
  47. | `Hang
  48. | `Hani
  49. | `Hano
  50. | `Hatr
  51. | `Hebr
  52. | `Hira
  53. | `Hluw
  54. | `Hmng
  55. | `Hmnp
  56. | `Hrkt
  57. | `Hung
  58. | `Ital
  59. | `Java
  60. | `Kali
  61. | `Kana
  62. | `Kawi
  63. | `Khar
  64. | `Khmr
  65. | `Khoj
  66. | `Kits
  67. | `Knda
  68. | `Kthi
  69. | `Lana
  70. | `Laoo
  71. | `Latn
  72. | `Lepc
  73. | `Limb
  74. | `Lina
  75. | `Linb
  76. | `Lisu
  77. | `Lyci
  78. | `Lydi
  79. | `Mahj
  80. | `Maka
  81. | `Mand
  82. | `Mani
  83. | `Marc
  84. | `Medf
  85. | `Mend
  86. | `Merc
  87. | `Mero
  88. | `Mlym
  89. | `Modi
  90. | `Mong
  91. | `Mroo
  92. | `Mtei
  93. | `Mult
  94. | `Mymr
  95. | `Nagm
  96. | `Nand
  97. | `Narb
  98. | `Nbat
  99. | `Newa
  100. | `Nkoo
  101. | `Nshu
  102. | `Ogam
  103. | `Olck
  104. | `Orkh
  105. | `Orya
  106. | `Osge
  107. | `Osma
  108. | `Ougr
  109. | `Palm
  110. | `Pauc
  111. | `Perm
  112. | `Phag
  113. | `Phli
  114. | `Phlp
  115. | `Phnx
  116. | `Plrd
  117. | `Prti
  118. | `Qaai
  119. | `Rjng
  120. | `Rohg
  121. | `Runr
  122. | `Samr
  123. | `Sarb
  124. | `Saur
  125. | `Sgnw
  126. | `Shaw
  127. | `Shrd
  128. | `Sidd
  129. | `Sind
  130. | `Sinh
  131. | `Sogd
  132. | `Sogo
  133. | `Sora
  134. | `Soyo
  135. | `Sund
  136. | `Sylo
  137. | `Syrc
  138. | `Tagb
  139. | `Takr
  140. | `Tale
  141. | `Talu
  142. | `Taml
  143. | `Tang
  144. | `Tavt
  145. | `Telu
  146. | `Tfng
  147. | `Tglg
  148. | `Thaa
  149. | `Thai
  150. | `Tibt
  151. | `Tirh
  152. | `Tnsa
  153. | `Toto
  154. | `Ugar
  155. | `Vaii
  156. | `Vith
  157. | `Wara
  158. | `Wcho
  159. | `Xpeo
  160. | `Xsux
  161. | `Yezi
  162. | `Yiii
  163. | `Zanb
  164. | `Zinh
  165. | `Zyyy
  166. | `Zzzz
]

Unicode script identifier

val equal_script : script -> script -> bool

Equality

val show_script : script -> string

String representation

type utyp +=
  1. | Typ_script of script map * script index

Extend the universal type.

module Quick : sig ... end

This module contains internal fast-path functions for property query.