easy_xlsx
The repo contains three libraries for reading data from Microsoft's document
formats ("Office Open XML").
open_packaging
parses Office Open XML's "Open Packaging Conventions"
(the container format for all Microsoft Office documents)spreadsheetml
parses the XML data in SpreadsheetML (i.e. Excel's XLSX
format)easy_xlsx
reads XLSX documents, applies the formatting in the document,
and returns the result as a list of sheets (with a sheet name and then
astring list list
of data). The goal of this library is to give the
same results as if you exported the XLSX file to CSV and then read it with
a standard CSV reader.
open_packaging
and spreadsheetml
are relatively safe to use but incomplete
(it should be obvious what they're mising -- if a field doesn't exist, I
haven't got to it yet). Everything that does exist should be parsed properly.
easy_xlsx
is in very early stages. It should properly give read XLSX files
and output correct types, but the SpreadsheetML spec doesn't list all of the
built-in format strings, so some types may not be handled correctly. At the
moment, easy_xlsx
will bail out in any case where it can't understand the
formatting, although I'd be open to patches to make this optional.
Building
Install dependencies:
opam pin add -n easy_xlsx .
opam depext easy_xlsx
opam install --deps-only easy_xlsx
Then build:
make
You can run the tests if you want:
make test
The Makefile
is just a thin wrapper around jbuilder, so you can use
jbuilder commands too if you prefer.
Helping
If you want to help with this, create an issue with what you'd like to work
on, and mention it if you need me to help with anything (point you at the
relevant spec, give my opinion on approaches, etc.).
Some things I could use help with:
Implement more of the specs. See ECMA 376.
The editions only contain changes, so most of what's interesting is in
the 1st edition. Part 2 is the most interesting for the Open Packaging
Conventions and Part 4 has specifics for SpreadsheetML (or the other office
formats if you want to start a library for them).Add more tests. Right now there's a set of extremely high level tests
that we get the sound output as OpenOffice's CSV export, but being so high
level means a lot of things are completely untested until we're 100%
finished, which isn't a good situation to be in. It's easy to have a typo
when implementing this spec, so I'd like to aim for 100% test coverage.The cell formatting needs a lot of work. It's what I'm likely to work on
next, but if you think you can do it first I'd be happy to let someone
else do it (let me know if you start working on this though, so we don't
duplicate effort).Make CamlZip support a string or bigstring interface.
Right now we can only read actual files on the filesystem since that's the
interface CamlZip gives us. I'd like to be able to read data from memory.Make a pure-OCaml ZIP library. Then we won't need any system dependencies
and should work withjs_of_ocaml
too.
with-test
with-test
>= "1.0+beta18"
build & >= "1.3.0"
>= "4.04.2"