package molenc

  1. Overview
  2. Docs
Molecular encoder/featurizer using rdkit and OCaml

Install

Dune Dependency

Authors

Maintainers

Sources

v0.0.1.tar.gz
md5=9de5d3bb892267d5d2fb913532cb1d59

Description

Chemical fingerprints are lossy encodings of molecules. molenc allows to encode molecules using unfolded and counted fingerprints (i.e. potentially very long, but sparse, integer vectors).

Currently, Faulon fingerprints are supported. In the future, atom pair fingerprints might be added. Currently, atom types are the quadruplet (#pi-electrons, element symbol, #HA neighbors, formal charge). In the future, pharmacophore features might be supported (a more abstract/fuzzy atom typing scheme). In the future, the stereo-chemistry information that can be encoded into SMILES strings might be taken into account.

Bibliography:

Carhart, R. E., Smith, D. H., & Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2), 64-73.

Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., & Sheridan, R. P. (1996). Chemical similarity using physiochemical property descriptors. Journal of Chemical Information and Computer Sciences, 36(1), 118-127.

Faulon, J. L., Visco, D. P., & Pophale, R. S. (2003). The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. Journal of chemical information and computer sciences, 43(3), 707-720.

OpenSMILES specification. Craig A. James et. al. v1.0 2016-05-15. http://opensmiles.org/opensmiles.html

Published: 04 May 2019

README

molenc

Molecular encoder using rdkit and OCaml.

OUTDATED DESCRIPTION The implemented fingerprint is J-L Faulon's "Signature Molecular Descriptor". This is a counted, unfolded fingerprint of molecules.

The fingerprint can be run using atom types (#pi-electrons, element symbol, #HA neighbors, formal charge) or rdkit pharmacophore features (TODO) (Donor, Acceptor, PosIonizable, NegIonizable, Aromatic, Hydrophobe), if you want a fuzzier description of your molecules.

Bibliography

Carhart, R. E., Smith, D. H., & Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2), 64-73.

Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., & Sheridan, R. P. (1996). Chemical similarity using physiochemical property descriptors. Journal of Chemical Information and Computer Sciences, 36(1), 118-127.

Faulon, J. L., Visco, D. P., & Pophale, R. S. (2003). The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. Journal of chemical information and computer sciences, 43(3), 707-720.

Dependencies (7)

  1. ocaml >= "4.04.0" & < "5.0"
  2. conf-rdkit
  3. minicli
  4. parmap
  5. dolog < "4.0.0"
  6. batteries
  7. dune < "3.0"

Dev Dependencies

None

Used by (2)

  1. linwrap >= "9.0.3"
  2. rankers < "2.0.9"

Conflicts

None