Thursday, February 3, 2011

A portmanteau in cheminformatics: CANGEN combining CANonicalization and molecular graph GENeration

CANGEN is a two-stage algorithm in computational molecular graph theory, which converts an arbitrarily entered SMILES notation of a chemical structure into a unique one. The first stage is a canonicalization procedure that labels each atomic node of the molecular graph such that a canonical order for the nodes is derived. The second stage, then, generates the unique linear notation of the graph by starting with the lowest labeled atomic node.

A CANGEN-derived notation of a molecular structure is an efficient search key to locate information for the encoded structure in a database or via Internet, while containing and transmitting the structural information along as a key name.

Keywords: molecular graphs, molecular connectivity, molecular data exchange, disambiguation, semantic web

Reference
D. Weininger, A. Weininger and J. L. Weininger: SMILES 2. Algorithm for Generation of Unique SMILES Notation. J. Chem. Inf. Comput. Sci. 1989, 29, 97-101.
DOI: 10.1021/ci00062a008.

1 comment:

  1. Per Google Scholar, here's a source:

    http://www.iocd.unam.mx/organica/seminario/2-3.pdf

    Thanks! I've been looking to do this.

    ReplyDelete