Development of Chemical Markup Language (CML) as a System for Handling Complex Chemical Content - Supplemental Information (XML)
Peter Murray-Rust2 Henry. S. Rzepa1 Michael Wright1
(1) Department of Chemistry, Imperial College of Science, Technology and Medicine, UK - (2) School of Pharmaceutical Sciences, University of Nottingham, UK - August 15, 2000
Additional Figures
JavaScript is able to access functions within the applet. Click a button to see a demonstration of this
Warning: you need to be online and the server-returned page will overwrite this one
Appendix A - List of ChiMeraL Resources
These resources are available from the ChiMeraL website - http://www.ch.ic.ac.uk/chimeral/
Demonstrations
Schema
XSL Stylesheets
perl converters
A large number of example CML files are also available, these contain properties, structures, spectra and reactions. Further archives of CML files will be made available as they are converted.
Appendix B - CML Syntax and Notes
The published CML 1.0 DTD2 declares valid elements and attributes but puts few restrictions on how these elements and attributes are used. As a consequence of this, it is possible to markup a CML object (e.g. a molecule) using a variety of different syntaxes. Whilst this gives great flexibility, it also makes it significantly more difficult to build CML applications. In particularl some of these syntaxes can not be easily parsed using stylesheets. I have selected what I feel is the syntax best suited to XSL and small molecule markup. This, along with the syntax for chimeral:spectrum and reaction (still experimental) is as follows: (comments on syntax are in green)
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="document.xsl" ?> <!-- Declares this document as XML and indicate the URL of its stylesheet --> <document title="Lipids" id="cmldoc_karne_lipids" xmlns="http://www.w3.org/1999/xhtml" xmlns:chimeral="x-schema:http://www.ch.ic.ac.uk/chimeral/spectrum_schema_ie_01.xml"> <!-- <document> isn't part of CML but represents the top element of any XML compliant document, this might contain CML, XHTML, MaML etc. - note the XHTML and chimeral namespaces --> <cml title="Cholesterol" id="cml_karne_cholesterol" xmlns="x-schema:http://www.ch.ic.ac.uk/chimeral/cml_schema_ie_02.xml"> <!-- The CML namespace points to a copy of the schema, <cml> this may contain any number of CML 'objects' e.g. <molecule>, <reaction> or <chimeral:spectrum> --> <molecule title="cholesterol" id="mol_cholesterol"> <formula>C27 H46 O</formula> <!-- Information specific to this molecule is included here as <string>, <float> or <integer> - additional elements can be added as required but note the use of @title to label them. Alternate names are marked up as a <list> of <string>s --> <string title="CAS">57-88-5</string> <string title="ACX">I1001660</string> <string title="RTECS">FZ8400000</string> <float title="molecule weight">386.6598</float> <float title="melting point" units="degC">148.5</float> <float title="boiling point" units="degC">360</float> <float title="specific gravity">1.067</float> <list title="alternate names"> <string title="name">Cholesterin</string> <string title="name">(3beta)-Cholest-5-en-3-ol</string> <string title="name">Cholest-5-en-3-ol (3beta)-</string> <string title="name">cholest-5-ene-3beta-ol</string> <string title="name">3beta-hydroxycholest-5-ene</string> </list> <!-- The following is used for small molecular structures - this format is much preferred but rather verbose. <integer builtin="atomId"> would normally be that used in the MDL .mol format but in contrast, 'id' must be unique over (at least) this document. Additional strings are 'formalCharge' and 'hydrogenCount'. 2D structures will use builtin="x2 | y2" but are otherwise the same --> <list title="atoms"> <!-- repeat --> <atom id="cholesterol_a_1"> <integer builtin="atomId">1</integer> <float builtin="x3" units="A">-1.9901</float> <float builtin="y3" units="A">2.1889</float> <float builtin="z3" units="A">-1.8776</float> <string builtin="elementType">H</string> </atom> <!-- /repeat (74 atoms) --> </list> <!-- Large molecular structures - this format is terse but much harder to format/refer to in XSL. I have chosen not to use it --> <atomArray id="methanol"> <stringArray title="label">a1 a2 a3 a4 a5 a6</stringArray> <stringArray builtin="elementType">C O H H H H</stringArray> <floatArray builtin="x3">-0.748 ..</floatArray> <floatArray builtin="y3">-0.015 ..</floatArray> <floatArray builtin="z3">0.024 ..</floatArray> <integerArray builtin="formalCharge"></integerArray> </atomArray> <!-- A <list> of <bond>s is used for small molecules - large ones will probably ignore bonds and calculate then directly --> <list title="bonds"> <!-- repeat --> <bond id="cholesterol_b_1"> <integer title="bondId">1</integer> <integer builtin="atomRef">2</integer> <integer builtin="atomRef">1</integer> <integer builtin="order" convention="MDL">1</integer> </bond> <!-- /repeat (77 bonds) --> </list> </molecule> <!-- Elements in spectra tend to match the LDHs in the JCAMP format. The namespace chimeral: is very important as spectrum isn't found in CML 1.0 --> <chimeral:spectrum title="Cholesterol" id="spect_cholesterol_ms_1" convention="JCAMP-DX=4.24"> <string title="datatype">MASS SPECTRUM</string> <string title="EPA">67286</string> <string title="origin">T.IIDA NIHON UNIVERSITY, KORIYAMA, FUKUSHIMA-KEN, JAPAN</string> <string title="owner">NIST Mass Spectrometry Data Center</string> <string title="spectrometer">LKB 9000</string> <!-- The following information is required for the rendering of spectra in Jspec --> <float title="xunits">M/Z</float> <float title="yunits">RELATIVE ABUNDANCE</float> <float title="firstx" convention="M/Z">18</float> <float title="lastx" convention="M/Z">387</float> <float title="deltax" convention="M/Z"></float> <float title="xfactor">1</float> <float title="firsty" convention="RELATIVE ABUNDANCE">700</float> <float title="miny" convention="RELATIVE ABUNDANCE">100</float> <float title="maxy" convention="RELATIVE ABUNDANCE">9999</float> <float title="yfactor">1</float> <float title="npoints">169</float> < !-- One of the following syntaxes is then used, depending on the type of spectrum --> < !-- A: simplest and prefered spectra format. Try and avoid (X++(Y..Y)) --> <list title="xypairs" convention="(XY..XY)"> <!-- repeat --> <coordinate2 id="cholesterol_ms_c_1">18, 700</coordinate2> <!-- /repeat (169 data pairs) --> </list> <!-- B: alternate format for peak tables rather then data --> <list title="peak table" convention="(XY..XY)"> <!-- repeat --> <coordinate2 id="mol_s_1">X, Y</coordinate2> <!-- /repeat --> </list> <!-- C: convention often used for NMR peak tables is (XYM) --> <list title="peak table" convention="(XYM)"> <!-- repeat --> <coordinate3 id="mol_s_1">X, Y, M</coordinate3> <!-- /repeat --> </list> </chimeral:spectrum> <!-- Reactions are made up of a series of lists, each list containing a number of links @href to molecule @id --> <reaction title="Reactions" id="simple_rxn_1" convention="stepwise | linear | cycle.4 | .."> <!-- Linear markup is much preferred since it's the simplest, others provided for formatting purposes. The following three elements can be used either at the reaction level or at each reaction step --> <string title="description">Diels-Alder cycloaddition</string> <float title="yield" units="%">88</float> <string title="notes">example</string> <!-- Stepwise (x1 > y1, x2 > y2); use as many reactants, reagents and products as needed for each step, reactions steps will be displayed separately. Additional links to catalysts, intermediates, transition states etc. can be used as required --> <!-- repeat --> <list title="reactionStep" id="simple_s_1"> <link title="reactant" href="mol_x"/> <link title= "reagent" href= "mol_r"> <integer title="index">1</integer> <string title="solvent">Acetonitrile</string> <string title="temperature" units="degC">100</string> <string title="duration" units="hours">3</string> <string title= "notes">reflux</string> </link> <link title="reagent"> <integer title="index">2</integer> <string title="notes">workup</string> </link> <link title="product" href="mol_y"/> </list> <!-- /repeat --> <!-- Linear (x > y > z); reactant refers to the first reactants, product refers to the final product; intermediates use linearReactant and linearProduct --> <list title= "linearstep" id="step_1"> <link title="reactant" href="mol_1"/> <link title="reagent" href="mol_r1"> <!-- .. --> </link> <link title="linearProduct" href="mol_2"/> </list> <list title= "linearstep" id="step_2"> <link title="linearReactant" href="mol_2"/> <link title="reagent" href="mol_r2"> <!-- .. --> </link> <link title="linearProduct" href="mol_3"/> </list> <list title= "linearstep" id="step_3"> <link title="linearReactant" href="mol_3"/> <link title="reagent" href="mol_r3"> <!-- .. --> </link> <link title="Product" href="mol_4"/> </list> <!-- Catalytic cycle - much more complex (.. > x > y > z > ..) reactant and product refer to substances 'in' and 'out' of the cycle in each step, use cycleReactant and cycleProduct for 'within' the cycle. Markup should be cyclic (final cycleProduct == first cycleReactant) --> <list title="reactionStep" id="step_1"> <link title="cycleReactant" href="mol_1" id="cycle_lk_1"/> <link title="reactant" href="mol_3" id="cycle_lk_2"/> <link title="cycleProduct" href="mol_7" id="cycle_lk_3"/> </list> <list title="reactionStep" id="step_2"> <link title="cycleReactant" href="mol_7" id="cycle_lk_4"/> <link title="cycleProduct" href="mol_6" id="cycle_lk_5"/> <link title="product" href="mol_8" id="cycle_lk_6"/> </list> <list title="reactionStep" id="step_3"> <link title="cycleReactant" href="mol_6" id="cycle_lk_7"/> <link title="reactant" href="mol_5" id="cycle_lk_8"/> <link title="cycleProduct" href="mol_4" id="cycle_lk_9"/> </list> <list title="reactionStep" id="step_4"> <link title="cycleReactant" href="mol_4" id="cycle_lk_10"/> <link title="cycleProduct" href="mol_1" id="cycle_lk_11"/> <link title="product" href="mol_2" id="cycle_lk_12"/> </list> <!-- list title="atomMap" could be placed within each step --> </reaction> </document>
Appendix C - CML Schema (IE 0.2) Notes
This version of the schema is based directly on the CML 1.0 DTD but includes datatype declarations for use in IE 5.x. These datatypes allow the use of @id and @href for intra-document linking. A platform independent version (plus schemas for chimeral: and docml:) are available from the ChiMeraL site.1 (comments on syntax are in green)
<?xml version="1.0"?> <Schema name="cml_dev_karne" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <description> CML development - Version 0.2 - 7/4/00 This document is the first draft of an XML Schema compatible with CML V1.0 published in JCICS... In converting the schema we have suggested some closed and some open content models. As CML develops it seems likely that there will be advantage in opening some of the models (e.g. angle), perhaps for annotations and ancillary information. Readers should note that the XML Schema activity is still at draft stage and that this schema may be revised in the future for compatibility. This schema is intended for use with IE5 - a platform independent version is available. XML documents can be validated against this schema by adding xmlns="x-schema:URL" within the cml element. Please see cml_schema_ie_02.html for further comments and explanations. Peter Murray Rust, Henry Rzepa, Michael Wright Comments to Michael Wright - karne@innocent.com </description> <!-- ********** Attribute Types ********** --> <!-- *** common *** --> <!-- It is expected that these attributes will be found on almost all elements. @title is used for display and general labelling and @id indicates a document unique identity string for that element. This identity can then be used to reference that element (and hence the object it represents) by the use of @href. Care must be taken that @id is unique. The use (for example) of <atom id="1"> will cause trouble in a document with more than one molecule and should be avoided. Note that @title on data elements (string/integer/float) can be used to markup data for which there is no explicit CML element (e.g. <string title="CAS">58-08-2</string>) and @convention should be declared if this is not obvious. @builtin implies an element with a meaning predefined in the DTD --> <AttributeType name="title" required="no"/> <AttributeType name="id" required="no" dt:type="id"/> <AttributeType name="convention" required="no"/> <AttributeType name="builtin" required="no" dt:type="enumeration" dt:values="x2 y2 xy2 x3 y3 z3 xyz3 xFract yFract zFract xyzFract elementType atomId isotope occupancy hydrogenCount atomParity residueType residueId formalCharge atomRef atomRefs length order stereo acell bcell ccell alpha beta gamma z spacegroup" /> <!-- *** linkers *** --> <!-- Designed to allow the linking of an element to another via @id. For example <link href="mol_543"> indicates a link to any element with @id="mol_543". @unitsRef and @dictRef are intended for future use --> <AttributeType name="href" required="no" dt:type="idrefs"/> <AttributeType name="dictRef" required="no" dt:type="idrefs"/> <AttributeType name="unitsRef" required="no" dt:type="idrefs"/> <AttributeType name="atomRef" required="no" dt:type="idref"/> <AttributeType name="atomRefs" required="no" dt:type="idrefs"/> <!-- *** quantifiers/constraints *** --> <!-- Various constraints on the values of data elements, only 'units' is in common use --> <AttributeType name="count" required="no"/> <AttributeType name="size" required="no"/> <AttributeType name="rows" required="no"/> <AttributeType name="columns" required="no"/> <AttributeType name="min" required="no"/> <AttributeType name="max" required="no"/> <AttributeType name="units" required="no"/> <!-- ********** Element Types ************ --> <!-- *** data *** --> <!-- These elements are intended to contain only text string and attributes - they may not contain other elements - and are hence 'closed' (no additional markup can be added beyond the schema). If additional data holders are required, they should be of the form <string title="CAS">58-08-2</string> where the title value is used to identify the elements contents. For full list of attributes, please see the schema --> <ElementType name="string" content="textOnly" model="closed"> <attribute type="id"/> <attribute type="builtin"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="dictRef"/> </ElementType> <ElementType name="float" content="textOnly" model="closed"> <attribute type="id"/> <attribute type="builtin"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="min"/> <attribute type="max"/> <attribute type="units"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <ElementType name="integer" content="textOnly" model="closed"> <attribute type="id"/> <attribute type="builtin"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="min"/> <attribute type="max"/> <attribute type="units"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <!-- The arrays should be used only when large amounts of data need to be stored. In all cases explicit markup of each unit of data is preferred since this is much easier to manipulate with a stylesheet. In some cases this may not be possible (e.g. large molecules with 200+ atoms) and arrays can be used. These would probably need to be parsed by dedicated chemical tools. floatMatrix is intended for computational chemistry --> <ElementType name="stringArray" content="textOnly" model="closed"> <attribute type="id"/> <attribute type="builtin"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="size"/> <attribute type="min"/> <attribute type="max"/> <attribute type="dictRef"/> </ElementType> <ElementType name="floatArray" content="textOnly" model="closed"> <attribute type="id"/> <attribute type="builtin"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="size"/> <attribute type="min"/> <attribute type="max"/> <attribute type="units"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <ElementType name="integerArray" content="textOnly" model="closed"> <attribute type="id"/> <attribute type="builtin"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="size"/> <attribute type="min"/> <attribute type="max"/> <attribute type="units"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <ElementType name="floatMatrix" content="textOnly" model="closed"> <attribute type="id"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="rows"/> <attribute type="columns"/> <attribute type="min"/> <attribute type="max"/> <attribute type="units"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <!-- Coordinates consist of 2 or 3 comma separated numbers. They should be used for logically linked number groups - e.g. a data point in a spectrum --> <ElementType name="coordinate2" content="textOnly" model="closed"> <!-- use for data pairs (e.g spectra xy) --> <attribute type="id"/> <attribute type="builtin"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="units"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <ElementType name="coordinate3" content="textOnly" model="closed"> <!-- use for data triplets (e.g spectra x, y, m) --> <attribute type="id"/> <attribute type="builtin"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="units"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <!-- Angle and torsion haven't yet been developed, hence they have been left open --> <ElementType name="angle" content="textOnly" model="open"> <attribute type="id"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="atomRefs"/> <attribute type="min"/> <attribute type="max"/> <attribute type="units" default="deg"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <ElementType name="torsion" content="textOnly" model="open"> <attribute type="id"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="atomRefs"/> <attribute type="min"/> <attribute type="max"/> <attribute type="units" default="deg"/> <attribute type="unitsRef"/> <attribute type="dictRef"/> </ElementType> <!-- *** mixed *** --> <!-- Link is used as a 'holder' element - e.g. for 'href' or 'unitsRef' and indicates a logical link to another CML object --> <ElementType name="link" content="mixed" model="open"> <attribute type="id"/> <attribute type="title"/> <attribute type="href"/> </ElementType> <ElementType name="formula" content="mixed" model="open"> <attribute type="id"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="count"/> <attribute type="dictRef"/> </ElementType> <!-- *** structural *** --> <!-- These elements define the tree structure of the CML document and are not expected to contain text strings - only other elements. Since they are 'open', additional sub elements - whether CML or not - can be added with correct namespacing. Attributes and elements for these elements have been declared in the schema but these are only suggestions - the DTD allows anything. Many, like electron and the crystalagraphic markup have yet to be developed --> <ElementType name="atom" content="eltOnly" model="open" order="many"> <attribute type="id"/> <attribute type="title"/> <attribute type="convention" default="mol"/> <attribute type="count"/> <attribute type="dictRef"/> <element type="float" minOccurs="0" maxOccurs="*"/> <element type="string" minOccurs="0" maxOccurs="*"/> <element type="integer" minOccurs="0" maxOccurs="*"/> <element type="link" minOccurs="0" maxOccurs="*"/> </ElementType> <ElementType name="bond" content="eltOnly" model="open" order="many"> <attribute type="id"/> <attribute type="convention" default="mol"/> <attribute type="atomRef"/> <attribute type="atomRefs"/> <!-- avoid using these and use a subelement with builtin="atomRef" --> <element type="integer" minOccurs="0" maxOccurs="*"/> <element type="float" minOccurs="0" maxOccurs="*"/> <element type="string" minOccurs="0" maxOccurs="*"/> <element type="link" minOccurs="0" maxOccurs="*"/> <element type="angle" minOccurs="0" maxOccurs="*"/> <element type="torsion" minOccurs="0" maxOccurs="*"/> </ElementType> <ElementType name="list" content="eltOnly" model="open" order="many"> <attribute type="id"/> <attribute type="title"/> <!-- required: atoms/bonds/xypairs/peak table/atom map--> <attribute type="convention"/> <!-- should be included for spectra: (XY..XY), (XYM) --> <element type="string" minOccurs="0" maxOccurs="*"/> <element type="integer" minOccurs="0" maxOccurs="*"/> <element type="float" minOccurs="0" maxOccurs="*"/> <element type="floatArray" minOccurs="0" maxOccurs="*"/> <element type="stringArray" minOccurs="0" maxOccurs="*"/> <element type="integerArray" minOccurs="0" maxOccurs="*"/> <element type="floatMatrix" minOccurs="0" maxOccurs="*"/> <element type="angle" minOccurs="0" maxOccurs="*"/> <element type="torsion" minOccurs="0" maxOccurs="*"/> <element type="coordinate2" minOccurs="0" maxOccurs="*"/> <element type="coordinate3" minOccurs="0" maxOccurs="*"/> <element type="link" minOccurs="0" maxOccurs="*"/> <element type="formula" minOccurs="0" maxOccurs="*"/> <element type="atom" minOccurs="0" maxOccurs="*"/> <element type="bond" minOccurs="0" maxOccurs="*"/> </ElementType> <ElementType name="atomArray" content="eltOnly" model="open" order="many"> <!-- only use for large molecules --> <attribute type="id"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="dictRef"/> <element type="stringArray" minOccurs="0" maxOccurs="*"/> <element type="floatArray" minOccurs="0" maxOccurs="*"/> <element type="integerArray" minOccurs="0" maxOccurs="*"/> <element type="link" minOccurs="0" maxOccurs="*"/> </ElementType> <ElementType name="bondArray" content="eltOnly" model="open" order="many"> <!-- only use for large molecules --> <attribute type="id"/> <attribute type="convention"/> <attribute type="dictRef"/> <element type="stringArray" minOccurs="0" maxOccurs="*"/> <element type="floatArray" minOccurs="0" maxOccurs="*"/> <element type="integerArray" minOccurs="0" maxOccurs="*"/> <element type="link" minOccurs="0" maxOccurs="*"/> </ElementType> <ElementType name="electron" content="eltOnly" model="open"> <attribute type="id"/> <attribute type="count"/> <attribute type="convention"/> <attribute type="dictRef"/> </ElementType> <ElementType name="crystal" content="eltOnly" model="open"> <attribute type="id"/> <attribute type="title" default="crystal"/> <attribute type="convention"/> <attribute type="dictRef"/> </ElementType> <ElementType name="sequence" content="eltOnly" model="open"> <attribute type="id"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="dictRef"/> </ElementType> <ElementType name="feature" content="eltOnly" model="open"> <attribute type="id"/> <attribute type="title"/> <attribute type="convention"/> <attribute type="dictRef"/> </ElementType> <ElementType name="molecule" content="eltOnly" model="open" order="many"> <attribute type="id"/> <attribute type="title" default="molecule"/> <attribute type="convention" default="mol"/> <attribute type="dictRef"/> <element type="formula" minOccurs="0" maxOccurs="*"/> <element type="list" minOccurs="0" maxOccurs="*"/> <element type="atomArray" minOccurs="0" maxOccurs="*"/> <element type="bondArray" minOccurs="0" maxOccurs="*"/> <element type="string" minOccurs="0" maxOccurs="*"/> <element type="float" minOccurs="0" maxOccurs="*"/> <element type="integer" minOccurs="0" maxOccurs="*"/> <element type="link" minOccurs="0" maxOccurs="*"/> </ElementType> <ElementType name="reaction" content="eltOnly" model="open" order="many"> <attribute type="id"/> <attribute type="title" default="reaction"/> <attribute type="convention"/> <attribute type="dictRef"/> <element type="string" minOccurs="0" maxOccurs="*"/> <element type="float" minOccurs="0" maxOccurs="*"/> <element type="integer" minOccurs="0" maxOccurs="*"/> <element type="list" minOccurs="0" maxOccurs="*"/> </ElementType> <!-- ********** root ********** --> <!--This is the general 'holder' for all CML documents. Normally this would then be embedded within an XML <document> Namespaces/schema should be declared within this element or the document root--> <ElementType name="cml" content="mixed" model="open" order="many"> <attribute type="id"/> <attribute type="title" default="cml document"/> <attribute type="convention"/> <attribute type="dictRef"/> <element type="molecule" minOccurs="0" maxOccurs="*"/> <element type="crystal" minOccurs="0" maxOccurs="*"/> <element type="reaction" minOccurs="0" maxOccurs="*"/> </ElementType> </Schema>
Appendix D - Examples
This is a copy of the DTD for CML 1.0 published in JCICS by Peter Murray-Rust and Henry S. Rzepa 2
<!-- Appendix A - CML DTD-1999-05-15 --> <!-- Authors: P.Murray-Rust H.Rzepa This DTD is fully described in Journal of Chemical Information and Computer Science, Vol xxx, 1999, pp. xxx --> <!-- =======================================================--> <!-- PARAMETER ENTITIES --> <!-- =======================================================--> <!-- ======attributes found on almost all elements ==========--> <!ENTITY % title 'title CDATA #IMPLIED'> <!ENTITY % id 'id CDATA #IMPLIED'> <!ENTITY % convention 'convention CDATA "CML"'> <!ENTITY % dictRef 'dictRef CDATA #IMPLIED'> <!-- ======linking information ==============================--> <!ENTITY % simpleLink 'href CDATA #REQUIRED'> <!-- ======quantifiers and constraints on some primitives ===--> <!ENTITY % count 'count CDATA "1"'> <!ENTITY % size 'size CDATA #IMPLIED'> <!ENTITY % rows 'rows CDATA #REQUIRED'> <!ENTITY % columns 'columns CDATA #REQUIRED'> <!-- ======constraints on some numeric data primitives ===--> <!ENTITY % min 'min CDATA #IMPLIED'> <!ENTITY % max 'max CDATA #IMPLIED'> <!ENTITY % units 'units CDATA #IMPLIED'> <!ENTITY % angleUnits 'units (degrees | radians) "degrees"'> <!ENTITY % unitsRef 'unitsRef CDATA #IMPLIED'> <!-- values which may be useful in min and max attributes --> <!ENTITY % integer.zero '0'> <!ENTITY % integer.max '2147483647'> <!ENTITY % integer.min '-2147483648'> <!ENTITY % float.zero '0.0'> <!ENTITY % float.max '8.98846567431158E307'> <!ENTITY % float.min '4.9E-324'> <!-- ======= builtin values for any element ================--> <!ENTITY % builtinId 'id'> <!-- ======= builtin values for atoms ======================--> <!ENTITY % elementType 'elementType'> <!ENTITY % atomId 'atomId'> <!ENTITY % x2 'x2'> <!ENTITY % y2 'y2'> <!ENTITY % x3 'x3'> <!ENTITY % y3 'y3'> <!ENTITY % z3 'z3'> <!ENTITY % xy2 'xy2'> <!ENTITY % xyz3 'xyz3'> <!ENTITY % xFract 'xFract'> <!ENTITY % yFract 'yFract'> <!ENTITY % zFract 'zFract'> <!ENTITY % xyzFract 'xyzFract'> <!ENTITY % occupancy 'occupancy'> <!ENTITY % isotope 'isotope'> <!ENTITY % formalCharge 'formalCharge'> <!ENTITY % nonHydrogenCount 'nonHydrogenCount'> <!ENTITY % hydrogenCount 'hydrogenCount'> <!ENTITY % atomParity 'atomParity'> <!ENTITY % residueType 'residueType'> <!ENTITY % residueId 'residueId'> <!ENTITY % atomStringBuiltin ' %elementType; | %atomId; | %residueType; | %residueId; ' > <!ENTITY % atomFloatBuiltin ' %x2; | %y2; | %x3; | %y3; | %z3; | %xFract; | %yFract; | %zFract; | %occupancy; | %isotope; | %formalCharge; | %hydrogenCount; | %nonHydrogenCount; | %atomParity; ' > <!ENTITY % atomIntegerBuiltin ' %isotope; | %formalCharge; | %hydrogenCount; | %nonHydrogenCount; | %atomParity; ' > <!ENTITY % atomCoordinate2Builtin ' %xy2; ' > <!ENTITY % atomCoordinate3Builtin ' %xyz3; | %xyzFract; ' > <!-- ======= builtin values for bonds ======================--> <!ENTITY % atomRef 'atomRef'> <!ENTITY % builtinAtomRefs 'atomRefs'> <!ENTITY % length 'length'> <!ENTITY % order 'order'> <!ENTITY % stereo 'stereo'> <!ENTITY % atomRefs 'atomRefs CDATA #IMPLIED'> <!ENTITY % bondStringBuiltin ' %atomRef; | %builtinAtomRefs; | %order; | %stereo; ' > <!ENTITY % bondFloatBuiltin ' %length; ' > <!ENTITY % bondIntegerBuiltin '' > <!-- ======= builtin values for crystal ====================--> <!ENTITY % acell 'acell'> <!ENTITY % bcell 'bcell'> <!ENTITY % ccell 'ccell'> <!ENTITY % alpha 'alpha'> <!ENTITY % beta 'beta'> <!ENTITY % gamma 'gamma'> <!ENTITY % z 'z'> <!ENTITY % spacegroup 'spacegroup'> <!ENTITY % crystalStringBuiltin ' %spacegroup; ' > <!ENTITY % crystalFloatBuiltin ' %acell; | %bcell; | %ccell; | %alpha; | %beta; | %gamma; | %z; ' > <!ENTITY % crystalIntegerBuiltin ' %z; ' > <!-- =======================================================--> <!ENTITY % stringBuiltin ' builtin ( %builtinId; | %atomStringBuiltin; | %bondStringBuiltin; | %crystalStringBuiltin; ) #IMPLIED ' > <!ENTITY % floatBuiltin ' builtin ( %atomFloatBuiltin; | %bondFloatBuiltin; | %crystalFloatBuiltin; ) #IMPLIED ' > <!ENTITY % integerBuiltin ' builtin ( %atomIntegerBuiltin; | %crystalIntegerBuiltin; ) #IMPLIED ' > <!ENTITY % coordinate2Builtin ' builtin ( %atomCoordinate2Builtin; ) #IMPLIED ' > <!ENTITY % coordinate3Builtin ' builtin ( %atomCoordinate3Builtin; ) #IMPLIED ' > <!-- =======================================================--> <!-- ELEMENTS for widely used data primitives --> <!-- =======================================================--> <!ELEMENT string (#PCDATA)> <!ATTLIST string %title; %id; %stringBuiltin; %dictRef; %convention; > <!ELEMENT link (#PCDATA)> <!ATTLIST link %title; %id; %simpleLink; %convention; > <!ELEMENT float (#PCDATA)> <!ATTLIST float %title; %id; %floatBuiltin; %min; %max; %units; %unitsRef; %dictRef; %convention; > <!ELEMENT integer (#PCDATA)> <!ATTLIST integer %title; %id; %integerBuiltin; %min; %max; %units; %unitsRef; %dictRef; %convention; > <!ELEMENT stringArray (#PCDATA)> <!ATTLIST stringArray %title; %id; %stringBuiltin; %size; %min; %max; delimiter CDATA #IMPLIED %dictRef; %convention; > <!ELEMENT floatArray (#PCDATA)> <!ATTLIST floatArray %title; %id; %floatBuiltin; %size; %min; %max; %units; %unitsRef; %dictRef; %convention; > <!ELEMENT integerArray (#PCDATA)> <!ATTLIST integerArray %title; %id; %integerBuiltin; %size; %min; %max; %units; %unitsRef; %dictRef; %convention; > <!ELEMENT floatMatrix (#PCDATA)> <!ATTLIST floatMatrix %title; %id; %rows; %columns; %min; %max; %units; %unitsRef; %dictRef; %convention; > <!ELEMENT coordinate2 (#PCDATA)> <!ATTLIST coordinate2 %title; %id; %coordinate2Builtin; %unitsRef; %dictRef; %convention; > <!ELEMENT coordinate3 (#PCDATA)> <!ATTLIST coordinate3 %title; %id; %coordinate3Builtin; %unitsRef; %dictRef; %convention; > <!ELEMENT angle (#PCDATA)> <!ATTLIST angle %title; %id; %atomRefs; %angleUnits; %min; %max; %dictRef; %convention; > <!ELEMENT torsion (#PCDATA)> <!ATTLIST torsion %title; %id; %atomRefs; %angleUnits; %min; %max; %dictRef; %convention; > <!ELEMENT list ANY> <!ATTLIST list %title; %id; > <!-- =======================================================--> <!-- ELEMENTS for chemical and crystallographic concepts --> <!-- =======================================================--> <!-- NOTE for elements which have element-specific values for the builtin attribute, those values are already listed as entities --> <!-- =======================================================--> <!ELEMENT molecule ANY> <!ATTLIST molecule %title; %id; %count; %dictRef; %dictRef; %convention; > <!-- ========================================================--> <!ELEMENT formula ANY> <!ATTLIST formula %title; %id; %count; %dictRef; %convention; > <!-- ========================================================--> <!ELEMENT atom ANY> <!ATTLIST atom %title; %id; %count; %dictRef; %convention; > <!-- .......................................................--> <!ELEMENT atomArray ANY> <!ATTLIST atomArray %title; %id; %dictRef; %convention; > <!-- ========================================================--> <!ELEMENT bond ANY> <!ATTLIST bond %id; %atomRefs; %dictRef; %convention; > <!-- ========================================================--> <!ELEMENT bondArray ANY> <!ATTLIST bondArray %id; %dictRef; %convention; > <!-- ========================================================--> <!ELEMENT electron ANY> <!ATTLIST electron %id; %count; %dictRef; %convention; > <!-- ========================================================--> <!ELEMENT reaction ANY> <!ATTLIST reaction %id; %dictRef; %convention; > <!-- ======================================================= --> <!ELEMENT crystal ANY> <!ATTLIST crystal %title; %id; %dictRef; %convention; > <!-- ======================================================= --> <!ELEMENT sequence ANY> <!ATTLIST sequence %title; %id; %dictRef; %convention; > <!-- ======================================================= --> <!ELEMENT feature ANY> <!ATTLIST feature %title; %id; %dictRef; %convention; >
Appendix E - Glossary
References