This is the old DTD documentation, not necessarily applicable to the Schema documentation.
Introduction
Scope of this document
These guidelines are a guide to Version 3.7of the RSC Primary Articles DTD.
Feedback and updates
We expect to learn a considerable amount about our developing XML application from the routine encoding of articles. Please let us know of any problems you encounter in using these instructions while trying to encode articles using the DTD provided. This will help us to improve both the application and its associated documentation.
We plan to issue updates to the DTD and documentation at regular, planned, intervals. You will be notified of these updates in advance, so that you can allocate resources to deal with any changes to data capture instructions or rendering software that might be required.
We intend to introduce the next version of the DTD in August 2000, with a preliminary version being available for comment and testing two weeks beforehand.
Format of this document
This document fulfils two functions. As well as containing instructions on the conventions to follow, it acts as an example of the results that are expected, being written to conform to the RSC Primary Articles DTD Version 3.7 .
Since this document is fully XML-conformant, it can be browsed in Internet Explorer 5.0, or converted to HTML, using the XSLT style sheet provided.
Scope of the data capture work
The initial objective is to capture all the text within each article which can be encoded in SGML/XML (see
next section). The DOCTYPE and document element will always be
<article>. Within this, the
<art-admin> (which holds the article's unique manuscript number),
<published> (for articles which have already appeared in print),
<art-front>,
<art-body> and
<art-back> element types will be routinely used, with an occasional
<appmat>.
SGML/XML encoding
As far as possible, all the information in the articles presented should be encoded in SGML and included in the resulting document. Obvious exceptions are figures, which should be referenced as external entities in the standard manner (see
Graphics below).
Both tables and equations are liable to be more difficult. If possible, these should be encoded in SGML, but we accept that there are liable to be cases where this is not feasible (or even possible) due to the complexity of the data or inadequacies in the DTD as currently drafted. In these cases the relevant object should be treated as a graphic. A particular example is where a table contains graphics spanned across rows or columns - this would be impossible to render accurately from the SGML. See
Tables and
Equations below for specific guidelines.
Articles should conform to XML as well as SGML conventions. This means that:
an XML declaration must be provided at the start of the article processing instructions must be terminated by "?>" empty elements must be terminated by "/>" (i.e for colspec, ugraphic, icgraphic) end-tags should always be provided, except for empty elements element and attribute names must be entered in
lower case, as per their definitions in the DTD
attribute values should always be quoted
A variety of tools can (and should) be used to check that articles consist of valid SGML/XML. The nsgmls program will check for SGML conformance. There is a wide variety of free or inexpensive XML-aware software. For example, if you open an XML document in Internet Explorer 5, its built-in XML parser will check the document for validity and report any errors.
File naming conventions
All manuscripts will have a unique identifier, assigned by RSC, e.g. a901234h. As well as being used to name the file containing the encoded article, this identifier will be encoded as the
<ms-id> element within the article.
Graphic types: (from RSC)
RK: We might be better splitting this into data capture sections and final supply sections. At data capture we will also require maths captured as TeX, as e.g. b000114c-t1.tex
The following filename styles should be supplied to the RSC:
for ms-id use the form a908765g for the SGML/XML files (and PDF in the future) use the filenames in the form a908765g.xml, .pdf for graphics generated at data capture/typesetting (maths, table images where required) use the form a908765g-t1.tif, and increment the numbers as t2, t3, t4, etc through the document.
Lower-case should be used.
File Delivery
We require, for each paper:
An SGML/XML file named as *.xml: File width - max width 1000 characters Any graphics created for inline/displayed maths, as LZW-compressed TIFF files at 600 dpi. The SGML/XML file will require call-outs to these images. The images should be named as specified above (e.g, a901234h-t1.tif).
Should we say any more about graphics formats, for figures, colour, other resolutions etc.?
The images as supplied by the RSC (figures, schemes, etc)
"also TeX" [RK]
In other words,
all relevant files should be supplied. Each document and associated files should be delivered as a zip file, named as above (e.g. a901234h.zip)
Form of PUBLIC identifiers
PUBLIC identifiers should be used throughout
In addition, each PUBLIC identifier should be followed by a SYSTEM identifier giving a URL that locates the resource in question. This
belt and braces strategy will allow articles to be treated as valid XML (XML requires a SYSTEM identifier), while offering us the flexibility of using SGML-aware software to interpret the PUBLIC identifiers in different ways, as necessary.
Thus the DOCTYPE declaration at the head of each article should always take the form:
<!DOCTYPE article PUBLIC "-//RSC//DTD RSC Primary Article DTD 3.7//EN" "http://www.rsc.org/dtds/rscart37.dtd">
PUBLIC identifiers should be constructed using the general format:
"RSC// [MS number] [object src]"
where the object src is the element type with number:
The names assigned within each article for the external entities it references should reflect the last component of the entity's PUBLIC identifier, e.g.
<!ENTITY eqn3 PUBLIC "RSC// a706828h eqn3" ...
Form of SYSTEM identifiers
The SYSTEM identifiers (i.e. filenames) assigned to each external entity should consist of the article's manuscript number followed by the entity's name, with a suitable suffix, e.g.:
<!ENTITY ugt3 PUBLIC "RSC// a706828h ugt3" "a706828h-t3.tif" NDATA tiff>
Documents relating to the RSC DTD
The DTD itself is in the file
rscart36.dtd. A number of other files are required before documents will parse against the DTD. They should all be stored in the same directory as the DTD itself, apart from the entities files (*.ent) which should be stored in a subdirectory named
entities. We use Internet Explorer 5 as our (XML) parser. We suggest suppliers use the same parser.
SGML Declaration
An SGML Declaration suitable for use with this DTD is in the file
rscxml33.dcl. This Declaration allows an XML-encoded article to be processed by SGML software. It specifies features such as case-sensitivity for element and attribute names, quoting of attribute values, XML-style processing instructions and empty element syntax, and Unicode support.
Catalog file
The catalog file
rscart3s.cat is in the standard OASIS catalog file format. It resolves all the PUBLIC identifiers declared in the DTD, as well as the PUBLIC identifier of the DTD itself. This catalog file invokes the SGML version of the DTD, rather than the XML version. It uses the file
rscsgm36.dtd to set up the DTD's parameter entities for SGML. If required, an updated rscart3s.cat can be used to override the DTD's online SYSTEM identifier and point instead to a local copy.
Table support
The file
calstab1.dtd contains the OASIS-supported DTD fragment which supports the interoperable CALS table model subset. Additions and changes to this model are declared in the body of the DTD itself, not here.
Entity declarations
Two files containing character entities are provided. One of these contains mappings of characters to numeric values that conform to Unicode 2.0 (rsc_x.ent). This is for use with the default XML interpretation of the DTD. It should be noted that we plan to use Unicode Combining Characters to partially solve the problem of 'one character over another'. This means that rendering software will need to support Combining Characters, ideally in a generalized manner.
The other file maps exactly the same characters to SDATA entities, and is for use with the SGML interpretation of the DTD (rsc_s.ent).
If an article contains any characters which are not in the RSC set, they should be declared in the article's internal DTD subset and RSC should be alerted to the need to add them to the standard set.
Character mappings file. RSC maintains information about special characters in a character mappings file (charmaps.xml). The entity declarations described above are generated from this file by XSLT style sheets. Characters in this file are categorized into one of the following classes:
non-ASCII character ASCII character ASCII diacritic ligature non-ASCII diacritic combining RSC character
These categories help to ensure that each character is mapped to the most appropriate result when different types of output encoding are generated:
ASCII: This involves:
mapping all diacritical characters [which fall outside the ASCII 255-character range] to the corresponding single letter mapping all ligatures [which fall outside the ASCII 255-character range] to the corresponding pair of letters mapping all other characters which fall outside the ASCII 255-character range to one or more ASCII characters, if there is an ASCII equivalent (e.g. —) suppressing all combining characters suppressing all other characters which fall outside the ASCII 255-character range
Unicode: This involves:
suppressing all combining characters suppressing all RSC 'special use' characters which fall outside the Unicode standard character range
HTML: This involves:
mapping all letters plus combining characters to a suitable image file mapping all RSC 'special use' characters which fall outside the Unicode standard character range to a suitable image file mapping all Unicode characters which don't display in browsers to a suitable image file
XML: This involves:
outputting all non-ASCII characters as entity references, using the <name> element from charmaps.xml
General conventions
Guidelines
Style guidelines. The style guidelines for each journal describe general conventions for article structure. Use these as a guide to the structure and content of articles.
In particular, while encoding articles these guidelines should be used to infer when a change of type style (e.g. to
bold) implies a specific element type, as discussed below under
Cross-references.
Semantics of the table model. The table model used is developed from the interoperable CALS table model subset supported by OASIS
1a-b. The OASIS web site contains a description of the generic CALS table model
1a, and a description of the semantics of this interoperable subset
1b.
Version 3.7of the DTD simplifies the level of CALS table support that is required by removing the <spanspec> element type (which is not part of the interoperable subset). This has been found to be unnecessary, since both horizontal and vertical spans within tables can be represented without it. (<colspec> provides all the information that is required for horizontal spanning, while the MOREROWS attribute supports vertical spans.) It adds support for rotated tables by including the ORIENT attribute, which can be set to "land" to indicate a landscape, i.e. rotated, table.
Article structure
Each article consists of front matter, body matter and back matter.
The article itself can have a
type attribute, which specifies what type of article it is. This table summarises the codes to be used for each type of article, and the types of article that are currently liable to appear in each journal published by the RSC. (See
below for a key to the journal codes in this table.)
Table -arttypesArticle type codes and usage
Article type
Code
PO
EM
GC
DT
JM
P1
P2
JC
CC
FT
AN
AC
JA
MC
FD
NP
CS
IC/OC/PC
NJ
RC
QU
CE
GT
Papers
ART
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Comms
COM
X
X
X
X
X
X
X
X
X
X
Perspectives
PER
X
X
Letters
LET
X
X
X
X
X
Feature Articles
FEA
X
X
X
X
Editorial
EDI
X
X
X
X
X
X
X
X
X
X
X
X
X
Synopsis
SYN
X
Full text
ART
X
Research Articles
RES
X
Discussions
DIS
X
Review Articles
REV
X
X
X
X
X
X
X
X
X
X
X
Book Reviews
BKR
X
X
X
X
News
NWS
X
X
X
X
X
News articles
NAR
X
Highlights
HIG
X
X
X
X
Interviews
INT
X
Technical note
TEC
X
Events/Conference Diary
CNF
X
Conference reports
CRP
X
X
Synthetic abstract
SAB
X
Cover Feature
COV
X
Focus
FOC
X
X
Viewpoints
VPT
X
X
Invited Lecture
LEC
X
Keynote Article
KEY
X
Hot off the Press Articles
HOT
X
Atomic Spectrometry Update
ASU
X
Analytical Methods Committee
AMS
X
Inter-laboratory Note
ILN
X
Critical Review
CRV
X
Tutorial Review
TRV
X
Glow Discharge Paper
GDP
X
Glow Discharge Comm
GDC
X
Glow Discharge Review
GDR
X
Glow Discharge News Article
GDN
X
Glow Discharge Technical Note
GDT
X
Front matter
The front matter consists of
<art-admin>, which holds the article's unique
manuscript number,
<published>, which contains details of the journal, volume, issue in which the article has been printed and the relevant pagination details, and
<art-front>, which is the front matter proper.
For the date on which a revised version of an article was issued, use the same date format, with role="revised".
Authors - we would like the corresponding author to be identified. There's no need to mark the others as 'princ'.
For affiliations, use the <org> <orgname></orgname> <org><address></address> for the address - although the <org> group does contain its own <address> element, this shouldn't be used for encoding the articles. We don't require any org ids.
The
<published> element should be set with the attribute type="print", along with the journal code — the other pubfront subelements should be left blank.
The body of each article consists of an
<art-body>, containing one more
<section>s.
These are the top-level structural units within each article: lower levels are represented by
<subsect1>,
<subsect2>, etc. (N.B. the numbering of section-level element names represents their depth of nesting, not repetition.)
Care should be taken to ensure that the structure of the article, implied by the style of headings, is correctly reflected in the <section> and <subsectN> elements assigned. See
<title> for details of heading typestyles.
Appendices
Any appendices to an article are placed within an
<appmat> element, between the
<art-body> and
<art-back> elements. This contains one or more
lt;appendix> elements, each optionally numbered and containing one or more
<section>s.
Back matter
The back matter contains an optional
<ack> element. This is followed by mandatory
<biblist> and
<compoundgrp> elements.
This last element is provided as a place to collect together
<compound> elements, each of which defines the ID of a chemical compound mentioned in the article, and thus to provide a target for
<compoundref> cross-references (which are normally set in bold face: see
Cross-references). (The ultimate intention, not to be implemented at this stage, is to provide links back from these <compound> elements to the points in the article where the compound is defined or illustrated.)
Graphics
Graphical objects should be declared as external entities, with a suitable Notation. The RSC application provides a comprehensive set of possible
notations, which ought to include all the image formats encountered. Let us know if any new image formats are encountered.
External entity declarations should include PUBLIC identifiers as well as SYSTEM identifiers, e.g.
<!ENTITY ugr1 PUBLIC "RSC// a904043i ugr1" "a904043i-u1.tif" NDATA tiff>
Graphics take the following attributes:
ID: a unique ID for this graphic (see
notes below on assigning IDs) (required)
src: the entity which contains the graphic (see notes above on external entities) height: width: pos: "float" for floating graphics: otherwise "fixed". "float" should be used for graphics marked as "A" blocks, while "fixed" should be used for "B" blocks and for graphics appearing in the body of the text. Graphics appearing within tables, equations should be assumed to be fixed.
Chemical formulae, equations, symbols for which no character entity is provided in the DTD and tables which are too complex to encode as XML should all be encoded as a
<ugraphic> element. As well as the standard attributes for graphics, this has a
displayed attribute which can take the value "displayed" (which indicates that the graphic should be set off from the surrounding text) or "inline" (which means that the graphic should form part of the current line).
Assigning unique id's
In order to make id's unique within each article, a prefix should be added to the identifier assigned by the author:
Table 1id prefixes for different classes of target
1
Table footnotes should be given an
id which is a combination of the table's
id and a unique
id for the footnote within that table, e.g. tab2fna. Table footnotes should be given letters (a, b, c, etc).
typesetter-generated graphic (e.g. equations and tables which cannot be encoded in SGML/XML)
ug
Thus, for example, a citation referred to in the paper as
8a should be given the id
cit8a, while chemical compound
8•a should acquire the id
chem8a.
We are using the number or letter in some of the id's to generate some of the numbering within the HTML article: for affiliations, equation numbering and table footnote lettering the aff, eqn, table fn should all be given the literal values that would appear in the text,
e.g. affa, affb; eqn1, eqn2; tab1fna, tab3fnc. For the remaining id's a unique number or letter will be sufficient.
Links and cross-references
Internal cross-references within an article should use the standard SGML/XML ID-IDREF mechanism. To enforce this, we have specified as
#REQUIRED the
id attribute for all the elements that cross-references might point to. It is not practicable to do the same for pointer elements, since their target is not always present. To allow for this, the
idrefs attribute is not mandatory. Instead, a
presence attribute is provided. When a linking element has no target, this attribute should always be specified, with the value presence="missing".
This table summarises the element types which indicate cross-references, and the target element type for each.
Table 3Mapping of cross-reference element types to target element types
Cross-reference element type
Target element type
<compoundref>
<compound>
<textref>
any textual element with an ID attribute
<figref>
<figure>
<schemref>
<scheme>
<plateref>
<plate>
<chartref>
<chart>
<eqnref>
<equation>
<boxref>
<box>
<tableref>
<table-entry>
<citref>
<citgroup>
<fnoteref>
<footnote>
<affref>
<aff>
One specific point to note is that
<citref> does
not point to a <citation> or <journalcit> element: instead it points to
<citgroup>. This design allows any number of citations to occur within a single numbered or sub-numbered part of a References list.
In the unlikely event that an external link to another article (also encoded in SGML/XML) needs to be made, the general-purpose
<link> element type is provided. This implements the Text Encoding Initiative (TEI) Extended Pointer mechanism, which allows all or part of a document to become the target of a link. It is anticipated that only the ID-based part of the TEI Extended Pointer syntax would be required in practice.
Do not use the <link> element without checking with RSC first. The linking strategy described here is likely to be reviewed once the W3C's XLink proposal reaches Recommendation status.
Recognising cross-references. This table summarises typographical conventions which are often used to represent various types of cross-reference. Where a change of font style indicates such a cross-reference, it should always be marked up as such. In such cases, the cross-reference should
not also be marked up as a change of font style.
Table 4
type style
data type
cross- reference type
superscript
arabic no. [+ letter suffix]
citref
superscript
letter
affref
superscript
symbol
fnoteref
bold
numbers, letters, roman numerals
compoundref
Numbering
For the present, numbers should be included in the
<no> element if they are required. In the longer term, we plan to support the auto-numbering of sections by the addition of a single attribute. Once this is in place, it will no longer be necessary to number sections specifically.
There is no need (and no opportunity!) to number figures, schemes, boxes or plates. Suitable prefixes and numbers (e.g. "Fig 1.") will be supplied by style sheets. Other concepts (e.g. citations, equations, appendices, and chemical compounds) have an optional
<no> element. This does not need to be used where the numbering scheme follows a simple sequence of arabic numbers, since the entries will be auto-numbered in this case. If
any instance of a given element type has a non-standard number within an article, then the
<no> element should be specified for
all instances of that element type.
However, all of these concepts are allowed to have an ID, and some require one — these IDs still need to be specified even if the title or heading itself can be auto-numbered.
We can't (yet) auto-number tables in appendices, which require numbers in the form A1, A2, etc.
Low-level elements
Emphasis and font style elements
Changes in font style should be marked up with the appropriate emphasis tags unless they indicate a specific concept, as discussed above under
Cross-references.
Individual elements can be used to mark
bold text,
italic text,
bold italic,
underlined text,
SMALL CAPS,
superscript and
subscript. They can also be used in combination to represent, for example,
superscript bold text.
Footnotes. Footnotes to be placed just after the first
<fnoteref>.
All footnote characters should be auto-generated. In text, they follow the order:
dagger double dagger curly s thing backwards P thing (paragraph mark) double vertical line double asterisk 2 daggers 2 double daggers 2 curly 2 backwards P 2 x double vertical lines 3 asterisks 3 daggers.....etc.
In table footnotes, they just appear as
a, b, c, d, etc, where these letters are taken from the end of the
id attribute's value.
Text. Spacing:
Equation spacing: +, minus, divide, times, are spaced on either side when in an equation (there is spacing around the mathematical character when it is between two digits e.g. 4 + 4. When it is just the character and one digit there is no space e.g. +4). This also to applies to proportional to, plusminus, similar to, approx. equal to, >, < and their >= variants.
multiple citrefs shouldn't be spaced: <citref idrefs="cit1 cit4 cit5 cit12">1, 4, 5, 12</citref> should be: <citref idrefs="cit1 cit4 cit5 cit12">1,4,5,12</citref>
Figure, scheme, etc references should be placed at the end of the paragraph in which they are first referenced.
<p> in titles to be used for Green Chemistry font change. Second <p> of GC titles will contain the details for the smaller title content. Simple titles don't need to use p at all.
For elements where the content model is empty (ugraphic, colspec, icgraphic) the elements need a closing solidus for XML: <colspec colname="1" colwidth="2.82*" align="left"/>
Compoundrefs: these can take any form, but the ids don't have to exactly match, e.g. <compoundref idrefs="chem61a">6·1a</compoundref>
Tables
Tables will normally appear inline, marked up according to CALS-compatible SGML. The standard CALS attributes should be used to render the table in a form that is as close as possible to the printed result. This includes, but is not limited to, the relative widths of columns, spanning of rows and columns, and the use of lines to separate headings. The specific conventions listed below are intended to be compatible with the approach supported by Adept's table editor:
relative column widths: use
<colspec> with colname="n" and colwidth="X.XX*", where X.XX is a ratio of 1.00, the default column width;
individual cells (
<entry> element type) refer to their colspec with colname="n"
spanning: horizontal spans use namest="n" nameend="m" within <entry>; vertical spans use morerows="n" within <entry>
horizontal alignment:
align"center", "right", "justify", "char" and "left" (default) within <entry>. When the
align attribute is not specified for <entry>, the value in the appropriate <colspec> element will be used as a fallback
vertical alignment:
valign="top", "middle", and "bottom" (default - !!) within <entry>. When the
valign attribute is not specified for <entry>, the value in its parent
<row> element will be used as a fallback, and failing that the value in the <row>'s own parent (<thead>, <tfoot> or <tbody>)
rules: in general, do not mark up ruler lines within tables. Default style rules will insert a rule below headings which span more than one cell. If absolutely necessary, use standard CALS conventions, i.e. rowsep="0" for e.g. bottom rule (?); "1" for vertical rule (?); ... and flag this as an exception
N.B. overall table width, row shading and non-standard row heights (other than spans) are recorded by Adept as processing instructions, and so are not encoded in the SGML
However, tables will sometimes be too complex to represent in this way, and so will be prepared as a graphic. To deal with this variation, a 'cover element'
lt;table-entry> is provided, which contains either an inline
<table> entry or a
<ugraphic>. It is <table-entry> which requires a unique ID for
<tableref> elements to point to, and which contains a
<title> element.
One side-effect of this approach is that un-numbered tables can simply be encoded as <table>. From version 3.3 onwards, <table> can appear within text and between paragraphs.
Chemistry
Chemical compounds and simple formulae can often be represented as inline markup.
<sup> and
<inf> can be used to shift text, and
<overbar> and
<underbar> to place rules above or below chemical symbols. The character entity sets provided as part of the DTD (especially the ISO Chemistry set and the custom RSC set) support most chemical symbols that will be encountered. The
<stack> element type can be used to encode the situation where one character appears directly above another.
Where chemical formulae are too complex to render as inline SGML, an inline or displayed
<ugraphic> should be used instead.
Equations
Equations may appear inline, marked up in SGML using the tools available such as
<fraction>:
1
/3.
However, equations will fequently be too complex to represent in this way, and so will be prepared as a graphic. To deal with this variation, a 'cover element'
<equation> is provided, which contains either an inline
<eqntext> entry or a
<ugraphic>. <equation> requires a unique ID for
<eqnref> elements to point to.
Multi-line text equations can be accommodated by adding another
<p>. Within
<eqntext>, you should either have
no <p> subelements (one-line or inline equations), or
nothing but <p> subelements (multi-line equations).
Citations
Where citations follow the standard pattern for journal articles, the <journalcit> element type should be used. In all other cases (including 'difficult' journal article citations, books, theses, computer software, etc.), the more flexible
<citation> element type should be used.
<citext> should be used to mark up text within the References section which is not a citation of any kind.
Numbering citations. As noted above in
Links and Cross- references, the citation number is a property of the enclosing
<citgroup> element, not the citation itself. This makes it easy to deal with the case where more than one citation is given under the same reference number. It also allows running text to be mixed with, or indeed take the place of, proper citations.
Note that the expected pattern for numbering citations is to use numbers for top-level entries, and letters for sub-entries. If the citations follow this pattern, the <no> element should not be provided for any <citgroup> element. Instead, nested <citgroup> elements should be used to represent the lower-level citations. (See the source SGML of these instructions for an example of this technique.)
Standard journal citations. Standard journal citations follow this model:
author (at least one) optional article title [journal] title year volume number issue number first page or page range translation (optional)
Unless stated otherwise, each element should appear exactly once, and elements should appear in the order given. In such cases,
<journalcit> can and should be used. The citation should be entered as a series of analysed subelements. No punctuation should be recorded between each component of the citation, and no style markup (e.g. italic for titles; bold for volume numbers) should be included. Punctuation and styling will be applied by the rendering process. Thus the citation:
G.H. Jonker and J.H. Van Santen,
Physica, 1950,
16, 337
Non-standard citations. The
<citation> element type should always be used for non-standard citations which, do not fit the standard model. The type of citation should be specified in the
type attribute. Allowed values are:
article (the default value - this doesn't need to be specified) book thesis patent software other
This isn't being done at present.
Within citations, the following concepts should always be marked up when they are present:
authors (
<citauth>)
titles (
<title>)
editors (
<editor>)
citpub (
<citpub>)
place of publication (
<pubplace>)
year of publication (
<year>)
journal volume number (
<volumeno>)
journal issue number (
<issueno>)
the part of the work that is being cited: section, pagination, etc. (
<biblscope>)
<citation> elements will be marked up as found, including all punctuation and style changes.
This is an example of a reference to a patent:
S. Iwaya, H. Masumura, Y. Midori, Y. Oikawa and H. Abe,
US Patent, 4,404,029, 1983.
Book citations. One particular type of non-standard citation which will frequently occur is a reference to a book, either in whole or in part. Again,
<citation> should be used to mark these up. The
<editor>,
<citpub> and
<pubplace> element types will often be required within such citations. A fairly typical, simple, example is:
S. Brooks and B. Johansson, in
Handbook of Magnetic Materials, ed. K. H. J. Buschow, 1993, 7th edn.
This should be encoded:
<citation type="book"><citauth><fname>S.</fname><surname> Brooks</surname></citauth> and <citauth><fname>B.</fname><surname>Johansson</surname> </citauth>, in <title>Handbook of Magnetic Materials</title>, ed. <editor> K. H. J. Buschow</editor>, <year>1993</year>, 7th edn.</citation>
Note the following:
within
<citauth>, analysis is the same as for standard citations. No space is required between the forename and surname because the rendering process will add one
no
<it> element is required within the title: it will be rendered as italic
otherwise, all punctuation (i.e. all punctuation between analysed components) is provided exactly as in the source the edition information does not fit the model for
<biblscope>, and so is left as unanalysed text
A good mixed citation example:
<citgroup id="cit5"> <citation>During the preparation of this manuscript, diester <compoundref idrefs="chem1">1</compoundref> was isolated as a minor side product in the base promoted rearrangement of the analogous (<it>R</it>′,<it>R</it>′,<it>R</it>,<it>R</it>)-2,3-butane diacetal (BDA) protected dimethyl tartrate, see: <citauth> <fname>M. T.</fname> <surname>Barros</surname> </citauth> , <citauth> <fname>A. J.</fname> <surname>Burke</surname> </citauth> and <citauth> <fname>C. D.</fname> <surname>Maycock</surname> </citauth>, <title>Tetrahedron Lett.</title>, <year>1999</year>, <volumeno>40</volumeno>, <biblscope>1583</biblscope>.</citation>
and a
<citext>:
<citgroup id="cit8"> <citext>The strong bias towards axial silylation was seen to fall if the mono sodium alkoxide did <it>not</it> precipitate prior to addition of the silicon halide.</citext></citgroup>
two other points:
a) where a citref appears within another citation. We have extended that content model of citelt so that it can contain "m.simple-text", i.e. any element types which can occur within paragraphs.This change should make citelt a much better 'catch-all' for miscellaneous stuff within citations.
b) where a citation includes a compoundref and ugraphic of the compound. The compoundref is allowed, but the ugraphic isn't. We have created a new class 'para-graphic' for these two element types. They can now appear anywhere 'text-elts' can appear, as well as between paragraphs.
RSC journal abbreviations. The journals published by the RSC have the following abbreviations, which can be used within the SGML/XML framework, e.g. in
<journalref> elements:
Table 6
AC
Analytical Communications
AN
Analyst
CC
Chemical Communications
CE
Cryst. Eng. Communications
CP
PCCP
CS
Chem. Soc. Reviews
DT
Dalton Transactions
EM
J. Environmental Monitoring
FD
Faraday Discussions
FT
Faraday Transactions
GC
Green Chemistry
GT
Geo. Trans.
IC/OC/PC
Ann Rep (Inorganic, Organic, Physical)
JA
JAAS
JC
JCR
JM
J. Materials Chemistry
MC
Mendeleev
NJ
New Journal of Chemistry
NP
Natural Product Reports
P1
Perkin Transactions 1
P2
Perkin Transactions 2
PO
Pesticide Outlook
RC
RCR
QU
Phys. Chem. Comm.
Lists
Lists can be entered as a
<list>, containing an optional
<head> and any number of
<item> elements. The
type attribute can be used to indicate the type of list. It should take one of the following values:
ordered
bulleted
simple
Note that, since <list> can occur within <item>, it is possible to declare lists nested to any depth.
General
If there are internal references that are in effect impossible, just put the text in and leave out the reference. It would helpful to advise us in case an amendment to the DTD may be wise, but usually these are one-offs. One case recently had a number of equations in a single ugraphic, itself called scheme 1. In this case it was not possible to add eqnrefs to the scheme.
Appendix A. Alphabetical list of element types
Element definitions
This section contains a definition of every element type in the RSC DTD, including element types which are not required for the data capture work. These additional element types are included for editorial use within RSC, or to support future processing of the encoded articles. They are indicated thus:
RSC internal use only
a. 'anchor': a wrapper round a resource (an image, scheme, table, etc.). An anchor specifies a non-printable external entity which can augment the resource. Where appropriate, it should be represented as a clickable link to navigate to the external entity. Can contain zero or more:
elements representing graphics
equation
box
table-entry
table
src: an entity reference which defines the external entity
in HTML output,
a is represented as an
href attribute on the <a> element which is already wrapped around a graphic resource
above. The top half of a
stack. Contains 'characters only'.
rendered as superscript, before
below
abstract. An abstract of the article. Contains 'text or
paragraphs'.
rule above [and below], with the abstract itself output as a sequence of left-aligned bold
paragraphs
ack. Acknowledgements for the article. Contains 'text or paragraphs'.
title: an optional non-standard title for the acknowledgements section.
preceded by a rule. Title is set as an a-heading if title is not specified, the heading 'Acknowledgements' is output
address. A complete postal address. Can be represented by a
link, or by a sequence of address subelements:
city
postcode
state
country
addrelt
each separated by spacing but no punctuation.
id: a unique identifier for this address element
type: the type of address
address within aff is output in italic
other addresses are not currently styled
each top-level subelement within aff/address is followed by a comma, except for <postcode>s followed by a <country> element which is the last subelement of the address
addrelt. An element within a postal address. Used only when no more specific element type (e.g.
city) is appropriate. Can contain 'simple text'.
id: a unique identifier for this element
see
address
admin-event. A single event relating to the administration of an article, e.g. its receipt, acceptance, or rejection. Provided in versions 3.4 onwards of the DTD as a place-holder for RSC management information. Has a mixed content model, which allows the following subelements within text:
agent
address
date
admin-event (for complex administrative events)
type: the type of administrative event
currently suppressed
advert. An advertisement, i.e. any self-contained block of text which is to be 'dropped in' to a journal issue (including information on grants available, etc.). Contains a
link, or one or more
sections.
id: a unique identifier for this element
type: the type of advertisement
treated as a keep-together block
rules placed either side of it in HTML
aff. An author's affiliation. Contains one or more pairs of:
org (optional)
address (mandatory)
followed by any of the following which apply:
phone
fax
email
url
id: a unique identifier for this affiliation element. See
above for guidance on assigning id's.
affiliations are rendered as a 'small heading' affiliation codes ('a', 'b', etc.) are auto-generated from the last letter of the aff element's id attribute ('affa', 'affb', etc.). They are rendered as italic superscript, and applied both as a prefix to the affiliation itself, and as a cross-referencing hyperlink from the relevant author(s)
affref. A reference to an author's affiliation. In practice this element is not used, since authors' affiliations are indicated by the
aff attribute on
author.
idrefs: a space-separated list of <aff> identifiers
presence: 'missing' or 'notmissing' (the default value)
no support yet provided
agent. A person playing a role within an
admin-event. Contains one
person element.
role: the role played by the person in this administrative event
suppressed, as part of <admin-event>
appendix. An appendix to an article. Contains an optional
no and one or more
sections.
id: a unique identifier for this appendix element. See
above for guidance on assigning id's.
each appendix is preceded by an a-heading "Appendix N", where N is either the value of its <no> subelement or the element's actual sequence number
appmat. A container for appendix matter. See
above for general guidance.
Contains one or more
appendix elements.
currently placed after <art-back> (i.e. out of sequence)
is this the best thing to do with appendices?
art-admin. A container for administrative information relating to an article. Contains, in the order specified:
ms-id (required)
doi (optional)
pii (optional)
sici (optional)
office (optional)
received (optional and repeatable)
date (optional and repeatable)
admin-event (optional and repeatable)
the <art-admin> element is set as an inline italic sequence
art-back. A container for an article's back matter. Contains, in the order specified:
ack (optional)
biblist (required)
compoundgrp (required)
section (optional and repeatable)
no special formatting is associated with the <art-back> element type
art-body. A container for an article's body matter. See
above for general guidance.
Contains one or more
sections, or one or more
news-sections.
no special formatting is associated with the <art-body> element type
art-front. A container for an article's front matter. See
above for general guidance on analysing front matter.
Contains a
link, or the following elements in the order specified:
titlegrp (required)
authgrp (optional)
conference (optional)
art-toc-entry (optional)
arttoc (optional)
dedicate (optional)
biography (optional and repeatable)
abstract (optional and repeatable)
subject (optional and repeatable)
keyword (optional and repeatable)
the <art-front> element as a whole is suppressed, but its <titlegrp> subelement is treated specially. See its
documentation for details
then <authgrp>, <biography> and <abstract> are output, in that order
art-links. A container for links from an article to other resources. Contains any number of
suppinf and/or
fulltext elements.
no special formatting is associated with the <art-links> element type
art-toc-entry. Container for resources to use when creating the article's entry in the table of contents for a journal issue. Contains, in the following order:
ictext
icgraphic
currently suppressed from the article itself
article. An article. Contains a
link element, or the following elements in the order specified:
art-admin (optional)
published (optional and repeatable)
art-links (optional)
art-front (optional)
art-body (optional)
appmat (optional)
art-back (optional)
dtd: a FIXED attribute which specifies which version of the DTD was in use when this XML document was created. There is no need to enter a value for this attribute (and any value other than 'RSCPAx.y' for version x.y of the DTD will render the whole article invalid)
price-code: takes the value 'free', 'premium' or 'review'. If not specified, 'free' is assumed
type: the class of article, e.g. 'feature', 'communication'. The article type should be taken from the
list of codes given above, e.g. "ART" for a Paper
background: a reference to an external entity to be used as a background image for the article
the subelements of <article> are output in this order:
when outputting to HTML, the
type attribute is extracted, converted to its expanded form as listed above, and inserted within the article header before the article title
when typesetting, a simple combined graphic with article type included is inserted
no support for background images has been provided yet
articleref. RSC internal use only
A pointer to an article (within an issue), used when generating index entries. Contains a
link.
suppressed as part of <index>
arttitle. An article title within a citation or journalcit. Contains 'simple text or paragraphs'.
no special formatting is associated with the <arttitle> element type
arttoc. An article's table of contents. Entering an empty <arttoc> element is an instruction to generate an article table of contents from the section and subsection headings (levels a to d, i.e. <section> to <subsect3>) found in the article. In the HTML output, hyperlinks from the ToC to each section are generated. These are based on the section's
id if specified, otherwise on a unique system-generated code (which is liable to change each time the document is edited).
Can, if desired, contain
toc-head (optional) and
toc-entry (optional and repeatable).
the <arttoc> element is replaced by a table containing auto-generated section numbers in the left column (or the section's
no element, if specified), and section titles in the right column
Is it logical that sections with <no> elements get numbered in the article, while those without don't, even though both get numbered in the ToC?
no support is yet offered for specifically entered <toc-head> and <toc-entry> elements
requirements: in ASU a) table titles appear in contents b) References section gets picked up
authgrp. A container for details of authors and their affiliations. Contains one or more
author elements, followed by one or more
affs.
punctuation between multiple authors' names is added by the style sheet links between authors and their affiliations are added by the style sheet an indication of the 'corresponding' author is added by the style sheet details of when/where received, when accepted, and when/how published are only output if an <authgrp> element is present within
news-item and
book-review, <authgrp> is output after all other subelements. Authors and affiliations are output on separate lines, in italic, and right-justified.
author. One author of an article. Repeat for each distinct author. Contains a
person, followed by an optional
footnote.
aff: one or more idref's (separated by spaces), specifying which
aff elements apply to this author
key: a unique key for this author [not yet used]
role: can take the value 'princ' (principal author) or 'corres' (corresponding author)
punctuation between multiple authors' names is added by the style sheet links between authors and their affiliations are added by the style sheet an indication of the 'corresponding' author is added by the style sheet the author's
person subelement is output in bold
B in HTML output
below. the bottom half of a
stack. Contains 'characters only'.
rendered as subscript
SUB in HTML output, after
above
bi. Indicates that the contained text should be rendered as bold italic. This is preferable to using separate <bo> and <it> elements. Only use this element when it is not possible to deduce
why the text is rendered in this way. If possible, always use a more meaningful element type.
rendered as bold and italic
B and I in HTML output
biblist. A container for the bibliography at the end of an article. Contains a mixture of text and
citgroups.
title: a non-standard title for the bibliography. Can include a section number, if one is required.
if the
title attribute is specified, it is output as the heading for this section. Otherwise, the heading 'References' is output
both H3 in HTML output.
biblscope. The scope of a citation within the work cited. Can include references to sections, chapters, page ranges, etc. Contains 'simple text'.
no special formatting is associated with the <biblscope> element type
biography. A person's biography. Contains a
link, or one or more
sections.
id: a unique identifier for this <biography> element
<biography> is suppressed where it appears, but is output as a full-width one-row table
TABLE in HTML output, followed by a rule
HR in HTML output, after the article's front matter (so long as an
art-front element is present).
RK: Biography might be better as a two-cell table, with any plate as the left hand cell
bo. Indicates that the contained text should be rendered as bold. Only use this element when it is not possible to deduce
why the text is rendered in this way. If possible, always use a more meaningful element type (specifically
compoundref, which is the most common reason for bold-face within article text).
rendered as bold
B in HTML output
board. RSC internal use only
a journal or issue's [Editorial] Board. Contains a
link, or an optional
title followed by zero or more
groups and/or
members.
id: a unique identifier for this element
no special formatting is associated with the <board> element type
book-review. A book review, consisting of the citation of the book being reviewed, reviewer's details, and the review itself. Contains a
citation, followed by an optional
authgrp for the reviewer's details (i.e. the 'author' of the review), followed by one or more paragraphs (
p) and/or 'inter-paragraph elements'.
within <book-review>, <authgrp> is output at the end, right-justified (see
authgrp for details)
multiple <book-review> elements are separated by a line-break
box. a floating text box. Contains a single
section.
id: a mandatory unique identifier for this element
height: the height of the box, expressed as ...
width: the width of the box, expressed as ...
tint: the tint of the box, expressed as ...
I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: can optionally take the value 'fixed' to indicate that the <box> cannot float
<box> elements are set as a centred 80%-width table with a border (not currently visible!)
boxref. A reference to a floating text box. Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the box(es) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
no special formatting is associated with the <boxref> element type
byline. RSC internal use only
a journal's byline. Contains 'simple text'.
type: the type of byline
no special formatting is associated with the <byline> element type
chart. A chart. Contains an optional
title. See
above for general guidance on encoding graphics.
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
src: the entity which contains the graphic (see notes above on external entities)
height: the height of the graphic, expressed as ...
width: the width of the graphic, expressed as ...
I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
the chart is output within a centred half-width table
TABLE in HTML output as an image
IMG in HTML output
if the chart has a <title>, this is output in a separate row below the image; otherwise the heading 'Chart N' is generated, where N is the chart number as indicated in its
id attribute
Neil's code has the auto-generated heading centred, and the 'real' heading left-aligned. Is this intended?
a text break instruction
BR clear="all" in HTML output is output before and after the chart
chartref. A cross-reference to a chart. Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the chart(s) to which cross-reference is being made
a link is made to the first idref specified in the
idrefs attribute
A in HTML output
citation. Container for an individual citation that doesn't fit the model for a standard journal citation (
journalcit). Should only be used if <journalcit> cannot. See above for general guidance on
encoding citations.
Contains mixed content, which can include the following element types as required:
citauth
title
year
volumeno
issueno
arttitle
biblscope
editor
citpub
pubplace
link
url
email
trans
'emphasis' elements
id: a unique identifier for this element
this attribute shouldn't be used, since it isn't intended to be pointed to now. It will be removed in the next version of the DTD type: the type of citation
no special formatting is associated with the <citation> element type
citauth. An author within a
citation or
journalcit element. Contains a
link, or an optional
fname followed by a mandatory
surname.
no special formatting is associated with the <citauth> element type
citext. Citation text. Used only when it is not possible to encode material found within a citations list using
journalcit or
citation. (This should only apply when the text isn't actually a citation at all.) Contains 'simple text'.
no special formatting is associated with the <citext> element type
citgroup. A group of citations with a single reference number. (Most <citgroup>s will only contain a single
journalcit or
citation element.) See above for general guidance on
encoding citations.
Contains an optional
no element for a non-standard citation number, followed by one or more of the following, in any order:
citext
journalcit
citation
citgroup
*
commentary may also appear after the various elements above
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
the citation is enclosed in an anchor group
A in HTML output, with a NAME attribute equal to its
id attribute
the content of <citgroup> is preceded by a displayed citation number, which is derived from the <citgroup>'s position in the citation list
citpub. The publisher of a citation. Contains 'simple text'.
To be added by data capture agency.
no special formatting is associated with the <citpub> element type
citref. A reference to a citation. Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the citation(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
position: can take the value 'super' or 'baseline'
a link is made to the first idref specified in the
idrefs attribute
A in HTML output
in HTML output the link has a TITLE attribute, generated from the text of the citation
unless the attribute
position="baseline" is specified, the citation will be displayed as small-type superscript
SMALL, SUP in HTML output
city. The name of a city. Must consist of character data only.
within <received>, the city name is preceded by '(in ' and followed by ')' otherwise, no special treatment is applied to <city> elements
coden. RSC internal use only
A CODEN identifier for a journal. Contains character data only.
no special treatment is applied to <coden> elements
colspec. A specification of the characteristics of a column in a table. Empty element: has no data content.
colnum: the column's number
colname: the column's name
colwidth: the column's width, as a relative fraction of 1.00 (= average column width given equal spacing)
colsep: the column's column separator
rowsep: the column's row separator
align: the alignment of the column's content
char: the character to be used for alignment within the column
charoff: the offset for character alignment within the column
information in <colspec> is used to determine cell spanning information in <colspec> is used to determine text alignment
commentary. A description of the value of a citgroup. Contains 'simple text'
rating used to denote the rank of the citation: 0 (default), 1 or 2
the rating is used to generate superior filled stars before the text of the commentary. 1 for a single superscript filled star, 2 for two stars the commentary should be rendered as italic
compname. The name of a chemical compound. Contains a
link or 'simple text'.
no special formatting is associated with the <compname> element type
compound. Specifies the
id of a chemical compound. Optionally contains one or more
compoundref elements, each linking to a definition of that compound
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
no special formatting is associated with the <compound> element type
compoundgrp. A container for zero or more
compound elements. A <compoundgrp> is required at the end of each article so that
compoundref elements have a target to point to. (At present no use is made of these links when rendering articles.)
no special formatting is associated with the <compoundgrp> element type
It should be explicitly suppressed, 'just in case'
compoundref. A reference to a chemical compound. Contains 'emphasised text' specifying the compound's code. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the compound(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
rendered as bold text
B in HTML output
conference. Information about a conference or similar meeting. Contains an optional sequence number (
no), followed by zero or more of the following, in any order:
confname
daterange
location
contact
id: a unique identifier for this element
no special formatting is associated with the <conference> element type
To be done ,,,
confgrp. A container for zero or more
conference elements.
id: a unique identifier for this element
no special formatting is associated with the <confgrp> element type
To be done ,,,
confname. A conference's name or title. Contains 'simple text'.
no special formatting is associated with the <confname> element type
To be done ,,,
contact. A contact, e.g. for a conference. Contains zero or more of the following, in any order:
person
address
phone
fax
email
url
id: a unique identifier for this element
no special formatting is associated with the <contact> element type
To be done ,,,
country. A country name. Must consist of character data only.
there is a comma after <postcode> unless it is immediately followed by <country>, in which case there is no punctuation otherwise, no special formatting is associated with the <country> element type
cpyrt. RSC internal use only
A copyright statement. Contains 'simple text'.
output at the end of the article, after any footnotes, in a full-width table
TABLE in HTML output
preceded by a rule
HR in HTML output
followed by a space and the publication year, if specified
date. A general year-month-day date. Contains a
year, followed by an optional
month and an optional
day.
role: the role played by this date (e.g. 'accepted' or 'revised')
dates are either output as year-only (e.g. within the generated copyright statement), or formatted into an 'RSC date' (e.g. '21st November 2000') <date> within
art-admin with
role='accepted' is output after
<received>, with a prefix ', Accepted'
role='revised' isn't supported at present
daterange. A range of two
dates.
no special formatting is associated with the <daterange> element type
There should be a '-' between the two dates.
day. A numerical day: 1/2/3/.../31. Should not contain anything apart from the day number itself.
when formatted as part of an 'RSC date', a suffix is added to the day (e.g. '21st')
dd. A definition description, part of a
deflist. Contains 'text or paragraphs'.
no special formatting is associated with the <dd> element type
dedicate. A dedication. Contains 'text or paragraphs'.
no special formatting is associated with the <dedicate> element type
def. The definition of a term, part of a
deflist. Contains the
term itself, followed by its definition in a
dd.
no special formatting is associated with the <def> element type
deflist. A definition list, containing an optional
head, and one or more definitions
def).
no special formatting is associated with the <deflist> element type
denom. The denominator of a fraction. Contains 'simple text'.
rendered as small subscript
SMALL, SUB in HTML output
doi. A Digital Object Identifier. Contains character data only.
as part of
art-admin, this element is currently ignored. A DOI is instead constructed from the article's manuscript number, with the correct RSC DOI prefix.
Need to mention link type='DOI' - somewhere!
editnote. An editorial note. Use this element type for any comments generated by the editing process - these do not form part of the article. Contains the following, in this order:
the
note itself
who made the note
the
date the note was made
type: the type of editorial note.
The values this attribute can take may be controlled in future, but it can be used freely at present.
no special formatting is associated with the <editnote> element type
Is this simply because we haven't yet supported it? It should either be suppressed or picked out in some way.
editor. The editor of an article or book. Contains 'simple text'.
id: a unique identifier for this element
no special formatting is associated with the <editor> element type
email. An e-mail address. Contains character data only. Only enter the actual address: the prefix
E-mail: will be generated by style sheets.
the address is enclosed in an anchor
A with href='mailto:' plus the address in HTML output
the content of the anchor consists of the address prefixed by 'E-mail: '
entry. An entry (cell) in a table. See above for general guidance on
encoding tables.
Contains mixed content which can include text elements, graphics, and equations.
colname: the name of the column in which this cell appears
namest: the name of the start column for this cell
nameend: the name of the end column for this cell
morerows: the number of rows occupied by this cell
colsep: the column's column separator
rowsep: the column's row separator
align: the alignment of the column's content
char: the character to be used for alignment within the column
charoff: the offset for character alignment within the column
valign: the vertical alignment of the column's content
indent: the indentation of this cell (an RSC-specific attribute)
<entry> is formatted as a table cell
TD in HTML output
the column name and morerows attribute are used to generate suitable COLSPAN and ROWSPAN settings for the cell align and valign are used to generate suitable ALIGN and VALIGN settings
eqnref. A reference to an equation.
Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the equation(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
a link is made to the first idref specified in the
idrefs attribute
A in HTML output
eqntext. An equation expressed in textual form. See above for general guidance on
encoding equations.
Contains 'simple text or paragraphs'. Use
ps to lay out multi-line equations.
display: can take the value 'displayed' or 'inline'. Use this attribute to indicate whether the equation should be set as a separate block, or rendered inline.
<eqntext>s occurring outside an
equation are set in a centred full-width table
TABLE in HTML output, with two breaks above and one below
no special action is taken for <eqntext>s within an
equation
No support is yet provided for the display attribute
equation. An equation. See above for general guidance on
encoding equations.
Contains an optional
no, followed by a textual equation (
eqntext) or a graphic displaying the equation (
ugraphic).
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
<equation>s are set in a centred full-width table
TABLE in HTML output, with two breaks above and one below
the equation itself is set in a table cell
TD in HTML output, in which there is an anchor whose name is the id of the equation
A in HTML output
the equation
no, if specified, is set in a cell to the right of the equation. If it is not specified, an equation identifier is generated based on the <equation>'s id attribute. In both cases, the identifier is surrounded by parentheses and the whole entry is bold
B in HTML output
When there is a <no>, it isn't being output as bold at present
fax. A fax number. Can only contain character data.
no special formatting is associated with the <fax> element type
figref. A cross-reference to a figure. Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the figure(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
a link is made to the first idref specified in the
idrefs attribute
A in HTML output
figure. A figure. Contains an optional
title. See
above for general guidance on encoding graphics.
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
src: the entity which contains the graphic (see notes above on external entities)
height: the height of the graphic, expressed as ...
width: the width of the graphic, expressed as ...
I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
the figure is output within a centred half-width table
TABLE in HTML output as an image
IMG in HTML output
if the figure has a <title>, this is output in a separate row
TR in HTML output below the image
the heading 'Fig. N' is generated, where N is the figure number as indicated in its
id attribute
a text break instruction
BR clear="all" in HTML output is output before and after the figure
fname. A person's first name. Contains 'simple text'.
spacing within the <fname> element is preserved otherwise, no special treatment for <fname> elements
fnoteref. A reference to a footnote (at the end of the article, or in the footer of a table). Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on
footnotes, and on creating
cross-references.
idrefs: one or more space-separated idref's, specifying the figure(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
footnote references are rendered as small superscript anchors
SMALL, SUP, A in HTML output
the target for the anchor
HREF attribute in HTML output is the value of the id attribute
for <fnoteref>s within tables, the text of the anchor is a system-generated letter based on the id attribute for <fnoteref>s outside tables, the text of the anchor is a symbol, allocated in the sequence: asterisk; dagger; double dagger; section sign; paragraph sign; double vertical line <fnoteref>s within tables are output in red
footer. A sequence of paragraphs at the end of a news item, typically set in italic. Contains one or more
ps.
no special treatment for <footer> elements
Shouldn't they be set in italic, as per the spec? Needs discussion
footnote. A footnote in the article, or in a table footer. See
above for general guidance on encoding footnotes.
Footnotes in the article are placed at the point where the footnote reference is to appear in the rendered result. This means that
fnoteref is only required for such footnotes if the same footnote is referenced more than once. In contrast, table footnotes are placed within the
tfoot, and
are referenced by a separate
fnoteref.
Contains text or paragraphs.
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
footnotes outside tables are removed to a separate section with title 'Footnotes'
H3 in HTML output, and replaced by system-generated footnote references (see
fnoteref for details of how these are rendered)
footnotes within tables are rendered in red, with a smaller font size all footnotes are rendered as anchors
A in HTML output
the target for the anchor
HREF attribute in HTML output is the value of the id attribute
fpage. The number of the first page within an issue on which the printed version of an article appears. Can only contain character data.
print pagination within
<pubfront> is suppressed from the rendered version
otherwise, no special treatment for <fpage> elements
fraction. A fraction. Contains a numerator (
numer), followed by a denominator (
denom).
shape: takes values 'case' (an "above and below" fraction) or 'sol' (a "solidus" fraction)
no special formatting is associated with the <fraction> element type
Shouldn't we make some attempt to deal with the 'case' case? What can be done in HTML?
fulltext. A link to the full text of an article (e.g. in PDF).
Probably not required - do not use without checking with RSC.
Contains a
link element.
no special formatting is associated with the <fulltext> element type
group. RSC internal use only
A group of people with similar roles within an Editorial Board. Contains an optional
title, followed by zero or more
members.
no special formatting is associated with the <group> element type
head. A heading (e.g. for a list, index, or definition list). Contains paragraphs or text.
no special formatting is associated with the <head> element type
This could be replaced by further use of the existing <title> element type. Either that, or it should be supported in the style sheet!
icgraphic. A graphic to be included in an illustrated contents list entry. Empty element: has no contents. See
above for general guidance on encoding graphics.
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
src: the entity which contains the graphic (see notes above on external entities)
height: the height of the graphic, expressed as ...
width: the width of the graphic, expressed as ...
I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
This isn't actually useful for <icgraphic>, but it 'inherits' this attribute by virtue of being a graphic element.
no special formatting is associated with the <icgraphic> element type
<icgraphic> and <ictext> should be suppressed.
ictext. Text describing the article, to be included in an illustrated contents list entry. Contains paragraphs or text.
no special formatting is associated with the <ictext> element type
Shouldn't icgraphic and ictext be suppressed, at least?
index. RSC internal use only
An [author] index. Contains an optional
head, followed by zero or more
index-entrys.
no special formatting is associated with the <index> element type
index-entry. RSC internal use only
An entry in an [author] index. Contains a
value, followed by one or more
articlerefs.
no special formatting is associated with the <index-entry> element type
inf. Inferior (subscript) text. Indicates that the contained text should be rendered as subscript. Only use this element when it is not possible to deduce
why the text is rendered in this way. If possible, always use a more meaningful element type.
rendered as small and subscript
SMALL, SUB in HTML output
info. Information, e.g. about a journal. Contains a
link, or one or more
sections.
type: can take the values 'author' (the default), 'illustration' or 'distribution'
level: can take the values 'full' (the default), 'brief' or 'paragraph'
no special formatting is associated with the <info> element type
issn. RSC internal use only
The International Standard Serial Number for a journal. Contains character data only.
type: the type of ISSN
no special formatting is associated with the <issn> element type
issue. RSC internal use only
One issue of a journal. Contains a
link, or the following elements in this order:
id: a mandatory unique identifier for this element
dtd: a FIXED attribute which specifies which version of the DTD was in use when this XML document was created. There is no need to enter a value for this attribute (and any value other than 'RSCPAx.y' for version x.y of the DTD will render the whole article invalid)
type: the type of issue
suppressed within
pubfront
otherwise, no special formatting is associated with the <issue> element type
issue-back. RSC internal use only
The back matter for an issue. Contains any number of any of the following, in any order:
board
issue-toc
index
advert
info
confgrp
no special formatting is associated with the <issue-back> element type
issue-front. RSC internal use only
The front matter for an issue. Contains any number of any of the following, in any order:
board
issue-toc
index
advert
info
confgrp
no special formatting is associated with the <issue-front> element type
issue-toc. RSC internal use only
The table of contents for an issue. Contains an optional
toc-head, followed by zero or more
toc-entry elements.
no special formatting is associated with the <issue-toc> element type
issueid. RSC internal use only
An identifier (other than the issue number) for an issue of a journal. Can only contain character data.
no special formatting is associated with the <issueid> element type
issueno. The issue number within a volume. Can only contain character data. When used within the
issue element, this should be a 3-digit number with leading zeroes.
Still true?
To be added by data capture agency
Still true?
no special formatting is associated with the <issueno> element type
issueref. A reference to [a document describing] one issue of a journal. See
above for general guidance on creating cross-references.
Contains a
link, or these elements in the following order:
links within <issueref> are suppressed
otherwise, no special formatting is associated with the <issueref> element type
it. Indicates that the contained text should be rendered as italic. Only use this element when it is not possible to deduce
why the text is rendered in this way. If possible, always use a more meaningful element type.
rendered as italic
I in HTML output
item. An item within a list. See above for general guidance on
encoding lists.
Contains paragraphs or 'simple text'.
rendered as a list item
LI in HTML output
jnltrans. A translation of a simple journal citation (
journalcit). Also used for Chem. Abstracts references, with the abstract number in <fpage>.
Contains the following, in the order specified:
sertitle (optional)
year (optional)
volumeno (optional)
pages (optional and repeatable)
no special formatting is associated with the <jnltrans> element type
journal. RSC internal use only
A description of an RSC journal. Contains a
link, or these elements in the order specified:
title (one or more)
sercode
byline (optional and repeatable)
logo (optional and repeatable)
publisher
issn (one or more)
coden (optional)
board (optional and repeatable)
info (optional and repeatable)
advert (optional and repeatable)
cpyrt
volume (optional and repeatable)
id: a unique identifier for this element
no special formatting is associated with the <journal> element type
journalcit. A citation which follows the standard model for simple citations of journal articles. Use
citation for more complex cases, and for citations to anything other than journal articles. Use
citext only for text within the References section which is not a citation at all. See above for general guidance on
encoding citations.
Contains these elements in the order specified:
citauth (one or more)
title
year
volumeno (optional)
issueno (optional)
pages
jnltrans (optional)
link (optional and repeatable)
the citation is output within an anchor
A in HTML output, with NAME attribute equal to its
id attribute
This is not required.
a semicolon is output after all <journalcit>s except the last within its containing <citgroup>. This last <journalcit> is followed by a full stop
journalref. A reference to a document describing a journal. See above for general guidance on
creating cross-references, and for a list of RSC
journal codes.
It contains a
link element, which should have the appropriate journal code as its value. These codes are listed
below.
Contains a
link, or these elements in the order specified:
title (one or more)
sercode
byline (optional and repeatable)
logo (optional and repeatable)
publisher
issn (one or more)
coden (optional)
board (optional and repeatable)
info (optional and repeatable)
advert (optional and repeatable)
cpyrt
volume (optional and repeatable)
id: a unique identifier for this element
the <journalref> within
published is used to provide the journal title which appears at the head of the article
keyword. A keyword describing an article's content. Contains 'simple text'.
<keyword>s are suppressed from the rendered article
link. A link to [part of] another document. Contains simple text.
Although the attributes within <link> provide a powerful means of expressing links, they are not yet being used. Instead, the data content within <link> is used to specify the target document. This content will be a unique identifier for the document, e.g. a
journal code or an article's manuscript number.
type: the type of link, e.g. 'DOI' for DOI cross-references
doc: an entity reference defining the document to which the link is being made
from: the [start of the] target within the linked document, expressed as an XPath expression
to: (for ranges only), the end of the target within the linked document, expressed as an XPath expression
in general, <link>s are suppressed from the rendered article. Instead, such <link>s as are required for rendering (e.g. the link to a document describing an article's journal) are resolved by a pre-rendering edit which replaces the link by the actual document to which it points
list. A list. See above for general guidance on
encoding lists.
Contains an optional
head, followed by one or more
items.
type: the type of list, which should take one of the following values:
ordered
bulleted
simple
<list> is rendered as an unordered (bulleted) list
UL in HTML output
Should be extended to cope with all the allowed list types
location. A location (i.e. an address). Contains one or more of the following, in any order:
city
postcode
state
country
addrelt
no special formatting is associated with the <location> element type
logo. RSC internal use only
A logo. Contains a
ugraphic specifying the image to be used.
type: the type of logo
no special formatting is currently associated with the <logo> element type. Instead, the
sercode is used to construct the logo's file name
lpage. The number of a printed article's last page. Contains character data only.
<lpage> is suppressed from the rendered article
member. RSC internal use only
A member of a <group>. Contains an optional
role, followed by zero or more
persons.
no special formatting is associated with the <member> element type
month. A month. Contains character data only. Months should be specified in full, e.g. "January".
Since the style sheet can convert numeric months to their full form, should we be allowing, or even asking for, numeric months?
if a numeric month is entered, it is converted to its full form, e.g. '3' becomes 'March'
ms-id. The RSC's unique identifier for an article. Contains character data only.
Conventions for formatting article identifiers are given
above.
To be added by data capture agency
output at the end of the article also used to construct the article's DOI the presence of <ms-id> triggers the generation of the "Received" statement
nameelt. A component of an organisation's name. Contains 'simple text'.
type: the type of name element
', ' is output after all <nameelt>s except the last in a sequence
news-article. A full article (with title and author details, and back matter such as a list of citations) found within a news section. Contains these elements, in the order specified:
art-front
art-body
appmat
art-back
id: a unique identifier for this element. See
above for guidance on assigning id's.
type: the type of news article
no special formatting is associated with the <news-article> element type
news-item. A relatively simple news item. For more complex material, use
news-article instead. Contains these elements, in the order specified:
title (optional)
authgrp (optional)
abstract (optional)
p or paragraph-level elements (optional and repeatable)
footer (optional)
id: a unique identifier for this element. See
above for guidance on assigning id's.
within <news-item>, <authgrp> is output at the end, right-justified (see
authgrp for details)
multiple <news-item> elements are separated by a line-break
news-section. A container for one or more news articles or (more usually) news items, plus other formats such as advertisements and conference listings. Can contain nested <news-section>s to support e.g. a two-level structure of news sections.
Contains an optional
title, followed by zero or more of the following, in any order:
news-section
news-article
news-item
book-review
advert
info
confgrp
p
paragraph-level elements
id: a unique identifier for this element. See
above for guidance on assigning id's.
no special formatting is associated with the <news-section> element type
no. A number or other identifier (for a table, figure, etc.). Contains character data only. See
above for general guidance on numbering strategy.
section <no>s are suppressed from normal output instead, the <no> element, if present, is picked up and incorporated into the section
title
a similar strategy is applied to equation <no>s, which are enclosed in parentheses and output in bold
B in HTML output
no-of-pages. The number of pages in the printed version of an article. Contains character data only.
print pagination is suppressed from the rendered version
note. A note. Contains text or paragraphs.
no special formatting is associated with the <note> element type
Should be, e.g., italic and surrounded by '[..]'.
numer. The numerator of a fraction. Contains 'simple text'.
rendered as small superscript
SMALL, SUP in HTML output
office. The RSC office responsible for managing an article. Contains character data only.
like all
art-admin subelements, this is suppressed
org. An organisation's name and address. Contains a
link, or one or more
orgnames followed by zero or more
addresses.
id: a unique identifier for this element
within
aff, the level-1 subelements of <org> are followed by ', '
otherwise, no special formatting is associated with the <org> element type
Multiple <address> elements should be separated by ' and '.
orgname. An organization's name. Contains one or more
nameelts.
within
aff, each <orgname> is followed by ', '
otherwise, no special formatting is associated with the <orgname> element type
overbar. An overbar. Indicates that a bar should be placed above all the text within this element. Contains 'simple text'.
no special formatting is associated with the <overbar> element type.
(Specifically, no means has been found to implement this feature within HTML output. Could try using a CSS text decoration instruction.)
p. A paragraph. Contains mixed content (i.e. text and subelements intermixed), including any of these elements, at any point and in any order:
roman
it
bo
bi
scp
sansserif
ul
sup
inf
list
footnote
note
overbar
underbar
stack
fraction
warning
unknown
email
url
ugraphic
eqntext
figure
scheme
plate
chart
equation
compname
compoundref
textref
figref
schemref
plateref
chartref
eqnref
boxref
tableref
citref
fnoteref
affref
by default, <p> is rendered as a paragraph
P in HTML output
within sections at any level, the first paragraph is rendered closed up to the preceding title (with no indentation), and is followed by a line break
BR clear="all" in HTML output
within sections at any level, subsequent paragraphs are indented by an em space
check, and followed by a line break
BR clear="all" in HTML output
pages. The range of pages covered by a citation. Contains a
fpage, optionally followed by a
lpage.
no special formatting is associated with the <pages> element type.
persname. A person's name. Contains the following, in the order specified:
no special formatting is associated with the <persname> element type.
person. Details about a person. Contains a
link, or the following elements in the order specified:
persname (required; repeatable)
biography (optional)
address (optional and repeatable)
id: a unique identifier for this <person> element
<person> within
author is rendered as bold
B in HTML output
otherwise, no special formatting is associated with the <person> element type.
phone. A telephone number. Contains character data only.
no special formatting is associated with the <phone> element type.
Should have a prefix, e.g. 'Tel. '.
pii. A Publisher Item Identifier. Contains character data only.
like all
art-admin subelements, this is suppressed
plate. A plate. Contains an optional
title. See
above for general guidance on encoding graphics.
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
src: the entity which contains the graphic (see notes above on external entities)
height: the height of the graphic, expressed as ...
width: the width of the graphic, expressed as ...
I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
<plate>s within
biography are rendered as a left-aligned table cell
TD in HTML output
otherwise, the plate is output within a centred half-width table
TABLE in HTML output
the plate itself is rendered as an image
IMG in HTML output
if the plate has a <title>, this is output in a separate row
TR in HTML output below the image; otherwise the heading 'Plate N' is generated, where N is the plate number as indicated in its
id attribute
a text break instruction is output before and after the plate
BR clear="all" in HTML output
plateref. A reference to a plate. Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
a link is made to the first idref specified in the
idrefs attribute
A in HTML output
postcode. A postcode. Contains character data only.
the address item before a <postcode> is
not followed by a comma
otherwise, no special formatting is associated with the <postcode> element type.
pubfront.
Should this be 'RSC internal use only'?
Publication front matter. Contains the following elements in the order specified:
fpage
lpage (optional)
no-of-pages
date
the contents of <pubfront> are all suppressed by default
published elements with type="print", and containing a <pubfront> with
year="PENDING", are rendered as the phrase "Publish PENDING" in red. If the <year> is empty, they are rendered as "Publish Pending", also in red
published elements with type="web" are rendered as a bold paragraph
B, P in HTML output "Published on the Web ", followed by <pubfront><date>, formatted as described under
date
the
year within <pubfront>, from the
published element with type="print", is used in the copyright statement
published. A link to a document/resource in which an article has been published. Contains a
citext, or the following elements in the order specified:
Use the analysed citation subelements to describe print publication, or <citext> to record online publication.
Is that right? Web publication uses <pubfront>. RK: not sure that this is right - citext??
type: the type of publication. Should take one of the values: "print", "HTML" or "PDF".
Should it be "HTML" or "web"??
doc: can specify a URL where the online publication is located
from: the [start of the] target within the linked document, expressed as an XPath expression
to: (for ranges only), the end of the target within the linked document, expressed as an XPath expression
the contents of <published> are all suppressed by default <
journalref><
title> is selected from <published> with type="print", and used to specify the journal within a cell
TD in HTML output in the header table at the start of the article. It is rendered as bold italic
B, I in HTML output
as noted under
pubfront, <published> with type="print" is used to generate "Publish Pending", "Published on the Web", and copyright statements
publisher. RSC internal use only
The publisher of a journal. Contains "organisation" subelements, i.e. a
link, or one or more
orgnames followed by zero or more
addresses.
<aff> now has <address>, and <org> within it also has <address> - overkill?
id: a unique identifier for this element
no special formatting is associated with the <publisher> element type
pubname. A publisher name. Contains 'simple text'.
This is no longer linked to anything, so should be removed from the DTD.
pubplace. The place of publication of a book, etc. Contains 'simple text'.
no special formatting is associated with the <pubplace> element type
qualifier. A qualification to a person's name, such as a title, an honorific, or a phrase such as 'the late'. Contains 'simple text'.
no special formatting is associated with the <qualifier> element type
received. A container for details of the date when, and place where, an article was received. Contains an optional
city, followed by a
date.
placed after an article's authors, as a bold italic paragraph
B, I, P in HTML output
"Received ", followed by the
city, if present, preceded by " (in " and followed by ") ", then the
date
role. RSC internal use only
A role played by one or more people. Contains 'simple text'.
no special formatting is associated with the <role> element type
roman. Indicates that the contained text should be rendered as a roman typeface. Contains 'simple text'.
Only use this element when it is not possible to deduce
why the text is rendered in this way. If possible, always use a more meaningful element type.
no special formatting is associated with the <roman> element type
Should be rendered as normal text.
row. A row in a table or table heading. See above for general guidance on
encoding tables.
Contains one or more
entry elements.
rowsep: whether there is a row separator ("0" means "no"; any other digit value means "yes")
valign: the vertical alignment of the row ("top", "middle" or "bottom")
rendered as a table row
TR in HTML output
the
valign attribute is used when specified; otherwise the <row>'s parent's
valign is used when specified; otherwise vertical alignment is set to "bottom"
VALIGN attribute in HTML output
Should this be some other value by default?
sansserif. Indicates that the contained text should be rendered in a sans serif typeface. Contains 'simple text'.
Only use this element when it is not possible to deduce
why the text is rendered in this way. If possible, always use a more meaningful element type.
no special formatting is associated with the <sansserif> element type
Surely something should be done with this!
scheme. A scheme. Contains an optional
title. See
above for general guidance on encoding graphics.
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
src: the entity which contains the graphic (see notes above on external entities)
height: the height of the graphic, expressed as ...
width: the width of the graphic, expressed as ...
I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
the scheme is output within a centred half-width table (TABLE) the scheme itself is rendered as an image (IMG) if the scheme has a <title>, this is output in a separate row (TR) below the image the heading 'Scheme N' is generated, where N is the scheme number as indicated in its
id attribute
a text break instruction is output before and after the scheme
BR clear="all" in HTML output
schemref. A reference to a scheme. Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
a link is made to the first idref specified in the
idrefs attribute
A in HTML output
scp. Indicates that the contained text should be rendered in small caps. Contains 'simple text'.
Only use this element when it is not possible to deduce
why the text is rendered in this way. If possible, always use a more meaningful element type.
the contents of <scp> elements are converted to upper case and rendered as small type
SMALL in HTML output
section. A top-level section. Contains these elements in the order specified:
no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect1 (optional and repeatable)
id: a unique identifier for this element. See
above for guidance on assigning id's.
type: the type of section
no special formatting is associated with <section>s within
biography
otherwise, <section> is rendered as a separate division
DIV in HTML output
an anchor is generated at the start of the section, with a name based on the
id attribute if specified; otherwise a unique name is generated, based on the section's position within the article
A; NAME attribute in HTML output
sercode. RSC internal use only
A serial (journal) code, conforming to the
list of codes given above. Contains character data only.
To be added by data capture agency
the value of sercode is used to locate the correct journal details when preparing the article for rendering <sercode> is suppressed by default the value of <sercode> is used to specify the pathname for associated image files, and to retrieve the correct journal logo
sertitle. A serial (journal) title. Contains 'simple text' or paragraphs.
type: the type of series title
within
citation and
journalcit, <sertitle> is rendered as italic
I in HTML output
within
journalcit ", " is output after all but the last <sertitle>
otherwise, no special formatting is associated with <sertitle>
the DTD now only has <sertitle> within <jnltrans>. Elsewhere it has become <title>. The style sheet needs updating to take account of this (the code described here will never be called upon), and <sertitle> should probably be removed from the DTD and replaced by <title> within <jnltrans>.
sici. A Serial Item Contribution Identifier. Contains character data only.
like all
art-admin subelements, this is suppressed
stack. One or more characters appearing directly above other characters (like a fraction without the horizontal line). Contains
above followed by
below.
below is output as subscript
SUB in HTML output, followed by
above as superscript
SUP in HTML output
Are these output in the wrong order?
state. A geopolitical unit such as a state, county, etc. Contains character data only.
no special formatting is associated with <state>
subject. A broad subject heading, ideally taken from a controlled list. Contains 'simple text'.
type: the type of subject category
no special formatting is associated with <subject>
This element type should be suppressed
subsect1. A level-1 subsection. Contains these elements in the order specified:
no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect2 (optional and repeatable)
id: a unique identifier for this element. See
above for guidance on assigning id's.
type: the type of section
<subsect1> is rendered as a separate division
DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the
id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article
A; NAME attribute in HTML output
subsect2. A level-2 subsection. Contains these elements in the order specified:
no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect3 (optional and repeatable)
id: a unique identifier for this element. See
above for guidance on assigning id's.
type: the type of section
<subsect2> is rendered as a separate division
DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the
id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article
A; NAME attribute in HTML output
subsect3. A level-3 subsection. Contains these elements in the order specified:
no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect4 (optional and repeatable)
id: a unique identifier for this element. See
above for guidance on assigning id's.
type: the type of section
<subsect3> is rendered as a separate division
DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the
id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article
A; NAME attribute in HTML output
subsect4. A level-4 subsection. Contains these elements in the order specified:
no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect5 (optional and repeatable)
id: a unique identifier for this element. See
above for guidance on assigning id's.
type: the type of section
<subsect4> is rendered as a separate division
DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the
id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article
A; NAME attribute in HTML output
subsect5. A level-5 subsection. Contains these elements in the order specified:
no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect6 (optional and repeatable)
id: a unique identifier for this element. See
above for guidance on assigning id's.
type: the type of section
<subsect5> is rendered as a separate division
DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the
id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article
A; NAME attribute in HTML output
subsect6. A level-6 subsection. Contains these elements in the order specified:
no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
id: a unique identifier for this element. See
above for guidance on assigning id's.
type: the type of section
<subsect6> is rendered as a separate division
DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the
id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article
A; NAME attribute in HTML output
subtitle. A [table] subtitle. Contains 'simple text' or paragraphs.
no special formatting is associated with <subtitle>
sup. Indicates that the contained text should be rendered in superscript. Contains 'simple text'.
Only use this element when it is not possible to deduce
why the text is rendered in this way. If possible, always use a more meaningful element type. <sup> is often mistakenly used instead of
<citref>.
the contents of <sup> elements are rendered as superscript
SUP in HTML output
suppinf. Contains a
link to supplementary information for an article.
<suppinf> is suppressed
surname. A surname. Contains 'simple text'.
no special treatment for <surname> elements
table. A table, encoded using CALS-compliant XML markup. See above for general guidance on
encoding tables.
(Tables which cannot be thus encoded should be prepared as images, and encoded as
ugraphics.)
Contains an optional
title, followed by an optional
subtitle, followed by one or more
tgroups. Note that <title> and <subtitle> within
table-entry should be used in preference to these elements, since this allows titles for XML-encoded and 'image' tables to be treated consistently.
Although, as the DTD notes, we can't clear %titles;, we
could set parameter entity %tbl.tbl-titles.mdl to "" and so remove this possibility.
pgwide: page width ("0" means "no"; any other digit value means "yes")
a rule
HR in HTML output is output before each <table>
spacing from the source document is preserved within <table> <table> is rendered as a full-width table
TABLE in HTML output
for print, a
pgwide value of '0' signifies a single-column table, and '1' a page-width table
table-entry. 'cover group' for a table, whether declared inline as
tableor given as a
ugraphic. See above for general guidance on
encoding tables.
Contains an optional
title, followed by an optional
subtitle, followed by either
table or
ugraphic.
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
a break
BR in HTML output is output before and after each <table-entry>
an anchor is output, with name equal to the id attribute
A; NAME attribute in HTML output, followed by "Table " and a system-generated table number, in bold
B in HTML output, followed by the contents of <table-entry>
tableref. A reference to a table. Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
a link is made to the first idref specified in the
idrefs attribute
A in HTML output
tbody. A table's body matter (i.e. the main table, ignoring any header or footer). See above for general guidance on
encoding tables.
Contains one or more
rows.
valign: the vertical alignment of the row ("top", "middle" or "bottom")
no special treatment for <tbody> elements within
tgroup
otherwise, <tbody> is rendered as a table body
TBODY in HTML output
term. A term being defined in a
deflist. Contains 'simple text'.
no special treatment for <term> elements
textref. A cross-reference to text elsewhere in the article. Contains 'emphasised text' giving a human-readable description of the cross-reference. See
above for general guidance on creating cross-references.
idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
a link is made to the first idref specified in the
idrefs attribute
A in HTML output
tfoot. The footer area of a table. See above for general guidance on
encoding tables.
Contains zero or more
colspecs, followed by one or more
rows.
Shouldn't <tfoot> have some CALS-style attributes?
no special treatment for <tfoot> elements within
tgroup, apart from outputting them after <tbody>
otherwise, <tfoot> is rendered as a table footer (TFOOT)
tgroup. A table group. See above for general guidance on
encoding tables.
Contains these elements, in the order specified:
colspec (optional and repeatable)
thead (optional)
tfoot (optional)
tbody
cols: the number of columns in the table
colsep: indicates a column separator ("0" means "no"; any other digit value means "yes")
rowsep: indicates a row separator ("0" means "no"; any other digit value means "yes")
align: default cell alignment. Takes one of the values "left", "right", "center", "justify" or "char"
within
tgroup, subelements are output in the order:
thead,
tbody,
tfoot without any special formatting. (In other words, the whole table is output as a single block: headers and footers are not treated specially.)
thead. The header area of a table. See above for general guidance on
encoding tables.
Contains zero or more
colspecs, followed by one or more
rows.
Shouldn't <thead> have some CALS-style attributes?
no special treatment for <thead> elements within
tgroup
otherwise, <thead> is rendered as a table header
THEAD in HTML output
title. A title (of a figure, table, journal, etc.). Contains 'simple text' or paragraphs.
type: the type of title
the article title (within
titlegrp) is rendered as a level-2 heading
H2 in HTML output, with a rule above
HR in HTML output
all section titles are prefixed by a preceding
no element at the same level, if present
<title> within
section is rendered as an a-heading
<title> within
subsect1 is rendered as a b-heading
<title> within
subsect2 is preceded by a break and an em space, and rendered as bold (i.e. a c-heading)
<title> within
subsect3 is preceded by a break and an em space, and rendered as italic (i.e. a d-heading)
<title> within
citation is rendered as italic
<title> within
journalcit is rendered as italic, and followed by ", " if it is not the last <title>
<title> within
figure,
plate,
scheme and
chart is rendered in bold, in a left-aligned table cell
TD in HTML output, with a suitable prefix (e.g. "Fig. ")
spacing from the source document is preserved in <title>s within
table-entry
p elements within <title> do not generate any markup
also types - type="subtitle" will have to be rendered - also whether or not paragraphs are included should be explicit. NH>type = "addition" Will additions ever be captured externally?
otherwise, no special formatting is associated with <title>
titlegrp. A container for an article's main titles. Contains one or more
titles.
no special formatting is associated with <titlegrp>
toc-entry. An entry in a table of contents. Contains 'simple text' or paragraphs.
no special formatting is associated with <toc-entry>
toc-head. Heading for a table of contents. Contains 'simple text'.
no special formatting is associated with <toc-head>
trans. A translation (of a citation)
Contains mixed content, which can include the following element types as required:
citauth
title
year
volumeno
issueno
arttitle
biblscope
editor
citpub
pubplace
link
url
email
trans
'emphasis' elements
no special formatting is associated with <trans>
Perhaps it should be - e.g. italic? RK to comment, please.
ugraphic. An untitled graphic. Use this element to encode any graphical content which doesn't have a title. See
above for general guidance on encoding graphics.
id: a mandatory unique identifier for this element. See
above for guidance on assigning id's.
src: the entity which contains the graphic (see notes above on external entities)
height: the height of the graphic, expressed as ...
width: the width of the graphic, expressed as ...
I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
display: how the graphic is to be displayed. Takes value "displayed" or "inline"
if the graphic does not have display="inline", and does not appear within an
equation (or a
table - it isn't possible to have tables nested inside tables), it is output within a centred half-width table, and a text break instruction is output before and after the graphic
BR clear="all" in HTML output
graphics with display="inline", and graphics within equations, are output with no additional markup the graphic itself is rendered as an image
IMG in HTML output
ul. Indicates that the contained text should be underlined. Contains 'simple text'.
no special formatting is associated with the <ul> element type.
We could implement this as a CSS style - text decoration - but this wouldn't be totally cross-platform
underbar. An underbar. Indicates that a bar should be placed below all the text within this element. Contains 'simple text'.
In what way is this different from <ul>?
no special formatting is associated with the <underbar> element type.
Should aim to implement this as an underline - CSS text decoration again?)
unknown. A feature in the text which cannot be encoded by any other element type in the DTD. Use the
type attribute to indicate the nature of the feature.
Do we need to generate some warning when this element is used?
Contains 'simple text'.
type: the type of 'unknown' information
rendered in a fixed-width font
KBD in HTML output, with a double break above and below
BR in HTML output
spacing from the source document is preserved in the rendered result
url. A URL. Contains character data only.
id: a unique identifier for this element
Probably not needed: drop from next version of DTD?
rendered as an anchor, with a target equal to the element's data content
A; HREF attribute in HTML output
value. RSC internal use only
The value of an index entry. Contains character data only.
no special formatting is associated with the <value> element type.
volume. RSC internal use only
One volume of a journal. Contains a
link, or the following elements in this order:
journalref
volumeno
date
issue (optional and repeatable)
id: a mandatory unique identifier for this element
no special formatting is associated with the <volume> element type.
volumeno. A journal volume number. Contains character data only.
When used within the <volume> element, this should be a 3-digit number with leading zeroes
Still true?
non-empty <volumeno> elements within
journalcit and
citation are rendered as bold
B in HTML output, and followed by ", " if they are not the last component of the citation
otherwise, no special formatting is associated with the <volumeno> element type.
volumeref. A reference to one volume of a journal. See
above for general guidance on creating cross-references.
Contains a
link, or the following elements in this order:
journalref (optional)
volumeno
date (optional)
issue (optional and repeatable)
no special formatting is associated with the <volumeref> element type.
warning. A warning. Contains 'simple text'.
no special formatting is associated with the <warning> element type.
Should be red text.
who. The identity of the person making an editorial note (
editnote). Contains 'simple text'.
Wouldn't it make more sense to have <person> in place of this element type - replace by <person> in next version of DTD.
no special formatting is associated with the <who> element type.
year. A 4-digit year. Contains character data only. The value "PENDING" is allowed for
date within
pubfront.
non-empty <year> elements within
journalcit are followed by ", " if they are not the last component of the citation
This also used to apply to <citation>, but no longer does.
published elements with type="print", and containing a <pubfront> with
year="PENDING", are rendered as the phrase "Publish PENDING" in red. If the <year> is empty, they are rendered as "Publish Pending", also in red
the <year> within
pubfront, from the
published element with type="print", is used in the copyright statement
otherwise, no special formatting is associated with the <who> element type.
Appendix B. Notations
Table -notationsNotations recognized within the RSC application
Name
PUBLIC identifier where known
bmp
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Microsoft Windows bitmap//EN"
"ISO/IEC 10918:1993//NOTATION Digital Compression and Coding of Continuous-tone Still Images (JPEG)//EN"
mpeg1aud
"ISO/IEC 11172-3:1993//NOTATION Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 3: Audio//EN"
mpeg1vid
"ISO/IEC 11172-2:1993//NOTATION Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video//EN"
mpeg2aud
"ISO/IEC 13818-3:1995//NOTATION Coding of moving pictures and associated audio: Part 3. Audio//EN"
mpeg2vid
"ISO/IEC 13818-2:1995//NOTATION Information technology - Coding of moving pictures and associated audio: Part 2. Video//EN"
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Microsoft Windows Metafile//EN"
chemdraw
eqn
pdf
ps
Appendix C. Changes to the RSC DTD
This Appendix lists the changes made to the RSC DTD from version 3.4 onwards.
Summary of changes in version 3.4
Version 3.4 of the RSC Article DTD is a maintenance release, which aims to solve problems encountered while encoding articles, and to provide the RSC with the opportunity to add improved management information to articles.
The following changes are relevant to the encoding of actual articles:
footnotes can now be added after an author's name (within
<author>, after
<person>)
the content models of a number of elements (
<footnote>,
<note>,
<head>,
<toc-entry>,
<ictext>,
<dedicate>,
<abstract>,
<dd> and
<ack>) have been extended so they can contain textual subelements as well as multiple paragraphs
all 'reference' elements (
<compoundref>,
<textref>,
<figref>,
<schemref>,
<plateref>, etc.) can now contain subelements which support font style changes
<persname> now has an optional
lt;qualifier> subelement that can appear at the start or end of the name. This allows titles, qualifications, and informal phrases such as 'the late' to be encoded
<ugraphic> now has a
src2 attribute to support the addition, specifically, of TeX versions of a graphic. This new attribute should no longer be used.
the RSC-specific entity set has been filled out with declarations for some commonly-required characters some simple ISO entities have been added to the allowed character entity set there is a new
<arttitle> element for encoding article titles within citations
within the
<published> element,
<volumeref>,
<issueref> and
<pubfront> are now optional
the content model for
<eqntext> has been changed to allow it to contain multiple paragraphs
the element type url has been added to the class 'general', which allows it to be used anywhere within text the 'fixed' DTD version has been changed to '3.4'
The following changes are only relevant to RSC's internal management procedures:
a new
<admin-event> has been added within
<art-admin>. This has a
type attribute, and subelements
<agent>,
<address> and
<date>. In addition, it can contain a nested
<admin-event>, thus supporting complex multi-level events if required. (In future, this element might be preferred to
<date> for encoding 'accepted' details.)
journalref, volumeref and issueref now have the same content model as journal, volume and issue respectively. This allows links to be replaced by the relevant content without invalidating the document a
price-code attribute has been added to
<article>
<journal> now has an optional repeatable
lt;logo> element, containing a graphic
Summary of changes in version 3.5
The following changes in version 3.5 will affect the encoding of articles:
the 'fixed' DTD version has been changed to '3.5'
<authgrp> within
<art-front> is now optional
<org> within
<aff> is now [optional and] repeatable;
<org> and
<address> are repeatable as a pair
<ack> now has an optional
title attribute
content model for
<trans> has been made the same as that for
<citation>.
N.B. this change is not upwards-compatible. The previous content model for trans allowed citext. This is replaced by the 'mixed content with %emph;' approach offered by %m.citation
<email> has been added to the %gen; content model class, allowing email addresses to appear wherever this class is allowed (which is pretty well anywhere in textual content)
<url> and
<email> have been added to the %m.citation; content model class
<url> now has a
url attribute, which can be used to specify the url. If not used, the data content of the <url> element is taken to be the actual url, as before
there is a new
<a> element type, designed to support hyperlinks which use an image as the clickable link
there is a new
<subject> element type, which can contain a broad subject heading to categorise the article
within the content model for
<journalcit>,
<link> has been made into an 'optional extra', so that citations can be supported by e.g. a DOI and a COI
<link> now has a
type attribute, for e.g. COIs
for RSC internal use only and DOIs
[usage convention only:] within
<suppinf>, the content of the
<link> element should now be 'INFO' or 'CRYSTAL'. 'INFO' corresponds to the single value that was previously allowed ('TRUE')
The following changes are only relevant to RSC's internal management procedures:
<coden> element type added to header information
Summary of changes in version 3.6
This version contains the following changes:
the 'fixed' DTD version has been changed to '3.6' the parameter entity a.dtd has been altered to RSCPA3.6, and is now actually used! the new element type <a> is now actually allowed within a document the new 'generated' set of entity declarations rsc_x.ent is used element type no now has a content model of 'simple text' instead of just #PCDATA the common attributes for graphics (%a.graphic;) now have a prefix attribute, which can have
values 'prefix' (default) or 'noprefix' year and pages are now optional within journalcit additional arttitle element added between citauth and title citext content model is now %m.simple-text-or-paras; to allow new paragraphs (or at least line
breaks) citation content model (%m.citation;) now includes pages new element commentary added within citgroup subsect2 and subsect3 can now have zero or more citref elements immediately after the section title table cells can now contain paragraphs
Summary of changes in version 3.7
This version contains the following changes:
the 'fixed' DTD version has been changed to '3.7' classification element type added to art-front after keyword; supported by separate DTD
file class.dtd