Guidelines on the capture of RSC journal articles

Richard Light

doi:10.1039/RBL001

This is the old DTD documentation, not necessarily applicable to the Schema documentation.

Introduction

Scope of this document

These guidelines are a guide to Version 3.7of the RSC Primary Articles DTD.

Feedback and updates

We expect to learn a considerable amount about our developing XML application from the routine encoding of articles. Please let us know of any problems you encounter in using these instructions while trying to encode articles using the DTD provided. This will help us to improve both the application and its associated documentation.

We plan to issue updates to the DTD and documentation at regular, planned, intervals. You will be notified of these updates in advance, so that you can allocate resources to deal with any changes to data capture instructions or rendering software that might be required.

We intend to introduce the next version of the DTD in August 2000, with a preliminary version being available for comment and testing two weeks beforehand.

Format of this document

This document fulfils two functions. As well as containing instructions on the conventions to follow, it acts as an example of the results that are expected, being written to conform to the RSC Primary Articles DTD Version 3.7 .

Since this document is fully XML-conformant, it can be browsed in Internet Explorer 5.0, or converted to HTML, using the XSLT style sheet provided.

Scope of the data capture work

The initial objective is to capture all the text within each article which can be encoded in SGML/XML (see next section). The DOCTYPE and document element will always be <article>. Within this, the <art-admin> (which holds the article's unique manuscript number), <published> (for articles which have already appeared in print), <art-front>, <art-body> and <art-back> element types will be routinely used, with an occasional <appmat>.

SGML/XML encoding

As far as possible, all the information in the articles presented should be encoded in SGML and included in the resulting document. Obvious exceptions are figures, which should be referenced as external entities in the standard manner (see Graphics below).

Both tables and equations are liable to be more difficult. If possible, these should be encoded in SGML, but we accept that there are liable to be cases where this is not feasible (or even possible) due to the complexity of the data or inadequacies in the DTD as currently drafted. In these cases the relevant object should be treated as a graphic. A particular example is where a table contains graphics spanned across rows or columns - this would be impossible to render accurately from the SGML. See Tables and Equations below for specific guidelines.

Articles should conform to XML as well as SGML conventions. This means that:

an XML declaration must be provided at the start of the article
processing instructions must be terminated by "?>"
empty elements must be terminated by "/>" (i.e for colspec, ugraphic, icgraphic)
end-tags should always be provided, except for empty elements
element and attribute names must be entered in lower case, as per their definitions in the DTD
attribute values should always be quoted

A variety of tools can (and should) be used to check that articles consist of valid SGML/XML. The nsgmls program will check for SGML conformance. There is a wide variety of free or inexpensive XML-aware software. For example, if you open an XML document in Internet Explorer 5, its built-in XML parser will check the document for validity and report any errors.

File naming conventions

All manuscripts will have a unique identifier, assigned by RSC, e.g. a901234h. As well as being used to name the file containing the encoded article, this identifier will be encoded as the <ms-id> element within the article.

The RSC will name graphics files as follows:

a901234h-f1.tif (for figure 1)
a901234h-s2.tif (for scheme 2)
a901234h-u1.tif (for ugraphic 1)

Graphic types: (from RSC) RK: We might be better splitting this into data capture sections and final supply sections. At data capture we will also require maths captured as TeX, as e.g. b000114c-t1.tex

f1, 2, 3.. figures
s1, 2, 3.. schemes
u1, 2, 3.. ugraphics
ga graphical abstract
c1, 2, .. charts

The following filename styles should be supplied to the RSC:

for ms-id use the form a908765g
for the SGML/XML files (and PDF in the future) use the filenames in the form a908765g.xml, .pdf
for graphics generated at data capture/typesetting (maths, table images where required) use the form a908765g-t1.tif, and increment the numbers as t2, t3, t4, etc through the document.

Lower-case should be used.

File Delivery

We require, for each paper:

An SGML/XML file named as *.xml: File width - max width 1000 characters
Any graphics created for inline/displayed maths, as LZW-compressed TIFF files at 600 dpi. The SGML/XML file will require call-outs to these images. The images should be named as specified above (e.g, a901234h-t1.tif). Should we say any more about graphics formats, for figures, colour, other resolutions etc.?
The images as supplied by the RSC (figures, schemes, etc)
"also TeX" [RK]

In other words, all relevant files should be supplied. Each document and associated files should be delivered as a zip file, named as above (e.g. a901234h.zip)

Form of PUBLIC identifiers

PUBLIC identifiers should be used throughout

In addition, each PUBLIC identifier should be followed by a SYSTEM identifier giving a URL that locates the resource in question. This belt and braces strategy will allow articles to be treated as valid XML (XML requires a SYSTEM identifier), while offering us the flexibility of using SGML-aware software to interpret the PUBLIC identifiers in different ways, as necessary.

Thus the DOCTYPE declaration at the head of each article should always take the form:

<!DOCTYPE article PUBLIC "-//RSC//DTD RSC Primary Article DTD 3.7//EN" "http://www.rsc.org/dtds/rscart37.dtd">

PUBLIC identifiers should be constructed using the general format:

"RSC// [MS number] [object src]"

where the object src is the element type with number:

ugt (ugt1)
fig (fig3)
sch (sch2)
pl (pl13)
cht (cht1)
etc.

e.g.

"RSC a706828h eqn3"

The names assigned within each article for the external entities it references should reflect the last component of the entity's PUBLIC identifier, e.g.

<!ENTITY eqn3 PUBLIC "RSC// a706828h eqn3" ...

Form of SYSTEM identifiers

The SYSTEM identifiers (i.e. filenames) assigned to each external entity should consist of the article's manuscript number followed by the entity's name, with a suitable suffix, e.g.:

<!ENTITY ugt3 PUBLIC "RSC// a706828h ugt3" "a706828h-t3.tif" NDATA tiff>

Documents relating to the RSC DTD

The DTD itself is in the file rscart36.dtd. A number of other files are required before documents will parse against the DTD. They should all be stored in the same directory as the DTD itself, apart from the entities files (*.ent) which should be stored in a subdirectory named entities. We use Internet Explorer 5 as our (XML) parser. We suggest suppliers use the same parser.

SGML Declaration

An SGML Declaration suitable for use with this DTD is in the file rscxml33.dcl. This Declaration allows an XML-encoded article to be processed by SGML software. It specifies features such as case-sensitivity for element and attribute names, quoting of attribute values, XML-style processing instructions and empty element syntax, and Unicode support.

Catalog file

The catalog file rscart3s.cat is in the standard OASIS catalog file format. It resolves all the PUBLIC identifiers declared in the DTD, as well as the PUBLIC identifier of the DTD itself. This catalog file invokes the SGML version of the DTD, rather than the XML version. It uses the file rscsgm36.dtd to set up the DTD's parameter entities for SGML. If required, an updated rscart3s.cat can be used to override the DTD's online SYSTEM identifier and point instead to a local copy.

Table support

The file calstab1.dtd contains the OASIS-supported DTD fragment which supports the interoperable CALS table model subset. Additions and changes to this model are declared in the body of the DTD itself, not here.

Entity declarations

Two files containing character entities are provided. One of these contains mappings of characters to numeric values that conform to Unicode 2.0 (rsc_x.ent). This is for use with the default XML interpretation of the DTD. It should be noted that we plan to use Unicode Combining Characters to partially solve the problem of 'one character over another'. This means that rendering software will need to support Combining Characters, ideally in a generalized manner.

The other file maps exactly the same characters to SDATA entities, and is for use with the SGML interpretation of the DTD (rsc_s.ent).

If an article contains any characters which are not in the RSC set, they should be declared in the article's internal DTD subset and RSC should be alerted to the need to add them to the standard set.

Character mappings file. RSC maintains information about special characters in a character mappings file (charmaps.xml). The entity declarations described above are generated from this file by XSLT style sheets. Characters in this file are categorized into one of the following classes:

non-ASCII character
ASCII character
ASCII diacritic
ligature
non-ASCII diacritic
combining
RSC character

These categories help to ensure that each character is mapped to the most appropriate result when different types of output encoding are generated:

ASCII: This involves:

mapping all diacritical characters [which fall outside the ASCII 255-character range] to the corresponding single letter
mapping all ligatures [which fall outside the ASCII 255-character range] to the corresponding pair of letters
mapping all other characters which fall outside the ASCII 255-character range to one or more ASCII characters, if there is an ASCII equivalent (e.g. —)
suppressing all combining characters
suppressing all other characters which fall outside the ASCII 255-character range

Unicode: This involves:

suppressing all combining characters
suppressing all RSC 'special use' characters which fall outside the Unicode standard character range

HTML: This involves:

mapping all letters plus combining characters to a suitable image file
mapping all RSC 'special use' characters which fall outside the Unicode standard character range to a suitable image file
mapping all Unicode characters which don't display in browsers to a suitable image file

XML: This involves:

outputting all non-ASCII characters as entity references, using the <name> element from charmaps.xml

General conventions

Guidelines

Style guidelines. The style guidelines for each journal describe general conventions for article structure. Use these as a guide to the structure and content of articles.

In particular, while encoding articles these guidelines should be used to infer when a change of type style (e.g. to bold) implies a specific element type, as discussed below under Cross-references.

Semantics of the table model. The table model used is developed from the interoperable CALS table model subset supported by OASIS ^1a-b. The OASIS web site contains a description of the generic CALS table model ^1a, and a description of the semantics of this interoperable subset ^1b.

Version 3.7of the DTD simplifies the level of CALS table support that is required by removing the <spanspec> element type (which is not part of the interoperable subset). This has been found to be unnecessary, since both horizontal and vertical spans within tables can be represented without it. (<colspec> provides all the information that is required for horizontal spanning, while the MOREROWS attribute supports vertical spans.) It adds support for rotated tables by including the ORIENT attribute, which can be set to "land" to indicate a landscape, i.e. rotated, table.

Article structure

Each article consists of front matter, body matter and back matter.

The article itself can have a type attribute, which specifies what type of article it is. This table summarises the codes to be used for each type of article, and the types of article that are currently liable to appear in each journal published by the RSC. (See below for a key to the journal codes in this table.)

Table -arttypes Article type codes and usage

Article type	Code	PO	EM	GC	DT	JM	P1	P2	JC	CC	FT	AN	AC	JA	MC	FD	NP	CS	IC/OC/PC	NJ	RC	QU	CE	GT
Papers	ART		X	X	X	X	X	X			X	X		X		X				X		X	X	X
Comms	COM			X	X	X	X	X		X	X		X	X	X
Perspectives	PER				X							X
Letters	LET			X	X															X		X		X
Feature Articles	FEA	X	X			X				X
Editorial	EDI	X		X	X	X	X	X		X		X	X							X		X	X	X
Synopsis	SYN								X
Full text	ART								X
Research Articles	RES										X
Discussions	DIS															X
Review Articles	REV	X		X			X					X	X	X			X	X	X		X			X
Book Reviews	BKR	X		X										X			X
News	NWS	X	X	X								X		X
News articles	NAR			X
Highlights	HIG			X								X	X										X
Interviews	INT													X
Technical note	TEC		X
Events/Conference Diary	CNF			X
Conference reports	CRP	X																					X
Synthetic abstract	SAB						X
Cover Feature	COV						X
Focus	FOC		X	X
Viewpoints	VPT		X	X
Invited Lecture	LEC										X
Keynote Article	KEY							X
Hot off the Press Articles	HOT																X
Atomic Spectrometry Update	ASU													X
Analytical Methods Committee	AMS											X
Inter-laboratory Note	ILN													X
Critical Review	CRV											X
Tutorial Review	TRV											X
Glow Discharge Paper	GDP													X
Glow Discharge Comm	GDC													X
Glow Discharge Review	GDR													X
Glow Discharge News Article	GDN													X
Glow Discharge Technical Note	GDT													X

Front matter

The front matter consists of <art-admin>, which holds the article's unique manuscript number, <published>, which contains details of the journal, volume, issue in which the article has been printed and the relevant pagination details, and <art-front>, which is the front matter proper.

For accepted date, use:

<date role="accepted"><year>1999</year><month>April</month><day>23</day></date>

For the date on which a revised version of an article was issued, use the same date format, with role="revised".

Authors - we would like the corresponding author to be identified. There's no need to mark the others as 'princ'.

For affiliations, use the <org> <orgname></orgname> <org><address></address> for the address - although the <org> group does contain its own <address> element, this shouldn't be used for encoding the articles. We don't require any org ids.

The <published> element should be set with the attribute type="print", along with the journal code — the other pubfront subelements should be left blank.

<published type="print"> <journalref><link>GC</link></journalref> <volumeref><link>001 </link></volumeref> <issueref><link>unknown</link></issueref> <pubfront><fpage></fpage><no-of-pages></no-of-pages> <date><year></year></date> </pubfront> </published>

Body matter

The body of each article consists of an <art-body>, containing one more <section>s.

These are the top-level structural units within each article: lower levels are represented by <subsect1>, <subsect2>, etc. (N.B. the numbering of section-level element names represents their depth of nesting, not repetition.)

Care should be taken to ensure that the structure of the article, implied by the style of headings, is correctly reflected in the <section> and <subsectN> elements assigned. See <title> for details of heading typestyles.

Appendices

Any appendices to an article are placed within an <appmat> element, between the <art-body> and <art-back> elements. This contains one or more lt;appendix> elements, each optionally numbered and containing one or more <section>s.

Back matter

The back matter contains an optional <ack> element. This is followed by mandatory <biblist> and <compoundgrp> elements.

This last element is provided as a place to collect together <compound> elements, each of which defines the ID of a chemical compound mentioned in the article, and thus to provide a target for <compoundref> cross-references (which are normally set in bold face: see Cross-references). (The ultimate intention, not to be implemented at this stage, is to provide links back from these <compound> elements to the points in the article where the compound is defined or illustrated.)

Graphics

Graphical objects should be declared as external entities, with a suitable Notation. The RSC application provides a comprehensive set of possible notations, which ought to include all the image formats encountered. Let us know if any new image formats are encountered.

External entity declarations should include PUBLIC identifiers as well as SYSTEM identifiers, e.g.

<!ENTITY ugr1 PUBLIC "RSC// a904043i ugr1" "a904043i-u1.tif" NDATA tiff>

Graphics take the following attributes:

ID: a unique ID for this graphic (see notes below on assigning IDs) (required)
src: the entity which contains the graphic (see notes above on external entities)
height:
width:
pos: "float" for floating graphics: otherwise "fixed". "float" should be used for graphics marked as "A" blocks, while "fixed" should be used for "B" blocks and for graphics appearing in the body of the text. Graphics appearing within tables, equations should be assumed to be fixed.

Chemical formulae, equations, symbols for which no character entity is provided in the DTD and tables which are too complex to encode as XML should all be encoded as a <ugraphic> element. As well as the standard attributes for graphics, this has a displayed attribute which can take the value "displayed" (which indicates that the graphic should be set off from the surrounding text) or "inline" (which means that the graphic should form part of the current line).

Assigning unique id's

In order to make id's unique within each article, a prefix should be added to the identifier assigned by the author:

Table 1 id prefixes for different classes of target

1 Table footnotes should be given an id which is a combination of the table's id and a unique id for the footnote within that table, e.g. tab2fna. Table footnotes should be given letters (a, b, c, etc).
author affiliation	aff
chart	cht
chemical compound	chem
citation	cit
equation	eqn
figure	fig
footnote	fn
plate	pl
scheme	sch
table	tab
table footnote	tab + fn ¹
untitled graphic	ugr
typesetter-generated graphic (e.g. equations and tables which cannot be encoded in SGML/XML)	ug

Thus, for example, a citation referred to in the paper as ^8a should be given the id cit8a, while chemical compound 8•a should acquire the id chem8a.

We are using the number or letter in some of the id's to generate some of the numbering within the HTML article: for affiliations, equation numbering and table footnote lettering the aff, eqn, table fn should all be given the literal values that would appear in the text, e.g. affa, affb; eqn1, eqn2; tab1fna, tab3fnc. For the remaining id's a unique number or letter will be sufficient.

Links and cross-references

Internal cross-references within an article should use the standard SGML/XML ID-IDREF mechanism. To enforce this, we have specified as #REQUIRED the id attribute for all the elements that cross-references might point to. It is not practicable to do the same for pointer elements, since their target is not always present. To allow for this, the idrefs attribute is not mandatory. Instead, a presence attribute is provided. When a linking element has no target, this attribute should always be specified, with the value presence="missing".

This table summarises the element types which indicate cross-references, and the target element type for each.

Table 3 Mapping of cross-reference element types to target element types

Cross-reference element type	Target element type
<compoundref>	<compound>
<textref>	any textual element with an ID attribute
<figref>	<figure>
<schemref>	<scheme>
<plateref>	<plate>
<chartref>	<chart>
<eqnref>	<equation>
<boxref>	<box>
<tableref>	<table-entry>
<citref>	<citgroup>
<fnoteref>	<footnote>
<affref>	<aff>

One specific point to note is that <citref> does not point to a <citation> or <journalcit> element: instead it points to <citgroup>. This design allows any number of citations to occur within a single numbered or sub-numbered part of a References list.

In the unlikely event that an external link to another article (also encoded in SGML/XML) needs to be made, the general-purpose <link> element type is provided. This implements the Text Encoding Initiative (TEI) Extended Pointer mechanism, which allows all or part of a document to become the target of a link. It is anticipated that only the ID-based part of the TEI Extended Pointer syntax would be required in practice. Do not use the <link> element without checking with RSC first. The linking strategy described here is likely to be reviewed once the W3C's XLink proposal reaches Recommendation status.

Recognising cross-references. This table summarises typographical conventions which are often used to represent various types of cross-reference. Where a change of font style indicates such a cross-reference, it should always be marked up as such. In such cases, the cross-reference should not also be marked up as a change of font style.

Table 4

type style	data type	cross- reference type
superscript	arabic no. [+ letter suffix]	citref
superscript	letter	affref
superscript	symbol	fnoteref
bold	numbers, letters, roman numerals	compoundref

Numbering

For the present, numbers should be included in the <no> element if they are required. In the longer term, we plan to support the auto-numbering of sections by the addition of a single attribute. Once this is in place, it will no longer be necessary to number sections specifically.

There is no need (and no opportunity!) to number figures, schemes, boxes or plates. Suitable prefixes and numbers (e.g. "Fig 1.") will be supplied by style sheets. Other concepts (e.g. citations, equations, appendices, and chemical compounds) have an optional <no> element. This does not need to be used where the numbering scheme follows a simple sequence of arabic numbers, since the entries will be auto-numbered in this case. If any instance of a given element type has a non-standard number within an article, then the <no> element should be specified for all instances of that element type.

However, all of these concepts are allowed to have an ID, and some require one — these IDs still need to be specified even if the title or heading itself can be auto-numbered. We can't (yet) auto-number tables in appendices, which require numbers in the form A1, A2, etc.

Low-level elements

Emphasis and font style elements

Changes in font style should be marked up with the appropriate emphasis tags unless they indicate a specific concept, as discussed above under Cross-references.

Individual elements can be used to mark bold text, italic text, bold italic, underlined text, SMALL CAPS, ^superscript and _subscript. They can also be used in combination to represent, for example, ^{superscript bold text}.

Footnotes. Footnotes to be placed just after the first <fnoteref>.

All footnote characters should be auto-generated. In text, they follow the order:

dagger
double dagger
curly s thing
backwards P thing (paragraph mark)
double vertical line
double asterisk
2 daggers
2 double daggers
2 curly
2 backwards P
2 x double vertical lines
3 asterisks
3 daggers.....etc.

In table footnotes, they just appear as a, b, c, d, etc, where these letters are taken from the end of the id attribute's value.

Text. Spacing:

Equation spacing: +, minus, divide, times, are spaced on either side when in an equation (there is spacing around the mathematical character when it is between two digits e.g. 4 + 4. When it is just the character and one digit there is no space e.g. +4). This also to applies to proportional to, plusminus, similar to, approx. equal to, >, < and their >= variants.

multiple citrefs shouldn't be spaced: <citref idrefs="cit1 cit4 cit5 cit12">1, 4, 5, 12</citref> should be: <citref idrefs="cit1 cit4 cit5 cit12">1,4,5,12</citref>

Figure, scheme, etc references should be placed at the end of the paragraph in which they are first referenced.

<p> in titles to be used for Green Chemistry font change. Second <p> of GC titles will contain the details for the smaller title content. Simple titles don't need to use p at all.

For elements where the content model is empty (ugraphic, colspec, icgraphic) the elements need a closing solidus for XML: <colspec colname="1" colwidth="2.82*" align="left"/>

Compoundrefs: these can take any form, but the ids don't have to exactly match, e.g. <compoundref idrefs="chem61a">6·1a</compoundref>

Tables

Tables will normally appear inline, marked up according to CALS-compatible SGML. The standard CALS attributes should be used to render the table in a form that is as close as possible to the printed result. This includes, but is not limited to, the relative widths of columns, spanning of rows and columns, and the use of lines to separate headings. The specific conventions listed below are intended to be compatible with the approach supported by Adept's table editor:

relative column widths: use <colspec> with colname="n" and colwidth="X.XX*", where X.XX is a ratio of 1.00, the default column width;
individual cells ( <entry> element type) refer to their colspec with colname="n"
spanning: horizontal spans use namest="n" nameend="m" within <entry>; vertical spans use morerows="n" within <entry>
horizontal alignment: align"center", "right", "justify", "char" and "left" (default) within <entry>. When the align attribute is not specified for <entry>, the value in the appropriate <colspec> element will be used as a fallback
vertical alignment: valign="top", "middle", and "bottom" (default - !!) within <entry>. When the valign attribute is not specified for <entry>, the value in its parent <row> element will be used as a fallback, and failing that the value in the <row>'s own parent (<thead>, <tfoot> or <tbody>)
rules: in general, do not mark up ruler lines within tables. Default style rules will insert a rule below headings which span more than one cell. If absolutely necessary, use standard CALS conventions, i.e. rowsep="0" for e.g. bottom rule (?); "1" for vertical rule (?); ... and flag this as an exception
N.B. overall table width, row shading and non-standard row heights (other than spans) are recorded by Adept as processing instructions, and so are not encoded in the SGML

However, tables will sometimes be too complex to represent in this way, and so will be prepared as a graphic. To deal with this variation, a 'cover element' lt;table-entry> is provided, which contains either an inline <table> entry or a <ugraphic>. It is <table-entry> which requires a unique ID for <tableref> elements to point to, and which contains a <title> element.

One side-effect of this approach is that un-numbered tables can simply be encoded as <table>. From version 3.3 onwards, <table> can appear within text and between paragraphs.

Chemistry

Chemical compounds and simple formulae can often be represented as inline markup. <sup> and <inf> can be used to shift text, and <overbar> and <underbar> to place rules above or below chemical symbols. The character entity sets provided as part of the DTD (especially the ISO Chemistry set and the custom RSC set) support most chemical symbols that will be encountered. The <stack> element type can be used to encode the situation where one character appears directly above another.

Where chemical formulae are too complex to render as inline SGML, an inline or displayed <ugraphic> should be used instead.

Equations

Equations may appear inline, marked up in SGML using the tools available such as <fraction>: ¹ /₃.

However, equations will fequently be too complex to represent in this way, and so will be prepared as a graphic. To deal with this variation, a 'cover element' <equation> is provided, which contains either an inline <eqntext> entry or a <ugraphic>. <equation> requires a unique ID for <eqnref> elements to point to.

Multi-line text equations can be accommodated by adding another <p>. Within <eqntext>, you should either have no <p> subelements (one-line or inline equations), or nothing but <p> subelements (multi-line equations).

Citations

Where citations follow the standard pattern for journal articles, the <journalcit> element type should be used. In all other cases (including 'difficult' journal article citations, books, theses, computer software, etc.), the more flexible <citation> element type should be used. <citext> should be used to mark up text within the References section which is not a citation of any kind.

Numbering citations. As noted above in Links and Cross- references, the citation number is a property of the enclosing <citgroup> element, not the citation itself. This makes it easy to deal with the case where more than one citation is given under the same reference number. It also allows running text to be mixed with, or indeed take the place of, proper citations.

Note that the expected pattern for numbering citations is to use numbers for top-level entries, and letters for sub-entries. If the citations follow this pattern, the <no> element should not be provided for any <citgroup> element. Instead, nested <citgroup> elements should be used to represent the lower-level citations. (See the source SGML of these instructions for an example of this technique.)

Standard journal citations. Standard journal citations follow this model:

author (at least one)
optional article title
[journal] title
year
volume number
issue number
first page or page range
translation (optional)

Unless stated otherwise, each element should appear exactly once, and elements should appear in the order given. In such cases, <journalcit> can and should be used. The citation should be entered as a series of analysed subelements. No punctuation should be recorded between each component of the citation, and no style markup (e.g. italic for titles; bold for volume numbers) should be included. Punctuation and styling will be applied by the rendering process. Thus the citation:

G.H. Jonker and J.H. Van Santen, Physica, 1950, 16, 337

should be encoded:

<journalcit><citauth><fname>G. H.</fname><surname>Jonker</surname></citauth> <citauth> <fname>J. H.</fname><surname>Van Santen</surname></citauth> <title>Physica</title><year>1950</year><volumeno> 16</volumeno> <pages><fpage>337</fpage></pages></journalcit>

Non-standard citations. The <citation> element type should always be used for non-standard citations which, do not fit the standard model. The type of citation should be specified in the type attribute. Allowed values are:

article (the default value - this doesn't need to be specified)
book
thesis
patent
software
other

This isn't being done at present.

Within citations, the following concepts should always be marked up when they are present:

authors ( <citauth>)
titles ( <title>)
editors ( <editor>)
citpub ( <citpub>)
place of publication ( <pubplace>)
year of publication ( <year>)
journal volume number ( <volumeno>)
journal issue number ( <issueno>)
the part of the work that is being cited: section, pagination, etc. ( <biblscope>)

<citation> elements will be marked up as found, including all punctuation and style changes.

This is an example of a reference to a patent:

S. Iwaya, H. Masumura, Y. Midori, Y. Oikawa and H. Abe, US Patent, 4,404,029, 1983.

This should be encoded:

<citation type="patent"><citauth><fname>S.</fname><surname> Iwaya</surname></citauth>, <citauth><fname>H.</fname><surname>Masumura</surname> </citauth>, <citauth><fname>Y.</fname><surname>Midori</surname></citauth>, <citauth><fname>Y.</fname><surname>Oikawa</surname></citauth> and <citauth><fname>H.</fname><surname>Abe</surname></citauth>, <it>US Patent</it>, 4,404,029, <year>1983</year>.</citation>

Book citations. One particular type of non-standard citation which will frequently occur is a reference to a book, either in whole or in part. Again, <citation> should be used to mark these up. The <editor>, <citpub> and <pubplace> element types will often be required within such citations. A fairly typical, simple, example is:

S. Brooks and B. Johansson, in Handbook of Magnetic Materials, ed. K. H. J. Buschow, 1993, 7th edn.

This should be encoded:

<citation type="book"><citauth><fname>S.</fname><surname> Brooks</surname></citauth> and <citauth><fname>B.</fname><surname>Johansson</surname> </citauth>, in <title>Handbook of Magnetic Materials</title>, ed. <editor> K. H. J. Buschow</editor>, <year>1993</year>, 7th edn.</citation>

Note the following:

within <citauth>, analysis is the same as for standard citations. No space is required between the forename and surname because the rendering process will add one
no <it> element is required within the title: it will be rendered as italic
otherwise, all punctuation (i.e. all punctuation between analysed components) is provided exactly as in the source
the edition information does not fit the model for <biblscope>, and so is left as unanalysed text

A good mixed citation example:

<citgroup id="cit5"> <citation>During the preparation of this manuscript, diester <compoundref idrefs="chem1">1</compoundref> was isolated as a minor side product in the base promoted rearrangement of the analogous (<it>R</it>′,<it>R</it>′,<it>R</it>,<it>R</it>)-2,3-butane diacetal (BDA) protected dimethyl tartrate, see: <citauth> <fname>M. T.</fname> <surname>Barros</surname> </citauth> , <citauth> <fname>A. J.</fname> <surname>Burke</surname> </citauth> and <citauth> <fname>C. D.</fname> <surname>Maycock</surname> </citauth>, <title>Tetrahedron Lett.</title>, <year>1999</year>, <volumeno>40</volumeno>, <biblscope>1583</biblscope>.</citation>

and a <citext>:

<citgroup id="cit8"> <citext>The strong bias towards axial silylation was seen to fall if the mono sodium alkoxide did <it>not</it> precipitate prior to addition of the silicon halide.</citext></citgroup>

two other points:

a) where a citref appears within another citation. We have extended that content model of citelt so that it can contain "m.simple-text", i.e. any element types which can occur within paragraphs.This change should make citelt a much better 'catch-all' for miscellaneous stuff within citations.

b) where a citation includes a compoundref and ugraphic of the compound. The compoundref is allowed, but the ugraphic isn't. We have created a new class 'para-graphic' for these two element types. They can now appear anywhere 'text-elts' can appear, as well as between paragraphs.

RSC journal abbreviations. The journals published by the RSC have the following abbreviations, which can be used within the SGML/XML framework, e.g. in <journalref> elements:

Table 6

AC	Analytical Communications
AN	Analyst
CC	Chemical Communications
CE	Cryst. Eng. Communications
CP	PCCP
CS	Chem. Soc. Reviews
DT	Dalton Transactions
EM	J. Environmental Monitoring
FD	Faraday Discussions
FT	Faraday Transactions
GC	Green Chemistry
GT	Geo. Trans.
IC/OC/PC	Ann Rep (Inorganic, Organic, Physical)
JA	JAAS
JC	JCR
JM	J. Materials Chemistry
MC	Mendeleev
NJ	New Journal of Chemistry
NP	Natural Product Reports
P1	Perkin Transactions 1
P2	Perkin Transactions 2
PO	Pesticide Outlook
RC	RCR
QU	Phys. Chem. Comm.

Lists

Lists can be entered as a <list>, containing an optional <head> and any number of <item> elements. The type attribute can be used to indicate the type of list. It should take one of the following values:

ordered
bulleted
simple

Note that, since <list> can occur within <item>, it is possible to declare lists nested to any depth.

General

If there are internal references that are in effect impossible, just put the text in and leave out the reference. It would helpful to advise us in case an amendment to the DTD may be wise, but usually these are one-offs. One case recently had a number of equations in a single ugraphic, itself called scheme 1. In this case it was not possible to add eqnrefs to the scheme.

Appendix A. Alphabetical list of element types

Element definitions

This section contains a definition of every element type in the RSC DTD, including element types which are not required for the data capture work. These additional element types are included for editorial use within RSC, or to support future processing of the encoded articles. They are indicated thus:

RSC internal use only

a. 'anchor': a wrapper round a resource (an image, scheme, table, etc.). An anchor specifies a non-printable external entity which can augment the resource. Where appropriate, it should be represented as a clickable link to navigate to the external entity. Can contain zero or more:

elements representing graphics
equation
box
table-entry
table

src: an entity reference which defines the external entity

in HTML output, a is represented as an href attribute on the <a> element which is already wrapped around a graphic resource

above. The top half of a stack. Contains 'characters only'.

rendered as superscript, before below

abstract. An abstract of the article. Contains 'text or paragraphs'.

rule above [and below], with the abstract itself output as a sequence of left-aligned bold paragraphs

ack. Acknowledgements for the article. Contains 'text or paragraphs'.

title: an optional non-standard title for the acknowledgements section.

preceded by a rule. Title is set as an a-heading
if title is not specified, the heading 'Acknowledgements' is output

address. A complete postal address. Can be represented by a link, or by a sequence of address subelements:

city
postcode
state
country
addrelt

each separated by spacing but no punctuation.

id: a unique identifier for this address element
type: the type of address

address within aff is output in italic other addresses are not currently styled
each top-level subelement within aff/address is followed by a comma, except for <postcode>s followed by a <country> element which is the last subelement of the address

addrelt. An element within a postal address. Used only when no more specific element type (e.g. city) is appropriate. Can contain 'simple text'.

id: a unique identifier for this element

see address

admin-event. A single event relating to the administration of an article, e.g. its receipt, acceptance, or rejection. Provided in versions 3.4 onwards of the DTD as a place-holder for RSC management information. Has a mixed content model, which allows the following subelements within text:

agent
address
date
admin-event (for complex administrative events)

type: the type of administrative event

currently suppressed

aff. An author's affiliation. Contains one or more pairs of:

org (optional)
address (mandatory)

followed by any of the following which apply:

phone
fax
email
url

id: a unique identifier for this affiliation element. See above for guidance on assigning id's.

affiliations are rendered as a 'small heading'
affiliation codes ('a', 'b', etc.) are auto-generated from the last letter of the aff element's id attribute ('affa', 'affb', etc.). They are rendered as italic superscript, and applied both as a prefix to the affiliation itself, and as a cross-referencing hyperlink from the relevant author(s)

affref. A reference to an author's affiliation. In practice this element is not used, since authors' affiliations are indicated by the aff attribute on author.

idrefs: a space-separated list of <aff> identifiers
presence: 'missing' or 'notmissing' (the default value)

no support yet provided

agent. A person playing a role within an admin-event. Contains one person element.

role: the role played by the person in this administrative event

suppressed, as part of <admin-event>

appendix. An appendix to an article. Contains an optional no and one or more sections.

id: a unique identifier for this appendix element. See above for guidance on assigning id's.

each appendix is preceded by an a-heading "Appendix N", where N is either the value of its <no> subelement or the element's actual sequence number

appmat. A container for appendix matter. See above for general guidance.

Contains one or more appendix elements.

currently placed after <art-back> (i.e. out of sequence) is this the best thing to do with appendices?

art-admin. A container for administrative information relating to an article. Contains, in the order specified:

ms-id (required)
doi (optional)
pii (optional)
sici (optional)
office (optional)
received (optional and repeatable)
date (optional and repeatable)
admin-event (optional and repeatable)

the <art-admin> element is set as an inline italic sequence

art-back. A container for an article's back matter. Contains, in the order specified:

ack (optional)
biblist (required)
compoundgrp (required)
section (optional and repeatable)

no special formatting is associated with the <art-back> element type

art-body. A container for an article's body matter. See above for general guidance.

Contains one or more sections, or one or more news-sections.

no special formatting is associated with the <art-body> element type

art-front. A container for an article's front matter. See above for general guidance on analysing front matter.

Contains a link, or the following elements in the order specified:

titlegrp (required)
authgrp (optional)
conference (optional)
art-toc-entry (optional)
arttoc (optional)
dedicate (optional)
biography (optional and repeatable)
abstract (optional and repeatable)
subject (optional and repeatable)
keyword (optional and repeatable)

the <art-front> element as a whole is suppressed, but its <titlegrp> subelement is treated specially. See its documentation for details
then <authgrp>, <biography> and <abstract> are output, in that order

art-links. A container for links from an article to other resources. Contains any number of suppinf and/or fulltext elements.

no special formatting is associated with the <art-links> element type

art-toc-entry. Container for resources to use when creating the article's entry in the table of contents for a journal issue. Contains, in the following order:

ictext
icgraphic

currently suppressed from the article itself

article. An article. Contains a link element, or the following elements in the order specified:

art-admin (optional)
published (optional and repeatable)
art-links (optional)
art-front (optional)
art-body (optional)
appmat (optional)
art-back (optional)

dtd: a FIXED attribute which specifies which version of the DTD was in use when this XML document was created. There is no need to enter a value for this attribute (and any value other than 'RSCPAx.y' for version x.y of the DTD will render the whole article invalid)
price-code: takes the value 'free', 'premium' or 'review'. If not specified, 'free' is assumed
type: the class of article, e.g. 'feature', 'communication'. The article type should be taken from the list of codes given above, e.g. "ART" for a Paper
background: a reference to an external entity to be used as a background image for the article

the subelements of <article> are output in this order:

<art-admin>
<art-front>
<art-body>
<art-back>
<appmat>

when outputting to HTML, the type attribute is extracted, converted to its expanded form as listed above, and inserted within the article header before the article title
when typesetting, a simple combined graphic with article type included is inserted
no support for background images has been provided yet

articleref. RSC internal use only

A pointer to an article (within an issue), used when generating index entries. Contains a link.

suppressed as part of <index>

arttitle. An article title within a citation or journalcit. Contains 'simple text or paragraphs'.

no special formatting is associated with the <arttitle> element type

arttoc. An article's table of contents. Entering an empty <arttoc> element is an instruction to generate an article table of contents from the section and subsection headings (levels a to d, i.e. <section> to <subsect3>) found in the article. In the HTML output, hyperlinks from the ToC to each section are generated. These are based on the section's id if specified, otherwise on a unique system-generated code (which is liable to change each time the document is edited).

Can, if desired, contain toc-head (optional) and toc-entry (optional and repeatable).

the <arttoc> element is replaced by a table containing auto-generated section numbers in the left column (or the section's no element, if specified), and section titles in the right column
Is it logical that sections with <no> elements get numbered in the article, while those without don't, even though both get numbered in the ToC?
no support is yet offered for specifically entered <toc-head> and <toc-entry> elements
requirements: in ASU a) table titles appear in contents b) References section gets picked up

authgrp. A container for details of authors and their affiliations. Contains one or more author elements, followed by one or more affs.

punctuation between multiple authors' names is added by the style sheet
links between authors and their affiliations are added by the style sheet
an indication of the 'corresponding' author is added by the style sheet
details of when/where received, when accepted, and when/how published are only output if an <authgrp> element is present
within news-item and book-review, <authgrp> is output after all other subelements. Authors and affiliations are output on separate lines, in italic, and right-justified.

author. One author of an article. Repeat for each distinct author. Contains a person, followed by an optional footnote.

aff: one or more idref's (separated by spaces), specifying which aff elements apply to this author
key: a unique key for this author [not yet used]
role: can take the value 'princ' (principal author) or 'corres' (corresponding author)

below. the bottom half of a stack. Contains 'characters only'.

rendered as subscript SUB in HTML output, after above

bi. Indicates that the contained text should be rendered as bold italic. This is preferable to using separate <bo> and <it> elements. Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.

rendered as bold and italic B and I in HTML output

biblist. A container for the bibliography at the end of an article. Contains a mixture of text and citgroups.

title: a non-standard title for the bibliography. Can include a section number, if one is required.

if the title attribute is specified, it is output as the heading for this section. Otherwise, the heading 'References' is output both H3 in HTML output.

biblscope. The scope of a citation within the work cited. Can include references to sections, chapters, page ranges, etc. Contains 'simple text'.

no special formatting is associated with the <biblscope> element type

biography. A person's biography. Contains a link, or one or more sections.

id: a unique identifier for this <biography> element

<biography> is suppressed where it appears, but is output as a full-width one-row table TABLE in HTML output, followed by a rule HR in HTML output, after the article's front matter (so long as an art-front element is present). RK: Biography might be better as a two-cell table, with any plate as the left hand cell

bo. Indicates that the contained text should be rendered as bold. Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type (specifically compoundref, which is the most common reason for bold-face within article text).

rendered as bold B in HTML output

board. RSC internal use only

a journal or issue's [Editorial] Board. Contains a link, or an optional title followed by zero or more groups and/or members.

id: a unique identifier for this element

no special formatting is associated with the <board> element type

book-review. A book review, consisting of the citation of the book being reviewed, reviewer's details, and the review itself. Contains a citation, followed by an optional authgrp for the reviewer's details (i.e. the 'author' of the review), followed by one or more paragraphs ( p) and/or 'inter-paragraph elements'.

within <book-review>, <authgrp> is output at the end, right-justified (see authgrp for details)
multiple <book-review> elements are separated by a line-break

box. a floating text box. Contains a single section.

id: a mandatory unique identifier for this element
height: the height of the box, expressed as ...
width: the width of the box, expressed as ...
tint: the tint of the box, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: can optionally take the value 'fixed' to indicate that the <box> cannot float

<box> elements are set as a centred 80%-width table with a border (not currently visible!)

boxref. A reference to a floating text box. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the box(es) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

no special formatting is associated with the <boxref> element type

byline. RSC internal use only

a journal's byline. Contains 'simple text'.

type: the type of byline

no special formatting is associated with the <byline> element type

chart. A chart. Contains an optional title. See above for general guidance on encoding graphics.

id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
src: the entity which contains the graphic (see notes above on external entities)
height: the height of the graphic, expressed as ...
width: the width of the graphic, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks

the chart is output within a centred half-width table TABLE in HTML output as an image IMG in HTML output
if the chart has a <title>, this is output in a separate row below the image; otherwise the heading 'Chart N' is generated, where N is the chart number as indicated in its id attribute Neil's code has the auto-generated heading centred, and the 'real' heading left-aligned. Is this intended?
a text break instruction BR clear="all" in HTML output is output before and after the chart

chartref. A cross-reference to a chart. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the chart(s) to which cross-reference is being made

a link is made to the first idref specified in the idrefs attribute A in HTML output

citation. Container for an individual citation that doesn't fit the model for a standard journal citation ( journalcit). Should only be used if <journalcit> cannot. See above for general guidance on encoding citations.

Contains mixed content, which can include the following element types as required:

citauth
title
year
volumeno
issueno
arttitle
biblscope
editor
citpub
pubplace
link
url
email
trans
'emphasis' elements

id: a unique identifier for this element this attribute shouldn't be used, since it isn't intended to be pointed to now. It will be removed in the next version of the DTD
type: the type of citation

no special formatting is associated with the <citation> element type

citauth. An author within a citation or journalcit element. Contains a link, or an optional fname followed by a mandatory surname.

no special formatting is associated with the <citauth> element type

citext. Citation text. Used only when it is not possible to encode material found within a citations list using journalcit or citation. (This should only apply when the text isn't actually a citation at all.) Contains 'simple text'.

no special formatting is associated with the <citext> element type

citgroup. A group of citations with a single reference number. (Most <citgroup>s will only contain a single journalcit or citation element.) See above for general guidance on encoding citations.

Contains an optional no element for a non-standard citation number, followed by one or more of the following, in any order:

citext
journalcit
citation
citgroup

* commentary may also appear after the various elements above

id: a mandatory unique identifier for this element. See above for guidance on assigning id's.

the citation is enclosed in an anchor group A in HTML output, with a NAME attribute equal to its id attribute
the content of <citgroup> is preceded by a displayed citation number, which is derived from the <citgroup>'s position in the citation list

citpub. The publisher of a citation. Contains 'simple text'.

To be added by data capture agency.

no special formatting is associated with the <citpub> element type

citref. A reference to a citation. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the citation(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'
position: can take the value 'super' or 'baseline'

a link is made to the first idref specified in the idrefs attribute A in HTML output
in HTML output the link has a TITLE attribute, generated from the text of the citation
unless the attribute position="baseline" is specified, the citation will be displayed as small-type superscript SMALL, SUP in HTML output

city. The name of a city. Must consist of character data only.

within <received>, the city name is preceded by '(in ' and followed by ')'
otherwise, no special treatment is applied to <city> elements

coden. RSC internal use only

A CODEN identifier for a journal. Contains character data only.

no special treatment is applied to <coden> elements

colspec. A specification of the characteristics of a column in a table. Empty element: has no data content.

colnum: the column's number
colname: the column's name
colwidth: the column's width, as a relative fraction of 1.00 (= average column width given equal spacing)
colsep: the column's column separator
rowsep: the column's row separator
align: the alignment of the column's content
char: the character to be used for alignment within the column
charoff: the offset for character alignment within the column

information in <colspec> is used to determine cell spanning
information in <colspec> is used to determine text alignment

commentary. A description of the value of a citgroup. Contains 'simple text'

rating used to denote the rank of the citation: 0 (default), 1 or 2

the rating is used to generate superior filled stars before the text of the commentary. 1 for a single superscript filled star, 2 for two stars
the commentary should be rendered as italic

compname. The name of a chemical compound. Contains a link or 'simple text'.

no special formatting is associated with the <compname> element type

compound. Specifies the id of a chemical compound. Optionally contains one or more compoundref elements, each linking to a definition of that compound

id: a mandatory unique identifier for this element. See above for guidance on assigning id's.

no special formatting is associated with the <compound> element type

compoundgrp. A container for zero or more compound elements. A <compoundgrp> is required at the end of each article so that compoundref elements have a target to point to. (At present no use is made of these links when rendering articles.)

no special formatting is associated with the <compoundgrp> element type It should be explicitly suppressed, 'just in case'

compoundref. A reference to a chemical compound. Contains 'emphasised text' specifying the compound's code. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the compound(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

rendered as bold text B in HTML output

conference. Information about a conference or similar meeting. Contains an optional sequence number ( no), followed by zero or more of the following, in any order:

confname
daterange
location
contact

id: a unique identifier for this element

no special formatting is associated with the <conference> element type To be done ,,,

confgrp. A container for zero or more conference elements.

id: a unique identifier for this element

no special formatting is associated with the <confgrp> element type To be done ,,,

confname. A conference's name or title. Contains 'simple text'.

no special formatting is associated with the <confname> element type To be done ,,,

contact. A contact, e.g. for a conference. Contains zero or more of the following, in any order:

person
address
phone
fax
email
url

id: a unique identifier for this element

no special formatting is associated with the <contact> element type To be done ,,,

country. A country name. Must consist of character data only.

there is a comma after <postcode> unless it is immediately followed by <country>, in which case there is no punctuation
otherwise, no special formatting is associated with the <country> element type

cpyrt. RSC internal use only

A copyright statement. Contains 'simple text'.

output at the end of the article, after any footnotes, in a full-width table TABLE in HTML output
preceded by a rule HR in HTML output
followed by a space and the publication year, if specified

date. A general year-month-day date. Contains a year, followed by an optional month and an optional day.

role: the role played by this date (e.g. 'accepted' or 'revised')

dates are either output as year-only (e.g. within the generated copyright statement), or formatted into an 'RSC date' (e.g. '21st November 2000')
<date> within art-admin with role='accepted' is output after <received>, with a prefix ', Accepted' role='revised' isn't supported at present

daterange. A range of two dates.

no special formatting is associated with the <daterange> element type There should be a '-' between the two dates.

day. A numerical day: 1/2/3/.../31. Should not contain anything apart from the day number itself.

when formatted as part of an 'RSC date', a suffix is added to the day (e.g. '21st')

dd. A definition description, part of a deflist. Contains 'text or paragraphs'.

no special formatting is associated with the <dd> element type

dedicate. A dedication. Contains 'text or paragraphs'.

no special formatting is associated with the <dedicate> element type

def. The definition of a term, part of a deflist. Contains the term itself, followed by its definition in a dd.

no special formatting is associated with the <def> element type

deflist. A definition list, containing an optional head, and one or more definitions def).

no special formatting is associated with the <deflist> element type

denom. The denominator of a fraction. Contains 'simple text'.

rendered as small subscript SMALL, SUB in HTML output

doi. A Digital Object Identifier. Contains character data only.

as part of art-admin, this element is currently ignored. A DOI is instead constructed from the article's manuscript number, with the correct RSC DOI prefix. Need to mention link type='DOI' - somewhere!

editnote. An editorial note. Use this element type for any comments generated by the editing process - these do not form part of the article. Contains the following, in this order:

the note itself
who made the note
the date the note was made

type: the type of editorial note. The values this attribute can take may be controlled in future, but it can be used freely at present.

no special formatting is associated with the <editnote> element type Is this simply because we haven't yet supported it? It should either be suppressed or picked out in some way.

editor. The editor of an article or book. Contains 'simple text'.

id: a unique identifier for this element

no special formatting is associated with the <editor> element type

email. An e-mail address. Contains character data only. Only enter the actual address: the prefix E-mail: will be generated by style sheets.

the address is enclosed in an anchor A with href='mailto:' plus the address in HTML output
the content of the anchor consists of the address prefixed by 'E-mail: '

entry. An entry (cell) in a table. See above for general guidance on encoding tables.

Contains mixed content which can include text elements, graphics, and equations.

colname: the name of the column in which this cell appears
namest: the name of the start column for this cell
nameend: the name of the end column for this cell
morerows: the number of rows occupied by this cell
colsep: the column's column separator
rowsep: the column's row separator
align: the alignment of the column's content
char: the character to be used for alignment within the column
charoff: the offset for character alignment within the column
valign: the vertical alignment of the column's content
indent: the indentation of this cell (an RSC-specific attribute)

<entry> is formatted as a table cell TD in HTML output
the column name and morerows attribute are used to generate suitable COLSPAN and ROWSPAN settings for the cell
align and valign are used to generate suitable ALIGN and VALIGN settings

eqnref. A reference to an equation.

Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the equation(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

a link is made to the first idref specified in the idrefs attribute A in HTML output

eqntext. An equation expressed in textual form. See above for general guidance on encoding equations.

Contains 'simple text or paragraphs'. Use ps to lay out multi-line equations.

display: can take the value 'displayed' or 'inline'. Use this attribute to indicate whether the equation should be set as a separate block, or rendered inline.

<eqntext>s occurring outside an equation are set in a centred full-width table TABLE in HTML output, with two breaks above and one below
no special action is taken for <eqntext>s within an equation
No support is yet provided for the display attribute

equation. An equation. See above for general guidance on encoding equations.

Contains an optional no, followed by a textual equation ( eqntext) or a graphic displaying the equation ( ugraphic).

id: a mandatory unique identifier for this element. See above for guidance on assigning id's.

<equation>s are set in a centred full-width table TABLE in HTML output, with two breaks above and one below
the equation itself is set in a table cell TD in HTML output, in which there is an anchor whose name is the id of the equation A in HTML output
the equation no, if specified, is set in a cell to the right of the equation. If it is not specified, an equation identifier is generated based on the <equation>'s id attribute. In both cases, the identifier is surrounded by parentheses and the whole entry is bold B in HTML output
When there is a <no>, it isn't being output as bold at present

fax. A fax number. Can only contain character data.

no special formatting is associated with the <fax> element type

figref. A cross-reference to a figure. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the figure(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

a link is made to the first idref specified in the idrefs attribute A in HTML output

figure. A figure. Contains an optional title. See above for general guidance on encoding graphics.

the figure is output within a centred half-width table TABLE in HTML output as an image IMG in HTML output
if the figure has a <title>, this is output in a separate row TR in HTML output below the image
the heading 'Fig. N' is generated, where N is the figure number as indicated in its id attribute
a text break instruction BR clear="all" in HTML output is output before and after the figure

fname. A person's first name. Contains 'simple text'.

spacing within the <fname> element is preserved
otherwise, no special treatment for <fname> elements

fnoteref. A reference to a footnote (at the end of the article, or in the footer of a table). Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on footnotes, and on creating cross-references.

idrefs: one or more space-separated idref's, specifying the figure(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

footnote references are rendered as small superscript anchors SMALL, SUP, A in HTML output
the target for the anchor HREF attribute in HTML output is the value of the id attribute
for <fnoteref>s within tables, the text of the anchor is a system-generated letter based on the id attribute
for <fnoteref>s outside tables, the text of the anchor is a symbol, allocated in the sequence: asterisk; dagger; double dagger; section sign; paragraph sign; double vertical line
<fnoteref>s within tables are output in red

footnote. A footnote in the article, or in a table footer. See above for general guidance on encoding footnotes.

Footnotes in the article are placed at the point where the footnote reference is to appear in the rendered result. This means that fnoteref is only required for such footnotes if the same footnote is referenced more than once. In contrast, table footnotes are placed within the tfoot, and are referenced by a separate fnoteref.

Contains text or paragraphs.

id: a mandatory unique identifier for this element. See above for guidance on assigning id's.

footnotes outside tables are removed to a separate section with title 'Footnotes' H3 in HTML output, and replaced by system-generated footnote references (see fnoteref for details of how these are rendered)
footnotes within tables are rendered in red, with a smaller font size
all footnotes are rendered as anchors A in HTML output
the target for the anchor HREF attribute in HTML output is the value of the id attribute

fpage. The number of the first page within an issue on which the printed version of an article appears. Can only contain character data.

print pagination within <pubfront> is suppressed from the rendered version
otherwise, no special treatment for <fpage> elements

fraction. A fraction. Contains a numerator ( numer), followed by a denominator ( denom).

shape: takes values 'case' (an "above and below" fraction) or 'sol' (a "solidus" fraction)

no special formatting is associated with the <fraction> element type
Shouldn't we make some attempt to deal with the 'case' case? What can be done in HTML?

fulltext. A link to the full text of an article (e.g. in PDF). Probably not required - do not use without checking with RSC.

Contains a link element.

no special formatting is associated with the <fulltext> element type

group. RSC internal use only

A group of people with similar roles within an Editorial Board. Contains an optional title, followed by zero or more members.

no special formatting is associated with the <group> element type

head. A heading (e.g. for a list, index, or definition list). Contains paragraphs or text.

no special formatting is associated with the <head> element type
This could be replaced by further use of the existing <title> element type. Either that, or it should be supported in the style sheet!

icgraphic. A graphic to be included in an illustrated contents list entry. Empty element: has no contents. See above for general guidance on encoding graphics.

no special formatting is associated with the <icgraphic> element type
<icgraphic> and <ictext> should be suppressed.

ictext. Text describing the article, to be included in an illustrated contents list entry. Contains paragraphs or text.

no special formatting is associated with the <ictext> element type
Shouldn't icgraphic and ictext be suppressed, at least?

index. RSC internal use only

An [author] index. Contains an optional head, followed by zero or more index-entrys.

no special formatting is associated with the <index> element type

index-entry. RSC internal use only

An entry in an [author] index. Contains a value, followed by one or more articlerefs.

no special formatting is associated with the <index-entry> element type

inf. Inferior (subscript) text. Indicates that the contained text should be rendered as subscript. Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.

rendered as small and subscript SMALL, SUB in HTML output

info. Information, e.g. about a journal. Contains a link, or one or more sections.

type: can take the values 'author' (the default), 'illustration' or 'distribution'
level: can take the values 'full' (the default), 'brief' or 'paragraph'

no special formatting is associated with the <info> element type

issn. RSC internal use only

The International Standard Serial Number for a journal. Contains character data only.

type: the type of ISSN

no special formatting is associated with the <issn> element type

issue. RSC internal use only

One issue of a journal. Contains a link, or the following elements in this order:

journalref (optional)
volumeref (optional)
issueno
issueid (optional)
issue-front (optional)
article (optional and repeatable)
issue-back (optional)

id: a mandatory unique identifier for this element
dtd: a FIXED attribute which specifies which version of the DTD was in use when this XML document was created. There is no need to enter a value for this attribute (and any value other than 'RSCPAx.y' for version x.y of the DTD will render the whole article invalid)
type: the type of issue

suppressed within pubfront
otherwise, no special formatting is associated with the <issue> element type

issue-back. RSC internal use only

The back matter for an issue. Contains any number of any of the following, in any order:

board
issue-toc
index
advert
info
confgrp

no special formatting is associated with the <issue-back> element type

issue-front. RSC internal use only

The front matter for an issue. Contains any number of any of the following, in any order:

board
issue-toc
index
advert
info
confgrp

no special formatting is associated with the <issue-front> element type

issue-toc. RSC internal use only

The table of contents for an issue. Contains an optional toc-head, followed by zero or more toc-entry elements.

no special formatting is associated with the <issue-toc> element type

issueid. RSC internal use only

An identifier (other than the issue number) for an issue of a journal. Can only contain character data.

no special formatting is associated with the <issueid> element type

issueno. The issue number within a volume. Can only contain character data. When used within the issue element, this should be a 3-digit number with leading zeroes. Still true?

To be added by data capture agency Still true?

no special formatting is associated with the <issueno> element type

issueref. A reference to [a document describing] one issue of a journal. See above for general guidance on creating cross-references.

Contains a link, or these elements in the following order:

journalref (optional)
volumeref (optional)
issueno
issueid (optional)
issue-front (optional)
article (optional and repeatable)
issue-back (optional)

id: a unique identifier for this element

links within <issueref> are suppressed
otherwise, no special formatting is associated with the <issueref> element type

it. Indicates that the contained text should be rendered as italic. Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.

rendered as italic I in HTML output

item. An item within a list. See above for general guidance on encoding lists.

Contains paragraphs or 'simple text'.

rendered as a list item LI in HTML output

jnltrans. A translation of a simple journal citation ( journalcit). Also used for Chem. Abstracts references, with the abstract number in <fpage>.

Contains the following, in the order specified:

sertitle (optional)
year (optional)
volumeno (optional)
pages (optional and repeatable)

no special formatting is associated with the <jnltrans> element type

journal. RSC internal use only

A description of an RSC journal. Contains a link, or these elements in the order specified:

title (one or more)
sercode
byline (optional and repeatable)
logo (optional and repeatable)
publisher
issn (one or more)
coden (optional)
board (optional and repeatable)
info (optional and repeatable)
advert (optional and repeatable)
cpyrt
volume (optional and repeatable)

id: a unique identifier for this element

no special formatting is associated with the <journal> element type

journalcit. A citation which follows the standard model for simple citations of journal articles. Use citation for more complex cases, and for citations to anything other than journal articles. Use citext only for text within the References section which is not a citation at all. See above for general guidance on encoding citations.

Contains these elements in the order specified:

citauth (one or more)
title
year
volumeno (optional)
issueno (optional)
pages
jnltrans (optional)
link (optional and repeatable)

the citation is output within an anchor A in HTML output, with NAME attribute equal to its id attribute This is not required.
a semicolon is output after all <journalcit>s except the last within its containing <citgroup>. This last <journalcit> is followed by a full stop

journalref. A reference to a document describing a journal. See above for general guidance on creating cross-references, and for a list of RSC journal codes.

It contains a link element, which should have the appropriate journal code as its value. These codes are listed below.

Contains a link, or these elements in the order specified:

title (one or more)
sercode
byline (optional and repeatable)
logo (optional and repeatable)
publisher
issn (one or more)
coden (optional)
board (optional and repeatable)
info (optional and repeatable)
advert (optional and repeatable)
cpyrt
volume (optional and repeatable)

id: a unique identifier for this element

the <journalref> within published is used to provide the journal title which appears at the head of the article

keyword. A keyword describing an article's content. Contains 'simple text'.

<keyword>s are suppressed from the rendered article

link. A link to [part of] another document. Contains simple text.

Although the attributes within <link> provide a powerful means of expressing links, they are not yet being used. Instead, the data content within <link> is used to specify the target document. This content will be a unique identifier for the document, e.g. a journal code or an article's manuscript number.

type: the type of link, e.g. 'DOI' for DOI cross-references
doc: an entity reference defining the document to which the link is being made
from: the [start of the] target within the linked document, expressed as an XPath expression
to: (for ranges only), the end of the target within the linked document, expressed as an XPath expression

in general, <link>s are suppressed from the rendered article. Instead, such <link>s as are required for rendering (e.g. the link to a document describing an article's journal) are resolved by a pre-rendering edit which replaces the link by the actual document to which it points

list. A list. See above for general guidance on encoding lists.

Contains an optional head, followed by one or more items.

type: the type of list, which should take one of the following values:

ordered
bulleted
simple

<list> is rendered as an unordered (bulleted) list UL in HTML output
Should be extended to cope with all the allowed list types

location. A location (i.e. an address). Contains one or more of the following, in any order:

city
postcode
state
country
addrelt

no special formatting is associated with the <location> element type

logo. RSC internal use only

A logo. Contains a ugraphic specifying the image to be used.

type: the type of logo

no special formatting is currently associated with the <logo> element type. Instead, the sercode is used to construct the logo's file name

lpage. The number of a printed article's last page. Contains character data only.

<lpage> is suppressed from the rendered article

member. RSC internal use only

A member of a <group>. Contains an optional role, followed by zero or more persons.

no special formatting is associated with the <member> element type

month. A month. Contains character data only. Months should be specified in full, e.g. "January". Since the style sheet can convert numeric months to their full form, should we be allowing, or even asking for, numeric months?

if a numeric month is entered, it is converted to its full form, e.g. '3' becomes 'March'

ms-id. The RSC's unique identifier for an article. Contains character data only.

Conventions for formatting article identifiers are given above. To be added by data capture agency

output at the end of the article
also used to construct the article's DOI
the presence of <ms-id> triggers the generation of the "Received" statement

nameelt. A component of an organisation's name. Contains 'simple text'.

type: the type of name element

', ' is output after all <nameelt>s except the last in a sequence

news-article. A full article (with title and author details, and back matter such as a list of citations) found within a news section. Contains these elements, in the order specified:

art-front
art-body
appmat
art-back

id: a unique identifier for this element. See above for guidance on assigning id's.
type: the type of news article

no special formatting is associated with the <news-article> element type

news-item. A relatively simple news item. For more complex material, use news-article instead. Contains these elements, in the order specified:

title (optional)
authgrp (optional)
abstract (optional)
p or paragraph-level elements (optional and repeatable)
footer (optional)

id: a unique identifier for this element. See above for guidance on assigning id's.

within <news-item>, <authgrp> is output at the end, right-justified (see authgrp for details)
multiple <news-item> elements are separated by a line-break

news-section. A container for one or more news articles or (more usually) news items, plus other formats such as advertisements and conference listings. Can contain nested <news-section>s to support e.g. a two-level structure of news sections.

Contains an optional title, followed by zero or more of the following, in any order:

news-section
news-article
news-item
book-review
advert
info
confgrp
p
paragraph-level elements

id: a unique identifier for this element. See above for guidance on assigning id's.

no special formatting is associated with the <news-section> element type

no. A number or other identifier (for a table, figure, etc.). Contains character data only. See above for general guidance on numbering strategy.

section <no>s are suppressed from normal output
instead, the <no> element, if present, is picked up and incorporated into the section title
a similar strategy is applied to equation <no>s, which are enclosed in parentheses and output in bold B in HTML output

no-of-pages. The number of pages in the printed version of an article. Contains character data only.

print pagination is suppressed from the rendered version

note. A note. Contains text or paragraphs.

no special formatting is associated with the <note> element type Should be, e.g., italic and surrounded by '[..]'.

numer. The numerator of a fraction. Contains 'simple text'.

rendered as small superscript SMALL, SUP in HTML output

office. The RSC office responsible for managing an article. Contains character data only.

like all art-admin subelements, this is suppressed

org. An organisation's name and address. Contains a link, or one or more orgnames followed by zero or more addresses.

id: a unique identifier for this element

within aff, the level-1 subelements of <org> are followed by ', '
otherwise, no special formatting is associated with the <org> element type
Multiple <address> elements should be separated by ' and '.

orgname. An organization's name. Contains one or more nameelts.

within aff, each <orgname> is followed by ', '
otherwise, no special formatting is associated with the <orgname> element type

overbar. An overbar. Indicates that a bar should be placed above all the text within this element. Contains 'simple text'.

no special formatting is associated with the <overbar> element type. (Specifically, no means has been found to implement this feature within HTML output. Could try using a CSS text decoration instruction.)

p. A paragraph. Contains mixed content (i.e. text and subelements intermixed), including any of these elements, at any point and in any order:

roman
it
bo
bi
scp
sansserif
ul
sup
inf
list
footnote
note
overbar
underbar
stack
fraction
warning
unknown
email
url
ugraphic
eqntext
figure
scheme
plate
chart
equation
compname
compoundref
textref
figref
schemref
plateref
chartref
eqnref
boxref
tableref
citref
fnoteref
affref

by default, <p> is rendered as a paragraph P in HTML output
within sections at any level, the first paragraph is rendered closed up to the preceding title (with no indentation), and is followed by a line break BR clear="all" in HTML output
within sections at any level, subsequent paragraphs are indented by an em space check, and followed by a line break BR clear="all" in HTML output

pages. The range of pages covered by a citation. Contains a fpage, optionally followed by a lpage.

no special formatting is associated with the <pages> element type.

persname. A person's name. Contains the following, in the order specified:

qualifier (optional)
fname (optional)
surname
qualifier (optional)

id: a unique identifier for this element

no special formatting is associated with the <persname> element type.

person. Details about a person. Contains a link, or the following elements in the order specified:

persname (required; repeatable)
biography (optional)
address (optional and repeatable)

id: a unique identifier for this <person> element

<person> within author is rendered as bold B in HTML output
otherwise, no special formatting is associated with the <person> element type.

phone. A telephone number. Contains character data only.

no special formatting is associated with the <phone> element type. Should have a prefix, e.g. 'Tel. '.

pii. A Publisher Item Identifier. Contains character data only.

like all art-admin subelements, this is suppressed

plate. A plate. Contains an optional title. See above for general guidance on encoding graphics.

<plate>s within biography are rendered as a left-aligned table cell TD in HTML output
otherwise, the plate is output within a centred half-width table TABLE in HTML output
the plate itself is rendered as an image IMG in HTML output
if the plate has a <title>, this is output in a separate row TR in HTML output below the image; otherwise the heading 'Plate N' is generated, where N is the plate number as indicated in its id attribute
a text break instruction is output before and after the plate BR clear="all" in HTML output

plateref. A reference to a plate. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

a link is made to the first idref specified in the idrefs attribute A in HTML output

postcode. A postcode. Contains character data only.

the address item before a <postcode> is not followed by a comma
otherwise, no special formatting is associated with the <postcode> element type.

pubfront. Should this be 'RSC internal use only'?

Publication front matter. Contains the following elements in the order specified:

fpage
lpage (optional)
no-of-pages
date

the contents of <pubfront> are all suppressed by default
published elements with type="print", and containing a <pubfront> with year="PENDING", are rendered as the phrase "Publish PENDING" in red. If the <year> is empty, they are rendered as "Publish Pending", also in red
published elements with type="web" are rendered as a bold paragraph B, P in HTML output "Published on the Web ", followed by <pubfront><date>, formatted as described under date
the year within <pubfront>, from the published element with type="print", is used in the copyright statement

published. A link to a document/resource in which an article has been published. Contains a citext, or the following elements in the order specified:

journalref
volumeref (optional)
issueref (optional)
pubfront (optional)

Use the analysed citation subelements to describe print publication, or <citext> to record online publication. Is that right? Web publication uses <pubfront>. RK: not sure that this is right - citext??

type: the type of publication. Should take one of the values: "print", "HTML" or "PDF". Should it be "HTML" or "web"??
doc: can specify a URL where the online publication is located
from: the [start of the] target within the linked document, expressed as an XPath expression
to: (for ranges only), the end of the target within the linked document, expressed as an XPath expression

the contents of <published> are all suppressed by default
< journalref>< title> is selected from <published> with type="print", and used to specify the journal within a cell TD in HTML output in the header table at the start of the article. It is rendered as bold italic B, I in HTML output
as noted under pubfront, <published> with type="print" is used to generate "Publish Pending", "Published on the Web", and copyright statements

publisher. RSC internal use only

The publisher of a journal. Contains "organisation" subelements, i.e. a link, or one or more orgnames followed by zero or more addresses. <aff> now has <address>, and <org> within it also has <address> - overkill?

id: a unique identifier for this element

no special formatting is associated with the <publisher> element type

pubname. A publisher name. Contains 'simple text'. This is no longer linked to anything, so should be removed from the DTD.

pubplace. The place of publication of a book, etc. Contains 'simple text'.

no special formatting is associated with the <pubplace> element type

qualifier. A qualification to a person's name, such as a title, an honorific, or a phrase such as 'the late'. Contains 'simple text'.

no special formatting is associated with the <qualifier> element type

received. A container for details of the date when, and place where, an article was received. Contains an optional city, followed by a date.

placed after an article's authors, as a bold italic paragraph B, I, P in HTML output
"Received ", followed by the city, if present, preceded by " (in " and followed by ") ", then the date

role. RSC internal use only

A role played by one or more people. Contains 'simple text'.

no special formatting is associated with the <role> element type

roman. Indicates that the contained text should be rendered as a roman typeface. Contains 'simple text'.

Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.

no special formatting is associated with the <roman> element type Should be rendered as normal text.

row. A row in a table or table heading. See above for general guidance on encoding tables.

Contains one or more entry elements.

rowsep: whether there is a row separator ("0" means "no"; any other digit value means "yes")
valign: the vertical alignment of the row ("top", "middle" or "bottom")

rendered as a table row TR in HTML output
the valign attribute is used when specified; otherwise the <row>'s parent's valign is used when specified; otherwise vertical alignment is set to "bottom" VALIGN attribute in HTML output Should this be some other value by default?

sansserif. Indicates that the contained text should be rendered in a sans serif typeface. Contains 'simple text'.

Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.

no special formatting is associated with the <sansserif> element type Surely something should be done with this!

scheme. A scheme. Contains an optional title. See above for general guidance on encoding graphics.

the scheme is output within a centred half-width table (TABLE)
the scheme itself is rendered as an image (IMG)
if the scheme has a <title>, this is output in a separate row (TR) below the image
the heading 'Scheme N' is generated, where N is the scheme number as indicated in its id attribute
a text break instruction is output before and after the scheme BR clear="all" in HTML output

schemref. A reference to a scheme. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

a link is made to the first idref specified in the idrefs attribute A in HTML output

scp. Indicates that the contained text should be rendered in small caps. Contains 'simple text'.

Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.

the contents of <scp> elements are converted to upper case and rendered as small type SMALL in HTML output

section. A top-level section. Contains these elements in the order specified:

no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect1 (optional and repeatable)

id: a unique identifier for this element. See above for guidance on assigning id's.
type: the type of section

no special formatting is associated with <section>s within biography
otherwise, <section> is rendered as a separate division DIV in HTML output
an anchor is generated at the start of the section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the section's position within the article A; NAME attribute in HTML output

sercode. RSC internal use only

A serial (journal) code, conforming to the list of codes given above. Contains character data only.

To be added by data capture agency

the value of sercode is used to locate the correct journal details when preparing the article for rendering
<sercode> is suppressed by default
the value of <sercode> is used to specify the pathname for associated image files, and to retrieve the correct journal logo

sertitle. A serial (journal) title. Contains 'simple text' or paragraphs.

type: the type of series title

within citation and journalcit, <sertitle> is rendered as italic I in HTML output
within journalcit ", " is output after all but the last <sertitle>
otherwise, no special formatting is associated with <sertitle>

the DTD now only has <sertitle> within <jnltrans>. Elsewhere it has become <title>. The style sheet needs updating to take account of this (the code described here will never be called upon), and <sertitle> should probably be removed from the DTD and replaced by <title> within <jnltrans>.

sici. A Serial Item Contribution Identifier. Contains character data only.

like all art-admin subelements, this is suppressed

stack. One or more characters appearing directly above other characters (like a fraction without the horizontal line). Contains above followed by below.

below is output as subscript SUB in HTML output, followed by above as superscript SUP in HTML output Are these output in the wrong order?

state. A geopolitical unit such as a state, county, etc. Contains character data only.

no special formatting is associated with <state>

subject. A broad subject heading, ideally taken from a controlled list. Contains 'simple text'.

type: the type of subject category

no special formatting is associated with <subject> This element type should be suppressed

subsect1. A level-1 subsection. Contains these elements in the order specified:

no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect2 (optional and repeatable)

id: a unique identifier for this element. See above for guidance on assigning id's.
type: the type of section

<subsect1> is rendered as a separate division DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output

subsect2. A level-2 subsection. Contains these elements in the order specified:

no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect3 (optional and repeatable)

id: a unique identifier for this element. See above for guidance on assigning id's.
type: the type of section

<subsect2> is rendered as a separate division DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output

subsect3. A level-3 subsection. Contains these elements in the order specified:

no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect4 (optional and repeatable)

id: a unique identifier for this element. See above for guidance on assigning id's.
type: the type of section

<subsect3> is rendered as a separate division DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output

subsect4. A level-4 subsection. Contains these elements in the order specified:

no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect5 (optional and repeatable)

id: a unique identifier for this element. See above for guidance on assigning id's.
type: the type of section

<subsect4> is rendered as a separate division DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output

subsect5. A level-5 subsection. Contains these elements in the order specified:

no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)
subsect6 (optional and repeatable)

id: a unique identifier for this element. See above for guidance on assigning id's.
type: the type of section

<subsect5> is rendered as a separate division DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output

subsect6. A level-6 subsection. Contains these elements in the order specified:

no (optional)
title (optional)
p or paragraph-level elements (optional and repeatable)
deflist (optional and repeatable)

id: a unique identifier for this element. See above for guidance on assigning id's.
type: the type of section

<subsect6> is rendered as a separate division DIV in HTML output
an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output

subtitle. A [table] subtitle. Contains 'simple text' or paragraphs.

no special formatting is associated with <subtitle>

sup. Indicates that the contained text should be rendered in superscript. Contains 'simple text'.

Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type. <sup> is often mistakenly used instead of <citref>.

the contents of <sup> elements are rendered as superscript SUP in HTML output

suppinf. Contains a link to supplementary information for an article.

<suppinf> is suppressed

surname. A surname. Contains 'simple text'.

no special treatment for <surname> elements

table. A table, encoded using CALS-compliant XML markup. See above for general guidance on encoding tables.

(Tables which cannot be thus encoded should be prepared as images, and encoded as ugraphics.)

Contains an optional title, followed by an optional subtitle, followed by one or more tgroups. Note that <title> and <subtitle> within table-entry should be used in preference to these elements, since this allows titles for XML-encoded and 'image' tables to be treated consistently. Although, as the DTD notes, we can't clear %titles;, we could set parameter entity %tbl.tbl-titles.mdl to "" and so remove this possibility.

pgwide: page width ("0" means "no"; any other digit value means "yes")

a rule HR in HTML output is output before each <table>
spacing from the source document is preserved within <table>
<table> is rendered as a full-width table TABLE in HTML output
for print, a pgwide value of '0' signifies a single-column table, and '1' a page-width table

table-entry. 'cover group' for a table, whether declared inline as tableor given as a ugraphic. See above for general guidance on encoding tables.

Contains an optional title, followed by an optional subtitle, followed by either table or ugraphic.

id: a mandatory unique identifier for this element. See above for guidance on assigning id's.

a break BR in HTML output is output before and after each <table-entry>
an anchor is output, with name equal to the id attribute A; NAME attribute in HTML output, followed by "Table " and a system-generated table number, in bold B in HTML output, followed by the contents of <table-entry>

tableref. A reference to a table. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

a link is made to the first idref specified in the idrefs attribute A in HTML output

tbody. A table's body matter (i.e. the main table, ignoring any header or footer). See above for general guidance on encoding tables.

Contains one or more rows.

valign: the vertical alignment of the row ("top", "middle" or "bottom")

no special treatment for <tbody> elements within tgroup
otherwise, <tbody> is rendered as a table body TBODY in HTML output

term. A term being defined in a deflist. Contains 'simple text'.

no special treatment for <term> elements

textref. A cross-reference to text elsewhere in the article. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.

idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
presence: can take the value 'missing' or 'notmissing'

a link is made to the first idref specified in the idrefs attribute A in HTML output

tfoot. The footer area of a table. See above for general guidance on encoding tables.

Contains zero or more colspecs, followed by one or more rows. Shouldn't <tfoot> have some CALS-style attributes?

no special treatment for <tfoot> elements within tgroup, apart from outputting them after <tbody>
otherwise, <tfoot> is rendered as a table footer (TFOOT)

tgroup. A table group. See above for general guidance on encoding tables.

Contains these elements, in the order specified:

colspec (optional and repeatable)
thead (optional)
tfoot (optional)
tbody

cols: the number of columns in the table
colsep: indicates a column separator ("0" means "no"; any other digit value means "yes")
rowsep: indicates a row separator ("0" means "no"; any other digit value means "yes")
align: default cell alignment. Takes one of the values "left", "right", "center", "justify" or "char"

within tgroup, subelements are output in the order: thead, tbody, tfoot without any special formatting. (In other words, the whole table is output as a single block: headers and footers are not treated specially.)

thead. The header area of a table. See above for general guidance on encoding tables.

Contains zero or more colspecs, followed by one or more rows. Shouldn't <thead> have some CALS-style attributes?

no special treatment for <thead> elements within tgroup
otherwise, <thead> is rendered as a table header THEAD in HTML output

title. A title (of a figure, table, journal, etc.). Contains 'simple text' or paragraphs.

type: the type of title

the article title (within titlegrp) is rendered as a level-2 heading H2 in HTML output, with a rule above HR in HTML output
all section titles are prefixed by a preceding no element at the same level, if present
<title> within section is rendered as an a-heading
<title> within subsect1 is rendered as a b-heading
<title> within subsect2 is preceded by a break and an em space, and rendered as bold (i.e. a c-heading)
<title> within subsect3 is preceded by a break and an em space, and rendered as italic (i.e. a d-heading)
<title> within citation is rendered as italic
<title> within journalcit is rendered as italic, and followed by ", " if it is not the last <title>
<title> within figure, plate, scheme and chart is rendered in bold, in a left-aligned table cell TD in HTML output, with a suitable prefix (e.g. "Fig. ")
spacing from the source document is preserved in <title>s within table-entry
p elements within <title> do not generate any markup
also types - type="subtitle" will have to be rendered - also whether or not paragraphs are included should be explicit. NH>type = "addition" Will additions ever be captured externally?
otherwise, no special formatting is associated with <title>

titlegrp. A container for an article's main titles. Contains one or more titles.

no special formatting is associated with <titlegrp>

toc-entry. An entry in a table of contents. Contains 'simple text' or paragraphs.

no special formatting is associated with <toc-entry>

toc-head. Heading for a table of contents. Contains 'simple text'.

no special formatting is associated with <toc-head>

trans. A translation (of a citation)

no special formatting is associated with <trans> Perhaps it should be - e.g. italic? RK to comment, please.

ugraphic. An untitled graphic. Use this element to encode any graphical content which doesn't have a title. See above for general guidance on encoding graphics.

if the graphic does not have display="inline", and does not appear within an equation (or a table - it isn't possible to have tables nested inside tables), it is output within a centred half-width table, and a text break instruction is output before and after the graphic BR clear="all" in HTML output
graphics with display="inline", and graphics within equations, are output with no additional markup
the graphic itself is rendered as an image IMG in HTML output

ul. Indicates that the contained text should be underlined. Contains 'simple text'.

no special formatting is associated with the <ul> element type. We could implement this as a CSS style - text decoration - but this wouldn't be totally cross-platform

underbar. An underbar. Indicates that a bar should be placed below all the text within this element. Contains 'simple text'. In what way is this different from <ul>?

no special formatting is associated with the <underbar> element type. Should aim to implement this as an underline - CSS text decoration again?)

unknown. A feature in the text which cannot be encoded by any other element type in the DTD. Use the type attribute to indicate the nature of the feature. Do we need to generate some warning when this element is used?

Contains 'simple text'.

type: the type of 'unknown' information

rendered in a fixed-width font KBD in HTML output, with a double break above and below BR in HTML output
spacing from the source document is preserved in the rendered result

url. A URL. Contains character data only.

id: a unique identifier for this element Probably not needed: drop from next version of DTD?

rendered as an anchor, with a target equal to the element's data content A; HREF attribute in HTML output

value. RSC internal use only

The value of an index entry. Contains character data only.

no special formatting is associated with the <value> element type.

volume. RSC internal use only

One volume of a journal. Contains a link, or the following elements in this order:

journalref
volumeno
date
issue (optional and repeatable)

id: a mandatory unique identifier for this element

no special formatting is associated with the <volume> element type.

volumeno. A journal volume number. Contains character data only.

When used within the <volume> element, this should be a 3-digit number with leading zeroes Still true?

non-empty <volumeno> elements within journalcit and citation are rendered as bold B in HTML output, and followed by ", " if they are not the last component of the citation
otherwise, no special formatting is associated with the <volumeno> element type.

volumeref. A reference to one volume of a journal. See above for general guidance on creating cross-references.

Contains a link, or the following elements in this order:

journalref (optional)
volumeno
date (optional)
issue (optional and repeatable)

no special formatting is associated with the <volumeref> element type.

warning. A warning. Contains 'simple text'.

no special formatting is associated with the <warning> element type. Should be red text.

who. The identity of the person making an editorial note ( editnote). Contains 'simple text'. Wouldn't it make more sense to have <person> in place of this element type - replace by <person> in next version of DTD.

no special formatting is associated with the <who> element type.

year. A 4-digit year. Contains character data only. The value "PENDING" is allowed for date within pubfront.

non-empty <year> elements within journalcit are followed by ", " if they are not the last component of the citation This also used to apply to <citation>, but no longer does.
published elements with type="print", and containing a <pubfront> with year="PENDING", are rendered as the phrase "Publish PENDING" in red. If the <year> is empty, they are rendered as "Publish Pending", also in red
the <year> within pubfront, from the published element with type="print", is used in the copyright statement
otherwise, no special formatting is associated with the <who> element type.

Appendix B. Notations

Table -notations Notations recognized within the RSC application

Name	PUBLIC identifier where known
bmp	"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Microsoft Windows bitmap//EN"
cgm	"-//USA-DCD//NOTATION Computer Graphics Metafile//EN"
cgm-binary	"ISO 8632/3//NOTATION Binary encoding//EN"
cgm-char	"ISO 8632/2//NOTATION Character encoding//EN"
cgm-clear	"ISO 8632/4//NOTATION Clear text encoding//EN"
eps	"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Adobe Systems Encapulated PostScript//EN"
fax	"-//USA-DOD//NOTATION CCITT Group 4 Facsimile Type 1 Untiled Raster//EN"
gif	"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Compuserve Graphic Interchange Format//EN"
iges	"-//USA-DOD//NOTATION (ASME/ANSI Y14.26M-1987) Initial Graphics Exchange Specification//EN"
jpeg	"ISO/IEC 10918:1993//NOTATION Digital Compression and Coding of Continuous-tone Still Images (JPEG)//EN"
mpeg1aud	"ISO/IEC 11172-3:1993//NOTATION Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 3: Audio//EN"
mpeg1vid	"ISO/IEC 11172-2:1993//NOTATION Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video//EN"
mpeg2aud	"ISO/IEC 13818-3:1995//NOTATION Coding of moving pictures and associated audio: Part 3. Audio//EN"
mpeg2vid	"ISO/IEC 13818-2:1995//NOTATION Information technology - Coding of moving pictures and associated audio: Part 2. Video//EN"
pcx	"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION ZSoft PCX bitmap//EN"
pict	"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Apple Computer Quickdraw Picture//EN"
sgml	"+//ISO 8879:1986//NOTATION Information processing - Text and office systems - Standard Generalized Markup Language (SGML)//EN"
tex	"+//ISBN 0-201-13448-9::Knuth//NOTATION The TeXbook//EN"
tiff	"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Aldus/Microsoft Tagged Interchange File Format//EN"
wmf	"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Microsoft Windows Metafile//EN"
chemdraw
eqn
pdf
ps

Appendix C. Changes to the RSC DTD

This Appendix lists the changes made to the RSC DTD from version 3.4 onwards.

Summary of changes in version 3.4

Version 3.4 of the RSC Article DTD is a maintenance release, which aims to solve problems encountered while encoding articles, and to provide the RSC with the opportunity to add improved management information to articles.

The following changes are relevant to the encoding of actual articles:

footnotes can now be added after an author's name (within <author>, after <person>)
the content models of a number of elements ( <footnote>, <note>, <head>, <toc-entry>, <ictext>, <dedicate>, <abstract>, <dd> and <ack>) have been extended so they can contain textual subelements as well as multiple paragraphs
all 'reference' elements ( <compoundref>, <textref>, <figref>, <schemref>, <plateref>, etc.) can now contain subelements which support font style changes
<persname> now has an optional lt;qualifier> subelement that can appear at the start or end of the name. This allows titles, qualifications, and informal phrases such as 'the late' to be encoded
<ugraphic> now has a src2 attribute to support the addition, specifically, of TeX versions of a graphic. This new attribute should no longer be used.
the RSC-specific entity set has been filled out with declarations for some commonly-required characters
some simple ISO entities have been added to the allowed character entity set
there is a new <arttitle> element for encoding article titles within citations
within the <published> element, <volumeref>, <issueref> and <pubfront> are now optional
the content model for <eqntext> has been changed to allow it to contain multiple paragraphs
the element type url has been added to the class 'general', which allows it to be used anywhere within text
the 'fixed' DTD version has been changed to '3.4'

The following changes are only relevant to RSC's internal management procedures:

a new <admin-event> has been added within <art-admin>. This has a type attribute, and subelements <agent>, <address> and <date>. In addition, it can contain a nested <admin-event>, thus supporting complex multi-level events if required. (In future, this element might be preferred to <date> for encoding 'accepted' details.)
journalref, volumeref and issueref now have the same content model as journal, volume and issue respectively. This allows links to be replaced by the relevant content without invalidating the document
a price-code attribute has been added to <article>
<journal> now has an optional repeatable lt;logo> element, containing a graphic

Summary of changes in version 3.5

The following changes in version 3.5 will affect the encoding of articles:

the 'fixed' DTD version has been changed to '3.5'
<authgrp> within <art-front> is now optional
<org> within <aff> is now [optional and] repeatable; <org> and <address> are repeatable as a pair
<ack> now has an optional title attribute
content model for <trans> has been made the same as that for <citation>. N.B. this change is not upwards-compatible. The previous content model for trans allowed citext. This is replaced by the 'mixed content with %emph;' approach offered by %m.citation
<email> has been added to the %gen; content model class, allowing email addresses to appear wherever this class is allowed (which is pretty well anywhere in textual content)
<url> and <email> have been added to the %m.citation; content model class
<url> now has a url attribute, which can be used to specify the url. If not used, the data content of the <url> element is taken to be the actual url, as before
there is a new <a> element type, designed to support hyperlinks which use an image as the clickable link
there is a new <subject> element type, which can contain a broad subject heading to categorise the article
within the content model for <journalcit>, <link> has been made into an 'optional extra', so that citations can be supported by e.g. a DOI and a COI
<link> now has a type attribute, for e.g. COIs for RSC internal use only and DOIs
[usage convention only:] within <suppinf>, the content of the <link> element should now be 'INFO' or 'CRYSTAL'. 'INFO' corresponds to the single value that was previously allowed ('TRUE')

The following changes are only relevant to RSC's internal management procedures:

<coden> element type added to header information

Summary of changes in version 3.6

This version contains the following changes:

the 'fixed' DTD version has been changed to '3.6'
the parameter entity a.dtd has been altered to RSCPA3.6, and is now actually used!
the new element type <a> is now actually allowed within a document
the new 'generated' set of entity declarations rsc_x.ent is used
element type no now has a content model of 'simple text' instead of just #PCDATA
the common attributes for graphics (%a.graphic;) now have a prefix attribute, which can have values 'prefix' (default) or 'noprefix'
year and pages are now optional within journalcit
additional arttitle element added between citauth and title
citext content model is now %m.simple-text-or-paras; to allow new paragraphs (or at least line breaks)
citation content model (%m.citation;) now includes pages
new element commentary added within citgroup
subsect2 and subsect3 can now have zero or more citref elements immediately after the section title
table cells can now contain paragraphs

Summary of changes in version 3.7

This version contains the following changes:

the 'fixed' DTD version has been changed to '3.7'
classification element type added to art-front after keyword; supported by separate DTD file class.dtd

References

(a) http://www.oasis-open.org/html/a502.htm; (b) http://www.oasis-open.org/html/a503.htm.

PENDING