Royal Society of Chemistry
| Paper |
Guidelines on the capture of RSC journal articles |
Richard Lighta and
Richard Kiddb
aBurgess Hill, West Sussex, UK
bRSC, Cambridge, UK. Last update 2 August 2002
Introduction
Scope of this document
These guidelines are a guide to Version 3.6 of the RSC Article DTD.
Feedback and updates
Please let us know of any problems you encounter in using these instructions while trying to encode articles using the DTD provided. This will help us to improve both the application and its associated documentation.
We plan to issue updates to the DTD and documentation at regular, planned, intervals. You will be notified of these updates in advance, so that you can allocate resources to deal with any changes to data capture instructions or rendering software that might be required.
Format of this document
This document fulfils two functions. As well as containing instructions on the conventions to follow, it acts as an example of the results that are expected, being written to conform to the RSC Primary Articles DTD Version 3.6.
The XML version of this can be browsed using Internet Explorer 5.0 or above at http://www.rsc.org/dtds/desc36.xml. This HTML version was created from the XML using Saxon.
Scope of the data capture work
The objective is to capture all the text within each article which can be encoded in XML (see next section). The DOCTYPE and document element will always be <article>. Within this, the <art-admin> (which holds the article's unique manuscript number), <published> (for articles which have already appeared in print), <art-front>, <art-body> and <art-back> element types will be routinely used, with an occasional <appmat>.
XML encoding
As far as possible, all the information in the articles presented should be encoded in XML and included in the resulting document. Obvious exceptions are figures, which should be referenced as external entities in the standard manner (see Graphics below).
Both tables and equations are liable to be more difficult. If possible, these should be encoded in XML, but we accept that there are liable to be cases where this is not possible due to the complexity of the data or inadequacies in the DTD as currently drafted. In these cases the relevant object should be treated as a graphic. A particular example is where a table contains graphics spanned across rows or columns - this would be impossible to render accurately from the XML. See Tables and Equations below for specific guidelines.
Articles should conform to XML as well as SGML conventions. This means that:
- an XML declaration must be provided at the start of the article
- processing instructions must be terminated by "?>"
- empty elements must be terminated by "/>" (i.e for colspec, ugraphic, icgraphic)
- end-tags should always be provided, except for empty elements
- element and attribute names must be entered in lower case, as per their definitions in the DTD
- attribute values should always be quoted
A variety of tools can (and should) be used to check that articles consist of valid SGML/XML. The nsgmls program will check for SGML conformance. There is a wide variety of free or inexpensive XML-aware software. For example, if you open an XML document in Internet Explorer 5, its built-in XML parser will check the document for validity and report any errors.
File naming conventions
All manuscripts will have a unique identifier, assigned by RSC, e.g. a901234h. As well as being used to name the file containing the encoded article, this identifier will be encoded as the <ms-id> element within the article.
The RSC will name graphics files as follows:
- a901234h-f1.tif (for figure 1)
- a901234h-s2.tif (for scheme 2)
- a901234h-u1.tif (for ugraphic 1)
- a901234h-t1.tif (for ugraphic 1 created by supplier)
Graphic types: (from RSC)
- f1, 2, 3.. figures
- s1, 2, 3.. schemes
- u1, 2, 3.. ugraphics
- t1, 2, 3.. ugraphics created by supplier
- ga graphical abstract
- c1, 2, .. charts
The following filename styles should be supplied to the RSC:
- for ms-id use the form a908765g
- for the XML files (and PDF) use the filenames in the form a908765g.xml, .pdf
- for graphics generated at data capture (maths, table images where required) use the form a908765g-t1.tif, and increment the numbers as t2, t3, t4, etc through the document.
Lower-case should be used.
File Delivery
We require, for each paper:
- An XML file named as *.xml: File width - max width 1000 characters
- Mathtype created for inline/displayed maths, as GIF files at 600 dpi. The XML file will require call-outs to these images. The images should be named as specified above (e.g, a901234h-t1.tif).
Each document and associated files should be delivered as a zip file, named as above (e.g. a901234h.zip)
Form of PUBLIC identifiers
PUBLIC identifiers should be used throughout
In addition, each PUBLIC identifier should be followed by a SYSTEM identifier giving a URL that locates the resource in question. This belt and braces strategy will allow articles to be treated as valid XML (XML requires a SYSTEM identifier), while offering us the flexibility of using SGML-aware software to interpret the PUBLIC identifiers in different ways, as necessary.
Thus the DOCTYPE declaration at the head of each article should always take the form:
<!DOCTYPE article PUBLIC "-//RSC//DTD RSC Primary Article DTD 3.6//EN" "http://www.rsc.org/dtds/rscart36.dtd">
PUBLIC identifiers should be constructed using the general format:
"RSC// [MS number] [object src]"
where the object src is the element type with number:
- ugt (ugt1)
- fig (fig3)
- sch (sch2)
- pl (pl13)
- cht (cht1)
- etc.
e.g.
"RSC// a706828h eqn3"
The names assigned within each article for the external entities it references should reflect the last component of the entity's PUBLIC identifier, e.g.
<!ENTITY eqn3 PUBLIC "RSC// a706828h eqn3" ...
Form of SYSTEM identifiers
The SYSTEM identifiers (i.e. filenames) assigned to each external entity should consist of the article's manuscript number followed by the entity's name, with a suitable suffix, e.g.:
<!ENTITY ugt3 PUBLIC "RSC// a706828h ugt3" "a706828h-t3.tif" NDATA tiff>
Documents relating to the RSC DTD
The DTD itself is in the file rscart36.dtd. A number of other files are required before documents will parse against the DTD. They should all be stored in the same directory as the DTD itself, apart from the entities files (*.ent) which should be stored in a subdirectory named entities. We use Internet Explorer 5 as our (XML) parser. We suggest suppliers use the same parser.
SGML Declaration
An SGML Declaration suitable for use with this DTD is in the file rscxml33.dcl. This Declaration allows an XML-encoded article to be processed by SGML software. It specifies features such as case-sensitivity for element and attribute names, quoting of attribute values, XML-style processing instructions and empty element syntax, and Unicode support.
Catalog file
The catalog file rscart3s.cat is in the standard OASIS catalog file format. It resolves all the PUBLIC identifiers declared in the DTD, as well as the PUBLIC identifier of the DTD itself. This catalog file invokes the SGML version of the DTD, rather than the XML version. It uses the file rscsgm36.dtd to set up the DTD's parameter entities for SGML. If required, an updated rscart3s.cat can be used to override the DTD's online SYSTEM identifier and point instead to a local copy.
Table support
The file calstab1.dtd contains the OASIS-supported DTD fragment which supports the interoperable CALS table model subset. Additions and changes to this model are declared in the body of the DTD itself, not here.
Entity declarations
Two files containing character entities are provided. One of these contains mappings of characters to numeric values that conform to Unicode 2.0 (rsc_x.ent). This is for use with the default XML interpretation of the DTD. It should be noted that we plan to use Unicode Combining Characters to partially solve the problem of 'one character over another'. This means that rendering software will need to support Combining Characters, ideally in a generalized manner.
The other file maps exactly the same characters to SDATA entities, and is for use with the SGML interpretation of the DTD (rsc_s.ent).
If an article contains any characters which are not in the RSC set, the RSC should be alerted to the need to add them to the standard set.
Character mappings file. RSC maintains information about special characters in a character mappings file (charmaps.xml). The entity declarations described above are generated from this file by XSLT style sheets. Characters in this file are categorized into one of the following classes:
- non-ASCII character
- ASCII character
- ASCII diacritic
- ligature
- non-ASCII diacritic
- combining
- RSC character
These categories help to ensure that each character is mapped to the most appropriate result when different types of output encoding are generated:
- ASCII: This involves:
- mapping all diacritical characters [which fall outside the ASCII 255-character range] to the corresponding single letter
- mapping all ligatures [which fall outside the ASCII 255-character range] to the corresponding pair of letters
- mapping all other characters which fall outside the ASCII 255-character range to one or more ASCII characters, if there is an ASCII equivalent (e.g. —)
- suppressing all combining characters
- suppressing all other characters which fall outside the ASCII 255-character range
- Unicode: This involves:
- suppressing all combining characters
- suppressing all RSC 'special use' characters which fall outside the Unicode standard character range
- HTML: This involves:
- mapping all letters plus combining characters to a suitable image file
- mapping all RSC 'special use' characters which fall outside the Unicode standard character range to a suitable image file
- mapping all Unicode characters which don't display in browsers to a suitable image file
- XML: This involves:
- outputting all non-ASCII characters as entity references, using the <name> element from charmaps.xml
General conventions
Guidelines
Style guidelines. The style guidelines for each journal describe general conventions for article structure. Use these as a guide to the structure and content of articles.
In particular, while encoding articles these guidelines should be used to infer when a change of type style (e.g. to bold) implies a specific element type, as discussed below under Cross-references.
Semantics of the table model. The table model used is developed from the interoperable CALS table model subset supported by OASIS1a-b. The OASIS web site contains a description of the generic CALS table model1a, and a description of the semantics of this interoperable subset1b.
Version 3.6 of the DTD simplifies the level of CALS table support that is required by removing the <spanspec> element type (which is not part of the interoperable subset). This has been found to be unnecessary, since both horizontal and vertical spans within tables can be represented without it. (<colspec> provides all the information that is required for horizontal spanning, while the MOREROWS attribute supports vertical spans.) It adds support for rotated tables by including the ORIENT attribute, which can be set to "land" to indicate a landscape, i.e. rotated, table.
Article structure
Each article consists of front matter, body matter and back matter.
The article itself can have a type attribute, which specifies what type of article it is. This table summarises the codes to be used for each type of article, and the types of article that are currently liable to appear in each journal published by the RSC. (See below for a key to the journal codes in this table.)
Table
-arttypes
Article type codes and usage
Article type
|
Code
|
PO
|
EM
|
GC
|
DT
|
JM
|
P1
|
P2
|
JC
|
CC
|
FT
|
AN
|
AC
|
JA
|
MC
|
FD
|
NP
|
CS
|
IC/OC/PC
|
NJ
|
RC
|
QU
|
CE
|
GT
|
Papers |
ART |
|
X |
X |
X |
X |
X |
X |
|
|
X |
X |
|
X |
|
X |
|
|
|
X |
|
X |
X |
X |
Comms |
COM |
|
|
X |
X |
X |
X |
X |
|
X |
X |
|
X |
X |
X |
|
|
|
|
|
|
|
|
|
Perspectives |
PER |
|
|
|
X |
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
Letters |
LET |
|
|
X |
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
X |
|
X |
Feature Articles |
FEA |
X |
X |
|
|
X |
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Editorial |
EDI |
X |
|
X |
X |
X |
X |
X |
|
X |
|
X |
X |
|
|
|
|
|
|
X |
|
X |
X |
X |
Synopsis |
SYN |
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Full text |
ART |
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Research Articles |
RES |
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
Discussions |
DIS |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
Review Articles |
REV |
X |
|
X |
|
|
X |
|
|
|
|
X |
X |
X |
|
|
X |
X |
X |
|
X |
|
|
X |
Book Reviews |
BKR |
X |
|
X |
|
|
|
|
|
|
|
|
|
X |
|
|
X |
|
|
|
|
|
|
|
News |
NWS |
X |
X |
X |
|
|
|
|
|
|
|
X |
|
X |
|
|
|
|
|
|
|
|
|
|
News articles |
NAR |
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Highlights |
HIG |
|
|
X |
|
|
|
|
|
|
|
X |
X |
|
|
|
|
|
|
|
|
|
X |
|
Interviews |
INT |
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
Technical note |
TEC |
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Events/Conference Diary |
CNF |
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Conference reports |
CRP |
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
Synthetic abstract |
SAB |
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cover Feature |
COV |
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Focus |
FOC |
|
X |
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Viewpoints |
VPT |
|
X |
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Invited Lecture |
LEC |
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
Keynote Article |
KEY |
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hot off the Press Articles |
HOT |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
Atomic Spectrometry Update |
ASU |
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
Analytical Methods Committee |
AMS |
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
Inter-laboratory Note |
ILN |
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
Critical Review |
CRV |
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
Tutorial Review |
TRV |
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
Glow Discharge Paper |
GDP |
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
Glow Discharge Comm |
GDC |
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
Glow Discharge Review |
GDR |
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
Glow Discharge News Article |
GDN |
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
Glow Discharge Technical Note |
GDT |
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
Front matter
The front matter consists of <art-admin>, which holds the article's unique manuscript number, <published>, which contains details of the journal, volume, issue in which the article has been printed and the relevant pagination details, and <art-front>, which is the front matter proper.
For accepted date, use:
<date role="accepted"><year>1999</year><month>April</month><day>23</day></date>
For the date on which a revised version of an article was issued, use the same date format, with role="revised".
Authors - we would like the corresponding author to be identified. There's no need to mark the others as 'princ'.
For affiliations, use the <org> <orgname></orgname> <org><address></address> for the address - although the <org> group does contain its own <address> element, this shouldn't be used for encoding the articles. We don't require any org ids.
The <published> element should be set with the attribute type="print", along with the journal code — the other pubfront subelements should be left blank.
<published type="print"> <journalref><link>GC</link></journalref> <volumeref><link>001 </link></volumeref> <issueref><link>unknown</link></issueref> <pubfront><fpage></fpage><no-of-pages></no-of-pages> <date><year></year></date> </pubfront> </published>
Body matter
The body of each article consists of an <art-body>, containing one more <section>s.
These are the top-level structural units within each article: lower levels are represented by <subsect1>, <subsect2>, etc. (N.B. the numbering of section-level element names represents their depth of nesting, not repetition.)
Care should be taken to ensure that the structure of the article, implied by the style of headings, is correctly reflected in the <section> and <subsectN> elements assigned. See <title> for details of heading typestyles.
Appendices
Any appendices to an article are placed within an <appmat> element, between the <art-body> and <art-back> elements. This contains one or more lt;appendix> elements, each optionally numbered and containing one or more <section>s.
Back matter
The back matter contains an optional <ack> element. This is followed by mandatory <biblist> and <compoundgrp> elements.
This last element is provided as a place to collect together <compound> elements, each of which defines the ID of a chemical compound mentioned in the article, and thus to provide a target for <compoundref> cross-references (which are normally set in bold face: see Cross-references). (The ultimate intention, not to be implemented at this stage, is to provide links back from these <compound> elements to the points in the article where the compound is defined or illustrated.)
Graphics
Graphical objects should be declared as external entities, with a suitable Notation. The RSC application provides a comprehensive set of possible notations, which ought to include all the image formats encountered. Let us know if any new image formats are encountered.
External entity declarations should include PUBLIC identifiers as well as SYSTEM identifiers, e.g.
<!ENTITY ugr1 PUBLIC "RSC// a904043i ugr1" "a904043i-u1.tif" NDATA tiff>
Graphics take the following attributes:
- ID: a unique ID for this graphic (see notes below on assigning IDs) (required)
- src: the entity which contains the graphic (see notes above on external entities)
- height:
- width:
- pos: "float" for floating graphics: otherwise "fixed". "float" should be used for graphics marked as "A" blocks, while "fixed" should be used for "B" blocks and for graphics appearing in the body of the text. Graphics appearing within tables, equations should be assumed to be fixed.
Chemical formulae, equations, symbols for which no character entity is provided in the DTD and tables which are too complex to encode as XML should all be encoded as a <ugraphic> element. As well as the standard attributes for graphics, this has a displayed attribute which can take the value "displayed" (which indicates that the graphic should be set off from the surrounding text) or "inline" (which means that the graphic should form part of the current line).
Assigning unique id's
In order to make id's unique within each article, a prefix should be added to the identifier assigned by the author:
Table
1
id prefixes for different classes of target
author affiliation |
aff |
chart |
cht |
chemical compound |
chem |
citation |
cit |
equation |
eqn |
figure |
fig |
footnote |
fn |
plate |
pl |
scheme |
sch |
table |
tab |
table footnote |
tab + fn1 |
untitled graphic |
ugr |
typesetter-generated graphic (e.g. equations and tables which cannot be encoded in SGML/XML) |
ug |
1 Table footnotes should be given an id which is a combination of the table's id and a unique id for the footnote within that table, e.g. tab2fna. Table footnotes should be given letters (a, b, c, etc). |
Thus, for example, a citation referred to in the paper as 8a should be given the id cit8a, while chemical compound 8a should acquire the id chem8a.
We are using the number or letter in some of the id's to generate some of the numbering within the HTML article: for affiliations, equation numbering and table footnote lettering the aff, eqn, table fn should all be given the literal values that would appear in the text, e.g. affa, affb; eqn1, eqn2; tab1fna, tab3fnc. For the remaining id's a unique number or letter will be sufficient.
Links and cross-references
Internal cross-references within an article should use the standard SGML/XML ID-IDREF mechanism. To enforce this, we have specified as #REQUIRED the id attribute for all the elements that cross-references might point to. It is not practicable to do the same for pointer elements, since their target is not always present. To allow for this, the idrefs attribute is not mandatory. Instead, a presence attribute is provided. When a linking element has no target, this attribute should always be specified, with the value presence="missing".
This table summarises the element types which indicate cross-references, and the target element type for each.
Table
3
Mapping of cross-reference element types to target element types
Cross-reference element type
|
Target element type
|
<compoundref> |
<compound> |
<textref> |
any textual element with an ID attribute |
<figref> |
<figure> |
<schemref> |
<scheme> |
<plateref> |
<plate> |
<chartref> |
<chart> |
<eqnref> |
<equation> |
<boxref> |
<box> |
<tableref> |
<table-entry> |
<citref> |
<citgroup> |
<fnoteref> |
<footnote> |
<affref> |
<aff> |
One specific point to note is that <citref> does not point to a <citation> or <journalcit> element: instead it points to <citgroup>. This design allows any number of citations to occur within a single numbered or sub-numbered part of a References list.
In the unlikely event that an external link to another article (also encoded in SGML/XML) needs to be made, the general-purpose <link> element type is provided. This implements the Text Encoding Initiative (TEI) Extended Pointer mechanism, which allows all or part of a document to become the target of a link. It is anticipated that only the ID-based part of the TEI Extended Pointer syntax would be required in practice. Do not use the <link> element without checking with RSC first. The linking strategy described here is likely to be reviewed once the W3C's XLink proposal reaches Recommendation status.
Recognising cross-references. This table summarises typographical conventions which are often used to represent various types of cross-reference. Where a change of font style indicates such a cross-reference, it should always be marked up as such. In such cases, the cross-reference should not also be marked up as a change of font style.
Table
4
type style
|
data type
|
cross- reference type
|
superscript |
arabic no. [+ letter suffix] |
citref |
superscript |
letter |
affref |
superscript |
symbol |
fnoteref |
bold |
numbers, letters, roman numerals |
compoundref |
Numbering
For the present, numbers should be included in the <no> element if they are required.
There is no need (and no opportunity!) to number figures, schemes, boxes or plates. Suitable prefixes and numbers (e.g. "Fig 1.") will be supplied by style sheets. Other concepts (e.g. citations, equations, appendices, and chemical compounds) have an optional <no> element. This does not need to be used where the numbering scheme follows a simple sequence of arabic numbers, since the entries will be auto-numbered in this case. If any instance of a given element type has a non-standard number within an article, then the <no> element should be specified for all instances of that element type.
However, all of these concepts are allowed to have an ID, and some require one — these IDs still need to be specified even if the title or heading itself can be auto-numbered. We can't (yet) auto-number tables in appendices, which require numbers in the form A1, A2, etc.
Low-level elements
Emphasis and font style elements
Changes in font style should be marked up with the appropriate emphasis tags unless they indicate a specific concept, as discussed above under Cross-references.
Individual elements can be used to mark bold text, italic text, bold italic, underlined text, SMALL CAPS, superscript and subscript. They can also be used in combination to represent, for example, superscript bold text.
Footnotes. Footnotes to be placed just after the first <fnoteref>.
All footnote characters should be auto-generated. In text, they follow the order:
- dagger
- double dagger
- curly s thing
- backwards P thing (paragraph mark)
- double vertical line
- double asterisk
- 2 daggers
- 2 double daggers
- 2 curly
- 2 backwards P
- 2 x double vertical lines
- 3 asterisks
- 3 daggers.....etc.
In table footnotes, they just appear as a, b, c, d, etc, where these letters are taken from the end of the id attribute's value.
Text. Spacing:
Equation spacing: +, minus, divide, times, are spaced on either side when in an equation (there is spacing around the mathematical character when it is between two digits e.g. 4 + 4. When it is just the character and one digit there is no space e.g. +4). This also to applies to proportional to, plusminus, similar to, approx. equal to, >, < and their >= variants.
multiple citrefs shouldn't be spaced: <citref idrefs="cit1 cit4 cit5 cit12">1, 4, 5, 12</citref> should be: <citref idrefs="cit1 cit4 cit5 cit12">1,4,5,12</citref>
Figure, scheme, etc references should be placed at the end of the paragraph in which they are first referenced.
<p> in titles to be used for Green Chemistry font change. Second <p> of GC titles will contain the details for the smaller title content. Simple titles don't need to use p at all.
For elements where the content model is empty (ugraphic, colspec, icgraphic) the elements need a closing solidus for XML: <colspec colname="1" colwidth="2.82*" align="left"/>
Compoundrefs: these can take any form, but the ids don't have to exactly match, e.g. <compoundref idrefs="chem61a">6·1a</compoundref>
Tables
Tables will normally appear inline, marked up according to CALS-compatible SGML. The standard CALS attributes should be used to render the table in a form that is as close as possible to the printed result. This includes, but is not limited to, the relative widths of columns, spanning of rows and columns, and the use of lines to separate headings. The specific conventions listed below are intended to be compatible with the approach supported by Adept's table editor:
- relative column widths: use <colspec> with colname="n" and colwidth="X.XX*", where X.XX is a ratio of 1.00, the default column width;
- individual cells (<entry> element type) refer to their colspec with colname="n"
- spanning: horizontal spans use namest="n" nameend="m" within <entry>; vertical spans use morerows="n" within <entry>
- horizontal alignment: align"center", "right", "justify", "char" and "left" (default) within <entry>. When the align attribute is not specified for <entry>, the value in the appropriate <colspec> element will be used as a fallback
- vertical alignment: valign="top", "middle", and "bottom" (default - !!) within <entry>. When the valign attribute is not specified for <entry>, the value in its parent <row> element will be used as a fallback, and failing that the value in the <row>'s own parent (<thead>, <tfoot> or <tbody>)
- rules: in general, do not mark up ruler lines within tables. Default style rules will insert a rule below headings which span more than one cell. If absolutely necessary, use standard CALS conventions, i.e. rowsep="0" for e.g. bottom rule (?); "1" for vertical rule (?); ... and flag this as an exception
- N.B. overall table width, row shading and non-standard row heights (other than spans) are recorded by Adept as processing instructions, and so are not encoded in the SGML
However, tables will sometimes be too complex to represent in this way, and so will be prepared as a graphic. To deal with this variation, a 'cover element' lt;table-entry> is provided, which contains either an inline <table> entry or a <ugraphic>. It is <table-entry> which requires a unique ID for <tableref> elements to point to, and which contains a <title> element.
One side-effect of this approach is that un-numbered tables can simply be encoded as <table>. From version 3.3 onwards, <table> can appear within text and between paragraphs.
Chemistry
Chemical compounds and simple formulae can often be represented as inline markup. <sup> and <inf> can be used to shift text, and <overbar> and <underbar> to place rules above or below chemical symbols. The character entity sets provided as part of the DTD (especially the ISO Chemistry set and the custom RSC set) support most chemical symbols that will be encountered. The <stack> element type can be used to encode the situation where one character appears directly above another.
Where chemical formulae are too complex to render as inline SGML, an inline or displayed <ugraphic> should be used instead.
Equations
Equations may appear inline, marked up in SGML using the tools available such as <fraction>: 1
/3.
However, equations will fequently be too complex to represent in this way, and so will be prepared as a graphic. To deal with this variation, a 'cover element' <equation> is provided, which contains either an inline <eqntext> entry or a <ugraphic>. <equation> requires a unique ID for <eqnref> elements to point to.
Multi-line text equations can be accommodated by adding another <p>. Within <eqntext>, you should either have no <p> subelements (one-line or inline equations), or nothing but <p> subelements (multi-line equations).
Citations
Where citations follow the standard pattern for journal articles, the <journalcit> element type should be used. In all other cases (including 'difficult' journal article citations, books, theses, computer software, etc.), the more flexible <citation> element type should be used. <citext> should be used to mark up text within the References section which is not a citation of any kind.
Numbering citations. As noted above in Links and Cross- references, the citation number is a property of the enclosing <citgroup> element, not the citation itself. This makes it easy to deal with the case where more than one citation is given under the same reference number. It also allows running text to be mixed with, or indeed take the place of, proper citations.
Note that the expected pattern for numbering citations is to use numbers for top-level entries, and letters for sub-entries. If the citations follow this pattern, the <no> element should not be provided for any <citgroup> element. Instead, nested <citgroup> elements should be used to represent the lower-level citations. (See the source SGML of these instructions for an example of this technique.)
Standard journal citations. Standard journal citations follow this model:
- author (at least one)
- optional article title
- [journal] title
- year
- volume number
- issue number
- first page or page range
- translation (optional)
Unless stated otherwise, each element should appear exactly once, and elements should appear in the order given. In such cases, <journalcit> can and should be used. The citation should be entered as a series of analysed subelements. No punctuation should be recorded between each component of the citation, and no style markup (e.g. italic for titles; bold for volume numbers) should be included. Punctuation and styling will be applied by the rendering process. Thus the citation:
G.H. Jonker and J.H. Van Santen, Physica, 1950, 16, 337
should be encoded:
<journalcit><citauth><fname>G. H.</fname><surname>Jonker</surname></citauth> <citauth> <fname>J. H.</fname><surname>Van Santen</surname></citauth> <title>Physica</title><year>1950</year><volumeno> 16</volumeno> <pages><fpage>337</fpage></pages></journalcit>
Non-standard citations. The <citation> element type should always be used for non-standard citations which, do not fit the standard model. The type of citation should be specified in the type attribute. Allowed values are:
- article (the default value - this doesn't need to be specified)
- book
- thesis
- patent
- software
- other
This isn't being done at present.
Within citations, the following concepts should always be marked up when they are present:
- authors (<citauth>)
- titles (<title>)
- editors (<editor>)
- citpub (<citpub>)
- place of publication (<pubplace>)
- year of publication (<year>)
- journal volume number (<volumeno>)
- journal issue number (<issueno>)
- the part of the work that is being cited: section, pagination, etc. (<biblscope>)
<citation> elements will be marked up as found, including all punctuation and style changes.
This is an example of a reference to a patent:
S. Iwaya, H. Masumura, Y. Midori, Y. Oikawa and H. Abe, US Patent, 4,404,029, 1983.
This should be encoded:
<citation type="patent"><citauth><fname>S.</fname><surname> Iwaya</surname></citauth>, <citauth><fname>H.</fname><surname>Masumura</surname> </citauth>, <citauth><fname>Y.</fname><surname>Midori</surname></citauth>, <citauth><fname>Y.</fname><surname>Oikawa</surname></citauth> and <citauth><fname>H.</fname><surname>Abe</surname></citauth>, <it>US Patent</it>, 4,404,029, <year>1983</year>.</citation>
Book citations. One particular type of non-standard citation which will frequently occur is a reference to a book, either in whole or in part. Again, <citation> should be used to mark these up. The <editor>, <citpub> and <pubplace> element types will often be required within such citations. A fairly typical, simple, example is:
S. Brooks and B. Johansson, in Handbook of Magnetic Materials, ed. K. H. J. Buschow, 1993, 7th edn.
This should be encoded:
<citation type="book"><citauth><fname>S.</fname><surname> Brooks</surname></citauth> and <citauth><fname>B.</fname><surname>Johansson</surname> </citauth>, in <title>Handbook of Magnetic Materials</title>, ed. <editor> K. H. J. Buschow</editor>, <year>1993</year>, 7th edn.</citation>
Note the following:
- within <citauth>, analysis is the same as for standard citations. No space is required between the forename and surname because the rendering process will add one
- no <it> element is required within the title: it will be rendered as italic
- otherwise, all punctuation (i.e. all punctuation between analysed components) is provided exactly as in the source
- the edition information does not fit the model for <biblscope>, and so is left as unanalysed text
A good mixed citation example:
<citgroup id="cit5"> <citation>During the preparation of this manuscript, diester <compoundref idrefs="chem1">1</compoundref> was isolated as a minor side product in the base promoted rearrangement of the analogous (<it>R</it>,<it>R</it>,<it>R</it>,<it>R</it>)-2,3-butane diacetal (BDA) protected dimethyl tartrate, see: <citauth> <fname>M. T.</fname> <surname>Barros</surname> </citauth> , <citauth> <fname>A. J.</fname> <surname>Burke</surname> </citauth> and <citauth> <fname>C. D.</fname> <surname>Maycock</surname> </citauth>, <title>Tetrahedron Lett.</title>, <year>1999</year>, <volumeno>40</volumeno>, <biblscope>1583</biblscope>.</citation>
and a <citext>:
<citgroup id="cit8"> <citext>The strong bias towards axial silylation was seen to fall if the mono sodium alkoxide did <it>not</it> precipitate prior to addition of the silicon halide.</citext></citgroup>
two other points:
a) where a citref appears within another citation. We have extended that content model of citelt so that it can contain "m.simple-text", i.e. any element types which can occur within paragraphs.This change should make citelt a much better 'catch-all' for miscellaneous stuff within citations.
b) where a citation includes a compoundref and ugraphic of the compound. The compoundref is allowed, but the ugraphic isn't. We have created a new class 'para-graphic' for these two element types. They can now appear anywhere 'text-elts' can appear, as well as between paragraphs.
RSC journal abbreviations. The journals published by the RSC have the following abbreviations, which can be used within the SGML/XML framework, e.g. in <journalref> elements:
Table
6
AC |
Analytical Communications |
AN |
Analyst |
CC |
Chemical Communications |
CE |
Cryst. Eng. Communications |
CP |
PCCP |
CS |
Chem. Soc. Reviews |
DT |
Dalton Transactions |
EM |
J. Environmental Monitoring |
FD |
Faraday Discussions |
FT |
Faraday Transactions |
GC |
Green Chemistry |
GT |
Geo. Trans. |
IC/OC/PC |
Ann Rep (Inorganic, Organic, Physical) |
JA |
JAAS |
JC |
JCR |
JM |
J. Materials Chemistry |
MC |
Mendeleev |
NJ |
New Journal of Chemistry |
NP |
Natural Product Reports |
P1 |
Perkin Transactions 1 |
P2 |
Perkin Transactions 2 |
PO |
Pesticide Outlook |
RC |
RCR |
QU |
Phys. Chem. Comm. |
Lists
Lists can be entered as a <list>, containing an optional <head> and any number of <item> elements. The type attribute can be used to indicate the type of list. It should take one of the following values:
Note that, since <list> can occur within <item>, it is possible to declare lists nested to any depth.
General
If there are internal references that are in effect impossible, just put the text in and leave out the reference. It would helpful to advise us in case an amendment to the DTD may be wise, but usually these are one-offs. One case recently had a number of equations in a single ugraphic, itself called scheme 1. In this case it was not possible to add eqnrefs to the scheme. |
Appendix A. Alphabetical list of element types
Element definitions
This section contains a definition of every element type in the RSC DTD, including element types which are not required for the data capture work. These additional element types are included for editorial use within RSC, or to support future processing of the encoded articles. They are indicated thus:
RSC internal use only
a. 'anchor': a wrapper round a resource (an image, scheme, table, etc.). An anchor specifies a non-printable external entity which can augment the resource. Where appropriate, it should be represented as a clickable link to navigate to the external entity. Can contain zero or more:
- elements representing graphics
- equation
- box
- table-entry
- table
- src: an entity reference which defines the external entity
- in HTML output, a is represented as an href attribute on the <a> element which is already wrapped around a graphic resource
above. The top half of a stack. Contains 'characters only'.
- rendered as superscript, before below
abstract. An abstract of the article. Contains 'text or paragraphs'.
- rule above [and below], with the abstract itself output as a sequence of left-aligned bold paragraphs
ack. Acknowledgements for the article. Contains 'text or paragraphs'.
- title: an optional non-standard title for the acknowledgements section.
- preceded by a rule. Title is set as an a-heading
- if title is not specified, the heading 'Acknowledgements' is output
address. A complete postal address. Can be represented by a link, or by a sequence of address subelements:
- city
- postcode
- state
- country
- addrelt
each separated by spacing but no punctuation.
- id: a unique identifier for this address element
- type: the type of address
- address within aff is output in italic other addresses are not currently styled
- each top-level subelement within aff/address is followed by a comma, except for <postcode>s followed by a <country> element which is the last subelement of the address
addrelt. An element within a postal address. Used only when no more specific element type (e.g. city) is appropriate. Can contain 'simple text'.
- id: a unique identifier for this element
admin-event. A single event relating to the administration of an article, e.g. its receipt, acceptance, or rejection. Provided in versions 3.4 onwards of the DTD as a place-holder for RSC management information. Has a mixed content model, which allows the following subelements within text:
- agent
- address
- date
- admin-event (for complex administrative events)
- type: the type of administrative event
advert. An advertisement, i.e. any self-contained block of text which is to be 'dropped in' to a journal issue (including information on grants available, etc.). Contains a link, or one or more sections.
- id: a unique identifier for this element
- type: the type of advertisement
- treated as a keep-together block rules placed either side of it in HTML
aff. An author's affiliation. Contains one or more pairs of:
- org (optional)
- address (mandatory)
followed by any of the following which apply:
- id: a unique identifier for this affiliation element. See above for guidance on assigning id's.
- affiliations are rendered as a 'small heading'
- affiliation codes ('a', 'b', etc.) are auto-generated from the last letter of the aff element's id attribute ('affa', 'affb', etc.). They are rendered as italic superscript, and applied both as a prefix to the affiliation itself, and as a cross-referencing hyperlink from the relevant author(s)
affref. A reference to an author's affiliation. In practice this element is not used, since authors' affiliations are indicated by the aff attribute on author.
- idrefs: a space-separated list of <aff> identifiers
- presence: 'missing' or 'notmissing' (the default value)
agent. A person playing a role within an admin-event. Contains one person element.
- role: the role played by the person in this administrative event
- suppressed, as part of <admin-event>
appendix. An appendix to an article. Contains an optional no and one or more sections.
- id: a unique identifier for this appendix element. See above for guidance on assigning id's.
- each appendix is preceded by an a-heading "Appendix N", where N is either the value of its <no> subelement or the element's actual sequence number
appmat. A container for appendix matter. See above for general guidance.
Contains one or more appendix elements.
- currently placed after <art-back> (i.e. out of sequence) is this the best thing to do with appendices?
art-admin. A container for administrative information relating to an article. Contains, in the order specified:
- ms-id (required)
- doi (optional)
- pii (optional)
- sici (optional)
- office (optional)
- received (optional and repeatable)
- date (optional and repeatable)
- admin-event (optional and repeatable)
- the <art-admin> element is set as an inline italic sequence
art-back. A container for an article's back matter. Contains, in the order specified:
- ack (optional)
- biblist (required)
- compoundgrp (required)
- section (optional and repeatable)
- no special formatting is associated with the <art-back> element type
art-body. A container for an article's body matter. See above for general guidance.
Contains one or more sections, or one or more news-sections.
- no special formatting is associated with the <art-body> element type
art-front. A container for an article's front matter. See above for general guidance on analysing front matter.
Contains a link, or the following elements in the order specified:
- titlegrp (required)
- authgrp (optional)
- conference (optional)
- art-toc-entry (optional)
- arttoc (optional)
- dedicate (optional)
- biography (optional and repeatable)
- abstract (optional and repeatable)
- subject (optional and repeatable)
- keyword (optional and repeatable)
- the <art-front> element as a whole is suppressed, but its <titlegrp> subelement is treated specially. See its documentation for details
- then <authgrp>, <biography> and <abstract> are output, in that order
art-links. A container for links from an article to other resources. Contains any number of suppinf and/or fulltext elements.
- no special formatting is associated with the <art-links> element type
art-toc-entry. Container for resources to use when creating the article's entry in the table of contents for a journal issue. Contains, in the following order:
- currently suppressed from the article itself
article. An article. Contains a link element, or the following elements in the order specified:
- art-admin (optional)
- published (optional and repeatable)
- art-links (optional)
- art-front (optional)
- art-body (optional)
- appmat (optional)
- art-back (optional)
- dtd: a FIXED attribute which specifies which version of the DTD was in use when this XML document was created. There is no need to enter a value for this attribute (and any value other than 'RSCPAx.y' for version x.y of the DTD will render the whole article invalid)
- price-code: takes the value 'free', 'premium' or 'review'. If not specified, 'free' is assumed
- type: the class of article, e.g. 'feature', 'communication'. The article type should be taken from the list of codes given above, e.g. "ART" for a Paper
- background: a reference to an external entity to be used as a background image for the article
- the subelements of <article> are output in this order:
- <art-admin>
- <art-front>
- <art-body>
- <art-back>
- <appmat>
- when outputting to HTML, the type attribute is extracted, converted to its expanded form as listed above, and inserted within the article header before the article title
- when typesetting, a simple combined graphic with article type included is inserted
- no support for background images has been provided yet
articleref. RSC internal use only
A pointer to an article (within an issue), used when generating index entries. Contains a link.
- suppressed as part of <index>
arttitle. An article title within a citation or journalcit. Contains 'simple text or paragraphs'.
- no special formatting is associated with the <arttitle> element type
arttoc. An article's table of contents. Entering an empty <arttoc> element is an instruction to generate an article table of contents from the section and subsection headings (levels a to d, i.e. <section> to <subsect3>) found in the article. In the HTML output, hyperlinks from the ToC to each section are generated. These are based on the section's id if specified, otherwise on a unique system-generated code (which is liable to change each time the document is edited).
Can, if desired, contain toc-head (optional) and toc-entry (optional and repeatable).
- the <arttoc> element is replaced by a table containing auto-generated section numbers in the left column (or the section's no element, if specified), and section titles in the right column
- Is it logical that sections with <no> elements get numbered in the article, while those without don't, even though both get numbered in the ToC?
- no support is yet offered for specifically entered <toc-head> and <toc-entry> elements
- requirements: in ASU a) table titles appear in contents b) References section gets picked up
authgrp. A container for details of authors and their affiliations. Contains one or more author elements, followed by one or more affs.
- punctuation between multiple authors' names is added by the style sheet
- links between authors and their affiliations are added by the style sheet
- an indication of the 'corresponding' author is added by the style sheet
- details of when/where received, when accepted, and when/how published are only output if an <authgrp> element is present
- within news-item and book-review, <authgrp> is output after all other subelements. Authors and affiliations are output on separate lines, in italic, and right-justified.
author. One author of an article. Repeat for each distinct author. Contains a person, followed by an optional footnote.
- aff: one or more idref's (separated by spaces), specifying which aff elements apply to this author
- key: a unique key for this author [not yet used]
- role: can take the value 'princ' (principal author) or 'corres' (corresponding author)
- punctuation between multiple authors' names is added by the style sheet
- links between authors and their affiliations are added by the style sheet
- an indication of the 'corresponding' author is added by the style sheet
- the author's person subelement is output in bold B in HTML output
below. the bottom half of a stack. Contains 'characters only'.
- rendered as subscript SUB in HTML output, after above
bi. Indicates that the contained text should be rendered as bold italic. This is preferable to using separate <bo> and <it> elements. Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.
- rendered as bold and italic B and I in HTML output
biblist. A container for the bibliography at the end of an article. Contains a mixture of text and citgroups.
- title: a non-standard title for the bibliography. Can include a section number, if one is required.
- if the title attribute is specified, it is output as the heading for this section. Otherwise, the heading 'References' is output both H3 in HTML output.
biblscope. The scope of a citation within the work cited. Can include references to sections, chapters, page ranges, etc. Contains 'simple text'.
- no special formatting is associated with the <biblscope> element type
biography. A person's biography. Contains a link, or one or more sections.
- id: a unique identifier for this <biography> element
- <biography> is suppressed where it appears, but is output as a full-width one-row table TABLE in HTML output, followed by a rule HR in HTML output, after the article's front matter (so long as an art-front element is present). RK: Biography might be better as a two-cell table, with any plate as the left hand cell
bo. Indicates that the contained text should be rendered as bold. Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type (specifically compoundref, which is the most common reason for bold-face within article text).
- rendered as bold B in HTML output
board. RSC internal use only
a journal or issue's [Editorial] Board. Contains a link, or an optional title followed by zero or more groups and/or members.
- id: a unique identifier for this element
- no special formatting is associated with the <board> element type
book-review. A book review, consisting of the citation of the book being reviewed, reviewer's details, and the review itself. Contains a citation, followed by an optional authgrp for the reviewer's details (i.e. the 'author' of the review), followed by one or more paragraphs (p) and/or 'inter-paragraph elements'.
- within <book-review>, <authgrp> is output at the end, right-justified (see authgrp for details)
- multiple <book-review> elements are separated by a line-break
box. a floating text box. Contains a single section.
- id: a mandatory unique identifier for this element
- height: the height of the box, expressed as ...
- width: the width of the box, expressed as ...
- tint: the tint of the box, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
- pos: can optionally take the value 'fixed' to indicate that the <box> cannot float
- <box> elements are set as a centred 80%-width table with a border (not currently visible!)
boxref. A reference to a floating text box. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the box(es) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- no special formatting is associated with the <boxref> element type
byline. RSC internal use only
a journal's byline. Contains 'simple text'.
- no special formatting is associated with the <byline> element type
chart. A chart. Contains an optional title. See above for general guidance on encoding graphics.
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- src: the entity which contains the graphic (see notes above on external entities)
- height: the height of the graphic, expressed as ...
- width: the width of the graphic, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
- pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
- the chart is output within a centred half-width table TABLE in HTML output as an image IMG in HTML output
- if the chart has a <title>, this is output in a separate row below the image; otherwise the heading 'Chart N' is generated, where N is the chart number as indicated in its id attributeNeil's code has the auto-generated heading centred, and the 'real' heading left-aligned. Is this intended?
- a text break instruction BR clear="all" in HTML output is output before and after the chart
chartref. A cross-reference to a chart. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the chart(s) to which cross-reference is being made
- a link is made to the first idref specified in the idrefs attribute A in HTML output
citation. Container for an individual citation that doesn't fit the model for a standard journal citation (journalcit). Should only be used if <journalcit> cannot. See above for general guidance on encoding citations.
Contains mixed content, which can include the following element types as required:
- citauth
- title
- year
- volumeno
- issueno
- arttitle
- biblscope
- editor
- citpub
- pubplace
- link
- url
- email
- trans
- 'emphasis' elements
- id: a unique identifier for this element this attribute shouldn't be used, since it isn't intended to be pointed to now. It will be removed in the next version of the DTD
- type: the type of citation
- no special formatting is associated with the <citation> element type
citauth. An author within a citation or journalcit element. Contains a link, or an optional fname followed by a mandatory surname.
- no special formatting is associated with the <citauth> element type
citext. Citation text. Used only when it is not possible to encode material found within a citations list using journalcit or citation. (This should only apply when the text isn't actually a citation at all.) Contains 'simple text'.
- no special formatting is associated with the <citext> element type
citgroup. A group of citations with a single reference number. (Most <citgroup>s will only contain a single journalcit or citation element.) See above for general guidance on encoding citations.
Contains an optional no element for a non-standard citation number, followed by one or more of the following, in any order:
- citext
- journalcit
- citation
- citgroup
- *commentary may also appear after the various elements above
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- the citation is enclosed in an anchor group A in HTML output, with a NAME attribute equal to its id attribute
- the content of <citgroup> is preceded by a displayed citation number, which is derived from the <citgroup>'s position in the citation list
citpub. The publisher of a citation. Contains 'simple text'.
To be added by data capture agency.
- no special formatting is associated with the <citpub> element type
citref. A reference to a citation. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the citation(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- position: can take the value 'super' or 'baseline'
- a link is made to the first idref specified in the idrefs attribute A in HTML output
- in HTML output the link has a TITLE attribute, generated from the text of the citation
- unless the attribute position="baseline" is specified, the citation will be displayed as small-type superscript SMALL, SUP in HTML output
city. The name of a city. Must consist of character data only.
- within <received>, the city name is preceded by '(in ' and followed by ')'
- otherwise, no special treatment is applied to <city> elements
coden. RSC internal use only
A CODEN identifier for a journal. Contains character data only.
- no special treatment is applied to <coden> elements
colspec. A specification of the characteristics of a column in a table. Empty element: has no data content.
- colnum: the column's number
- colname: the column's name
- colwidth: the column's width, as a relative fraction of 1.00 (= average column width given equal spacing)
- colsep: the column's column separator
- rowsep: the column's row separator
- align: the alignment of the column's content
- char: the character to be used for alignment within the column
- charoff: the offset for character alignment within the column
- information in <colspec> is used to determine cell spanning
- information in <colspec> is used to determine text alignment
commentary. A description of the value of a citgroup. Contains 'simple text'
- rating used to denote the rank of the citation: 0 (default), 1 or 2
- the rating is used to generate superior filled stars before the text of the commentary. 1 for a single superscript filled star, 2 for two stars
- the commentary should be rendered as italic
compname. The name of a chemical compound. Contains a link or 'simple text'.
- no special formatting is associated with the <compname> element type
compound. Specifies the id of a chemical compound. Optionally contains one or more compoundref elements, each linking to a definition of that compound
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- no special formatting is associated with the <compound> element type
compoundgrp. A container for zero or more compound elements. A <compoundgrp> is required at the end of each article so that compoundref elements have a target to point to. (At present no use is made of these links when rendering articles.)
- no special formatting is associated with the <compoundgrp> element type It should be explicitly suppressed, 'just in case'
compoundref. A reference to a chemical compound. Contains 'emphasised text' specifying the compound's code. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the compound(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- rendered as bold text B in HTML output
conference. Information about a conference or similar meeting. Contains an optional sequence number (no), followed by zero or more of the following, in any order:
- confname
- daterange
- location
- contact
- id: a unique identifier for this element
- no special formatting is associated with the <conference> element type To be done ,,,
confgrp. A container for zero or more conference elements.
- id: a unique identifier for this element
- no special formatting is associated with the <confgrp> element type To be done ,,,
confname. A conference's name or title. Contains 'simple text'.
- no special formatting is associated with the <confname> element type To be done ,,,
contact. A contact, e.g. for a conference. Contains zero or more of the following, in any order:
- person
- address
- phone
- fax
- email
- url
- id: a unique identifier for this element
- no special formatting is associated with the <contact> element type To be done ,,,
country. A country name. Must consist of character data only.
- there is a comma after <postcode> unless it is immediately followed by <country>, in which case there is no punctuation
- otherwise, no special formatting is associated with the <country> element type
cpyrt. RSC internal use only
A copyright statement. Contains 'simple text'.
- output at the end of the article, after any footnotes, in a full-width table TABLE in HTML output
- preceded by a rule HR in HTML output
- followed by a space and the publication year, if specified
date. A general year-month-day date. Contains a year, followed by an optional month and an optional day.
- role: the role played by this date (e.g. 'accepted' or 'revised')
- dates are either output as year-only (e.g. within the generated copyright statement), or formatted into an 'RSC date' (e.g. '21st November 2000')
- <date> within art-admin with role='accepted' is output after <received>, with a prefix ', Accepted' role='revised' isn't supported at present
daterange. A range of two dates.
- no special formatting is associated with the <daterange> element type There should be a '-' between the two dates.
day. A numerical day: 1/2/3/.../31. Should not contain anything apart from the day number itself.
- when formatted as part of an 'RSC date', a suffix is added to the day (e.g. '21st')
dd. A definition description, part of a deflist. Contains 'text or paragraphs'.
- no special formatting is associated with the <dd> element type
dedicate. A dedication. Contains 'text or paragraphs'.
- no special formatting is associated with the <dedicate> element type
def. The definition of a term, part of a deflist. Contains the term itself, followed by its definition in a dd.
- no special formatting is associated with the <def> element type
deflist. A definition list, containing an optional head, and one or more definitions def).
- no special formatting is associated with the <deflist> element type
denom. The denominator of a fraction. Contains 'simple text'.
- rendered as small subscript SMALL, SUB in HTML output
doi. A Digital Object Identifier. Contains character data only.
- as part of art-admin, this element is currently ignored. A DOI is instead constructed from the article's manuscript number, with the correct RSC DOI prefix. Need to mention link type='DOI' - somewhere!
editnote. An editorial note. Use this element type for any comments generated by the editing process - these do not form part of the article. Contains the following, in this order:
- the note itself
- who made the note
- the date the note was made
- type: the type of editorial note. The values this attribute can take may be controlled in future, but it can be used freely at present.
- no special formatting is associated with the <editnote> element type Is this simply because we haven't yet supported it? It should either be suppressed or picked out in some way.
editor. The editor of an article or book. Contains 'simple text'.
- id: a unique identifier for this element
- no special formatting is associated with the <editor> element type
email. An e-mail address. Contains character data only. Only enter the actual address: the prefix E-mail: will be generated by style sheets.
- the address is enclosed in an anchor A with href='mailto:' plus the address in HTML output
- the content of the anchor consists of the address prefixed by 'E-mail: '
entry. An entry (cell) in a table. See above for general guidance on encoding tables.
Contains mixed content which can include text elements, graphics, and equations.
- colname: the name of the column in which this cell appears
- namest: the name of the start column for this cell
- nameend: the name of the end column for this cell
- morerows: the number of rows occupied by this cell
- colsep: the column's column separator
- rowsep: the column's row separator
- align: the alignment of the column's content
- char: the character to be used for alignment within the column
- charoff: the offset for character alignment within the column
- valign: the vertical alignment of the column's content
- indent: the indentation of this cell (an RSC-specific attribute)
- <entry> is formatted as a table cell TD in HTML output
- the column name and morerows attribute are used to generate suitable COLSPAN and ROWSPAN settings for the cell
- align and valign are used to generate suitable ALIGN and VALIGN settings
eqnref. A reference to an equation.
Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the equation(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- a link is made to the first idref specified in the idrefs attribute A in HTML output
eqntext. An equation expressed in textual form. See above for general guidance on encoding equations.
Contains 'simple text or paragraphs'. Use ps to lay out multi-line equations.
- display: can take the value 'displayed' or 'inline'. Use this attribute to indicate whether the equation should be set as a separate block, or rendered inline.
- <eqntext>s occurring outside an equation are set in a centred full-width table TABLE in HTML output, with two breaks above and one below
- no special action is taken for <eqntext>s within an equation
- No support is yet provided for the display attribute
equation. An equation. See above for general guidance on encoding equations.
Contains an optional no, followed by a textual equation (eqntext) or a graphic displaying the equation (ugraphic).
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- <equation>s are set in a centred full-width table TABLE in HTML output, with two breaks above and one below
- the equation itself is set in a table cell TD in HTML output, in which there is an anchor whose name is the id of the equation A in HTML output
- the equation no, if specified, is set in a cell to the right of the equation. If it is not specified, an equation identifier is generated based on the <equation>'s id attribute. In both cases, the identifier is surrounded by parentheses and the whole entry is bold B in HTML output
- When there is a <no>, it isn't being output as bold at present
fax. A fax number. Can only contain character data.
- no special formatting is associated with the <fax> element type
figref. A cross-reference to a figure. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the figure(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- a link is made to the first idref specified in the idrefs attribute A in HTML output
figure. A figure. Contains an optional title. See above for general guidance on encoding graphics.
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- src: the entity which contains the graphic (see notes above on external entities)
- height: the height of the graphic, expressed as ...
- width: the width of the graphic, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
- pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
- the figure is output within a centred half-width table TABLE in HTML output as an image IMG in HTML output
- if the figure has a <title>, this is output in a separate row TR in HTML output below the image
- the heading 'Fig. N' is generated, where N is the figure number as indicated in its id attribute
- a text break instruction BR clear="all" in HTML output is output before and after the figure
fname. A person's first name. Contains 'simple text'.
- spacing within the <fname> element is preserved
- otherwise, no special treatment for <fname> elements
fnoteref. A reference to a footnote (at the end of the article, or in the footer of a table). Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on footnotes, and on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the figure(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- footnote references are rendered as small superscript anchors SMALL, SUP, A in HTML output
- the target for the anchor HREF attribute in HTML output is the value of the id attribute
- for <fnoteref>s within tables, the text of the anchor is a system-generated letter based on the id attribute
- for <fnoteref>s outside tables, the text of the anchor is a symbol, allocated in the sequence: asterisk; dagger; double dagger; section sign; paragraph sign; double vertical line
- <fnoteref>s within tables are output in red
footer. A sequence of paragraphs at the end of a news item, typically set in italic. Contains one or more ps.
- no special treatment for <footer> elements
- Shouldn't they be set in italic, as per the spec? Needs discussion
footnote. A footnote in the article, or in a table footer. See above for general guidance on encoding footnotes.
Footnotes in the article are placed at the point where the footnote reference is to appear in the rendered result. This means that fnoteref is only required for such footnotes if the same footnote is referenced more than once. In contrast, table footnotes are placed within the tfoot, and are referenced by a separate fnoteref.
Contains text or paragraphs.
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- footnotes outside tables are removed to a separate section with title 'Footnotes' H3 in HTML output, and replaced by system-generated footnote references (see fnoteref for details of how these are rendered)
- footnotes within tables are rendered in red, with a smaller font size
- all footnotes are rendered as anchors A in HTML output
- the target for the anchor HREF attribute in HTML output is the value of the id attribute
fpage. The number of the first page within an issue on which the printed version of an article appears. Can only contain character data.
- print pagination within <pubfront> is suppressed from the rendered version
- otherwise, no special treatment for <fpage> elements
fraction. A fraction. Contains a numerator (numer), followed by a denominator (denom).
- shape: takes values 'case' (an "above and below" fraction) or 'sol' (a "solidus" fraction)
- no special formatting is associated with the <fraction> element type
- Shouldn't we make some attempt to deal with the 'case' case? What can be done in HTML?
fulltext. A link to the full text of an article (e.g. in PDF). Probably not required - do not use without checking with RSC.
Contains a link element.
- no special formatting is associated with the <fulltext> element type
group. RSC internal use only
A group of people with similar roles within an Editorial Board. Contains an optional title, followed by zero or more members.
- no special formatting is associated with the <group> element type
head. A heading (e.g. for a list, index, or definition list). Contains paragraphs or text.
- no special formatting is associated with the <head> element type
- This could be replaced by further use of the existing <title> element type. Either that, or it should be supported in the style sheet!
icgraphic. A graphic to be included in an illustrated contents list entry. Empty element: has no contents. See above for general guidance on encoding graphics.
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- src: the entity which contains the graphic (see notes above on external entities)
- height: the height of the graphic, expressed as ...
- width: the width of the graphic, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
- pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks This isn't actually useful for <icgraphic>, but it 'inherits' this attribute by virtue of being a graphic element.
- no special formatting is associated with the <icgraphic> element type
- <icgraphic> and <ictext> should be suppressed.
ictext. Text describing the article, to be included in an illustrated contents list entry. Contains paragraphs or text.
- no special formatting is associated with the <ictext> element type
- Shouldn't icgraphic and ictext be suppressed, at least?
index. RSC internal use only
An [author] index. Contains an optional head, followed by zero or more index-entrys.
- no special formatting is associated with the <index> element type
index-entry. RSC internal use only
An entry in an [author] index. Contains a value, followed by one or more articlerefs.
- no special formatting is associated with the <index-entry> element type
inf. Inferior (subscript) text. Indicates that the contained text should be rendered as subscript. Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.
- rendered as small and subscript SMALL, SUB in HTML output
info. Information, e.g. about a journal. Contains a link, or one or more sections.
- type: can take the values 'author' (the default), 'illustration' or 'distribution'
- level: can take the values 'full' (the default), 'brief' or 'paragraph'
- no special formatting is associated with the <info> element type
issn. RSC internal use only
The International Standard Serial Number for a journal. Contains character data only.
- no special formatting is associated with the <issn> element type
issue. RSC internal use only
One issue of a journal. Contains a link, or the following elements in this order:
- journalref (optional)
- volumeref (optional)
- issueno
- issueid (optional)
- issue-front (optional)
- article (optional and repeatable)
- issue-back (optional)
- id: a mandatory unique identifier for this element
- dtd: a FIXED attribute which specifies which version of the DTD was in use when this XML document was created. There is no need to enter a value for this attribute (and any value other than 'RSCPAx.y' for version x.y of the DTD will render the whole article invalid)
- type: the type of issue
- suppressed within pubfront
- otherwise, no special formatting is associated with the <issue> element type
issue-back. RSC internal use only
The back matter for an issue. Contains any number of any of the following, in any order:
- board
- issue-toc
- index
- advert
- info
- confgrp
- no special formatting is associated with the <issue-back> element type
issue-front. RSC internal use only
The front matter for an issue. Contains any number of any of the following, in any order:
- board
- issue-toc
- index
- advert
- info
- confgrp
- no special formatting is associated with the <issue-front> element type
issue-toc. RSC internal use only
The table of contents for an issue. Contains an optional toc-head, followed by zero or more toc-entry elements.
- no special formatting is associated with the <issue-toc> element type
issueid. RSC internal use only
An identifier (other than the issue number) for an issue of a journal. Can only contain character data.
- no special formatting is associated with the <issueid> element type
issueno. The issue number within a volume. Can only contain character data. When used within the issue element, this should be a 3-digit number with leading zeroes. Still true?
To be added by data capture agency Still true?
- no special formatting is associated with the <issueno> element type
issueref. A reference to [a document describing] one issue of a journal. See above for general guidance on creating cross-references.
Contains a link, or these elements in the following order:
- journalref (optional)
- volumeref (optional)
- issueno
- issueid (optional)
- issue-front (optional)
- article (optional and repeatable)
- issue-back (optional)
- id: a unique identifier for this element
- links within <issueref> are suppressed
- otherwise, no special formatting is associated with the <issueref> element type
it. Indicates that the contained text should be rendered as italic. Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.
- rendered as italic I in HTML output
item. An item within a list. See above for general guidance on encoding lists.
Contains paragraphs or 'simple text'.
- rendered as a list item LI in HTML output
jnltrans. A translation of a simple journal citation (journalcit). Also used for Chem. Abstracts references, with the abstract number in <fpage>.
Contains the following, in the order specified:
- sertitle (optional)
- year (optional)
- volumeno (optional)
- pages (optional and repeatable)
- no special formatting is associated with the <jnltrans> element type
journal. RSC internal use only
A description of an RSC journal. Contains a link, or these elements in the order specified:
- title (one or more)
- sercode
- byline (optional and repeatable)
- logo (optional and repeatable)
- publisher
- issn (one or more)
- coden (optional)
- board (optional and repeatable)
- info (optional and repeatable)
- advert (optional and repeatable)
- cpyrt
- volume (optional and repeatable)
- id: a unique identifier for this element
- no special formatting is associated with the <journal> element type
journalcit. A citation which follows the standard model for simple citations of journal articles. Use citation for more complex cases, and for citations to anything other than journal articles. Use citext only for text within the References section which is not a citation at all. See above for general guidance on encoding citations.
Contains these elements in the order specified:
- citauth (one or more)
- title
- year
- volumeno (optional)
- issueno (optional)
- pages
- jnltrans (optional)
- link (optional and repeatable)
- the citation is output within an anchor A in HTML output, with NAME attribute equal to its id attribute This is not required.
- a semicolon is output after all <journalcit>s except the last within its containing <citgroup>. This last <journalcit> is followed by a full stop
journalref. A reference to a document describing a journal. See above for general guidance on creating cross-references, and for a list of RSC journal codes.
It contains a link element, which should have the appropriate journal code as its value. These codes are listed below.
Contains a link, or these elements in the order specified:
- title (one or more)
- sercode
- byline (optional and repeatable)
- logo (optional and repeatable)
- publisher
- issn (one or more)
- coden (optional)
- board (optional and repeatable)
- info (optional and repeatable)
- advert (optional and repeatable)
- cpyrt
- volume (optional and repeatable)
- id: a unique identifier for this element
- the <journalref> within published is used to provide the journal title which appears at the head of the article
keyword. A keyword describing an article's content. Contains 'simple text'.
- <keyword>s are suppressed from the rendered article
link. A link to [part of] another document. Contains simple text.
Although the attributes within <link> provide a powerful means of expressing links, they are not yet being used. Instead, the data content within <link> is used to specify the target document. This content will be a unique identifier for the document, e.g. a journal code or an article's manuscript number.
- type: the type of link, e.g. 'DOI' for DOI cross-references
- doc: an entity reference defining the document to which the link is being made
- from: the [start of the] target within the linked document, expressed as an XPath expression
- to: (for ranges only), the end of the target within the linked document, expressed as an XPath expression
- in general, <link>s are suppressed from the rendered article. Instead, such <link>s as are required for rendering (e.g. the link to a document describing an article's journal) are resolved by a pre-rendering edit which replaces the link by the actual document to which it points
list. A list. See above for general guidance on encoding lists.
Contains an optional head, followed by one or more items.
- type: the type of list, which should take one of the following values:
- <list> is rendered as an unordered (bulleted) list UL in HTML output
- Should be extended to cope with all the allowed list types
location. A location (i.e. an address). Contains one or more of the following, in any order:
- city
- postcode
- state
- country
- addrelt
- no special formatting is associated with the <location> element type
logo. RSC internal use only
A logo. Contains a ugraphic specifying the image to be used.
- no special formatting is currently associated with the <logo> element type. Instead, the sercode is used to construct the logo's file name
lpage. The number of a printed article's last page. Contains character data only.
- <lpage> is suppressed from the rendered article
member. RSC internal use only
A member of a <group>. Contains an optional role, followed by zero or more persons.
- no special formatting is associated with the <member> element type
month. A month. Contains character data only. Months should be specified in full, e.g. "January". Since the style sheet can convert numeric months to their full form, should we be allowing, or even asking for, numeric months?
- if a numeric month is entered, it is converted to its full form, e.g. '3' becomes 'March'
ms-id. The RSC's unique identifier for an article. Contains character data only.
Conventions for formatting article identifiers are given above. To be added by data capture agency
- output at the end of the article
- also used to construct the article's DOI
- the presence of <ms-id> triggers the generation of the "Received" statement
nameelt. A component of an organisation's name. Contains 'simple text'.
- type: the type of name element
- ', ' is output after all <nameelt>s except the last in a sequence
news-article. A full article (with title and author details, and back matter such as a list of citations) found within a news section. Contains these elements, in the order specified:
- art-front
- art-body
- appmat
- art-back
- id: a unique identifier for this element. See above for guidance on assigning id's.
- type: the type of news article
- no special formatting is associated with the <news-article> element type
news-item. A relatively simple news item. For more complex material, use news-article instead. Contains these elements, in the order specified:
- title (optional)
- authgrp (optional)
- abstract (optional)
- p or paragraph-level elements (optional and repeatable)
- footer (optional)
- id: a unique identifier for this element. See above for guidance on assigning id's.
- within <news-item>, <authgrp> is output at the end, right-justified (see authgrp for details)
- multiple <news-item> elements are separated by a line-break
news-section. A container for one or more news articles or (more usually) news items, plus other formats such as advertisements and conference listings. Can contain nested <news-section>s to support e.g. a two-level structure of news sections.
Contains an optional title, followed by zero or more of the following, in any order:
- news-section
- news-article
- news-item
- book-review
- advert
- info
- confgrp
- p
- paragraph-level elements
- id: a unique identifier for this element. See above for guidance on assigning id's.
- no special formatting is associated with the <news-section> element type
no. A number or other identifier (for a table, figure, etc.). Contains character data only. See above for general guidance on numbering strategy.
- section <no>s are suppressed from normal output
- instead, the <no> element, if present, is picked up and incorporated into the section title
- a similar strategy is applied to equation <no>s, which are enclosed in parentheses and output in bold B in HTML output
no-of-pages. The number of pages in the printed version of an article. Contains character data only.
- print pagination is suppressed from the rendered version
note. A note. Contains text or paragraphs.
- no special formatting is associated with the <note> element type Should be, e.g., italic and surrounded by '[..]'.
numer. The numerator of a fraction. Contains 'simple text'.
- rendered as small superscript SMALL, SUP in HTML output
office. The RSC office responsible for managing an article. Contains character data only.
- like all art-admin subelements, this is suppressed
org. An organisation's name and address. Contains a link, or one or more orgnames followed by zero or more addresses.
- id: a unique identifier for this element
- within aff, the level-1 subelements of <org> are followed by ', '
- otherwise, no special formatting is associated with the <org> element type
- Multiple <address> elements should be separated by ' and '.
orgname. An organization's name. Contains one or more nameelts.
- within aff, each <orgname> is followed by ', '
- otherwise, no special formatting is associated with the <orgname> element type
overbar. An overbar. Indicates that a bar should be placed above all the text within this element. Contains 'simple text'.
- no special formatting is associated with the <overbar> element type. (Specifically, no means has been found to implement this feature within HTML output. Could try using a CSS text decoration instruction.)
p. A paragraph. Contains mixed content (i.e. text and subelements intermixed), including any of these elements, at any point and in any order:
- roman
- it
- bo
- bi
- scp
- sansserif
- ul
- sup
- inf
- list
- footnote
- note
- overbar
- underbar
- stack
- fraction
- warning
- unknown
- email
- url
- ugraphic
- eqntext
- figure
- scheme
- plate
- chart
- equation
- compname
- compoundref
- textref
- figref
- schemref
- plateref
- chartref
- eqnref
- boxref
- tableref
- citref
- fnoteref
- affref
- by default, <p> is rendered as a paragraph P in HTML output
- within sections at any level, the first paragraph is rendered closed up to the preceding title (with no indentation), and is followed by a line break BR clear="all" in HTML output
- within sections at any level, subsequent paragraphs are indented by an em space check, and followed by a line break BR clear="all" in HTML output
pages. The range of pages covered by a citation. Contains a fpage, optionally followed by a lpage.
- no special formatting is associated with the <pages> element type.
persname. A person's name. Contains the following, in the order specified:
- qualifier (optional)
- fname (optional)
- surname
- qualifier (optional)
- id: a unique identifier for this element
- no special formatting is associated with the <persname> element type.
person. Details about a person. Contains a link, or the following elements in the order specified:
- persname (required; repeatable)
- biography (optional)
- address (optional and repeatable)
- id: a unique identifier for this <person> element
- <person> within author is rendered as bold B in HTML output
- otherwise, no special formatting is associated with the <person> element type.
phone. A telephone number. Contains character data only.
- no special formatting is associated with the <phone> element type. Should have a prefix, e.g. 'Tel. '.
pii. A Publisher Item Identifier. Contains character data only.
- like all art-admin subelements, this is suppressed
plate. A plate. Contains an optional title. See above for general guidance on encoding graphics.
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- src: the entity which contains the graphic (see notes above on external entities)
- height: the height of the graphic, expressed as ...
- width: the width of the graphic, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
- pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
- <plate>s within biography are rendered as a left-aligned table cell TD in HTML output
- otherwise, the plate is output within a centred half-width table TABLE in HTML output
- the plate itself is rendered as an image IMG in HTML output
- if the plate has a <title>, this is output in a separate row TR in HTML output below the image; otherwise the heading 'Plate N' is generated, where N is the plate number as indicated in its id attribute
- a text break instruction is output before and after the plate BR clear="all" in HTML output
plateref. A reference to a plate. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- a link is made to the first idref specified in the idrefs attribute A in HTML output
postcode. A postcode. Contains character data only.
- the address item before a <postcode> is not followed by a comma
- otherwise, no special formatting is associated with the <postcode> element type.
pubfront. Should this be 'RSC internal use only'?
Publication front matter. Contains the following elements in the order specified:
- fpage
- lpage (optional)
- no-of-pages
- date
- the contents of <pubfront> are all suppressed by default
- published elements with type="print", and containing a <pubfront> with year="PENDING", are rendered as the phrase "Publish PENDING" in red. If the <year> is empty, they are rendered as "Publish Pending", also in red
- published elements with type="web" are rendered as a bold paragraph B, P in HTML output "Published on the Web ", followed by <pubfront><date>, formatted as described under date
- the year within <pubfront>, from the published element with type="print", is used in the copyright statement
published. A link to a document/resource in which an article has been published. Contains a citext, or the following elements in the order specified:
- journalref
- volumeref (optional)
- issueref (optional)
- pubfront (optional)
Use the analysed citation subelements to describe print publication, or <citext> to record online publication. Is that right? Web publication uses <pubfront>. RK: not sure that this is right - citext??
- type: the type of publication. Should take one of the values: "print", "HTML" or "PDF". Should it be "HTML" or "web"??
- doc: can specify a URL where the online publication is located
- from: the [start of the] target within the linked document, expressed as an XPath expression
- to: (for ranges only), the end of the target within the linked document, expressed as an XPath expression
- the contents of <published> are all suppressed by default
- <journalref><title> is selected from <published> with type="print", and used to specify the journal within a cell TD in HTML output in the header table at the start of the article. It is rendered as bold italic B, I in HTML output
- as noted under pubfront, <published> with type="print" is used to generate "Publish Pending", "Published on the Web", and copyright statements
publisher. RSC internal use only
The publisher of a journal. Contains "organisation" subelements, i.e. a link, or one or more orgnames followed by zero or more addresses. <aff> now has <address>, and <org> within it also has <address> - overkill?
- id: a unique identifier for this element
- no special formatting is associated with the <publisher> element type
pubname. A publisher name. Contains 'simple text'. This is no longer linked to anything, so should be removed from the DTD.
pubplace. The place of publication of a book, etc. Contains 'simple text'.
- no special formatting is associated with the <pubplace> element type
qualifier. A qualification to a person's name, such as a title, an honorific, or a phrase such as 'the late'. Contains 'simple text'.
- no special formatting is associated with the <qualifier> element type
received. A container for details of the date when, and place where, an article was received. Contains an optional city, followed by a date.
- placed after an article's authors, as a bold italic paragraph B, I, P in HTML output
- "Received ", followed by the city, if present, preceded by " (in " and followed by ") ", then the date
role. RSC internal use only
A role played by one or more people. Contains 'simple text'.
- no special formatting is associated with the <role> element type
roman. Indicates that the contained text should be rendered as a roman typeface. Contains 'simple text'.
Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.
- no special formatting is associated with the <roman> element type Should be rendered as normal text.
row. A row in a table or table heading. See above for general guidance on encoding tables.
Contains one or more entry elements.
- rowsep: whether there is a row separator ("0" means "no"; any other digit value means "yes")
- valign: the vertical alignment of the row ("top", "middle" or "bottom")
- rendered as a table row TR in HTML output
- the valign attribute is used when specified; otherwise the <row>'s parent's valign is used when specified; otherwise vertical alignment is set to "bottom" VALIGN attribute in HTML output Should this be some other value by default?
sansserif. Indicates that the contained text should be rendered in a sans serif typeface. Contains 'simple text'.
Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.
- no special formatting is associated with the <sansserif> element type Surely something should be done with this!
scheme. A scheme. Contains an optional title. See above for general guidance on encoding graphics.
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- src: the entity which contains the graphic (see notes above on external entities)
- height: the height of the graphic, expressed as ...
- width: the width of the graphic, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
- pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
- the scheme is output within a centred half-width table (TABLE)
- the scheme itself is rendered as an image (IMG)
- if the scheme has a <title>, this is output in a separate row (TR) below the image
- the heading 'Scheme N' is generated, where N is the scheme number as indicated in its id attribute
- a text break instruction is output before and after the scheme BR clear="all" in HTML output
schemref. A reference to a scheme. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- a link is made to the first idref specified in the idrefs attribute A in HTML output
scp. Indicates that the contained text should be rendered in small caps. Contains 'simple text'.
Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type.
- the contents of <scp> elements are converted to upper case and rendered as small type SMALL in HTML output
section. A top-level section. Contains these elements in the order specified:
- no (optional)
- title (optional)
- p or paragraph-level elements (optional and repeatable)
- deflist (optional and repeatable)
- subsect1 (optional and repeatable)
- id: a unique identifier for this element. See above for guidance on assigning id's.
- type: the type of section
- no special formatting is associated with <section>s within biography
- otherwise, <section> is rendered as a separate division DIV in HTML output
- an anchor is generated at the start of the section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the section's position within the article A; NAME attribute in HTML output
sercode. RSC internal use only
A serial (journal) code, conforming to the list of codes given above. Contains character data only.
To be added by data capture agency
- the value of sercode is used to locate the correct journal details when preparing the article for rendering
- <sercode> is suppressed by default
- the value of <sercode> is used to specify the pathname for associated image files, and to retrieve the correct journal logo
sertitle. A serial (journal) title. Contains 'simple text' or paragraphs.
- type: the type of series title
- within citation and journalcit, <sertitle> is rendered as italic I in HTML output
- within journalcit ", " is output after all but the last <sertitle>
- otherwise, no special formatting is associated with <sertitle>
the DTD now only has <sertitle> within <jnltrans>. Elsewhere it has become <title>. The style sheet needs updating to take account of this (the code described here will never be called upon), and <sertitle> should probably be removed from the DTD and replaced by <title> within <jnltrans>.
sici. A Serial Item Contribution Identifier. Contains character data only.
- like all art-admin subelements, this is suppressed
stack. One or more characters appearing directly above other characters (like a fraction without the horizontal line). Contains above followed by below.
- below is output as subscript SUB in HTML output, followed by above as superscript SUP in HTML output Are these output in the wrong order?
state. A geopolitical unit such as a state, county, etc. Contains character data only.
- no special formatting is associated with <state>
subject. A broad subject heading, ideally taken from a controlled list. Contains 'simple text'.
- type: the type of subject category
- no special formatting is associated with <subject> This element type should be suppressed
subsect1. A level-1 subsection. Contains these elements in the order specified:
- no (optional)
- title (optional)
- p or paragraph-level elements (optional and repeatable)
- deflist (optional and repeatable)
- subsect2 (optional and repeatable)
- id: a unique identifier for this element. See above for guidance on assigning id's.
- type: the type of section
- <subsect1> is rendered as a separate division DIV in HTML output
- an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output
subsect2. A level-2 subsection. Contains these elements in the order specified:
- no (optional)
- title (optional)
- p or paragraph-level elements (optional and repeatable)
- deflist (optional and repeatable)
- subsect3 (optional and repeatable)
- id: a unique identifier for this element. See above for guidance on assigning id's.
- type: the type of section
- <subsect2> is rendered as a separate division DIV in HTML output
- an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output
subsect3. A level-3 subsection. Contains these elements in the order specified:
- no (optional)
- title (optional)
- p or paragraph-level elements (optional and repeatable)
- deflist (optional and repeatable)
- subsect4 (optional and repeatable)
- id: a unique identifier for this element. See above for guidance on assigning id's.
- type: the type of section
- <subsect3> is rendered as a separate division DIV in HTML output
- an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output
subsect4. A level-4 subsection. Contains these elements in the order specified:
- no (optional)
- title (optional)
- p or paragraph-level elements (optional and repeatable)
- deflist (optional and repeatable)
- subsect5 (optional and repeatable)
- id: a unique identifier for this element. See above for guidance on assigning id's.
- type: the type of section
- <subsect4> is rendered as a separate division DIV in HTML output
- an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output
subsect5. A level-5 subsection. Contains these elements in the order specified:
- no (optional)
- title (optional)
- p or paragraph-level elements (optional and repeatable)
- deflist (optional and repeatable)
- subsect6 (optional and repeatable)
- id: a unique identifier for this element. See above for guidance on assigning id's.
- type: the type of section
- <subsect5> is rendered as a separate division DIV in HTML output
- an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output
subsect6. A level-6 subsection. Contains these elements in the order specified:
- no (optional)
- title (optional)
- p or paragraph-level elements (optional and repeatable)
- deflist (optional and repeatable)
- id: a unique identifier for this element. See above for guidance on assigning id's.
- type: the type of section
- <subsect6> is rendered as a separate division DIV in HTML output
- an anchor is generated at the start of the sub-section, with a name based on the id attribute if specified; otherwise a unique name is generated, based on the sub-section's position within the article A; NAME attribute in HTML output
subtitle. A [table] subtitle. Contains 'simple text' or paragraphs.
- no special formatting is associated with <subtitle>
sup. Indicates that the contained text should be rendered in superscript. Contains 'simple text'.
Only use this element when it is not possible to deduce why the text is rendered in this way. If possible, always use a more meaningful element type. <sup> is often mistakenly used instead of <citref>.
- the contents of <sup> elements are rendered as superscript SUP in HTML output
suppinf. Contains a link to supplementary information for an article.
surname. A surname. Contains 'simple text'.
- no special treatment for <surname> elements
table. A table, encoded using CALS-compliant XML markup. See above for general guidance on encoding tables.
(Tables which cannot be thus encoded should be prepared as images, and encoded as ugraphics.)
Contains an optional title, followed by an optional subtitle, followed by one or more tgroups. Note that <title> and <subtitle> within table-entry should be used in preference to these elements, since this allows titles for XML-encoded and 'image' tables to be treated consistently. Although, as the DTD notes, we can't clear %titles;, we could set parameter entity %tbl.tbl-titles.mdl to "" and so remove this possibility.
- pgwide: page width ("0" means "no"; any other digit value means "yes")
- a rule HR in HTML output is output before each <table>
- spacing from the source document is preserved within <table>
- <table> is rendered as a full-width table TABLE in HTML output
- for print, a pgwide value of '0' signifies a single-column table, and '1' a page-width table
table-entry. 'cover group' for a table, whether declared inline as tableor given as a ugraphic. See above for general guidance on encoding tables.
Contains an optional title, followed by an optional subtitle, followed by either table or ugraphic.
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- a break BR in HTML output is output before and after each <table-entry>
- an anchor is output, with name equal to the id attribute A; NAME attribute in HTML output, followed by "Table " and a system-generated table number, in bold B in HTML output, followed by the contents of <table-entry>
tableref. A reference to a table. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- a link is made to the first idref specified in the idrefs attribute A in HTML output
tbody. A table's body matter (i.e. the main table, ignoring any header or footer). See above for general guidance on encoding tables.
Contains one or more rows.
- valign: the vertical alignment of the row ("top", "middle" or "bottom")
- no special treatment for <tbody> elements within tgroup
- otherwise, <tbody> is rendered as a table body TBODY in HTML output
term. A term being defined in a deflist. Contains 'simple text'.
- no special treatment for <term> elements
textref. A cross-reference to text elsewhere in the article. Contains 'emphasised text' giving a human-readable description of the cross-reference. See above for general guidance on creating cross-references.
- idrefs: one or more space-separated idref's, specifying the plate(s) to which cross-reference is being made
- presence: can take the value 'missing' or 'notmissing'
- a link is made to the first idref specified in the idrefs attribute A in HTML output
tfoot. The footer area of a table. See above for general guidance on encoding tables.
Contains zero or more colspecs, followed by one or more rows. Shouldn't <tfoot> have some CALS-style attributes?
- no special treatment for <tfoot> elements within tgroup, apart from outputting them after <tbody>
- otherwise, <tfoot> is rendered as a table footer (TFOOT)
tgroup. A table group. See above for general guidance on encoding tables.
Contains these elements, in the order specified:
- colspec (optional and repeatable)
- thead (optional)
- tfoot (optional)
- tbody
- cols: the number of columns in the table
- colsep: indicates a column separator ("0" means "no"; any other digit value means "yes")
- rowsep: indicates a row separator ("0" means "no"; any other digit value means "yes")
- align: default cell alignment. Takes one of the values "left", "right", "center", "justify" or "char"
- within tgroup, subelements are output in the order: thead, tbody, tfoot without any special formatting. (In other words, the whole table is output as a single block: headers and footers are not treated specially.)
thead. The header area of a table. See above for general guidance on encoding tables.
Contains zero or more colspecs, followed by one or more rows. Shouldn't <thead> have some CALS-style attributes?
- no special treatment for <thead> elements within tgroup
- otherwise, <thead> is rendered as a table header THEAD in HTML output
title. A title (of a figure, table, journal, etc.). Contains 'simple text' or paragraphs.
- the article title (within titlegrp) is rendered as a level-2 heading H2 in HTML output, with a rule above HR in HTML output
- all section titles are prefixed by a preceding no element at the same level, if present
- <title> within section is rendered as an a-heading
- <title> within subsect1 is rendered as a b-heading
- <title> within subsect2 is preceded by a break and an em space, and rendered as bold (i.e. a c-heading)
- <title> within subsect3 is preceded by a break and an em space, and rendered as italic (i.e. a d-heading)
- <title> within citation is rendered as italic
- <title> within journalcit is rendered as italic, and followed by ", " if it is not the last <title>
- <title> within figure, plate, scheme and chart is rendered in bold, in a left-aligned table cell TD in HTML output, with a suitable prefix (e.g. "Fig. ")
- spacing from the source document is preserved in <title>s within table-entry
- p elements within <title> do not generate any markup
- also types - type="subtitle" will have to be rendered - also whether or not paragraphs are included should be explicit. NH>type = "addition" Will additions ever be captured externally?
- otherwise, no special formatting is associated with <title>
titlegrp. A container for an article's main titles. Contains one or more titles.
- no special formatting is associated with <titlegrp>
toc-entry. An entry in a table of contents. Contains 'simple text' or paragraphs.
- no special formatting is associated with <toc-entry>
toc-head. Heading for a table of contents. Contains 'simple text'.
- no special formatting is associated with <toc-head>
trans. A translation (of a citation)
Contains mixed content, which can include the following element types as required:
- citauth
- title
- year
- volumeno
- issueno
- arttitle
- biblscope
- editor
- citpub
- pubplace
- link
- url
- email
- trans
- 'emphasis' elements
- no special formatting is associated with <trans> Perhaps it should be - e.g. italic? RK to comment, please.
ugraphic. An untitled graphic. Use this element to encode any graphical content which doesn't have a title. See above for general guidance on encoding graphics.
- id: a mandatory unique identifier for this element. See above for guidance on assigning id's.
- src: the entity which contains the graphic (see notes above on external entities)
- height: the height of the graphic, expressed as ...
- width: the width of the graphic, expressed as ... I don't think these can be defined at capture. I really don't know what the best units would be though - absolutes or pixels NH>What is easiest to capture?
- pos: 'float' for floating graphics: otherwise 'fixed. 'float' should be used for graphics marked as "A" blocks, while 'fixed' should be used for "B" blocks
- display: how the graphic is to be displayed. Takes value "displayed" or "inline"
- if the graphic does not have display="inline", and does not appear within an equation (or a table - it isn't possible to have tables nested inside tables), it is output within a centred half-width table, and a text break instruction is output before and after the graphic BR clear="all" in HTML output
- graphics with display="inline", and graphics within equations, are output with no additional markup
- the graphic itself is rendered as an image IMG in HTML output
ul. Indicates that the contained text should be underlined. Contains 'simple text'.
- no special formatting is associated with the <ul> element type. We could implement this as a CSS style - text decoration - but this wouldn't be totally cross-platform
underbar. An underbar. Indicates that a bar should be placed below all the text within this element. Contains 'simple text'. In what way is this different from <ul>?
- no special formatting is associated with the <underbar> element type. Should aim to implement this as an underline - CSS text decoration again?)
unknown. A feature in the text which cannot be encoded by any other element type in the DTD. Use the type attribute to indicate the nature of the feature. Do we need to generate some warning when this element is used?
Contains 'simple text'.
- type: the type of 'unknown' information
- rendered in a fixed-width font KBD in HTML output, with a double break above and below BR in HTML output
- spacing from the source document is preserved in the rendered result
url. A URL. Contains character data only.
- id: a unique identifier for this element Probably not needed: drop from next version of DTD?
- rendered as an anchor, with a target equal to the element's data content A; HREF attribute in HTML output
value. RSC internal use only
The value of an index entry. Contains character data only.
- no special formatting is associated with the <value> element type.
volume. RSC internal use only
One volume of a journal. Contains a link, or the following elements in this order:
- journalref
- volumeno
- date
- issue (optional and repeatable)
- id: a mandatory unique identifier for this element
- no special formatting is associated with the <volume> element type.
volumeno. A journal volume number. Contains character data only.
When used within the <volume> element, this should be a 3-digit number with leading zeroes Still true?
- non-empty <volumeno> elements within journalcit and citation are rendered as bold B in HTML output, and followed by ", " if they are not the last component of the citation
- otherwise, no special formatting is associated with the <volumeno> element type.
volumeref. A reference to one volume of a journal. See above for general guidance on creating cross-references.
Contains a link, or the following elements in this order:
- journalref (optional)
- volumeno
- date (optional)
- issue (optional and repeatable)
- no special formatting is associated with the <volumeref> element type.
warning. A warning. Contains 'simple text'.
- no special formatting is associated with the <warning> element type. Should be red text.
who. The identity of the person making an editorial note (editnote). Contains 'simple text'. Wouldn't it make more sense to have <person> in place of this element type - replace by <person> in next version of DTD.
- no special formatting is associated with the <who> element type.
year. A 4-digit year. Contains character data only. The value "PENDING" is allowed for date within pubfront.
- non-empty <year> elements within journalcit are followed by ", " if they are not the last component of the citation This also used to apply to <citation>, but no longer does.
- published elements with type="print", and containing a <pubfront> with year="PENDING", are rendered as the phrase "Publish PENDING" in red. If the <year> is empty, they are rendered as "Publish Pending", also in red
- the <year> within pubfront, from the published element with type="print", is used in the copyright statement
- otherwise, no special formatting is associated with the <who> element type.
Appendix B. Notations
Table
-notations
Notations recognized within the RSC application
Name
|
PUBLIC identifier where known
|
bmp |
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Microsoft Windows bitmap//EN" |
cgm |
"-//USA-DCD//NOTATION Computer Graphics Metafile//EN" |
cgm-binary |
"ISO 8632/3//NOTATION Binary encoding//EN" |
cgm-char |
"ISO 8632/2//NOTATION Character encoding//EN" |
cgm-clear |
"ISO 8632/4//NOTATION Clear text encoding//EN" |
eps |
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Adobe Systems Encapulated PostScript//EN" |
fax |
"-//USA-DOD//NOTATION CCITT Group 4 Facsimile Type 1 Untiled Raster//EN" |
gif |
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Compuserve Graphic Interchange Format//EN" |
iges |
"-//USA-DOD//NOTATION (ASME/ANSI Y14.26M-1987) Initial Graphics Exchange Specification//EN" |
jpeg |
"ISO/IEC 10918:1993//NOTATION Digital Compression and Coding of Continuous-tone Still Images (JPEG)//EN" |
mpeg1aud |
"ISO/IEC 11172-3:1993//NOTATION Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 3: Audio//EN" |
mpeg1vid |
"ISO/IEC 11172-2:1993//NOTATION Information technology - Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s - Part 2: Video//EN" |
mpeg2aud |
"ISO/IEC 13818-3:1995//NOTATION Coding of moving pictures and associated audio: Part 3. Audio//EN" |
mpeg2vid |
"ISO/IEC 13818-2:1995//NOTATION Information technology - Coding of moving pictures and associated audio: Part 2. Video//EN" |
pcx |
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION ZSoft PCX bitmap//EN" |
pict |
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Apple Computer Quickdraw Picture//EN" |
sgml |
"+//ISO 8879:1986//NOTATION Information processing - Text and office systems - Standard Generalized Markup Language (SGML)//EN" |
tex |
"+//ISBN 0-201-13448-9::Knuth//NOTATION The TeXbook//EN" |
tiff |
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Aldus/Microsoft Tagged Interchange File Format//EN" |
wmf |
"+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Microsoft Windows Metafile//EN" |
chemdraw |
|
eqn |
|
pdf |
|
ps |
|
|
|
Appendix C. Changes to the RSC DTD
This Appendix lists the changes made to the RSC DTD from version 3.4 onwards.
Summary of changes in version 3.4
Version 3.4 of the RSC Article DTD is a maintenance release, which aims to solve problems encountered while encoding articles, and to provide the RSC with the opportunity to add improved management information to articles.
The following changes are relevant to the encoding of actual articles:
- footnotes can now be added after an author's name (within <author>, after <person>)
- the content models of a number of elements (<footnote>, <note>, <head>, <toc-entry>, <ictext>, <dedicate>, <abstract>, <dd> and <ack>) have been extended so they can contain textual subelements as well as multiple paragraphs
- all 'reference' elements (<compoundref>, <textref>, <figref>, <schemref>, <plateref>, etc.) can now contain subelements which support font style changes
- <persname> now has an optional lt;qualifier> subelement that can appear at the start or end of the name. This allows titles, qualifications, and informal phrases such as 'the late' to be encoded
- <ugraphic> now has a src2 attribute to support the addition, specifically, of TeX versions of a graphic. This new attribute should no longer be used.
- the RSC-specific entity set has been filled out with declarations for some commonly-required characters
- some simple ISO entities have been added to the allowed character entity set
- there is a new <arttitle> element for encoding article titles within citations
- within the <published> element, <volumeref>, <issueref> and <pubfront> are now optional
- the content model for <eqntext> has been changed to allow it to contain multiple paragraphs
- the element type url has been added to the class 'general', which allows it to be used anywhere within text
- the 'fixed' DTD version has been changed to '3.4'
The following changes are only relevant to RSC's internal management procedures:
- a new <admin-event> has been added within <art-admin>. This has a type attribute, and subelements <agent>, <address> and <date>. In addition, it can contain a nested<admin-event>, thus supporting complex multi-level events if required. (In future, this element might be preferred to <date> for encoding 'accepted' details.)
- journalref, volumeref and issueref now have the same content model as journal, volume and issue respectively. This allows links to be replaced by the relevant content without invalidating the document
- a price-code attribute has been added to <article>
- <journal> now has an optional repeatable lt;logo> element, containing a graphic
Summary of changes in version 3.5
The following changes in version 3.5 will affect the encoding of articles:
- the 'fixed' DTD version has been changed to '3.5'
- <authgrp> within <art-front> is now optional
- <org> within <aff> is now [optional and] repeatable; <org> and <address> are repeatable as a pair
- <ack> now has an optional title attribute
- content model for <trans> has been made the same as that for <citation>. N.B. this change is not upwards-compatible. The previous content model for trans allowed citext. This is replaced by the 'mixed content with %emph;' approach offered by %m.citation
- <email> has been added to the %gen; content model class, allowing email addresses to appear wherever this class is allowed (which is pretty well anywhere in textual content)
- <url> and <email> have been added to the %m.citation; content model class
- <url> now has a url attribute, which can be used to specify the url. If not used, the data content of the <url> element is taken to be the actual url, as before
- there is a new <a> element type, designed to support hyperlinks which use an image as the clickable link
- there is a new <subject> element type, which can contain a broad subject heading to categorise the article
- within the content model for <journalcit>, <link> has been made into an 'optional extra', so that citations can be supported by e.g. a DOI and a COI
- <link> now has a type attribute, for e.g. COIs for RSC internal use only and DOIs
- [usage convention only:] within <suppinf>, the content of the <link> element should now be 'INFO' or 'CRYSTAL'. 'INFO' corresponds to the single value that was previously allowed ('TRUE')
The following changes are only relevant to RSC's internal management procedures:
- <coden> element type added to header information
Summary of changes in version 3.6
This version contains the following changes:
- the 'fixed' DTD version has been changed to '3.6'
- the parameter entity a.dtd has been altered to RSCPA3.6, and is now actually used!
- the new element type <a> is now actually allowed within a document
- the new 'generated' set of entity declarations rsc_x.ent is used
element type no now has a content model of 'simple text' instead of just #PCDATA
the common attributes for graphics (%a.graphic;) now have a prefix attribute, which can have values 'prefix' (default) or 'noprefix'
year and pages are now optional within journalcit
additional arttitle element added between citauth and title
citext content model is now %m.simple-text-or-paras; to allow new paragraphs (or at least line breaks)
citation content model (%m.citation;) now includes pages
new element commentary added within citgroup
References
1 | | (a) http://www.oasis-open.org/html/a502.htm; (b) http://www.oasis-open.org/html/a503.htm. |
|
This journal is ©
The Royal Society of Chemistrye\
Unassigned |