Glyph Taxonomy
Using the Glyph Taxonomy
This Taxonomy is the authority file for the classification of symbols (peripheral graphemes as defined in the Transliteration Guide) for use in DHARMA digital editions. Each entry has a unique identifier or token, displayed in the header of the entry card before a colon. The entries are organised taxonomically into two "kingdoms", that of etic tokens (objectively describing what a glyph looks like) and that of emic tokens (identifying what a glyph means); a limited number of genera within each kingdom (corresponding to a basic shape in the etic kingdom and to a language in the emic kingdom), and a potentially unlimited number of species within each genus and subspecies within each species. Tokens consist of identifiers for a genus and, where applicable, species and subspecies, separated by hyphens. Each part is an uninterrupted string containing only basic (unaccented) Latin letters or Arabic numbers. The letters are normally in lower case, but camelCase may be used when you need more than one word at one level, e.g. doubleBar. A symbol token encoded in a DHARMA digital edition associates a symbol with an entry in this Taxonomy.
In addition to the identifier, each entry includes a name and a description. Names are either very concise indications of the symbol's appearance (in the case of etic tokens), or terms designating the symbol's meaning (for emic tokens). Descriptions contain more detailed, but still concise, information about the symbol, which serves a dual purpose. On the one hand, it informs the reader of a digital edition about a symbol represented in it. On the other hand, it tells the creator of a digital edition how to recognise a symbol in the original text as a member of a particular class in this ontology. Many entries also include a "mapping", which is a Unicode character that is in some way similar to the original symbol.
When an encoded symbol is rendered for display in the DHARMA environment, the mapping (the "equivalent" computer-compatible character) will be shown by preference. If mapping is not available for an entry, then the name of the entry will be displayed. In both of these cases, the display will be completed by a mouseover tooltip showing the description of the symbol.
Neither the tokens, nor the descriptions and mapping equivalents in this authority file are meant to provide an accurate representation of all possible symbols, merely a practical classification. To fulfil this purpose, the taxonomy must remain reasonably small, so that the categories do not overlap and can be objectively distinguished from one another. When encoding an edition, it is always acceptable to use a higher-level generic token rather than a lower-level one whose description does not fully match the original symbol. It is likewise acceptable to classify unique or unusual symbols as "unclassified". Details of original glyphs can always be provided in the palaeographic description of the edition.
Some entries contain alternative identifiers and/or alternative names. Alternative identifiers are for preserving compatibility with earlier digital editions. In newly created editions, always use the primary identifier. Alternative names, usually in source languages, are included for completeness' sake and in order to aid encoders in finding the identifiers for symbols that they might call by a source-language name. Alternative names will not be shown in displayed editions.
Visual illustrations are available for some of the entries, showing either a schematic illustration of what a symbol of the given class looks like, or actual instances encountered in our practice. These serve to help encoders assign real-life symbols to classes, and to help end-users visualise the symbols in a digital edition.
Etic tokens
This section contains tokens created using English or international terms referring to simple geometric and iconic shapes, for an objective description of what a symbol looks like.
Vertical bars (daṇḍas)
This subsection (genus) includes symbols that consist of, or are palaeographically derived from, a single vertical bar that is about as tall as an average character body. There are separate sections with dedicated tokens for double bars and short bars.
Distinguish from short bars and double bars.
Distinguish from double bullets.
Bullets
This subsection (genus) includes symbols that resemble typographic bullets. A bullet may be a dot, a small filled or hollow circle, a short dash or a small asterisk, as well as a small group of any such symbols.
Distinguish from clustered triple bullets.
Circles
This subsection (genus) includes circles and symbols based predominantly on the circle shape. Use these tokens for fairly large circles; small circular symbols are bullets.
Distinguish from concentric circles, which may have a dot in the centre but involve two or more circles.
Distinguish from bullseyes, which only have a dot in the centre, without additional circles.
Spoked circles may be distinguished from more iconic wheel symols.
Crosses
This subsection (genus) includes symbols that consist of, or are palaeographically derived from crossing straight lines and are fairly large in size. Small crosses are bullets. Svastika (swastika) symbols are in the emic section as svastika and variants.
Dashes
This subsection (genus) includes symbols that consist of, or are palaeographically derived from, horizontal strokes.
Double vertical bars (daṇḍas)
This subsection (genus) includes symbols that consist of, or are palaeographically derived from, tall double vertical bars. There are separate sections with dedicated tokens for single bars and short bars.
Florets
This subsection (genus) includes symbols that represent stylised flowers, like typographic florets or fleurons, involving any combination of some or all of the following elements: a central circle or dot, a number of petals, and a number of radial lines.
Flourishes
This subsection (genus) includes curlicues and abstract ornamental patterns consisting of one or more lines which are often curved. Symbols identified by the emic name gomūtra should be classified as tapering flourishes.
Short vertical bars
This subsection (genus) includes symbols that consist of, or are palaeographically derived from, short vertical bars. Double short bars are included in the present section. There are separate sections with dedicated tokens for full-length single bars and double bars.
Spirals
This subsection (genus) includes symbols which are, or are based on spirals. Spirals may appear in any orientation, may have any number of full turns from less then one to several, and may have an extended tail.
Miscellaneous glyphs
This subsection (genus) includes symbols which cannot be described in simple geometric terms but represent an object iconically or may be suitably described in iconic terms. All tokens in this section must begin with "x-".
Wheels may be distinguished from less iconic spoked circles.
Emic and ideographic tokens
This section contains tokens for symbols that can be readily identified by an emic designation in a source language, and for ideograms which we identify by a fully or partially English name.
Symbols with Balinese names
Symbols with Sanskrit names
Symbols with Tamil names
Editing the Glyph Taxonomy
Contributors who encode text for DHARMA can and should edit this authority file, but only when there is good reason to do so. Major edits (new entries and substantive changes to existing entries) must be logged in the <revisionDesc> by creating a new <change> elements above the existing ones and recording the date, your DHARMA ID and the nature of your edit. Minor changes to existing content may be noted next to the changed part in an XML comment or, when superficial (e.g. correcting typos), be silent.
Adding new details to existing entries
New identifiers should not be added to an existing entry. Entries with more than one identifier are for legacy purposes
Feel free to add further source-language names to existing entries. Doing so was not a priority when this Taxonomy was created, but if you have the inclination, go ahead. Names are listed as pairs like <label>name</label><item>WHATEVER</item>, where WHATEVER is the name of the symbol. To add a new name, insert such a pair below the last pre-existing name. Enter the desired name. Add @xml:lang to the <item> element, with the three-letter ISO tag for the language of the name, e.g. <item xml:lang="san">daṇḍa</item>. Preferably also add an XML comment with the date and your DHARMA ID or name to note that you've added this name.
Modifying existing content
Content already present in the authority file should normally be edited only by a project PI, Michaël or Dan. Content that was added by others may be edited as needed by the person who added it. In all other cases, consider very carefully if the edit is necessary. If it is, be very sure that it is consistent with the rest of the file and will not lead to unintended consequences.
Adding new entries
New subspecies and species may be created as needed. New genera should only be created (as and when needed) in the emic/ideographic section. Etic tokens that do not fit into an existing genus shall be assigned to the miscellaneous genus ("x-"). New sections will not be created in the lifetime of DHARMA, although sections for alphabetic and numeric graphemes are envisioned and may be added in the future.
Before creating a new token, make very certain that it is necessary and useful. Before creating a new symbol entry, make very certain (1) that the entry does not already exist, perhaps by a slightly different name; and (2) that the new entry is a useful addition to the Taxonomy. A new entry can only be useful if the symbol it describes can be objectively distinguished from all other symbols listed in the Taxonomy and this distinction is considered relevant to research. Just because four-petalled florets can be objectively distinguished from eight-petalled florets does not necessarily mean that the distinction is relevant. Even if the strategies of using four-versus eight-petalled florets could arguably be researched, it is not feasible to expect that this distinction will be consistently made in a corpus large enough to study such patterns, so we prefer not to count the petals on our florets.
To add a new symbol entry, decide on its token in the pattern "genus-species" or "genus-species-subspecies" (sub-subspecies may be added if absolutely necessary), using terminology similar to pre-existing tokens. Make sure that your token does not already exist. Copy and paste an existing entry (the entire <list></list> container) that is similar to your new entry. Make sure you maintain the alphabetical order of primary identifiers within each subsection. Edit the fields of the new entry to correspond to your new symbol. When creating a new subspecies for an existing species, it is advisable to keep the mapping identical to that of the higher category unless a distinctive Unicode with decent font support can be found for the new glyph.