Glyph Taxonomy

Using the Glyph Taxonomy

This Taxonomy is the authority file for the classification of symbols (peripheral graphemes as defined in the Transliteration Guide) for use in DHARMA digital editions. Each entry has a unique identifier or token, displayed in the header of the entry card before a colon. The entries are organised taxonomically into two "kingdoms", that of etic tokens (objectively describing what a glyph looks like) and that of emic tokens (identifying what a glyph means); a limited number of genera within each kingdom (corresponding to a basic shape in the etic kingdom and to a language in the emic kingdom), and a potentially unlimited number of species within each genus and subspecies within each species. Tokens consist of identifiers for a genus and, where applicable, species and subspecies, separated by hyphens. Each part is an uninterrupted string containing only basic (unaccented) Latin letters or Arabic numbers. The letters are normally in lower case, but camelCase may be used when you need more than one word at one level, e.g. doubleBar. A symbol token encoded in a DHARMA digital edition associates a symbol with an entry in this Taxonomy.

In addition to the identifier, each entry includes a name and a description. Names are either very concise indications of the symbol's appearance (in the case of etic tokens), or terms designating the symbol's meaning (for emic tokens). Descriptions contain more detailed, but still concise, information about the symbol, which serves a dual purpose. On the one hand, it informs the reader of a digital edition about a symbol represented in it. On the other hand, it tells the creator of a digital edition how to recognise a symbol in the original text as a member of a particular class in this ontology. Many entries also include a "mapping", which is a Unicode character that is in some way similar to the original symbol.

When an encoded symbol is rendered for display in the DHARMA environment, the mapping (the "equivalent" computer-compatible character) will be shown by preference. If mapping is not available for an entry, then the name of the entry will be displayed. In both of these cases, the display will be completed by a mouseover tooltip showing the description of the symbol.

Neither the tokens, nor the descriptions and mapping equivalents in this authority file are meant to provide an accurate representation of all possible symbols, merely a practical classification. To fulfil this purpose, the taxonomy must remain reasonably small, so that the categories do not overlap and can be objectively distinguished from one another. When encoding an edition, it is always acceptable to use a higher-level generic token rather than a lower-level one whose description does not fully match the original symbol. It is likewise acceptable to classify unique or unusual symbols as "unclassified". Details of original glyphs can always be provided in the palaeographic description of the edition.

Some entries contain alternative identifiers and/or alternative names. Alternative identifiers are for preserving compatibility with earlier digital editions. In newly created editions, always use the primary identifier. Alternative names, usually in source languages, are included for completeness' sake and in order to aid encoders in finding the identifiers for symbols that they might call by a source-language name. Alternative names will not be shown in displayed editions.

Visual illustrations are available for some of the entries, showing either a schematic illustration of what a symbol of the given class looks like, or actual instances encountered in our practice. These serve to help encoders assign real-life symbols to classes, and to help end-users visualise the symbols in a digital edition.

Etic tokens

This section contains tokens created using English or international terms referring to simple geometric and iconic shapes, for an objective description of what a symbol looks like.

Vertical bars (daṇḍas)

This subsection (genus) includes symbols that consist of, or are palaeographically derived from, a single vertical bar that is about as tall as an average character body. There are separate sections with dedicated tokens for double bars and short bars.

bar: bar
Description
a tall, single vertical bar
Alternative Name
daṇḍa [Sanskrit]
Mapping
|
Alternative Identifiers
danda; dandaPlain

Distinguish from short bars and double bars.

bar-broken: broken bar
Description
a tall, single vertical bar with a gap in the middle
Mapping
¦
Alternative Identifier
dandaGap

Distinguish from double bullets.

bar-crossed: crossed bar
Description
a tall, single vertical bar crossed by a shorter, predominantly horizontal line
Mapping
Alternative Identifier
dandaCross

bar-hooked: hooked bar
Description
a tall, single vertical bar with a hook or bend on top
Mapping
Alternative Identifier
dandaHooked

bar-curly: curly bar
Description
a curly and/or notched vertical line that may resemble a figure 3 or a { curly bracket
Mapping
|
Alternative Identifiers
dandaNotch; squiggleVertical

bar-ornate: ornate bar
Description
a tall, single vertical bar with unspecified complex ornamentation
Mapping
/
Alternative Identifier
dandaOrnate

bar-serifed: serifed bar
Description
a tall, single vertical bar with a headmark or small horizontal line on top
Mapping
|
Alternative Identifier
dandaSerif

bar-strokeLeft: bar with stroke on left
Description
a tall, single vertical bar with a shorter, predominantly horizontal line attached on the left
Mapping
Alternative Identifier
dandaStrokeLeft

Bullets

This subsection (genus) includes symbols that resemble typographic bullets. A bullet may be a dot, a small filled or hollow circle, a short dash or a small asterisk, as well as a small group of any such symbols.

bullet: bullet
Description
a small symbol resembling a typographic bullet
Alternative Name
dot
Mapping

bullet-double: double bullet
Description
a pair of dots or dotlike symbols one below the other
Mapping
Alternative Identifier
dotDouble

bullet-double-hollow: hollow double bullet
Description
a pair of small hollow circles one below the other
Mapping
Alternative Identifier
circleDouble

bullet-high: high bullet
Description
a dot or dotlike symbol positioned high in the line
Mapping
Alternative Identifier
dotHigh

bullet-high-hollow: hollow high bullet
Description
a small hollow circle positioned high in the line
Mapping
Alternative Identifier
circleHigh

bullet-hollow: hollow bullet
Description
a small hollow circle resembling a typographic bullet
Mapping
Alternative Identifier
circleSmall

bullet-low: low bullet
Description
a dot or dotlike symbol positioned low in the line
Mapping
Alternative Identifier
dotLow

bullet-low-hollow: hollow low bullet
Description
a small hollow circle positioned low in the line
Mapping
Alternative Identifier
circleLow

bullet-middle: middle bullet
Description
a small symbol resembling a typographic bullet, positioned at middle height in the line
Mapping
Alternative Identifiers
dotMid; dotMiddle

bullet-middle-hollow: hollow middle bullet
Description
a small hollow circle positioned at middle height in the line
Mapping
Alternative Identifiers
circleMed; circleMid

bullet-triple: triple bullet
Description
three dots or dotlike symbols in a vertical row
Mapping
Alternative Identifier
dotTriple

Distinguish from clustered triple bullets.

bullet-triple-clustered: clustered triple bullet
Description
three dots or dotlike symbols in a triangular cluster
Mapping
Alternative Identifiers
bulletTriangle; dotTriangle

Circles

This subsection (genus) includes circles and symbols based predominantly on the circle shape. Use these tokens for fairly large circles; small circular symbols are bullets.

circle: circle
Description
a fairly large circle or roughly circular symbol
Mapping

circle-bullseye: circle with a dot in the centre
Description
a single fairly large circle with a dot or filled circle in the centre like a bullseye
Mapping
⦿
Alternative Identifiers
circleTarget; encircled-dot; eye

Distinguish from concentric circles, which may have a dot in the centre but involve two or more circles.

circle-bullseyeCross: circle with an X in the centre
Description
a fairly large circle with an x, + or asterisk in the centre
Mapping
Alternative Identifier
circle-circleCross

circle-concentric: concentric circle
Description
a target-like symbol consisting of a circle containing one or more smaller circles with or without a dot in the centre
Mapping
Alternative Identifiers
circle-in-circle; circleConcentric; encircled-circle

Distinguish from bullseyes, which only have a dot in the centre, without additional circles.

circle-horned: horned circle
Description
a circle topped by two or more curved strokes like horns
Mapping
Alternative Identifier
circleHorned

circle-large: large circle
Description
a conspicuously large circle, bigger than a typical character body
Mapping
Alternative Identifier
circleLarge

circle-lined: lined circle
Description
a circle with one or more non-crossing lines inside; the lines may be straight or curved and may resemble diameters or chords
Mapping
Alternative Identifiers
circleCurve; tennisBall

circle-ornate: ornate circle
Description
a circle with unspecified complex ornamentation
Mapping
Alternative Identifier
circleOrnate

circle-oval-concentric: concentric oval
Description
a horizontal oval containing one or more concentric smaller ovals and/or a dot in the centre like an eye
Mapping
Alternative Identifier
circleConcentricOval

circle-spoked: circle with spokes
Description
a circle with two or more crossing diameters, or three or more radii
Mapping

Spoked circles may be distinguished from more iconic wheel symols.

Crosses

This subsection (genus) includes symbols that consist of, or are palaeographically derived from crossing straight lines and are fairly large in size. Small crosses are bullets. Svastika (swastika) symbols are in the emic section as svastika and variants.

cross: cross
Description
two fairly long and largely straight lines crossing approximately at right angles, in any orientation
Mapping

cross-tilted: tilted cross
Description
a cross with the arms at a tilted angle, resembling an X, a saltire or St. Andrew's cross
Mapping
Alternative Identifier
crossX

cross-upright: upright cross
Description
a cross with the arms close to vertical and horizontal, resembling a + symbol
Mapping
Alternative Identifier
crossPlus

Dashes

This subsection (genus) includes symbols that consist of, or are palaeographically derived from, horizontal strokes.

dash: dash
Description
a predominantly horizontal line
Mapping
Alternative Identifier
dashPlain

crescentMid: concave dash
Description
an arched horizontal line with the middle bulging downward like a horizontal crescent or a breve
Mapping
Alternative Identifiers
dash-concave; dashConcave

dash-convex: convex dash
Description
an arched horizontal line with the middle bulging upward like an inverted breve
Mapping
Alternative Identifier
dashConvex

dash-convex-notched: notched convex dash
Description
an arched horizontal line with the middle bulging upward and a notch in the centre, like a figure 3 rotated 90 degrees to the left
Mapping
Alternative Identifier
dashDoubleconvex

dash-double: double dash
Description
a parallel pair of predominantly horizontal lines
Mapping
Alternative Identifiers
dashDouble; equalSign

dash-hooked: hooked dash
Description
a predominantly horizontal line with a hook at one end
Mapping
Alternative Identifiers
dashHook; dashHooked

dash-long: long dash
Description
a conspicuously large horizontal line
Mapping
Alternative Identifier
dashLong

dash-obelus: obelus dash
Description
a predominantly horizontal line with dots above and below, like a division symbol or a horizontal obelus
Mapping
÷
Alternative Identifier
dashDoubledot

dash-triple: triple dash
Description
a parallel triplet of predominantly horizontal lines
Mapping
Alternative Identifier
dashTriple

dash-wavy: wavy dash
Description
a sinuous horizontal line like a tilde
Mapping
Alternative Identifier
dashWavy

Double vertical bars (daṇḍas)

This subsection (genus) includes symbols that consist of, or are palaeographically derived from, tall double vertical bars. There are separate sections with dedicated tokens for single bars and short bars.

ddanda: double bar
Description
a parallel pair of tall vertical bars
Mapping
Alternative Identifiers
ddandaPlain; doubleBar

ddandaCross: crossed double bar
Description
a parallel pair of tall vertical bars crossed by a shorter, predominantly horizontal line
Mapping
Alternative Identifiers
ddandaCrossed; doubleBar-crossed

ddandaHook: hooked double bar
Description
a parallel pair of tall vertical bars with a hook or bend at the top of one or both
Mapping
Alternative Identifiers
ddandaHooked; doubleBar-hooked

ddandaOrnate: ornate double bar
Description
a parallel pair of tall vertical bars with unspecified complex ornamentation
Mapping
Alternative Identifier
doubleBar-ornate

ddandaSerif: serifed double bar
Description
a parallel pair of tall vertical bars with a headmark or small horizontal line on top
Mapping
Alternative Identifier
doubleBar-serifed

ddanda-StrokeLeft: double bar with stroke on left
Description
a parallel pair of tall vertical bars with a shorter, predominantly horizontal line attached on the left
Mapping
Alternative Identifier
doubleBar-strokeLeft

ddanda-StrokeRight: double bar with stroke on right
Description
a parallel pair of tall vertical bars with a shorter, predominantly horizontal line attached on the right
Mapping
Alternative Identifier
doubleBar-strokeRight

ddandaTailRight: double bar with tail on right
Description
a parallel pair of tall vertical bars with an appendix near the lower end on the right
Mapping
Alternative Identifier
doubleBar-tailRight

Florets

This subsection (genus) includes symbols that represent stylised flowers, like typographic florets or fleurons, involving any combination of some or all of the following elements: a central circle or dot, a number of petals, and a number of radial lines.

fleuron: floret
Description
a stylised flower
Mapping
Alternative Identifiers
floret; floretQuatrefoil; flower; simpleFinial

complexFinial: complex floret
Description
a stylised flower with complex detail and/or ornamentation
Mapping
Alternative Identifiers
floret-complex; floretComplex

floret-petal: petal

Flourishes

This subsection (genus) includes curlicues and abstract ornamental patterns consisting of one or more lines which are often curved. Symbols identified by the emic name gomūtra should be classified as tapering flourishes.

flourish: flourish
Description
an abstract ornamental pattern
Mapping
Alternative Identifiers
ornament; scroll

flourish-taperBidirectional: flourish tapering both ways
Description
an ornamental curlicue or series of bars tapering out in both directions from the centre
Mapping
Alternative Identifier
gomutraDouble

flourish-taperIn: flourish tapering in
Description
an ornamental curlicue tapering to the left, an initial gomūtra symbol
Mapping
Alternative Identifier
gomutraInitial

flourish-taperOut: flourish tapering out
Description
an ornamental curlicue tapering to the right, a final gomūtra symbol
Mapping
Alternative Identifier
gomutraFinal

flourish-taperOut-bars: bars tapering out
Description
an ornamental curlicue consisting of a series of vertical bars tapering to the right, a final gomūtra symbol
Mapping
Alternative Identifier
gomutraFinalBars

flourish-taperOut-complex: complex flourish tapering out
Description
an ornamental curlicue tapering to the right, a complex final gomūtra symbol
Mapping
Alternative Identifier
gomutraFinalComplex

Short vertical bars

This subsection (genus) includes symbols that consist of, or are palaeographically derived from, short vertical bars. Double short bars are included in the present section. There are separate sections with dedicated tokens for full-length single bars and double bars.

comma: short bar
Description
a short, predominantly vertical bar that may be straight or curved
Mapping
Alternative Identifiers
commaSmall; dotCommaMid; shortBar

ddandaSmall: double short bar
Description
a parallel pair of short, predominantly vertical bars that may be straight or curved
Mapping
Alternative Identifier
shortBar-double

commaHigh: high short bar
Description
a short, predominantly vertical bar that may be straight or curved, positioned high in the line
Mapping
Alternative Identifier
shortBar-high

Spirals

This subsection (genus) includes symbols which are, or are based on spirals. Spirals may appear in any orientation, may have any number of full turns from less then one to several, and may have an extended tail.

spiral: spiral
Description
a spiral symbol
Mapping

spiral-left: counterclockwise spiral
Description
a spiral that turns counterclockwise, i.e. to the left when proceeding from the centre
Mapping
Alternative Identifier
spiralL

spiral-right: clockwise spiral
Description
a spiral that turns clockwise, i.e. to the right when proceeding from the centre
Mapping
Alternative Identifier
spiralR

spiral-spoked: spoked spiral
Description
a spiral symbol with spokes
Mapping

x-beast: beast
Description
iconic symbol depicting the face or body of a real or mythical beast, monster
Mapping

Miscellaneous glyphs

This subsection (genus) includes symbols which cannot be described in simple geometric terms but represent an object iconically or may be suitably described in iconic terms. All tokens in this section must begin with "x-".

x-flag: flag
Description
iconic symbol depicting a flag
Mapping

x-linga: liṅga
Description
iconic symbol depicting a liṅga

x-wheel: wheel
Description
iconic symbol depicting a wheel
Mapping

Wheels may be distinguished from less iconic spoked circles.

Emic and ideographic tokens

This section contains tokens for symbols that can be readily identified by an emic designation in a source language, and for ideograms which we identify by a fully or partially English name.

Symbols with Balinese names

ban-XXX: XXX
Description
XXX
Mapping
XXX

Symbols with Sanskrit names

san-nandyavarta: nandyāvarta [Sanskrit]
Description
the symbol called nandyāvarta

san-om: oṁ [Sanskrit]
Description
a symbol representing oṁ
Mapping

san-siddham: siddham [Sanskrit]
Description
a symbol representing siddham

san-srivatsa: śrīvatsa [Sanskrit]
Description
the symbol called śrīvatsa

san-svastika: svastika [Sanskrit]
Description
a svastika symbol
Mapping

san-svastika-left: counterclockwise svastika [English]
Description
a svastika symbol with the arms counterclockwise
Alternative Name
sauvastika [Sanskrit]
Mapping

san-svastika-right: clockwise svastika
Description
a svastika symbol with the arms clockwise
Alternative Name
svastika [Sanskrit]
Mapping

san-triratna: triratna [Sanskrit]
Description
the symbol called triratna or nandipada

Symbols with Tamil names

tam-aka: āka [Tamil]
Mapping
𑿭

tam-alakku: āḻākku [Tamil]
Mapping
𑿗

tam-cevitu: ceviṭu [Tamil]
Mapping
𑿖

tam-kacu: kācu [Tamil]
Mapping
𑿝

tam-kalam: kalam [Tamil]

tam-kalancu: kaḻañcu [Tamil]

tam-kani: kāṇi [Tamil]
Description
A symbol for kāṇi not used as the fraction 1/80.
Mapping
𑿂

tam-kannaru: kaṇṇāṟu [Tamil]

tam-kil: kīḻ [Tamil]
Mapping
𑿔

tam-kuli: kuḻi [Tamil]
Mapping
𑿢

tam-kuruni: kuṟuṇi [Tamil]
Mapping
𑿚

tam-ma: mā [Tamil]
Description
A symbol for mā not used as the fraction 1/20.
Mapping
𑿈

tam-matam: mātam [Tamil]
Mapping

tam-merpati: mēṟpaṭi [Tamil]
Mapping

tam-mutal: mutal [Tamil]
Mapping
𑿯

tam-nal: nāḷ [Tamil]

tam-nali: nāḻi [Tamil]

tam-nancey: naṉcey [Tamil]
Mapping
𑿤

tam-nayaka: nāyaka [Tamil]
Description
A symbol representing nāyaka. Abbreviated form of ṉāyaka, ṉāyakka.

tam-nel: nel [Tamil]
Mapping
𑿕

tam-nilam: nilam [Tamil]
Mapping
𑿦

tam-panam: paṇam [Tamil]
Mapping
𑿞

tam-panavitai: paṇaviṭai [Tamil]

tam-param: pāram [Tamil]
Mapping
𑿡

tam-patakku: patakku [Tamil]
Mapping
𑿛

tam-pillai: piḷḷai [Tamil]

tam-pillaiyarCuli: piḷḷaiyār cūḻi [Tamil]
Mapping

tam-pon: poṉ [Tamil]
Mapping
𑿟

tam-puncey: puṉcey [Tamil]
Mapping
𑿥

tam-rupay: rūpāy [Tamil]
Mapping

tam-teti: tēti [Tamil]
Mapping

tam-tuni: tūṇi [Tamil]

tam-ulakku: uḻakku [Tamil]
Mapping
𑿘

tam-uppalam: uppaḷam [Tamil]
Description
A symbol representing uppaḷam, ‘salt-pan’. See SII 7.485.

tam-varakan: varākaṉ [Tamil]
Mapping
𑿠

tam-varusam: varuṣam [Tamil]
Mapping

tam-veli: vēli [Tamil]
Mapping
𑿣

tam-yantu: yāṇṭu [Tamil]

Editing the Glyph Taxonomy

Contributors who encode text for DHARMA can and should edit this authority file, but only when there is good reason to do so. Major edits (new entries and substantive changes to existing entries) must be logged in the <revisionDesc> by creating a new <change> elements above the existing ones and recording the date, your DHARMA ID and the nature of your edit. Minor changes to existing content may be noted next to the changed part in an XML comment or, when superficial (e.g. correcting typos), be silent.

Adding new details to existing entries

New identifiers should not be added to an existing entry. Entries with more than one identifier are for legacy purposes

Feel free to add further source-language names to existing entries. Doing so was not a priority when this Taxonomy was created, but if you have the inclination, go ahead. Names are listed as pairs like <label>name</label><item>WHATEVER</item>, where WHATEVER is the name of the symbol. To add a new name, insert such a pair below the last pre-existing name. Enter the desired name. Add @xml:lang to the <item> element, with the three-letter ISO tag for the language of the name, e.g. <item xml:lang="san">daṇḍa</item>. Preferably also add an XML comment with the date and your DHARMA ID or name to note that you've added this name.

Modifying existing content

Content already present in the authority file should normally be edited only by a project PI, Michaël or Dan. Content that was added by others may be edited as needed by the person who added it. In all other cases, consider very carefully if the edit is necessary. If it is, be very sure that it is consistent with the rest of the file and will not lead to unintended consequences.

Adding new entries

New subspecies and species may be created as needed. New genera should only be created (as and when needed) in the emic/ideographic section. Etic tokens that do not fit into an existing genus shall be assigned to the miscellaneous genus ("x-"). New sections will not be created in the lifetime of DHARMA, although sections for alphabetic and numeric graphemes are envisioned and may be added in the future.

Before creating a new token, make very certain that it is necessary and useful. Before creating a new symbol entry, make very certain (1) that the entry does not already exist, perhaps by a slightly different name; and (2) that the new entry is a useful addition to the Taxonomy. A new entry can only be useful if the symbol it describes can be objectively distinguished from all other symbols listed in the Taxonomy and this distinction is considered relevant to research. Just because four-petalled florets can be objectively distinguished from eight-petalled florets does not necessarily mean that the distinction is relevant. Even if the strategies of using four-versus eight-petalled florets could arguably be researched, it is not feasible to expect that this distinction will be consistently made in a corpus large enough to study such patterns, so we prefer not to count the petals on our florets.

To add a new symbol entry, decide on its token in the pattern "genus-species" or "genus-species-subspecies" (sub-subspecies may be added if absolutely necessary), using terminology similar to pre-existing tokens. Make sure that your token does not already exist. Copy and paste an existing entry (the entire <list></list> container) that is similar to your new entry. Make sure you maintain the alphabetical order of primary identifiers within each subsection. Edit the fields of the new entry to correspond to your new symbol. When creating a new subspecies for an existing species, it is advisable to keep the mapping identical to that of the higher category unless a distinctive Unicode with decent font support can be found for the new glyph.

Legacy stuff

ddandaStrokeLeft
Description
double vertical bar with short horizontal appendix at middle of left bar
Mapping
||
ddandaStrokeRight
Description
double vertical bar with short horizontal appendix at middle of right bar
Mapping
||
ddandaDotTriple
Description
double vertical bar enclosing three dots
Mapping
||
circleCross
Description
circle with cross inside
Mapping
circleFinial
Description
circle with a floret inside
Mapping
Alternative Identifier
circleFloret
egg2apo
Description
circle topped by a double (or multiple) curved stroke
Mapping
circleTriangle
Description
three small circles arranged in a triangle
Mapping
circleWheel
Description
a circular shape depicting a wheel with spokes
Mapping
finalGomutra
Description
final gomūtra
Mapping
initialGomutra
Description
initial gomūtra
Mapping
braceCurlyClose
Description
a wavy or zigzag vertical line
Mapping
indistinct
Description
symbol that is definitely present, but is so damaged (or unclear in the surrogate for a reason other than damage) that it cannot be assigned to any of the primary categories
unknown
Description
symbol that cannot be assigned to a genus
pc
Mapping
tēti
Mapping
mācam
Mapping
varuṣam
Mapping