Skip to content

Document design principles

Michal Měchura edited this page Jan 19, 2022 · 5 revisions

This is a proposal for how to organize the content of the DMLex standard. The current working draft is organized according to this proposal.

Some bits of this text should probably be in the standard itself.

Entity types

The standard defines the existence of:

  • Object types such as Entry, Sense, Translation.

    • An object can have another object as its parent and can have other objects as its children: the standard defines which object types are allowed to be parents/children of which other object types and with what arities.
    • An object can have properties. The standard defines, for each object type, which properties it can have. More about properties below.
  • Relation types such as EntrySet, SensePair, SenseTuple.

    • A relation is something which connects two or more objects in a way other than parent-child. The objects involved in a relation are called its participants: the standard defines, for each relation type, objects of which type are allowed to be participants, with what arities, and what role they play in the relation. For example, a SubsenseRelation is allowed to have exactly one Sense as a participant with the role “superordinate sense” and exactly one Sense as a participant with the role “subordinate sense”.
    • A relation can have properties. The standard defines, for each relation type, which properties it can have. More about properties below.
  • Marker types such as HeadwordMarker, PlaceholderMarker.

    • A marker is an entity which adds inline markup to a property of an object. The standard defines which marker types are allowed to add inline markup to which properties of which object types.

So, “objects”, “relations” and “markers” is the vocabulary through which we are expressing our standard. This vocabulary is implementation-independent. In XML they would probably be implemented as XML elements, while in a relational database they would probably be implemented as tables.

Parents and children

Each object can have up to one parent. The top-level type in DMLex is LexicographicResource and all other objects are children or descendants of an instance of that. For each object type, DMLex prescribes the types of objects that ca be its parent and its children.

Participants

Relations are allowed to have participants. The participant in a relation is a reference to an object somwehere in the same LexicographicResource (internal links) or in another LexicographicResource (external links). For each participant of each relation, DMLex prescribes the object type and and the arity.

Properties

Objects, relations and markers are allowed to have properties. A property are always something atomic and literal, typically a string of text (but certainly not a reference to another object). For example, many objects have a property named text. An entity can never have more than one property of a given name.

The term “property” is another part of our abstract, implementation-independent vocabulary. In XML, properties would typically be implemented as XML attributes, while in a relational database they would typically be implemented as columns in tables.

What is and what isn’t a property

How do we decide that something is a property and not an object or a relation? There are two criteria.

  1. The criterion of atomicity. It is a property if its value is something atomic and literal, like a string or a number (including items from controlled vocabularies), but not a reference to an object.

  2. The criterion of arity. It is a property if the object can always only have a maximum one of it, never more.

Both criteria need to be met in order for something to be treated in our standard as a property of an object (and not as an object). And, conversely, if something meets both criteria, then our standard must treat it as a property of an object (and not as an object).

Headwords are properties, not objects

In most cases, the fact that we are treating something as a property (and not as an object) will be uncontroversial. The only ocassion when that might be surprising is the fact that we are treating headwords as properties of Entry objects (and not as objects).

This is because a headword meets for both our criteria for “propertyhood”: its value is a literal string, and an entry can never have more than one.

I think that we agreed at some point that we want to prohibit entries from having more than one headword. If I am not mistaken and we do indeed want to prohibit that, then treating headwords as properties of Entry is the way to do it. If, on the other hand, we want to make it possible for an entry to have more than one headword – either in the core of our standard or in a possible future extension – then we must create a Headwordobject type as a child of Entry. — @michmech

The core and the modules

The standard is broken down into a core plus several modules. If somebody wants to claim that they have implemented DMLex, then they have to implement at least the core. The modules are optional. Implementers can say eg. “we are implementing DMLex core plus this, this and this module”.

The core defines several entity types (“entities” is a general term for objects, relations, and markers). Each module defines additional entity types or extends entity types defifned the core.

It may sometimes happen that a module (let’s call it Module A) extends an entity type defined in another module (let’s call it Module B). If an implementer has decided to implement Module A but not Module B, then obviously they can’t implement the extensions to Module B. In that case the implementor’s implementation of Module A is valid nonetheless.

Uniqueness of names

The names of entities (ie. objects, relations and markers) are unique across the entire DMLex standard. There is no need to qualify them with the name of the module they come from.

The names of properties are unique only inside the scope of the entity type they belong to. So,m when talking about properties, it is always necessary to mention the entity it belongs to, eg. “the text property of the Example object”.

Naming conventions

We use PascalCase for the names of entity types and camelCase for the names of properties.

Names in DMLex versus names in the implementations

Authors of implementations and/or serializations of DMLex do not have to use the same names as the names we are using in the standard, and they do not have to follow the same naming conventions. They just have to say “our x is an implementation of y from DMLex”.