Skip to content
Geoffroy Noël edited this page Sep 14, 2016 · 4 revisions

The python description of the data model used by the Text Viewer/Editor of the texts can be found in digipal_text/models.py

Tables

The main tables are:

  • TextContent: one textual 'representation' of an item part. 'Representation' is quite open: it can be a translation, a transcription, a codicological description, etc.
  • TextContentXML: one XML instance of the of TextContent. TextContent is only a container whereas a TextContentXML record holds the entire XML representation in one field. One TextContent could, in principle, have multiple TextContentXML associated to it. It was designed that way to allow different markups of the same text. In practice though the framework currently assumes there is only one instance. The framework can deal efficiently with a very large XML document. More about this below.
  • TextContentXMLCopy: a compressed backup of the entire XML document stored in a TextContentXML. Backups occur at a regular interval or when a user presses the save button in the Text Editor. There is no interface yet to view or restore a copy. This would require further development. However there are command line tools to list and restore copies.
  • TextUnit: a 'virtual' Django model that represents a piece of the XML found in a TextContentXML at a precise (sub)location. With very small addition to the code, a TextUnit can be customised to into something more specific, for example an Entry (e.g. Exon) or a Clause (e.g. Models of Authority).
  • TextAnnotation: a link between a DigiPal Annotation and a unit of text.

Locations, Location Types and Sublocations

The Text framework is able to deal with very large XML document assuming smaller pieces are marked-up. For instance we mark up every page numbers in the Exon manuscript. '2r' will precede the text for page 2 recto and be marked-up as a location of type 'locus'. This allows the Text Viewer to dynamically list in a navigational drop-down all locuses found in a text to jump directly to that unit. The Text Viewer can therefore deal with large content by pulling only smaller units at a time in your browser. It also means that every page in your manuscript is addressable, it has a unique URL (e.g. .../translation/locus/2r/) and can be annotated.

By marking up the different texts of the same Item Part with a compatible set of locations, the Text Viewer can automatically synchronise the panes (e.g. translation, image, transcription panes all showing content of page 2r).

The framework can accept other types of locations. For instance 'entry' in Exon manuscript. It is possible to customise the framework to add your own location types. The current limitations is that for a given type, all location are sequential and total, that is, there is no nesting and there is no piece of text that doesn't belong to a location. The 'locus' type is special in the sense that it is supposed to match the locus assigned to the DigiPal Image records.

There is also the notion of sublocation in the location system. A sublocation is one piece of content within a unit. For instance the address clause in the translation of a charter can be a sublocation of the 'translation/locus/face' unit. This concept allows the Text Viewer to extract more refined pieces of content and synchronise a particular region of an image with a piece of text. By making up person name in the text you can link it to a rectangular box in the image. There is no actual independent image of that person name in the database, the Text Viewer has to first retrieve and display the image for entire page 2r then highlight the region for 'William'. Hence the idea of a sublocation which can only exist within a location.

Content Type

Every TextContent has a type. It can be a translation, a transcription, ... You can also define your own types in the database. Many things in the framework can be customised according to the type: the rendering of the markup in the text viewer and search result by using type-aware CSS rules, the auto-markup process (i.e. conversion from plain text conventions into XML markup), the actual buttons in the Text Editor to mark-up different parts of the text.

The mark-up is currently compatible with XHTML but it is designed to map easily with other schemas (e.g. TEI). One current limitation of the Text Editor is the absence of editorial interface for the definition of attributes.

Clone this wiki locally