<?xml version="1.0"?>
<?oxygen RNGSchema="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_odds.rnc" type="compact"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>ESDS Qualidata project recommendations</title>
        <author>James Cummings</author>
      </titleStmt>
      <publicationStmt>
        <p>for use by ESDS Qualidata</p>
      </publicationStmt>
      <sourceDesc>
        <p>Generated with TEI Roma</p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <front>
      <divGen type="toc"/>
    </front>
    <body>
      <div>
        <head>Project Documentation</head>
        <div>
          <head>Introduction</head>
          <p>The first deliverable was the selection of TEI P5 elements suitable for encoding ESDS
            materials, the recommendation for how best to store the required metadata and structure
            the encoded documents. In addition a Relax NG schema was produced along with customised
            TEI documentation for this schema. This schema may be modified later in response to the
            evolving needs of the project. This file is authored as a TEI ODD XML file which then is
            passed through TEI Roma to produce schemas (RNG/RNC/XSD/DTD) and documentation
            (PDF/HTML). All changes should be made in the TEI ODD file rather than in the generated
            schemas or documentation.</p>
        </div>
        <div>
          <head>Scope of Schema</head>
          <p> In deciding the metadata scheme and forms of markup that should be used to encode the
            interviews, it was agreed that a very light encoding should be used. However, to allow
            for later expansion and re-purposing of the materials, and re-use of the schema, a
            number of extra elements have been included. While these certainly can be cut out now,
            it was thought that it might be more straightforward to include them now, rather than
            introduce them at a later date. The elements included number around 130, and while this
            may seem like a lot, it is a great reduction from the almost 500 elements the TEI
            currently contains. It could easily be further constrained to an extremely minimal
            schema if desired. </p>
        </div>
        <div>
          <head>Processing of Files</head>
          <p>As the project files are already producing them in a set XML format, it was recommended
            that they continue to produce these files in that format. The resulting files could then
            be converted <foreign>en masse</foreign> at an appropriate time in the future. Of the
            sample files provided, none were found to have structures that would not be able to be
            easily converted to the proposed TEI format. The recommendation was also that this
            should be done through XSLT. This also impacted the decision to not worry as much about
            the tightness of the schema, since the conversion will be under the control of ESDS.
          </p>
        </div>
        <div>
          <head>Later Deliverables</head>
          <p>It is believed that this inital proposed schema is robust enough to deal with the
            majority of the other deliverables, and the TEI ODD file will be edited to document
            these in due course and the documentation and schemas regenerated at that point.</p>
        </div>
        <div>
          <head>Elements Included</head>
          <list type="gloss">
            <label>Text Structure:</label>
            <item>TEI, body, div, group, text.</item>
            <label>TEI Header:</label>
            <item>authority, availability, change, distributor, editorialDecl, equiment, extent,
              fileDesc, funder, idno, keywords, langUsage, language, noteStmt, principal,
              profileDesc, projectDesc, publicationStmt, recording, recordingStmt, revisionDesc,
              sourceDesc, sponsor, state, teiHeader, titleStmt.</item>
            <label>Core:</label>
            <item>abbr, addrLine, address, author, bibl, corr, date, desc, distinct, divGen, editor,
              emph, equiv, expan, foreign, gap, gloss, graphic, head, measure, mentioned, milestone,
              name, note, orig, p, pb, ptr, pubPlace, q, ref, reg, resp, respStmt, rs, sic,
              teiCorpus, term, title, unclear.</item>
            <label>Corpus:</label>
            <item>activity, firstLang, locale, particDesc, setting, settingDesc, textDesc.</item>
            <label>Linking: </label>
            <item>anchor, join, joinGrp, link, linkGrp, seg.</item>
            <label>Names&amp;Dates: </label>
            <item>addName, affiliation, age, birth, death, education, faith, floruit, forename,
              genName, langKnowledge, langKnown, listPerson, nameLink, nationality, occasion,
              occupation, orgDivn, orgName, orgTitle, orgType, particLinks, persEvent, persName,
              persState, persTrait, person, personGrp, placeName, relation, residence, roleName, sex
              socecStatus, surname.</item>
            <label>Spoken: </label>
            <item>event, kinesic, pause, shift, u, vocal, writing.</item>
          </list>
        </div>
        <div>
          <head>Structure of Files</head>
          <p> The recommendation is to use a single file per interview and any grouping of these to
            be done at a later stage if necessary. The basic structure should be: <egXML
              xmlns="http://www.tei-c.org/ns/Examples"><![CDATA[<TEI xmlns="http://www.tei-c.org/ns/1.0">
              <teiHeader> <!-- various header elements -->  </teiHeader>
          <text>
            <body>
              <div>
                <!-- enclose entire interview, or sections of interview in a
                  'div' element, if it has a section-level title use 'head'
                  element. -->
                <u who="#interviewer" xml:id="u1"><!-- Interviewer starts --></u>
                <u who="#subject" xml:id="u2"><!-- subject answering --></u>
                <u who="#interviewer" xml:id="u3"> <!-- more interviewer  talking --></u>
                <u who="#subject" xml:id="u4"> <!-- subject continues --></u>
              </div>
            </body>
          </text>
       </TEI>]]></egXML>
          </p>
          <p>Here the <gi>TEI</gi> element is in the TEI namespace, and XML comments are used to
            indicate missing data or notes. Each file should have a <gi>teiHeader</gi> (discussed
            below), followed by a <gi>text</gi> element. This should contain a <gi>body</gi> element
            and one <gi>div</gi> for the entire interview or for each section if it is segmented.
            Each utterance (in this kind of interview meant to mean speech by one person in usual
            turn-taking), is marked with a <gi>u</gi> element. This element must have a
            <att>who</att> attribute which should be a URI pointer pointing back up to the header
            where a <gi>person</gi> element has an <att>xml:id</att> defined for that person. Either
            generic <att>xml:id</att>'s such as <val>interviewer</val> and <val>subject</val> can be
            used (and referenced in the <gi>u</gi> element's <att>who</att> attribute as
              <val>#interviewer</val> since it is a URI fragment) or specific ones relating solely
            to that file could be used. </p>
          <p>These elements can be defined as: <specList>
              <specDesc key="TEI"/>
              <specDesc key="teiHeader"/>
              <specDesc key="text"/>
              <specDesc key="body"/>
              <specDesc key="div"/>
              <specDesc atts="who" key="u"/>
            </specList>
          </p>
        </div>
        <div>
          <head>The teiHeader</head>
          <p>The <gi>teiHeader</gi> element should contain all the metadata for that particular
            interview. There are two major child elements of <gi>teiHeader</gi> for the purpose of
            this project: <specList>
              <specDesc key="fileDesc"/>
              <specDesc key="profileDesc"/>
            </specList>
          </p>
          <p>A template <gi>fileDesc</gi> might look like: <egXML
              xmlns="http://www.tei-c.org/ns/Examples">
              <fileDesc>
                <titleStmt>
                  <title><!-- Title of file --></title>
                  <title type="collection"><!-- Optional title of collection --></title>
                </titleStmt>
                <publicationStmt>
                  <authority><!-- Name of depositor --></authority>
                  <distributor>ESDS Qualidata</distributor>
                  <idno type="intNum"><!-- Unique interview number --></idno>
                </publicationStmt>
                <sourceDesc>
                  <bibl><!-- Bibliographic Information concerning the transcript --></bibl>
                </sourceDesc>
              </fileDesc>
            </egXML>
          </p>
          <p>As much information as is available should be included. There are places for a lot more
            detail not shown in this simple template, consult the TEI guidelines for further
            information. The most important pieces of information here are the <gi>title</gi> and
            the <gi>idno</gi>, since these are what identify the file. </p>
          <p>The <gi>profileDesc</gi> for this project will usually contain two important child
            elements: <specList>
              <specDesc key="particDesc"/>
              <specDesc key="settingDesc"/>
            </specList>
          </p>
          <p>The <gi>particDesc </gi>element should contain a listPerson element with at least two
              <gi>person</gi> elements defined, one for the subject and one for the interviewer. The
              <gi>person</gi> element for the subject should contain as much information as is
            available for the interviewee. Some pieces of information which can be recorded here
            include: <specList>
              <specDesc key="affiliation"/>
              <specDesc key="age"/>
              <specDesc key="birth"/>
              <specDesc key="death"/>
              <specDesc key="education"/>
              <specDesc key="faith"/>
              <specDesc key="floruit"/>
              <specDesc key="langKnowledge"/>
              <specDesc key="nationality"/>
              <specDesc key="occupation"/>
              <specDesc key="sex"/>
              <specDesc key="socecStatus"/>
              <specDesc key="persState"/>
              <specDesc key="persTrait"/>
            </specList>
          </p>
          <p>For the purposes of this project, where known <gi>age</gi>, <gi>birth</gi>,
            <gi>sex</gi>, and <gi>occupation</gi> will be most commonly recorded. A template
              <gi>person</gi> element for a subject might look like this: <egXML
              xmlns="http://www.tei-c.org/ns/Examples">
              <person xml:id="subject">
                <persName type="unanonymised">
                  <!-- 
              type="unanonymised" or type="anonymised"
              If  type="anonymised", then content of persName
              should just be the anonymisation number, i.e. 'g24'.
            -->
                  <roleName type="honorific"
                    ><!-- Optional: Mr, Mrs, Dr, Prof, Sir,
                Rev., etc. if known--></roleName>
                  <forename><!-- forename of subject, include multiple
              forename elements for multiple forenames --></forename>
                  <surname><!-- surname of the subject --></surname>
                </persName>
                <birth date="1887"
                  ><!-- Date of birth if known, give to
              precision available, attribute in W3C format.  e.g.:
              <birth date="1887">1887</birth>
              <birth date="1887-01">January 1887</birth>
              <birth date="1887-01-25">25 January 1887</birth>
            --></birth>
                <occupation><!-- occupation if available --></occupation>
                <sex value="1"
                  ><!--
              Use ISO 5218:1977 values for attribute.  e.g.:
              <sex value="0">Unknown</sex>
              <sex value="1">Male</sex>
              <sex value="2">Female</sex>
              <sex value="9">Not Applicable</sex>
            --></sex>
                <persState type="marriage">
                  <p><!-- Martial Status if known--></p>
                </persState>
              </person>
            </egXML>
          </p>
          <p>There should always be a <gi>person</gi> element for both subject and interviewer even
            if there is no data available for the interviewer. This can consist merely of: <egXML
              xmlns="http://www.tei-c.org/ns/Examples">
              <person xml:id="interviewer">
                <p>Interviewer Unknown</p>
              </person>
            </egXML>
          </p>
          <p>The <gi>settingDesc</gi> element is a place to record information concerning the
              <gi>setting</gi> of the interview. A template of this might look like: <egXML
              xmlns="http://www.tei-c.org/ns/Examples">
              <settingDesc>
                <setting>
                  <date value="1979-05-03">3 May 1979</date>
                  <locale>
                    <placeName><!-- place of interview if known --></placeName>
                  </locale>
                </setting>
              </settingDesc>
            </egXML>
          </p>
          <p>In addition to the <gi>date</gi> and <gi>locale</gi>, the <gi>activity</gi> a subject
            is undertaking whilst being interviewed (e.g. 'sewing', 'driving', 'skiing') amongst
            other information can be recorded if known.</p>
        </div>
        <div>
          <head>Transcription of Speech</head>
          <p>As mentioned above each utterance should be in a <gi>u</gi> element. This should look
            something like: <egXML xmlns="http://www.tei-c.org/ns/Examples">
              <div>
                <!-- enclose entire interview, or sections of interview in a
            'div' element, if it has a section-level title use 'head'
            element. -->
                <u who="#interviewer" xml:id="u1"><!-- Interviewer starts --></u>
                <u who="#subject" xml:id="u2"><!-- subject answering --></u>
                <u who="#interviewer" xml:id="u3"><!-- more interviewer
              talking --></u>
                <u who="#subject" xml:id="u4"><!-- subject continues --></u>
                <!-- inside each 'u' many elements are available, if one
            doesn't fit the situation, use <seg type="type of thing">
          -->
              </div>
            </egXML>
          </p>
          <p>The important aspects are: the <att>who</att> attribute (which should point to the
              <att>xml:id</att> of a <gi>person</gi> in the header), and the <att>xml:id</att> which
            should be consecutively numbered from u1. </p>
          <p>There are various optional phrase-level elements available if the project's needs
            eventually call for them.</p>
          <p> These include phenomena of interest in spoken corpora such as: <specList>
              <specDesc key="event"/>
              <specDesc key="kinesic"/>
              <specDesc key="pause"/>
              <specDesc key="shift"/>
              <specDesc key="vocal"/>
              <specDesc key="writing"/>
              <specDesc key="gap"/>
            </specList>
          </p>
          <p>As well as those commonly used in such transcriptions: <specList>
              <specDesc key="abbr"/>
              <specDesc key="corr"/>
              <specDesc key="distinct"/>
              <specDesc key="emph"/>
              <specDesc key="expan"/>
              <specDesc key="foreign"/>
              <specDesc key="gap"/>
              <specDesc key="mentioned"/>
              <specDesc key="orig"/>
              <specDesc key="ref"/>
              <specDesc key="reg"/>
              <specDesc key="sic"/>
              <specDesc key="soCalled"/>
              <specDesc key="unclear"/>
            </specList>
          </p>
        </div>
        <div>
          <head>Name and Dates</head>
          <p>There are a variety of elements available for recording names and dates both in the
            header and outside it. In general the <gi>date</gi> element should be given W3C or ISO
            standard values, these also have an existing standard formulation for date ranges and
            durations. Some of the elements available include: <specList>
              <specDesc key="addName"/>
              <specDesc key="address"/>
              <specDesc key="addrLine"/>
              <specDesc key="date"/>
              <specDesc key="forename"/>
              <specDesc key="genName"/>
              <specDesc key="name"/>
              <specDesc key="nameLink"/>
              <specDesc key="orgDivn"/>
              <specDesc key="orgName"/>
              <specDesc key="orgTitle"/>
              <specDesc key="orgType"/>
              <specDesc key="persName"/>
              <specDesc key="placeName"/>
              <specDesc key="roleName"/>
              <specDesc key="surname"/>
            </specList>
          </p>
        </div>
        <div>
          <head>ID references with xml:id</head>
          <p>As mentioned previously, the <att>who</att> attribute on the <gi>u</gi> element should
            refer back to an <att>xml:id </att>reference in the header. If generalised
            <att>xml:id</att>'s are used (e.g. <val>interviewer</val> and <val>subject</val>), then
            a possible problem occurs if all the interviews are later treated as a corpus and
            collected into a separate file under a <gi>teiCorpus</gi> structure. (If this is done,
            the TEI recommends that this should be done using XInclude, rather than physically
            copying the files into one large file.) However this is done, the various
            <att>xml:id</att>'s would no longer be unique, and thus the master file would fail to
            validate. </p>
          <p>A simple transformation would be recommended before any such gathering together of the
            files. What this would do is replace all <att>xml:id</att>'s with a compound based on
            the unique <gi>idno</gi> element in the header. Thus, if an interview's <gi>idno</gi>
            might be <val>int0004</val>, then the <att>xml:id</att>'s in this file might be
            transformed simply to such things as <val>int0004-interviewer</val> and
              <val>int0004-subject</val>. Care must also be taken to transform any references to
            these, most notably in the <att>who</att> attribute of each utterance. This entire
            process is quite easily scriptable with XSLT, so should be of little concern, however I
            felt it worth mentioning here. </p>
        </div>
      </div>
      <div>
        <head>Schema Documentation</head>
        <schemaSpec docLang="en" ident="esds" targetLang="en" xml:lang="en">
          <moduleRef key="core"/>
          <moduleRef key="tei"/>
          <moduleRef key="header"/>
          <moduleRef key="textstructure"/>
          <moduleRef key="corpus"/>
          <moduleRef key="linking"/>
          <moduleRef key="namesdates"/>
          <moduleRef key="spoken"/>
          <elementSpec ident="add" mode="delete" module="core"/>
          <elementSpec ident="altIdent" mode="delete" module="core"/>
          <elementSpec ident="analytic" mode="delete" module="core"/>
          <elementSpec ident="biblFull" mode="delete" module="core"/>
          <elementSpec ident="biblItem" mode="delete" module="core"/>
          <elementSpec ident="biblStruct" mode="delete" module="core"/>
          <elementSpec ident="binaryObject" mode="delete" module="core"/>
          <elementSpec ident="cb" mode="delete" module="core"/>
          <elementSpec ident="choice" mode="delete" module="core"/>
          <elementSpec ident="cit" mode="delete" module="core"/>
          <elementSpec ident="del" mode="delete" module="core"/>
          <elementSpec ident="equiv" mode="delete" module="core"/>
          <elementSpec ident="headItem" mode="delete" module="core"/>
          <elementSpec ident="headLabel" mode="delete" module="core"/>
          <elementSpec ident="hi" mode="delete" module="core"/>
          <elementSpec ident="l" mode="delete" module="core"/>
          <elementSpec ident="lb" mode="delete" module="core"/>
          <elementSpec ident="lg" mode="delete" module="core"/>
          <elementSpec ident="num" mode="delete" module="core"/>
          <elementSpec ident="quote" mode="delete" module="core"/>
          <elementSpec ident="sp" mode="delete" module="core"/>
          <elementSpec ident="speaker" mode="delete" module="core"/>
          <elementSpec ident="stage" mode="delete" module="core"/>
          <elementSpec ident="time" mode="delete" module="core"/>
          <elementSpec ident="biblScope" mode="delete" module="core"/>
          <elementSpec ident="imprint" mode="delete" module="core"/>
          <elementSpec ident="label" mode="delete" module="core"/>
          <elementSpec ident="list" mode="delete" module="core"/>
          <elementSpec ident="listBibl" mode="delete" module="core"/>
          <elementSpec ident="meeting" mode="delete" module="core"/>
          <elementSpec ident="monogr" mode="delete" module="core"/>
          <elementSpec ident="postBox" mode="delete" module="core"/>
          <elementSpec ident="postCode" mode="delete" module="core"/>
          <elementSpec ident="series" mode="delete" module="core"/>
          <elementSpec ident="street" mode="delete" module="core"/>
          <elementSpec ident="broadcast" mode="delete" module="header"/>
          <elementSpec ident="cRefPattern" mode="delete" module="header"/>
          <elementSpec ident="catDesc" mode="delete" module="header"/>
          <elementSpec ident="catRef" mode="delete" module="header"/>
          <elementSpec ident="category" mode="delete" module="header"/>
          <elementSpec ident="classCode" mode="delete" module="header"/>
          <elementSpec ident="classDecl" mode="delete" module="header"/>
          <elementSpec ident="correction" mode="delete" module="header"/>
          <elementSpec ident="creation" mode="delete" module="header"/>
          <elementSpec ident="edition" mode="delete" module="header"/>
          <elementSpec ident="editionStmt" mode="delete" module="header"/>
          <elementSpec ident="encodingDesc" mode="delete" module="header"/>
          <elementSpec ident="fsdDecl" mode="delete" module="header"/>
          <elementSpec ident="hyphenation" mode="delete" module="header"/>
          <elementSpec ident="interpretation" mode="delete" module="header"/>
          <elementSpec ident="metDecl" mode="delete" module="header"/>
          <elementSpec ident="metSym" mode="delete" module="header"/>
          <elementSpec ident="namespace" mode="delete" module="header"/>
          <elementSpec ident="normalization" mode="delete" module="header"/>
          <elementSpec ident="quotation" mode="delete" module="header"/>
          <elementSpec ident="refsDecl" mode="delete" module="header"/>
          <elementSpec ident="rendition" mode="delete" module="header"/>
          <elementSpec ident="samplingDecl" mode="delete" module="header"/>
          <elementSpec ident="scriptStmt" mode="delete" module="header"/>
          <elementSpec ident="segmentation" mode="delete" module="header"/>
          <elementSpec ident="seriesStmt" mode="delete" module="header"/>
          <elementSpec ident="stdVals" mode="delete" module="header"/>
          <elementSpec ident="tagUsage" mode="delete" module="header"/>
          <elementSpec ident="tagsDecl" mode="delete" module="header"/>
          <elementSpec ident="taxonomy" mode="delete" module="header"/>
          <elementSpec ident="textClass" mode="delete" module="header"/>
          <elementSpec ident="variantEncoding" mode="delete" module="header"/>
          <elementSpec ident="argument" mode="delete" module="textstructure"/>
          <elementSpec ident="back" mode="delete" module="textstructure"/>
          <elementSpec ident="byline" mode="delete" module="textstructure"/>
          <elementSpec ident="closer" mode="delete" module="textstructure"/>
          <elementSpec ident="dateline" mode="delete" module="textstructure"/>
          <elementSpec ident="div0" mode="delete" module="textstructure"/>
          <elementSpec ident="div1" mode="delete" module="textstructure"/>
          <elementSpec ident="div2" mode="delete" module="textstructure"/>
          <elementSpec ident="div3" mode="delete" module="textstructure"/>
          <elementSpec ident="div4" mode="delete" module="textstructure"/>
          <elementSpec ident="div5" mode="delete" module="textstructure"/>
          <elementSpec ident="div6" mode="delete" module="textstructure"/>
          <elementSpec ident="div7" mode="delete" module="textstructure"/>
          <elementSpec ident="docAuthor" mode="delete" module="textstructure"/>
          <elementSpec ident="docDate" mode="delete" module="textstructure"/>
          <elementSpec ident="docEdition" mode="delete" module="textstructure"/>
          <elementSpec ident="docImprint" mode="delete" module="textstructure"/>
          <elementSpec ident="docTitle" mode="delete" module="textstructure"/>
          <elementSpec ident="epigraph" mode="delete" module="textstructure"/>
          <elementSpec ident="front" mode="delete" module="textstructure"/>
          <elementSpec ident="imprimatur" mode="delete" module="textstructure"/>
          <elementSpec ident="opener" mode="delete" module="textstructure"/>
          <elementSpec ident="salute" mode="delete" module="textstructure"/>
          <elementSpec ident="signed" mode="delete" module="textstructure"/>
          <elementSpec ident="titlePage" mode="delete" module="textstructure"/>
          <elementSpec ident="titlePart" mode="delete" module="textstructure"/>
          <elementSpec ident="trailer" mode="delete" module="textstructure"/>
          <elementSpec ident="channel" mode="delete" module="corpus"/>
          <elementSpec ident="constitution" mode="delete" module="corpus"/>
          <elementSpec ident="derivation" mode="delete" module="corpus"/>
          <elementSpec ident="domain" mode="delete" module="corpus"/>
          <elementSpec ident="factuality" mode="delete" module="corpus"/>
          <elementSpec ident="interaction" mode="delete" module="corpus"/>
          <elementSpec ident="preparedness" mode="delete" module="corpus"/>
          <elementSpec ident="purpose" mode="delete" module="corpus"/>
          <elementSpec ident="ab" mode="delete" module="linking"/>
          <elementSpec ident="alt" mode="delete" module="linking"/>
          <elementSpec ident="altGrp" mode="delete" module="linking"/>
          <elementSpec ident="timeline" mode="delete" module="linking"/>
          <elementSpec ident="when" mode="delete" module="linking"/>
          <elementSpec ident="bloc" mode="delete" module="namesdates"/>
          <elementSpec ident="country" mode="delete" module="namesdates"/>
          <elementSpec ident="distance" mode="delete" module="namesdates"/>
          <elementSpec ident="district" mode="delete" module="namesdates"/>
          <elementSpec ident="geog" mode="delete" module="namesdates"/>
          <elementSpec ident="geogName" mode="delete" module="namesdates"/>
          <elementSpec ident="hour" mode="delete" module="namesdates"/>
          <elementSpec ident="minute" mode="delete" module="namesdates"/>
          <elementSpec ident="month" mode="delete" module="namesdates"/>
          <elementSpec ident="offset" mode="delete" module="namesdates"/>
          <elementSpec ident="region" mode="delete" module="namesdates"/>
          <elementSpec ident="second" mode="delete" module="namesdates"/>
          <elementSpec ident="week" mode="delete" module="namesdates"/>
          <elementSpec ident="year" mode="delete" module="namesdates"/>
          <elementSpec module="core" ident="item" mode="delete"/>
          <elementSpec module="namesdates" ident="day" mode="delete"/>
          <elementSpec module="namesdates" ident="settlement" mode="delete"/>
        </schemaSpec>
      </div>
    </body>
  </text>
</TEI>

