Скачать презентацию
Идет загрузка презентации. Пожалуйста, подождите
Презентация была опубликована 11 лет назад пользователемccs.uky.edu
1 Processing Textual Sources for Linguistic and Literary Research: What a 'Solitary Scholar' Can Do Alexei Lavrentiev Ecole Normale Supérieure Lettres et Sciences humaines, Lyon, France University of Kentucky, October
2 Two projects Scholarly re-edition of a 1861 Anonymous folklore collection Corpus of Medieval French manuscript transcriptions for the study of punctuation
3 Folklore Project 1/14
4 Project Team Vera Kuznetsova –Senior Researcher, Institute of Philology SB RAS –Specialist in Russian folklore Olga Laguta –Professor, Novosibirsk State University –Linguist Alexei Lavrentiev Folklore Project 2/14
5 Objectives Verify the authenticity of folklore texts in the collection Analyze linguistic features of the texts Learn more about the author of the collection Make these texts available to scholarly community Folklore Project 3/14
6 Challenges Encode data in a sustainable format (TEI XML) using available tools –Microsoft office (Word, Access) –XML processing software (XML Spy) –Perl Configure the tools for the users with virtually no experience in IT Folklore Project 4/14
7 Workflow Word Documents Perl script Tokenized XML-TEI documents XSL Stylesheets Access Database Printed edition Lemmatized XML-TEI documents Vocabulary with contexts Linguistic analysis Metadata Folklore Project 5/14
8 Word document Folklore Project 6/14
9 Metadata file [1. File name] chtochelovekzakhochet ; [номер] 20 ; [2. Заглавие текста (в источнике)] Что человек захочет, то и сделает ; [3. Заглавие текста (рабочее)] Что человек захочет ; [4. Коллектив - редактор электронной версии] Сектор русского языка в Сибири, Институт филологии СО РАН ; [5. Ответственные исполнители] : [функция] Ввод текста и предварительная разметка ; [ФИО] Кузнецова Вера Станиславовна, Алешина Ольга Николаевна ; [функция] Конвертирование в формат XML-TEI, валидация ; [ФИО] Лаврентьев Алексей Михайлович. [6. Информация о проекте] : Корпус текстов русской фольклорной прозы (легенды) ; [7. Информация об источнике] : [Информация о редакторе(ах), составителе(ях) и т.п.] : [функция] подготовка к изданию ; [ФИО] Кузнецова Вера Станиславовна ; [функция] составитель сборника ; [ФИО] аноним ; [функция] автор записи ; [ФИО] не указан. [Место записи] не указано ; [Издательство] типография Ф. Иванова; [Место издания] Санкт-Петербург ; [Год издания] 1861 ; [ISBN] ????. Folklore Project 7/14
10 Perl script Takes Word document saved in HTML (filtered) format Takes the metadata Produces an XML-TEI document –Tokenizes and gives ID to and –Transforms analytical markup into elements Folklore Project 8/14
11 XML Document Folklore Project 9/14
12 XSLT Stylesheets Produce legible text for proofreading Produce tables to be exported to the database Folklore Project 10/14
13 Access Database Folklore Project 11/14
14 Access Database Folklore Project 12/14
15 Access Database Folklore Project 13/14
16 Results Printed edition –Texts –linguistic analysis supplement –indexes XML-TEI lemmatized text corpus XSLT stylesheets Access database –morphological table, –forms for lemmatization and dictionary Problem: no direct connection between the printed edition and the XML texts Folklore Project 14/14
17 Challenges Create an adequate representation of linguistically relevant data from a medieval manuscript –Multiple visualizations according to various editing traditions Annotate and analyze the use of punctuation marks Punctuation Project 1/12
18 Project History : first transcriptions using ASCII special characters 2001: first annotation using Excel 2003: XML-TEI (Charrette-style) transcriptions : XML-TEI (Menota-style) transcriptions Punctuation Project 2/12
19 Special data to be encoded Punctuation Project 3/12
20 Special data to be encoded Variant character glyphs Punctuation Project 3/12
21 Special data to be encoded Variant character glyphs Abbreviations Punctuation Project 3/12
22 Special data to be encoded Variant character glyphs Abbreviations Large initials Abnormal word spacing Punctuation Project 3/12
23 Normalized Presentation [ § 7] Endementres qu'il parloient einsi si entra laienz uns vaslez qui dist au roi: « Sire noveles vos aport mout merveilleuses. – Queles ? Multiple visualizations Extract from Ms.Lyon BM, P.A. 77, Queste del saint Graal, Photo: BM Lyon, Transcription: Graal Project Diplomatic Presentation [ § 7] ENdementres qu'il parloient einsi si entra laienz uns uaslez qui dist au roi. Sire noueles uos aport mout merueilleuses. Queles Imitative Presentation [ § 7] E Ndementreſ quıl parloıent eínſı ſı entͣ laıenz unſ uaſlez quı dıſt au roı. Sıre noueleſ uoſ apot mout merueılleuſeſ. Queleſ XML Transcription Endementres ENdementres E Ndementre&slong; qu Punctuation Project 4/12
24 Encoding choices Menota-style TEI extension –Multiple representation at a word level (norm, dipl, facs, pal?) Additional elements –punct, mdv_dropcap, mdv_lb… Additional attributes Punctuation Project 5/12
25 Workflow Compact syntax transcription –xml + shortcut characters (cf. Wiki) Text description using Access Database –Ms Description –Text typology Expanding to a standard XML format using a Perl script Export to tabular format for annotation Re-integration of annotation to XML documents Export and analysis using Weblex software Punctuation Project 6/12
26 Compact syntax Punctuation Project 7/12
27 Manuscript description Punctuation Project 8/12
28 Expanded XML Punctuation Project 9/12
29 Annotation Punctuation Project 10/12
30 Weblex Punctuation Project 11/12
31 Results 25 fragments of manuscripts transcribed and described Encoding guidelines Integrated database of text descriptors (editions and transcriptions) Perl scripts for conversions XSLT stylesheets Punctuation Project 12/12
32 Thank You!
Еще похожие презентации в нашем архиве:
© 2024 MyShared Inc.
All rights reserved.