Part 1: The Problem (How do we teach ab-initio students to read authentic Russian texts in a year?) Part 2: A potential corpus-based solution The use of corpora and corpus tools to train ab-initio students to read authentic academic texts ReadingCorp project Motivated by the demand for specialist PG language training in Russian and the findings of previous research (Russian for Research 2008)
6-month project funded by the Centre for East European Language Based Area Studies (CEELBAS) and carried out at the University of Sheffield in 2008 The project aimed to: build up a profile of what PG language training was offered at CEELBAS institutions and to identify the methods of and problems in teaching languages for research; identify the demand for language training for research purposes at member departments and to establish what such language training should include; look at new modes of delivery such as distance- and computer- aided learning and the possibility of sharing of resources.
Departments of Russian and Slavonic Studies are attracting more PG students who do not know Russian and whose research is therefore restricted (the same situation is true of other languages) Students are unable to read primary sources, use archives and work with some online packages without Russian You simply cant do Russian-related economic research without Russian; Without language skills research is much impaired
There is a massive demand for PG language training across CEELBAS institutions Potentially good researchers are being lost due to the lack of adequate PG language training Conventional PG-focused intensive courses are effective but impractical at most institutions; they are not financially sustainable at any institution in the long term Other methods (piggy-backing, non-intensive reading modules, following UG programmes) do not work It is not possible to offer specialist tuition to the individual student or to cover all research areas Texts are out-dated and/or more suited to some disciplines than others; their content is determined subjectively by linguists A cost-effective way of delivering shared PG language programmes is necessary
Corpora are well suited to LSP learning and teaching for several reasons: they can inform us of key items of vocabulary and grammar points that require instruction in specific domains; frequency data shape materials and syllabus design; breadth of topics: a corpus can be created on any topic, no matter how specialist, for which there is enough available material; needs of the individual: a corpus can be created from articles directly relevant to an individual students research topic; there is no printing/publication lag: corpora can be created on current events, yesterdays news stories, etc.; they can be built within hours.
Corpora can be used directly or indirectly Corpora can be used in combination with traditional teaching practices (blended learning) Corpora have been used successfully for language for research projects in the past: German for Chemists (Butler) and on the Warwick course of Italian Language for PG students of Renaissance Studies
2-year project funded by the AHRC (Collaborative Language Skills Training project) Run at the Department of Russian and Slavonic Studies (Sheffield), GRASS and CTS (Leeds) Combines knowledge and practice of PG language teaching methods (Sheffield / Leeds) with technological expertise in creating corpus tools for language learning purposes (Leeds)
To explore possibilities for using corpora to achieve reading competence in Russian To create tools, reference materials (keyword lists, annotated readers, a grammar for researchers) and exercises to support the acquisition of vocabulary from specific and varied domains To actively engage students in vocabulary identification exercises
It may seem ridiculous to suggest that a complete beginner with no formal training in linguistics or experience in learning a foreign language can learn Russian in a year We focus solely on reading skills Our aim is for students to read authentic texts with the help of dictionaries and our tools and materials - we do not expect them to pick up a text and read it as someone with years of training would Why within a year?
Corpus The Russian Academic Corpus (RAC) Technology (additions to the IntelliText Interface) Keyword list generator (single- and multi-words; POS-specific) Grammar frequency Advanced options for navigating texts Vocabulary highlights (general academic, discipline-specific keywords) Automatic grammar classification Pedagogy Readers from 13 academic disciplines Cleaned keyword lists from 13 academic disciplines Transferable teaching materials A PG-focused grammar
Contains approximately 5 million words Used for compiling frequency lists and in teaching Made up of 13 sub-corpora (art, criminology, culture, ecology, economics, geography, history, international relations, linguistics, medicine, politics, religion, sociology) The sub-corpora are roughly equal in size and each contains 50 texts The main corpus is freely available via the IntelliText Interface Individual sub-corpora are available on demand
General academic and discipline-specific keywords were extracted Single words (discipline-specific) and multi- words (general academic and discipline-specific) cleaned: anomalies removed; lemmas changed to original form (то не менее > тем не менее, по отношение к > по отношению к) 100 keywords for each subject area Translations (all lists) and collocations (single words)
PhraseTranslation вместе с темmoreover; that said тем не менееnevertheless в зависимости отdepending on состоит в томis заключается в томis в это времяat the (this / that) time по отношению кwith regard to список используемой литературыbibliography может привести кmay lead to один из важныхan important включает в себяincludes
Keyword TranslationKey collocations водаwater сточные воды "waste water"; пресная вода "fresh water"; морская вода "sea water" грунтовые воды "ground waters"; качество воды "water quality" отходыwaste бытовые отходы "domestic waste"; промышленные отходы "industrial waste"; твёрдые отходы "solid waste"; переработка отходов "waste processing"; размещение отходов "waste disposal"
Lexical bundleTranslation рынок трудаlabour market национальная экономикаnational economy оплата трудаremuneration of labour на рынкеon the market спрос наdemand for социальная политикаsocial policy рабочая силаwork force цена наprice of предпринимательский рискentrepreneurial risk предпринимательская деятельностьentrepreneurship
10 readers from each of the 13 sub-corpora Each text contains approximately 200 words The readers may be used to train general academic vocabulary or discipline-specific vocabulary Manually annotated Freely available
Криминогенность личности представляет собой качественной выражение соотношения негативной и позитивной направленности личности. А преступление является объективным, реальным показателем криминогенности личности. Криминогенность можно рассматривать с двух позиций. Исходя из первой, «криминогенность рождается и умирает вместе с преступлением». Однако криминогенность можно рассматривать не только как результат, но и как процесс ее становления. Таким образом, можно выделить три стадии генезиса криминогенности личности преступника: Формирование криминогенности личности, которая в этот период совершает аморальные поступки и правонарушения неуголовного характера.
Focus on receptive not productive language skills Grammar identification: our aim is for users to identify and understand the use of grammatical features, with our notes and tools, not to be able to construct them Grammar forms were selected on the basis of their frequency in academic texts: participles, gerunds and passive constructions were introduced early; some points of grammar commonly covered in the first year of UG programmes were not included.
The following information is included for each point of grammar: an English-language commentary of how and for what purpose it is used; information on what the form looks like (identification); lists of other points of grammar that have the same form and notes on how to tell them apart (disambiguation); an annotated list of common words within the category; corpus examples and translations.
Use: -ing forms: judging by his comments, Id say that... Looks like: принимая,судя, опираясь Common exceptions: будучи Can be confused with: soft feminine nouns (Nom. Sing.) = неделя, hard feminine adjectives (Nom. Sing.) = интересная; soft masculine nouns (Gen. Sing.) = трамвая Disambiguation: gerunds are very unlikely to be directly preceded by words ending in –ая or –ого; words ending in –a rarely follow gerunds (BUT принимая лекарства)
GerundTranslationNotes говоряspeaking, talking о "about" + Prep.; не говоря уже о "not to mention"; по-иному / иначе говоря "put another way, in other words"; строго говоря "strictly speaking" исходя on the basis of, on the strength of, based on the assumption that из "from" + Gen.; исходя из этого "on this basis"; исходя из того, что "on the basis of" (+ verb) начинаяstarting с "from" + Gen. будучиbeing Instr. учитываяconsidering Acc. имеяhaving в виду считаяconsidering что "that"; Acc. опираясьbased, drawing; relying на "on" + Acc. рассматриваяviewing, considering Acc. стремясьtrying, in an attempt to with verb infinitives; к + Dat.
For texts that are available online or that have been digitised The ReadingCorp tools allow users to annotate their texts according to vocabulary and grammar Vocabulary highlights work for any text uploaded to the system, as the list of academic words is stable and our tools automatically classify texts and corpora according to keywords Automatic grammar classification helps users identify or disambiguate parts of speech Demo with Space corpus
Initial corpus training (either one session over an afternoon or two shorter sessions) Introduction to the Cyrillic alphabet (if necessary) 1 class a week focusing on (1) guided reading and (2) hands-on vocabulary building exercises Exercises are based around keywords
CombinationLexical bundleTranslation Adj. + Search Word (SW) совокупный спросaggregate demand Verb + Adj. + SW + Noun отражать платежеспособный спрос населения to reflect the populations purchasing power Verb + Adj. + SWпользоваться большим спросом to be in high demand Verb + SW + Nounудовлетворить спрос покупателей to meet customers demands Noun + SW + Prep.увеличение спроса наrise in demand for SW + Verbспрос падаетdemand is decreasing
Tutors working with students whose research is in an area other than those covered by ReadingCorp may: use our interface to create keyword lists and analyse texts use the readers for general reading practice access the RAC use the grammar use the keyword lists from the RAC They will need to: create keywords lists for the subject by building a small corpus add their own examples to the material templates
Is/Does a corpus-based approach: suitable for distance learning? cover contemporary research topics? cost-effective and sustainable? transferable to other languages and domains? cater for the needs of the individual student? help structure syllabi? allow ab-initio students to acquire the necessary reading skills to be able to effectively carry out their research?
Corpora go beyond the traditional course book and offer exciting possibilities for LSP learning and teaching A corpus-based approach is particularly well-suited to training reading competence in specific domains It makes the goal of reading and understanding authentic academic texts in Russian within a year a realistic objective BUT will advances in machine translation and optical character recognition make specialised reading courses redundant? As machine translation becomes more reliable, as more material is digitised and made available online and as OCR technology becomes more accurate, will students need anything other than a scanner and Google Translate?