Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences iomdin@iitp.ru, iomdin@gmail.com.

Презентация:



Advertisements
Похожие презентации
LANGUAGE, SPEECH, SPEECH ACTIVITY Suggests to allocate the following functions: communicative; thinking tools; mastering the socio-historical; experience;
Advertisements

What to expect? How to prepare? What to do? How to win and find a good job? BUSINESS ENGLISH COURSE NOVA KAKHOVKA GUMNASUIM 2012.
Describe a movie which made a strong impression on you. You should say: which movie it was – the name what the movie was about who the main stars were.
The most important technological inventions Think of as many words as possible related to the topic Think of as many words as possible related to the.
Simple Past vs. Present Perfect When do we use each tense in English?
Учимся писать Эссе. Opinion essays § 1- introduce the subject and state your opinion § 2-4 – or more paragraphs - first viewpoint supported by reasons/
Indirect Questions How do you make indirect questions? When do you use this grammar?
COLLOQUIALISMS AND THEIR SPHERE OF COMMUNICATION.
THE MEDIA The mass media play an important part in our lives. Nowadays information is the most necessary thing. That is why there are so many sources.
Lesson 2. How to say hello & goodbye ?. When we first meet someone whether it is a person we know or someone we are meeting for the first time, we will.
In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting (rearranging) objects.
The category of mood. The category of mood is an explicit verbal category expressing the relation of the action denoted by the predicate to reality as.
1. Do you like your school? I should say that I love my school a lot. For me its not only a building where I get knowledge, but also the second home of.
Direct Speech / Quo Indirect Speech Indirect Speech Saying exactly what someone has said is called direct speech (sometimes called quoted speech). Here.
REFERENCE ELEMENTS 64. If your REFERENCE ELEMENTS toolbar is not in view and not hidden, you can retrieve it from the toolbars menu seen here. 65.
Take one minute to prepare a talk on the following subject. Take notes if you like and remember to include reasons and examples. You should then speak.
© 2009 Avaya Inc. All rights reserved.1 Chapter Two, Voic Pro Components Module Two – Actions, Variables & Conditions.
МОУ «Корочанская СОШ им. Д.К. Кромского». Every society has lots of common ways and problems.
The main problem between generations. There are many problems between parents and their children. It can be differences between the views of the younger.
Science and Technology The first computer. When you ask the question who invented the first computer, you definitely need to be prepared to hear many.
Транксрипт:

Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences

Program Overview: p Basic Principles of The Meaning-Text theory by Igor Melčuk. Language as a Universal Translator of Senses to Texts and Texts to Senses. Text analysis and text generation. The theory of integral linguistic description by Juri Apresjan. The grammar and the dictionary of language. 2. Two syntactic levels of sentence representation: surface syntax and deep syntax. December 21, Lectures

Program Overview: p The dependency tree structure as a syntactic representation of the sentence. Dependency tree vs. Constituent tree: advantages and drawbacks of both types of representation. Limits of the dependency tree. The hypothesis of two syntactic starts. 4. The notions of syntactic relation. Major classes of syntactic relations: actant, attributive, coordinative and auxiliary relation classes. 5. The notion of syntactic feature. Syntactic features vs. Semantic features. December 21, Lectures

Program Overview: p Actants and valencies. Active, passive and distant valencies. The government pattern of a dictionary entry. An overview of actant syntactic relations. The predicative relation. The agentive relation. Completive relations. 7. An overview of attributive syntactic relations. Grammatical Agreement. Numerals and Quantitative Constructions. The system of Quantification Syntax of Russian. 8. Grammatical coordination as a type of grammatical subordination. An overview of coordinative syntactic relations. December 21, Lectures

Program Overview: p Auxiliary syntactic relations. Analytical grammatical forms as an object of syntax. 10. Microsyntax of Language. Minor Type Sentences. Syntactic Idioms. 11. Lexical Functions in the Dictionary and the Grammar. 12. Syntactic description and syntactic rules. Dependency Syntax in NLP. Dependency Syntax in Machine Translation. Syntactically Tagged Corpus of Texts. December 21, Lectures

Lexical Functions Substitute LF synonyms, antonyms, converse terms, derivatives Collocate LF MAGN = 'a high degree of what is denoted by X OPER/FUNC... 6

7 Lexical Functions: Magn MAGN (disease) = grave MAGN (fog) = heavy MAGN (control) = strict MAGN (болезнь) = тяжелый MAGN (туман) = густой MAGN (контроль) = строгий December 21, Lectures 13-14

8 Lexical Functions: Oper / Func Family December 21, Lectures 13-14

9 Examples of LF Oper Oper 1 (invitation) = issue Oper 2 (invitation) = receive Oper 1 (defeat) = suffer Oper 2 (resistence) = encounter Oper 2 (respect) = enjoy December 21, Lectures 13-14

10 Examples of LF Func Func 1 (fear) = possess Func 2 (decision) = concern Func 1 (responsibility) = rest (with) Func 2 (vengeance) = fall (upon) December 21, Lectures 13-14

11 General Properties of Lexical Functions Universality Intralinguistic idiomaticity grave disease, heavy fog *heavy disease, *grave fog. Cross-linguistic idiomaticity Rus. tjazhelaja bolezn heavy disease Rus. gustoj tuman dense fog December 21, Lectures 13-14

12 General Properties of Lexical Functions (cont.) Paraphrasing Potential: He respects [X] his teachers He has [OPER 1 (S 0 (X))] respect [S 0 (X)] for his teachers He treats [LABOR 12 (S 0 (X))] his teachers with respect His teachers enjoy [OPER 2 (S 0 (X))] his respect December 21, Lectures 13-14

13 LF in Practical Applications Syntactic and Lexical Ambiguity Resolution in Parsers Idiomatic Translation of a Large Class of Set Expressions in Machine Translation Sentence Paraphrasing December 21, Lectures 13-14

14 Lexical Ambiguity Resolution to draw a distinction - provodit' razlichie Both verbs are extremely ambiguous: draw - more than 50 meanings provodit - more than 10 meanings December 21, Lectures 13-14

15 Syntactic Ambiguity Resolution support of the army 'support by the army' 'support (given) to the army' The president had [Y=OPER 2 (X)] the support [X] of the army December 21, Lectures 13-14

16 Syntactic Ambiguity Resolution The fear [X] of his wife possessed [Y = FUNC 1 (X)] Peter The fears of his wife infected Peter. December 21, Lectures 13-14

17 Idiomatic translation: LF Temp March: in–mart: v2 Tuesday: on– vtornik:v1 dawn: at– rassvet:na2 moment: at– moment:v1 Easter: at – pasxa:na1 December 21, Lectures 13-14

18 Sentence Paraphrasing X = CONV 12 (X) This group consists of 20 persons – Twenty persons comprise this group; X + Y = ANTI 1 (X) + ANTI 2 (Y) He began to observe the rules – He stopped violating the rules X = LABOR 12 + S 0 (X) He respects his parents – He treats his parents with respect December 21, Lectures 13-14

19 ETAP-3 Options 1.Machine Translation 2.Deeply Annotated Text Corpus of Russian (SynTagRus) 3.Translation System Based on UNL (Universal Networking Language) Interlingua 4.Synonymous and Quasi-Synonymous Paraphrasing of Utterances 5.Computer-Aided Language Learning Tool 6.New Developments: Semantics and Ontologies December 21, Lectures 13-14

SynTagRus Currently the treebank contains over 42,000 sentences (ca. over 600,000 words) belonging to texts of a variety of genres (contemporary fiction, popular science, newspaper and journal articles dated between 1960 and 2009, texts of online news etc.) and is steadily growing. It is an integral but fully autonomous part of the Russian National Corpus developed in a nationwide research project. It can be freely consulted on the Web ( 20December 21, Lectures 13-14

SynTagRus Since Russian is a language with relatively free word order, SYNTAGRUS adopted a dependency-based annotation scheme, in a way parallel to the Prague Dependency Treebank (see e.g. Hajič et al. 2000). 21December 21, Lectures 13-14

SynTagRus 22December 21, Lectures 13-14

SynTagRus What we have just seen is a screenshot of the dependency tree for the sentence (1) Наибольшее возмущение участников митинга вызвал продолжающийся рост цен на бензин, устанавливаемых нефтяными компаниями It was the continuing growth of petrol prices set by oil companies that caused the greatest indignation of the participants of the meeting. 23December 21, Lectures 13-14

SynTagRus Here, nodes represent words (lemmas) assigned morphological and part-of-speech tags, whilst arcs are labeled with names of syntactic links. The tagging uses about 75 syntactic links, half of them proposed in Igor Melčuks Meaning Text Theory (Melčuk 1988). 24December 21, Lectures 13-14

SynTagRus Normally, one token corresponds to one node in the dependency tree. There are however a noticeable number of exceptions. The main types of exceptions include: 25December 21, Lectures 13-14

SynTagRus 1)composite words like пятидесятиэтажный fifty-storeyed where one token corresponds to two or more nodes; 2)so-called phantom nodes for the representation of hard cases of ellipsis which do not correspond to any particular token in the sentence (cf. Я купил рубашку, а он галстук lit. I bought a shirt and he a tie, which is expanded into Я купил рубашку, а он купил PHANTOM галстук I bought a shirt and he bought PHANTOM a tie; 3)multiword expressions like по крайней мере at least where several tokens correspond to one node. 26December 21, Lectures 13-14

SynTagRus Morphological Tagging of SYNTAGRUS is based on a comprehensive morphological dictionary of Russian that counts about 130,000 entries (over 4 million word forms). ETAP-3 morphological analyzer uses the dictionary to produce morphological annotation of words belonging to the corpus, which includes the lemma, POS tags, and, depending on POS, a set of morphological features. 27December 21, Lectures 13-14

Syntactic Markup Language The syntactic markup language of the corpus is XML, because it is universally accepted and because it satisfies certain important requirements that the corpus must meet: 28December 21, Lectures 13-14

Syntactic Markup Language 1) the corpus must feature several layers of linguistic data that can be extracted from the annotation independently of each other; 2) it should be scalable and incrementable both quantitatively and qualitatively so that new types of information could be added easily; 3) it must be supplied by standard programming means for text parsing, sophisticated search, and conversion. 29December 21, Lectures 13-14

Structure Editor It is a complex software environment aimed at 1. automatic generation of morpho-syntactic and lexical functional annotation of texts, 2. manual editing of annotation results, and 3. fully manual annotation. Automatic generation is only possible for texts in natural languages that are supported by the ETAP-3 linguistic processor. 30December 21, Lectures 13-14

Structure Editor In principle, Structure Editor is not language- specific and can be used for annotation of texts in any natural language, primarily one with rich morphology. 31December 21, Lectures 13-14

Structure Editor StrEd allows the annotator to use diverse dialog interfaces in order to 1. view the whole text; 2. view a sentence as a table in which every line corresponds to a particular word of the sentence; 3. view the syntactic dependency tree for a sentence; 4. to view information on a particular word of the sentence; 5. view the discrepancies within the results of automatic tagging and manual tagging of a sentence. 32December 21, Lectures 13-14

Structure Editor StrEd view presenting the sample text at an initial stage with no morphosyntactic tagging performed. 33December 21, Lectures 13-14

Structure Editor As a rule, the first step of text annotation is automatic tagging. After it is obtained, the sentences are revised by the annotator, who detect and corrects the errors. To conveniently view the dependency tree structure and manipulate with it, Edit Structure dialog can be used. 34December 21, Lectures 13-14

Structure Editor 35December 21, Lectures 13-14

Structure Editor In this view, the annotator can perform all typical actions that modify the original tagging; in particular, the editor can rearrange the structure or delete the syntactic relations by simple mouse gestures, alter the lemmas, syntactic links, or grammatical features. If these operations do not suffice to obtain the desirable results, the annotator may continue the editing by switching to another dialog, intended for sentence properties viewing and manipulation, which allows performing less typical operations with the sentence. 36December 21, Lectures 13-14

Structure Editor 37December 21, Lectures 13-14

Morpho-syntactic annotation Петр крепко спит Петр крепко спит. 38December 21, Lectures 13-14

Sentence of average complexity Пчелиные ульи и муравьиные колонии служат хорошим примером: несмотря на относительную простоту организма отдельных насекомых и незначительные возможности их мозга, образуемый ими социум представляет собой весьма сложную систему, отличающуюся исключительной прочностью и слаженностью функционирования. Beehives and ant colonies serve as a good example: despite a relative simplicity of the body of individual insects and insignificant potentials of their brains, the social medium formed by them is a very complex system which is distinguished by exceptional strength and harmony of functioning. 39December 21, Lectures 13-14

Morpho-syntactic annotation 40December 21, Lectures 13-14

Lexical Functional Annotation The newest version of SYNTAGRUS contains partial lexical functional annotation: for collocations that could be presented with the apparatus of lexical functions, the tagging includes information on values and attributes of such lexical functions. 41December 21, Lectures 13-14

Lexical Functional Annotation 42December 21, Lectures 13-14

Lexical Functional Annotation 43December 21, Lectures 13-14

Lexical Functional Annotation Lexical functional annotation of a corpus sentence can be produced in three ways: 1. automatically, together with syntactic parsing by running the ETAP-3 parser on the sentence; 2. automatically, by running a subset of ETAP-3 rules on the ready syntactic structure of the sentence approved by the expert; using the StrEd option Let ETAP find them (LFs), 3. manually. The list of LF argument and values, irrespective of the way it was produced, can be manually edited: information on functions can be modified, added, or removed. 44December 21, Lectures 13-14

Annotation Tools Considering the significant size of S YN T AG R US (over 500,000 words ) the annotation process has to be automated to the fullest extent possible. On the other hand, automatic annotation has to allow for verification and, if need be, correction by a human expert. This means that the environment has to provide for comfortable viewing and editing of annotated texts. 45December 21, Lectures 13-14

Intellectual Debugger In order to diagnose nontrivial annotation errors, a powerful instrument, Intellectual Debugger (IntelDeb), was specially created to verify, in one quick step, whether the current syntactic annotation of a sentence (probably the result of several human interventions) is compatible with at least one of the parsing in principle achievable through the automatic ETAP-3 parser. 46December 21, Lectures 13-14

Intellectual Debugger IntelDeb can be considered as a specific parser which, unlike the regular ETAP parser, does not produce multiple parses of a sentence. Instead, if the IntelDeb finds that the structure being subject to verification is inadmissible, its goal is to diagnose the cause, or causes, of the situation as precisely as possible. 47December 21, Lectures 13-14

Intellectual Debugger The underlying idea is to run the parser consecutively on all binary subtrees as presented by the annotation and see whether the existing syntactic rules and dictionaries permit the construction of such subtrees. The algorithm checks all rules with regard to a specific syntactic link (there may be dozens of such rules and all possible lemmas for the given pair of words, starting with the rules and lemmas cited in the annotation but gradually loosening the grip and resorting to other rules and lemmas if the current choice cannot be confirmed. 48December 21, Lectures 13-14

The Hypothesis of Two Syntactic Starts We will be dealing with a special type of sentences with embedded (semi-)phraseological expressions like He does the Devil knows what or its Russian equivalent Он занимается чёрт знает чем. December 21, Lectures

The Hypothesis of Two Syntactic Starts It is very difficult to build adequate syntactic representations for such sentences. A controversial solution is proposed for this problem, admitting that sentences of this type have two syntactic starts, or syntactic heads. December 21, Lectures

December 21, Lectures Problem (1) Он занимается чёрт знает чем (2) He does the Devil knows what (3) Мне было – так лестно / Лезть за тобою – Бог / Знает куда! (Marina Tsvetayeva) (4) I felt so flattered to climb after you God knows where

December 21, Lectures Haspelmath, Martin. Indefinite pronouns. Oxford Studies in Typology and Linguistic Theory. Oxford: Oxford University Press, Lakoff, George. Syntactic Amalgams. // Papers from the 10th Meeting of the Chicago Linguistic society, 1974, pp References

December 21, Lectures Testelets Y., E. Bylinina. Sluicing-Based Indefinites in Russian. // Formal Approaches to Slavic Linguistics 13: The South Carolina Meeting. Ann Arbor, MI: Michigan Slavic Publications. 2005, Апресян, Ю.Д., Иомдин Л.Л. Конструкции типа НЕГДЕ СПАТЬ в русском языке: синтаксис и семантика. (Constructions of the NEGDE SPAT' type in Russian: Syntax and semantics.) Semiotika i informatika, No. 29. Moscow, 1990, pp References

December 21, Lectures Why is it difficult to build adequate surface syntactic representations for these sentences? Because it is unclear what the syntactic role of the verb знать or know in (1)-(4). This verb cannot be the absolute head of the surface syntactic tree as in (1 ) Один чёрт знает, чем он занимается or (2 ) The devil only knows what he does where знает and knows are the tops of the trees.

December 21, Lectures Indeed, if we compare (2) and (2 ) (2) He does the Devil knows what (2 ) The devil only knows what he does we will see that (2 ) is neither syntactically nor semantically equivalent to (2): John only knows what he does *He does John knows what (2), in contrast to (2 ), expresses disapproval, negative attitude of the speaker toward the subject and his activity

December 21, Lectures There is no reasonable syntactic governor for knows in (1) and (2). If we subordinate it to the main verb of the sentence we shall face the problem of what the syntactic relation between the verbs is.

December 21, Lectures We might view the syntactic governor of knows in the pronoun where. Phraseological expressions like devil knows may be suspected of having transformed into merged lexical units equivalent to indefinite particles like –ever.

December 21, Lectures Such a solution does not hold, since the embedded constructions of this type are not confined to phraseological expressions cited and may include rather free clauses formed with different verbs.

December 21, Lectures Когда я был подростком, сильное впечатление на меня произвела вычитанная не помню уже в какой книге история панамской авантюры. When I was a youth I was deeply impressed by the story of the Panama adventure that I read in I dont remember which book (Novoye Vremya)

December 21, Lectures Even the second parts of these constructions are not necessarily interrogative pronominal words. They may be represented, in Russian, by conjunction или or or the particle ли whether

December 21, Lectures Его судят за преступление, которое он неизвестно совершил или нет lit. He is being tried for a crime which it is not clear if he committed or not

December 21, Lectures Кроме того, есть еще такие сдерживающие факторы, как наличие Северной Кореи с непонятно имеющимся ли у нее ядерным оружием Besides, there are such deterrent factors as the presence of North Korea with nuclear weapons that it might or might not have lit. … the presence of North Korea with it-is-unclear-whether- available-to-it nuclear weapons

December 21, Lectures Whilst there is no evident syntactic governor for the second verbs of the sentences considered, the pronominal words have as many as two plausible candidates for governor.

December 21, Lectures (2) He does the Devil knows what

December 21, Lectures Оn the one hand, one may suggest that what чем instantiates the 1st completive valency of do. In the Russian example (1)заниматься, it is the only word of sentence (1) that stands in the instrumental case – exactly the one that is required by заниматься.

December 21, Lectures On the other hand, the same pronominal word may be viewed as instantiating the 1st completive valency of the verb know, the way it does in isolated (elliptic) sentences like I know what.

December 21, Lectures So, the syntactic structure of (1) has two oddities at a time: one word in need of a syntactic parent (know) has no good candidate while another word (what) has two.

December 21, Lectures The duality of syntactic dominance for what in (2) is far from trivial and requires further reasoning. In simple single-clause sentences pronominal words like what cannot depend on verbs that, unlike know, do not take propositional complements: *I do what Solution

December 21, Lectures Such pronouns may either form a special question like What do you do? – in which case the pronoun is interrogative too. In Russian, there can also be a highly colloquial general question like Вы занимаетесь чем? Do you do anything? where чем is an indefinite pronoun and really means anything

December 21, Lectures Assuming that (2) is not a single-clause sentence, we should define what clauses it may consist of. The most natural assumption is that (2) consists of two clauses, one constituted by verb does and the other constituted by verb knows.

December 21, Lectures Where are the boundaries of the two clauses? The left-hand boundaries of both clauses are evident: for the first clause it is the beginning of the whole sentence and for the second clause it is the word devil which is the subject of the verb knows.

December 21, Lectures Hypothesis: the right-hand boundaries of both clauses are the same and coincide with the end of the sentence, so that the pronominal word what belongs to both clauses.

December 21, Lectures If we now compare (2) with (5) John know what he does, we will see that

December 21, Lectures the lack of such subordination distinguishes the second clause of (2) from the subordinate clause of (5). The head of the second clause of (2) remains without a syntactic parent at all. This is the most crucial characteristic of this type of sentences.

December 21, Lectures Sentences (5) and (2) are unfolding differently: (5) is smoothly produced by the speaker, (2) has a sort of leap amidst generation: before the first clause is finished, the second clause starts to evolve, and, after some time, the two proceed together until the end of the whole sentence.

December 21, Lectures The second clause in (2) behaves like a tributary to a river, which contributes to its course.

December 21, Lectures Evolution of sentence (2) resembles the correlation between the main and the parenthetical clauses if the latter is situated in the middle of the sentence, as in (6) At this moment a young man (this was John) rose from his place

December 21, Lectures The drastic difference between (6) and (2) is that parenthetical clauses are finished sooner than the main clauses while in (2) the tributary clause ends together with the first clause.

December 21, Lectures If this stand is taken, we will have to admit that sentences of this type have two syntactic starts.

December 21, Lectures They violate the fundamental requirement of the surface syntactic component of the Meaning Text theory that the syntactic structure of any sentence should be a tree.

December 21, Lectures Discussion One more syntactic particularity is that, in Russian, expressions like чёрт знает что may include a personal pronoun whose syntactic status is unclear

December 21, Lectures Ему давно уже пора дом покупать, снимает чёрт его знает что! lit. Its high time he buys a house, he rents the Devil knows him what (Alexander Torin, Gelikon Plus, St. Petersburg, 2000);

December 21, Lectures Деньги уходят чёрт их знает куда lit. Money goes the devil knows it where (Vladimir Lenin, in a letter to his mother, 1895).

December 21, Lectures The constructions discussed are subject to rather tight lexical restrictions.

December 21, Lectures Within the phraseological subset, the constructions are formed with the verbs знать and, occasionally, ведать know, almost always in the present tense, whose subjects can be either 1) nouns чёрт, дьявол devil, леший wood goblin, бес and бис demon, шут jester and пёс dog (the last two are probably euphemisms for чёрт), practically always in the singular

December 21, Lectures ) derogatory nouns like фиг or хрен that are in fact euphemisms for an obscene word, as in В стране скоро фиг знает что начнется Soon, goodness knows what will start in this country, or this obscene word itself 3) nouns Бог God, Господь Lord, Аллах Allah, Всевышний Almighty, as in Mне не нравится, что на юбилей города приглашают Бог знает кого I don't like it that they invite God knows whom to attend the city anniversary.

December 21, Lectures Первая корректура ушла из издательства Будда знает сколько времени назад lit. The first proof-sheet left the publisher Buddha knows how long ago (from a posting about the publication of a manuscript on East Asia).

December 21, Lectures The semantics of the Devil knows what type of construction is very interesting and deserves special attention and careful study.

December 21, Lectures The meanings of collocations that represent the construction are remarkably close to each other. All of them have a strong evaluative component that expresses the speakers negative attitude toward the participant or circumstance of the situation conveyed by the collocations.

December 21, Lectures There is a noticeable difference of meaning between the variety of collocations based on God and the remaining collocations. In the former, the speakers negative attitude becomes milder and is substituted by regret and, possibly, compassion. To my mind, the speakers negative attitude belongs to the assertive part of the meaning rather than the presupposition. In particular, this may account for the fact that sentences like # He betrayed the devil knows whom are infelicitous:

December 21, Lectures in all probability, the semantics of the verb betray be disloyal to requires that its object deserve loyalty and the collocation Devil knows who introduces an unknown and/or bad person who does not deserve loyalty.

December 21, Lectures The construction considered here has a clear negative trend. As a matter of fact, expressions like Devil knows what, God knows where etc.) introduce unknown entities. He went God knows where really means the same as Nobody knows where.

December 21, Lectures At least some of the collocations that represent the construction lack compositionality. An example is the expression containing the Russian word сколько or its English equivalent how much: sentences like Он получил чёрт знает сколько денег He got the devil knows how much money refer to situations that involve an indefinitely large amount of money but never to situations that involve an indefinitely small amount of money.

December 21, Lectures The constructions considered here are unique and have no close cognates in the language. In particular, the constructions like Иди куда хочешь Go wherever you please, Oн танцует с кем попало He would dance with the first person he comes across, Ребенок ест что ни попадя The child eats whatever comes to hand that share with our constructions the presence of interrogative pronouns and the meaning of indefiniteness are nonetheless drastically different from them.

December 21, Lectures Most importantly, they do not have an additional syntactic start.

Microsyntax of Language Microsyntax of Language. Minor Type Sentences. Syntactic Idioms. December 21, Lectures

December 21, Lectures Syntactic Idioms Syntactic phrasemes are idiomatic units that have syntactic particularities not shared by common non-idiomatic expressions. The term syntactic phraseme was introduced in [Boguslavsky-Iomdin 1982].

December 21, Lectures Syntactic Idioms The term has been frequently used by Igor A. Melčuk. Jackendoff (1997) uses the term syntactic idiom. He focuses on the presence of variable parts in the syntactic idiom (like The hell with X or Russian Z- у не до X-a – мне не до смеху I am past laughter). Jackendoff, Ray. Twisting the Night Away. // Language, Vol. 73 (1997), pp. 534–559.

December 21, Lectures Syntactic Idioms What place in the general syntactic system of language is claimed by syntactic idioms?

December 21, Lectures Syntax and Microsyntax The general syntactic system of the language can in fact be divided into two unequal parts, or two syntaxes : the basic syntax of language, which embraces a comparatively small number of basic constructions; the peripheral syntax, which has a much greater number of constructions.

December 21, Lectures Syntax and Microsyntax Basic constructions are frequent, non-idiomatic, and built by very general grammar rules. Every one of the peripheral syntactic constructions is encountered in the text much less frequently than any basic one, although their overall occurrence is very high. These latter constructions are varied and extremely difficult to incorporate into the general system of syntax.

December 21, Lectures Syntax and Microsyntax The part of the syntax constituted by peripheral constructions is sometimes referred to as minor type sentences. I propose to use the term microsyntax to account for this part of syntax.

December 21, Lectures Syntax and Microsyntax This division has nothing to do with greater or lesser importance of any of the two portions of the syntax. The reason is that the study of peripheral linguistic structures requires much more individual and fine tools than that of basic structures.

December 21, Lectures Syntax and Microsyntax Microsyntax consists of objects of two main types: nonstandard syntactic constructions; syntactic idioms.

December 21, Lectures Syntax and Microsyntax The boundary between these objects is not very distinct. The main discriminating criterion is the degree of lexicalization.

December 21, Lectures Nonstandard Syntactic Constructions Russian modal impersonal constructions with an infinitive and a dative: Z-у X-овать Z is in for X Тебе выходить на следующей you must get off at the next stop Хозяйке всю ночь посуду мыть The hostess is in for a night of dishwashing

December 21, Lectures Nonstandard Syntactic Constructions Russian modal impersonal constructions with an infinitive, a dative, and a negation: Z-у не X-овать There is no chance that Z will do X Этому не бывать This will never happen Не видать тебе золота, покамест не достанешь крови человеческой! (Н.В.Гоголь) You will never see gold until you procure human blood (Nikolai Gogol)

December 21, Lectures Nonstandard Syntactic Constructions Coordinative constructions with lexically identical elements: ну упал и упал lit. he fell and fell his fall seemed to have no dramatic consequences бывают аварии и аварии lit. there are accidents and accidents different accidents take place

December 21, Lectures Nonstandard Syntactic Constructions Coordinative constructions with lexically identical elements: сказал, что его зовут так-то и так-то he said that his name is so and so (he gave one name and not two); надо сделать то-то и то-то we have to do this and this (probably only one thing is to be done)

December 21, Lectures Nonstandard Syntactic Constructions Vocative construction with lexically identical elements: Вась, а Вась Vasya, oh Vasya Vasya, can you hear me Иван Иваныч, а Иван Иваныч Ivan Ivanovich, oh Ivan Ivanovich

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? A curious lexical phenomenon in Russian associated with the word быть: in the future tense (буду, будешь etc) it is equivalent to буду есть or буду пить I will eat or I will drink:

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? Я не буду кашу I will not eat porridge Ты что будешь? What will you have? *Я не был кашу, *Ты что был?, *Я не кашу, *Ты что?

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? In no case could such expressions be considered as ellipsis, because they do not require any pre-text in which a verb like есть or пить occurs.

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? Further, these expressions obey very specific semantic restrictions: only words denoting food or drinks (plus pronouns) could be used with буду.

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? Accordingly, one cannot say something like *Я буду аспирин I will have an aspirin even though the normal Russian verbs to be used with the name of a medicine are пить or выпить: Выпей таблетку аспирина Take a pill of aspirin, Она всегда пьет аспирин, когда у нее болит голова She always takes aspirin when she has a headache

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? Interestingly, this construction can only refer to an actual event of eating or drinking: Поедем на Кавказ, будем пить вино We will travel to the Caucasus and will drink wine but never *Поедем на Кавказ, будем вино

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? This means that the construction can only be used in the sense of the immediate future.

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? Additionally, the construction normally refers to a situation where food or drink is offered by someone and taken by somebody else. So it would be common to say something like мы будем кофе и рогалики we will have coffee and rolls when addressing to a waiter or accepting his offer but totally unacceptable when the company in a café discusses their menu: *давай будем кофе и рогалики. lets have coffee and rolls

December 21, Lectures Nonstandard Syntactic Construction or Word Sense? One cannot imagine that someone says вот увидишь, он будет кофе youll see, he will be (having) coffee when predicting the behavior of a person sitting alone in his kitchen without anyone waiting on him.

December 21, Lectures Nonstandard Syntactic Construction! My solution is that it is a construction rather than a word sense because one has to note too many things to postulate a word sense of the verb быть.

December 21, Lectures Syntactic Idioms Z-у не до X-a Z is past X, Z is in no mood for X Z is busy with more important things than X and Z believes that X can be disregarded: Here, two elements are lexically bound: не not and до up to

December 21, Lectures Syntactic Idioms руки чешутся (сделать что-л.) ones fingers are itching (to do smth) У меня руки чешутся побить его My fingers itch to give him a thrashing

December 21, Lectures I will be considering a polysemous Russian adverbial syntactic idiom ВСЁ РАВНО: всё равно 1 all the same; as in Я всё равно сижу дома I am staying at home all the same; все равно 2 makes no difference, as in Нам всё равно, куда ехать We dont care where well be going; всё равно 3 tantamount; as in Сняться в плохом фильме всё равно что плюнуть в вечность To star in a bad movie is equivalent to spitting into eternity. What will come next

December 21, Lectures Syntactic phrasemes всё равно Two fixed lexical elements Three clearly discernible senses

December 21, Lectures None of these units can be considered a nonsyntactic idiom because every one of them has syntactic and combinatorial properties not shared by any other lexical units of Russian. Syntactic phrasemes всё равно

December 21, Lectures Identification of a syntactic idiom in the text is a serious problem Identification of Syntactic Idioms

December 21, Lectures Соглашаться на всё равно как и не соглашаться ни на что – одинаково неприемлемые решения To agree to everything, like not to agree to anything are equally unacceptable solutions Почти всё равно нулю Almost everything is equal to zero Не всё ли тебе равно? Isnt it all the same to you? Identification of Syntactic Idioms

December 21, Lectures Он работает в одиночку. He works alone Он шел в одиночку. He was going alone vs. He was going to a solitary cell Он влюбился в одиночку. He fell in love with a single mother Identification of Syntactic Idioms

December 21, Lectures Он что-то знает He knows something. Что-то он теперь поделывает? I wonder what he is doing these days Identification of Lexical Units

December 21, Lectures The full description of the syntactic behavior of a syntactic idiom must include: (1) lexical and morphological identification of the constituents; (2) identification of syntactic relations obtaining between the idioms constituents, and their direction; (3) determination of syntactic peculiarities that ensure the interaction of the idiom with other elements of the sentence. Description of a Syntactic Idiom

December 21, Lectures The lexical and morphological identification of all three lexical units of the idiom vocable is the same: they are composed of the noun всё all in the nominative singular and the adjective равный in the short form singular neuter. Syntactic Idioms всё равно

December 21, Lectures Мне всё безразлично lit. to me everything is indifferent All is the same to me Мне всё равно Its all the same to me There is no subject in the idiom but it can be added: Мне это всё равноThis is all the same to me Мне всё равно, куда он пойдет I dont care where he will go Мне всё всё равно lit. all is all the same to me Syntactic Idioms всё равно

December 21, Lectures The type of the syntactic relation that should be postulated between the syntactic head and the syntactic daughter is not predicative. Syntactic Idioms всё равно

December 21, Lectures всё равно 1 is a sentential adverb Its behavior is the same as that of nonidiomatic sentential adverbs like наверняка surely, непременно certainly, точно definitely, напрасно for nothing. Usually, it depends on the sentence head – a finite verb or an infinitive: Syntactic Idiom всё равно 1

December 21, Lectures Всё равно я его люблю I love him all the same Тебе всё равно вставать рано You will have to get early in any case Он всё равно хороший He is good all the same Syntactic Idiom всё равно 1

December 21, Lectures Всё равно 1 cannot accept any syntactic dependents, even particles: *Не всё равно я его люблю I love him not all the same *Тебе совершенно всё равно вставать рано You will have to get early in perfectly any case *Он почти всё равно хороший He is good almost all the same. Syntactic Idiom всё равно 1

December 21, Lectures Elements of всё равно 1 have a fixed order and cannot be penetrated by any other words. Syntactic Idiom всё равно 1

December 21, Lectures Of the three idioms, всё равно 1 has advanced the most toward the single word. The only notable distinction is phonetic and prosodic (two accents, nonreduced [o] in the element всё Syntactic Idiom всё равно 1

December 21, Lectures всё равно 2 is a predicative adverb. It resembles other predicatives like жаль a pity. The syntactic role played by всё равно 2 in the sentence is that of a part of the predicate, the other part of which is represented by a copula: : Ему было всё равно) It was all the same to him Syntactic Idiom всё равно 2

December 21, Lectures Всё равно 2 has the same set of syntactic features as predicate words like интересно I wonder, любопытно I am curious Syntactic Idiom всё равно 2

December 21, Lectures feature predqu that represents a words ability to accept a subject clause (an indirect or an alternative question): Ей было всё равно, куда идти It was all the same to her where to go Syntactic Idiom всё равно 2

December 21, Lectures feature predthat that represents a words ability to accept a subject clause introduced by the conjunction что that: Ей было всё равно, что ребенок устал и хочет спать It was all the same to her that the child was tired and sleepy Syntactic Idiom всё равно 2

December 21, Lectures всё равно 2 subcategorizes a noun in the dative which implements the idioms subject valency as it expresses the subject of the state. This subject need not be human but it must be a volitional thing: Syntactic Idiom всё равно 2

December 21, Lectures Дамы здесь ни при чем, дамам это всё равно, – отвечал пират, буквально сжигая швейцара глазами, – а это милиции не всё равно! (М. Булгаков, Мастер и Маргарита). The ladies have nothing to do with it, it is all the same to the ladies… but it is not all the same to the police Syntactic Idiom всё равно 2

December 21, Lectures Elements of the idiom also have the fixed order but under certain conditons (in the negative general question) may be intertwined by several other words: Не всё ли тебе равно, чтó со мной будет? Isnt it all the same to you what will become of me? Syntactic Idiom всё равно 2

December 21, Lectures In these sentences, some words may depend on the syntactic daughter of the idiom rather than its syntactic head. Syntactic Idiom всё равно 2

December 21, Lectures всё равно 3 is a predicative adverb, too. However, its syntactic properties are extremely idiosyncratic and do not seem to have close analogies to other lexical units of Russian. Syntactic Idiom всё равно 3

December 21, Lectures всё равно 3 subcategorizes a conjunction что that or как as: Никогда не следует сожалеть, что человека обуревают страсти. Это всё равно, как если бы мы стали сожалеть, что он человек One should never regret that man is passionate. This is equivalent to our regretting that he is man Syntactic Idiom всё равно 3

December 21, Lectures всё равно 3 is the part of the predicate alongside the copula. However, it imposes constraints on the subject which can only be a nomen actionis, the pronoun это this or an infinitive. In the latter case, the conjunction must be followed by another infinitive so that the sentence become a bi-infinitive one. Syntactic Idiom всё равно 3

December 21, Lectures In contrast to всё равно 2, всё равно 3 does not accept a subject of the state: *Сняться в плохом фильме мне всё равно что плюнуть в вечность To star in a bad movie is equivalent to me to spitting into eternity. Syntactic Idiom всё равно 3

December 21, Lectures As a matter of fact, всё равно 3 has no subject valency at all. In the utterance Сняться в плохом фильме для меня всё равно что плюнуть в вечность For me, to star in a bad movie is equivalent to spitting into eternity the expression for me describes the subject of the situation evaluation and not the subject of equivalence. Syntactic Idiom всё равно 3

December 21, Lectures The fact of polysemy of any syntactic idiom entail additional difficulties in NLP where the system must not only discern the syntactic idiom from free phrases but also distinguish between the senses within a vocable. Syntactic Idioms всё равно

December 21, Lectures Мне всё равно лететь I have to fly all the same Мне всё равно, лететь или не лететь. It is all the same to me whether I have to fly or not Мне всё равно, чёрт возьми, чистить картошку или мыть туалет! To hell with it, it is all the same to me whether I should peel the potatoes or scrub the toilet vs. To hell with it, I have to peel the potatoes or scrub the toilet all the same Syntactic Idioms всё равно

December 21, Lectures In these cases, a helpful method of ambiguity resolution is interactive man-machine sense disambiguation. Syntactic Idioms всё равно

December 21, Lectures Several syntactic relations: quantitative quantitative-auxiliar approximative-quantitative approximative-ordinal Russian Syntax of Quantification

December 21, Lectures Approximative-Ordinal Syntactic Relation (1) Он приедет числа двадцатого he will come approximately on the twentieth (2) Вчерашний день, часу в шестом, Зашел я на Сенную. Yesterday, at about six oclock, I entered the Hay Square (Nikolay Nekrasov) (3) *Машина остановилась цикле на первом. The machine stopped at about the first cycle (4) Она вернулась только часу в первом. She returned at between twelve and one

December 21, Lectures Quantitative Syntactic Relation (1а) Книга называется "Три товарища". (1б) Книга называется «Двадцать три товарища". (2а) Он увидел трех товарищей. (2б) *Он увидел двадцать трех товарищей. (2в) Он увидел двадцать три товарища. (3) Он поговорил с тремя товарищами. (4) Он знал одного лингвиста. (5) Он знал двадцать одного лингвиста. (6а) Имеется десять красок. (6б) Имеется примерно десять красок. (6в) Имеется десять различных акварельных красок. (6г) Имеется примерно десять различных акварельных красок.

December 21, Lectures Approximative-Quantitative Syntactic Relation (1) Мы провели там часа два. We spent there about two hours (2) Можно уйти часа в два. We may go at about two oclock (3) Он заработает тысяч пять с половиной. He will earn about five and a half thousand (4) Он заработает тысяч пять с половиной рублей. He will earn about five and a half thousand roubles

December 21, Lectures (1a) Книга называется "Три товарища". (1b) *Книга называется "Товарища три". (2а) Имеется десять красок. (2b) Имеется примерно десять красок. (2c) ? Имеется красок десять. (3a) Имеется десять различных акварельных красок. (3b) Имеется примерно десять различных акварельных красок. (3c) *Имеется различных акварельных красок десять. Quantiative and Approximative- Quantitative Syntactic Relation

December 21, Lectures (4a) Я потратил двадцать два рубля. (4b) Я потратил примерно двадцать два рубля. (4c) Я потратил рубля двадцать два. (5a) Я потратил двадцать один рубль. (5b) Я потратил примерно двадцать один рубль. (5c) ?? Я потратил рубль двадцать один. Quantiative and Approximative- Quantitative Syntactic Relation