Speech and Language Processing Lecture 3 Chapter 3 of SLP.

Презентация:



Advertisements
Похожие презентации
In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting (rearranging) objects.
Advertisements

What to expect? How to prepare? What to do? How to win and find a good job? BUSINESS ENGLISH COURSE NOVA KAKHOVKA GUMNASUIM 2012.
11 BASIC DRESS-UP FEATURES. LESSON II : DRESS UP FEATURES 12.
DRAFTING TECHNIQUES I 136. Here is a basic shape. From here, we will do some advanced drafting once we put this shape on a sheet as a drawing. Select.
Ecology and fashion. Project was done by Borodina Ludmila from 10 B.
If you are ready for the lesson, let's start. What kinds of schools do you know? Public school State school Boarding school All – boys school All – girls.
My Healthy Lifestyle. You hear a lot about living a healthy lifestyle, enough that the phrase 'healthy lifestyle' may be one we'd like to permanently.
Sequences Sequences are patterns. Each pattern or number in a sequence is called a term. The number at the start is called the first term. The term-to-term.
REFERENCE ELEMENTS 64. If your REFERENCE ELEMENTS toolbar is not in view and not hidden, you can retrieve it from the toolbars menu seen here. 65.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Customer-to-Provider Connectivity with BGP Connecting a Multihomed Customer to Multiple Service.
HPC Pipelining Parallelism is achieved by starting to execute one instruction before the previous one is finished. The simplest kind overlaps the execution.
The category of mood. The category of mood is an explicit verbal category expressing the relation of the action denoted by the predicate to reality as.
Family Relationships (Семейные Отношения). Family How could you describe the word family? First of all family means a close unit of parents and their.
Plan: Key English Test (KET) Preliminary English Test (PET) First Certificate in English (FCE) Certificate in Advanced English (CAE) Certificate in Proficiency.
Учимся писать Эссе. Opinion essays § 1- introduce the subject and state your opinion § 2-4 – or more paragraphs - first viewpoint supported by reasons/
The most important technological inventions Think of as many words as possible related to the topic Think of as many words as possible related to the.
Lecture # Computer Architecture Computer Architecture = ISA + MO ISA stands for instruction set architecture is a logical view of computer system.
COLLOQUIALISMS AND THEIR SPHERE OF COMMUNICATION.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Route Selection Using Policy Controls Applying Route-Maps as BGP Filters.
Goals and values. What are goals? Goals can be anything you want to achieve in a short period of time or in a long time period. Eg, get better grade,
Транксрипт:

Speech and Language Processing Lecture 3 Chapter 3 of SLP

7/24/2015 Speech and Language Processing - Jurafsky and Martin 2 Today English Morphology Finite-State Transducers

7/24/2015 Speech and Language Processing - Jurafsky and Martin 3 Words Finite-state methods are particularly useful in dealing with a lexicon Many devices, most with limited memory, need access to large lists of words And they need to perform fairly sophisticated tasks with those lists So well first talk about some facts about words and then come back to computational methods

7/24/2015 Speech and Language Processing - Jurafsky and Martin 4 English Morphology Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes We can usefully divide morphemes into two classes Stems: The core meaning-bearing units Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions

7/24/2015 Speech and Language Processing - Jurafsky and Martin 5 English Morphology We can further divide morphology up into two broad classes Inflectional Derivational

7/24/2015 Speech and Language Processing - Jurafsky and Martin 6 Word Classes By word class, we have in mind familiar notions like noun and verb Well go into the gory details in Chapter 5 Right now were concerned with word classes because the way that stems and affixes combine is based to a large degree on the word class of the stem

7/24/2015 Speech and Language Processing - Jurafsky and Martin 7 Inflectional Morphology Inflectional morphology concerns the combination of stems and affixes where the resulting word: Has the same word class as the original Serves a grammatical/semantic purpose that is Different from the original But is nevertheless transparently related to the original

7/24/2015 Speech and Language Processing - Jurafsky and Martin 8 Nouns and Verbs in English Nouns are simple Markers for plural and possessive Verbs are only slightly more complex Markers appropriate to the tense of the verb

7/24/2015 Speech and Language Processing - Jurafsky and Martin 9 Regulars and Irregulars It is a little complicated by the fact that some words misbehave (refuse to follow the rules) Mouse/mice, goose/geese, ox/oxen Go/went, fly/flew The terms regular and irregular are used to refer to words that follow the rules and those that dont

7/24/2015 Speech and Language Processing - Jurafsky and Martin 10 Regular and Irregular Verbs Regulars… Walk, walks, walking, walked, walked Irregulars Eat, eats, eating, ate, eaten Catch, catches, catching, caught, caught Cut, cuts, cutting, cut, cut

7/24/2015 Speech and Language Processing - Jurafsky and Martin 11 Inflectional Morphology So inflectional morphology in English is fairly straightforward But is complicated by the fact that are irregularities

7/24/2015 Speech and Language Processing - Jurafsky and Martin 12 Derivational Morphology Derivational morphology is the messy stuff that no one ever taught you. Quasi-systematicity Irregular meaning change Changes of word class

7/24/2015 Speech and Language Processing - Jurafsky and Martin 13 Derivational Examples Verbs and Adjectives to Nouns -ationcomputerizecomputerization -eeappointappointee -erkillkiller -nessfuzzyfuzziness

7/24/2015 Speech and Language Processing - Jurafsky and Martin 14 Derivational Examples Nouns and Verbs to Adjectives -alcomputationcomputational -ableembraceembraceable -lessclueclueless

7/24/2015 Speech and Language Processing - Jurafsky and Martin 15 Example: Compute Many paths are possible… Start with compute Computer -> computerize -> computerization Computer -> computerize -> computerizable But not all paths/operations are equally good (allowable?) Clue Clue -> *clueable

7/24/2015 Speech and Language Processing - Jurafsky and Martin 16 Morphology and FSAs Wed like to use the machinery provided by FSAs to capture these facts about morphology Accept strings that are in the language Reject strings that are not And do so in a way that doesnt require us to in effect list all the words in the language

7/24/2015 Speech and Language Processing - Jurafsky and Martin 17 Start Simple Regular singular nouns are ok Regular plural nouns have an -s on the end Irregulars are ok as is

7/24/2015 Speech and Language Processing - Jurafsky and Martin 18 Simple Rules

7/24/2015 Speech and Language Processing - Jurafsky and Martin 19 Now Plug in the Words

7/24/2015 Speech and Language Processing - Jurafsky and Martin 20 Derivational Rules If everything is an accept state how do things ever get rejected?

7/24/2015 Speech and Language Processing - Jurafsky and Martin 21 Parsing/Generation vs. Recognition We can now run strings through these machines to recognize strings in the language But recognition is usually not quite what we need Often if we find some string in the language we might like to assign a structure to it (parsing) Or we might have some structure and we want to produce a surface form for it (production/generation) Example From cats to cat +N +PL

7/24/2015 Speech and Language Processing - Jurafsky and Martin 22 Finite State Transducers The simple story Add another tape Add extra symbols to the transitions On one tape we read cats, on the other we write cat +N +PL

7/24/2015 Speech and Language Processing - Jurafsky and Martin 23 FSTs

7/24/2015 Speech and Language Processing - Jurafsky and Martin 24 Applications The kind of parsing were talking about is normally called morphological analysis It can either be An important stand-alone component of many applications (spelling correction, information retrieval) Or simply a link in a chain of further linguistic analysis

7/24/2015 Speech and Language Processing - Jurafsky and Martin 25 Transitions c:c means read a c on one tape and write a c on the other +N:ε means read a +N symbol on one tape and write nothing on the other +PL:s means read +PL and write an s c:ca:at:t +N: ε + PL:s

7/24/2015 Speech and Language Processing - Jurafsky and Martin 26 Typical Uses Typically, well read from one tape using the first symbol on the machine transitions (just as in a simple FSA). And well write to the second tape using the other symbols on the transitions.

7/24/2015 Speech and Language Processing - Jurafsky and Martin 27 Ambiguity Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state. Didnt matter which path was actually traversed In FSTs the path to an accept state does matter since different paths represent different parses and different outputs will result

7/24/2015 Speech and Language Processing - Jurafsky and Martin 28 Ambiguity Whats the right parse (segmentation) for Unionizable Union-ize-able Un-ion-ize-able Each represents a valid path through the derivational morphology machine.

7/24/2015 Speech and Language Processing - Jurafsky and Martin 29 Ambiguity There are a number of ways to deal with this problem Simply take the first output found Find all the possible outputs (all paths) and return them all (without choosing) Bias the search so that only one or a few likely paths are explored

7/24/2015 Speech and Language Processing - Jurafsky and Martin 30 The Gory Details Of course, its not as easy as cat +N +PL cats As we saw earlier there are geese, mice and oxen But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes Cats vs Dogs Fox and Foxes

7/24/2015 Speech and Language Processing - Jurafsky and Martin 31 Multi-Tape Machines To deal with these complications, we will add more tapes and use the output of one tape machine as the input to the next So to handle irregular spelling changes well add intermediate tapes with intermediate symbols

7/24/2015 Speech and Language Processing - Jurafsky and Martin 32 Multi-Level Tape Machines We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape

7/24/2015 Speech and Language Processing - Jurafsky and Martin 33 Lexical to Intermediate Level

7/24/2015 Speech and Language Processing - Jurafsky and Martin 34 Intermediate to Surface The add an e rule as in fox^s# foxes#

7/24/2015 Speech and Language Processing - Jurafsky and Martin 35 Foxes

7/24/2015 Speech and Language Processing - Jurafsky and Martin 36 Note A key feature of this machine is that it doesnt do anything to inputs to which it doesnt apply. Meaning that they are written out unchanged to the output tape.

7/24/2015 Speech and Language Processing - Jurafsky and Martin 37 Overall Scheme We now have one FST that has explicit information about the lexicon (actual words, their spelling, facts about word classes and regularity). Lexical level to intermediate forms We have a larger set of machines that capture orthographic/spelling rules. Intermediate forms to surface forms

7/24/2015 Speech and Language Processing - Jurafsky and Martin 38 Overall Scheme

7/24/2015 Speech and Language Processing - Jurafsky and Martin 39 Cascades This is an architecture that well see again and again Overall processing is divided up into distinct rewrite steps The output of one layer serves as the input to the next The intermediate tapes may or may not wind up being useful in their own right

7/24/2015 Speech and Language Processing - Jurafsky and Martin 40 Overall Plan

7/24/2015 Speech and Language Processing - Jurafsky and Martin 41 Final Scheme

7/24/2015 Speech and Language Processing - Jurafsky and Martin 42 Composition 1.Create a set of new states that correspond to each pair of states from the original machines (New states are called (x,y), where x is a state from M1, and y is a state from M2) 2.Create a new FST transition table for the new machine according to the following intuition…

7/24/2015 Speech and Language Processing - Jurafsky and Martin 43 Composition There should be a transition between two states in the new machine if its the case that the output for a transition from a state from M1, is the same as the input to a transition from M2 or…

7/24/2015 Speech and Language Processing - Jurafsky and Martin 44 Composition δ 3 ((x a,y a ), i:o) = (x b,y b ) iff There exists c such that δ 1 (x a, i:c) = x b AND δ 2 (y a, c:o) = y b