Computer-aided tools (CAT): Wordfast OmegaT and The Google Translator Toolkit.

Презентация:



Advertisements
Похожие презентации
Учимся писать Эссе. Opinion essays § 1- introduce the subject and state your opinion § 2-4 – or more paragraphs - first viewpoint supported by reasons/
Advertisements

While its always a good idea to think outside the box when approaching a creative task, this is not always the case. For example, when working with teams,
S11-1PAT301, Section 11, October 2003 SECTION 11 ANALYSIS SETUP.
The most important technological inventions Think of as many words as possible related to the topic Think of as many words as possible related to the.
Describe a movie which made a strong impression on you. You should say: which movie it was – the name what the movie was about who the main stars were.
© 2009 Avaya Inc. All rights reserved.1 Chapter Three, Voic Pro Advanced Functions Module One – Text to Speech.
A new interface model for the Jazyki Mira typological database Oleg Belyaev The research is supported by RFBR grant ( а.
The Web The Internet. Level A2 Waystage Level A2 Waystage Listening (p.17) I can understand simple messages delivered at a relatively high speed (on every.
USB Download Manual (v1.3) (GP2 Year 2010) LG Electronics/ LCD TV Division Feb. 17 th, Applied Models & Notice - File Copy - User Download Mode.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary © Wind River Systems, released under EPL 1.0. All logos are TM of their respective.
The waterfall model is a popular version of the systems development life cycle model for software engineering. Often considered the classic approach to.
Making PowerPoint Slides Avoiding the Pitfalls of Bad Slides.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Route Selection Using Policy Controls Applying Route-Maps as BGP Filters.
What is Linux? Student: Fomin Maxim Group: 294. What is Linux? A Unix-like Operating System A famous open source project Free to use, distribute, modify.
SPLAY TREE The basic idea of the splay tree is that every time a node is accessed, it is pushed to the root by a series of tree rotations. This series.
HPC Pipelining Parallelism is achieved by starting to execute one instruction before the previous one is finished. The simplest kind overlaps the execution.
Mobility Control and one-X Mobile. Mobility Control User Configuration Mobile Call Control requires PRI-U, BRI or SIP (RFC2833) trunks in the IP Office.
Unity3d Fomin Maxim 394 group. Unity is an integrated authoring tool for creating 3D video games or other interactive content such as architectural visualizations.
The waterfall model is a popular version of the systems development life cycle model for software engineering. Often considered the classic approach to.
Unit II Constructor Cont… Destructor Default constructor.
Транксрипт:

Computer-aided tools (CAT): Wordfast OmegaT and The Google Translator Toolkit

Translation Memory A Translation Memory (TM) is a database of translated segments – mostly, a database of pairs of sentences. Advantages of TM: avoids having to re-translate anything that has been already translated; avoids having to re-translate anything that has been already translated; allows workgroups to share translation that were previously done; allows workgroups to share translation that were previously done; allows translators to build up a precious database of translations. allows translators to build up a precious database of translations.

TM-based CAT rests on two essential methods: segmentation and translation memory. These two methods, each in its way, boost productivity. TM-based CAT rests on two essential methods: segmentation and translation memory. These two methods, each in its way, boost productivity.

WORDFAST Translation memory (TM) Size: up to 1,000,000 Translation Units (TU) per TM. Size: up to 1,000,000 Translation Units (TU) per TM. Format: Wordfast uses TMs in either plain text format, or Unicode format. Format: Wordfast uses TMs in either plain text format, or Unicode format. TM engine performance: The Wordfast TM engine is built to spot exact and/or fuzzy matches in less than half a second in most cases. TM engine performance: The Wordfast TM engine is built to spot exact and/or fuzzy matches in less than half a second in most cases. In case no fuzzy or exact match exists, Wordfast can retrieve expressions or text that has a relevance to the source segment being translated. In case no fuzzy or exact match exists, Wordfast can retrieve expressions or text that has a relevance to the source segment being translated. Integration: The Wordfast TM engine is totally integrated in Ms-Word: you don't need to run another application. Integration: The Wordfast TM engine is totally integrated in Ms-Word: you don't need to run another application. Networking: Up to 20 simultaneous users can share the same TM over a LAN (Local Area Network). Networking: Up to 20 simultaneous users can share the same TM over a LAN (Local Area Network).

Supported languages Any of the languages supported by Ms-Word: Any of the languages supported by Ms-Word: all European, Latin-based languages, Chinese/Japanese/Korean, right-to-left languages (Arabic, Hebrew), Cyrillic, in addition to Central European, Greek, various forms of Hindi, numerous minority languages, etc. all European, Latin-based languages, Chinese/Japanese/Korean, right-to-left languages (Arabic, Hebrew), Cyrillic, in addition to Central European, Greek, various forms of Hindi, numerous minority languages, etc. WF can use up to three simultaneous glossaries. Size: the size of a glossary in Wordfast has been voluntarily limited to 250,000 entries. Most project- specific glossaries supplied by clients have far less than 10,000 entries - closer to 1000 for most. Size: the size of a glossary in Wordfast has been voluntarily limited to 250,000 entries. Most project- specific glossaries supplied by clients have far less than 10,000 entries - closer to 1000 for most.

For fuzzy propositions (whose analogy rate is between the "fuzzy threshold" defined in Wordfast and has a default value of 75, and 99), the target segment will be proposed against a yellow background. For an unknown source segment – i.e., one that cannot be found in the TM or whose analogy rate falls below the fuzzy threshold, the target segment will be empty and displayed against a gray background.

OmegaT TM-based tool (CAT)

OmegaT features OmegaT was first developed by Keith Godfrey in 2000 and is currently developed by a team led by Didier Briel. The name OmegaT is a registered trademark in Germany. It saw light (first public release) in February 2001 OmegaT is FREE. OmegaT is intended for professional translators. Its features include customisable segmentation using regular expressions, translation memory with fuzzy matching and match propagation, glossary matching, dictionary matching, translation memory and reference material searching, and inline spell-checking using Hunspell spelling dictionaries. Ukrainian is encoded as UK-01 OmegaT runs on Linux, Mac OS X and Microsoft Windows 2000 or higher, and requires Java 1.5. It is available in 27 languages. Ukrainian is encoded as UK-01. According to a survey in 2010 among 458 professional translators, OmegaT is used 1/3 as much as Wordfast, DejaVu and MemoQ, and 1/8 as much as the market leader Trados. There is a "standard" version, which always has a complete user manual and a "latest" version which includes features that are not yet documented in the user manual.

For glossaries, OmegaT mainly uses tab- delimited plain text files in UTF-8 encoding with the.txt extension. The structure of a glossary file is extremely simple: the first column contains the source language word, the second column contains the corresponding target language words, the third column (optional) can contain anything including comments on context etc. (compare WORFAST) You can create your own glossary.txt file simply by typing a word, pressing Tab, typing a translation.

Fuzzy match in OmegaT has a default value of 30 per cent, in Wordfast – 75 per cent. To skip to the next segment in OmegaT you should press ENTER (compare Wordfast)

The Google Translator Toolkit

A bit of history The Google Translator Toolkit (GTT) was released in June 2009 as a result of the emergence of massive online collaboration. The traditional focus has been on what we could call MT- assisted TM But the GTT is the first tool to fall into the opposite category of TM-assisted MT. In MT-assisted TM the translator still uses a TM editor, the GTT (when the preferred Pre-fill with machine translation option is chosen) actually involves working on a MT editor. Thus, the GTT appears to signal a whole new era of translating. In 2001 Yves Champollion developer of the well known TM tool, Wordfast was already envisaging such a future, in which translators would be simply proofreading MT.

Professional translators accustomed to working with TM and trainers in the computer-assisted translation (CAT) area seem reluctant to accept that there may come a time when proofreading MT may be more efficient than translating in the traditional way. Professional translators accustomed to working with TM and trainers in the computer-assisted translation (CAT) area seem reluctant to accept that there may come a time when proofreading MT may be more efficient than translating in the traditional way. Still, some scholars maintain the attitude that however much post-editing is done, translating from scratch will always produce better results. Still, some scholars maintain the attitude that however much post-editing is done, translating from scratch will always produce better results.

The fact that the GTT is free on the Internet, and runs in MT mode by default clearly indicates that Google at least believes the proofreading MT model will prevail.

Potential users to a new type of user The Google Translator Toolkit not only goes a step further but, coming from outside the translation industry, is not addressed to the professional translator who works in localization*, but to a new type of user that traditional TM never addressed: any web-enabled and motivated bilingual. * LOCALIZATION is the process of translating a product into different languages or adapting a product for a specific country or region; the adaptation of computer software for non-native environments, especially other nations and cultures

GTT features As it says in introduction, the GTT is a powerful and easy-to- use editor that helps translators Work faster and better Work faster and better Upload and translate documents Upload and translate documents Use documents from your desktop or the web. Use documents from your desktop or the web. Download and publish translations Download and publish translations Publish translations to Wikipedia or Knol. Publish translations to Wikipedia or Knol. Chat and share translations online Chat and share translations online Collaborate online with other translators. Collaborate online with other translators. Use advanced tools Use advanced tools Use features like translation memories and multilingual glossaries. Use features like translation memories and multilingual glossaries.

GENERALISING AND SYSTEMISING THE ADVANTAGES AND DISADVANTAGES OF the GTT The data showed no obvious time advantages to be gained from post-editing, so the hypothesis, that translating from MT will produce faster results, is not supported. Our second hypothesis that translating by working from the MT version of the source would produce results similar to those produced when translating from the ST in the traditional way is supported. This is a claim that could revolutionise the way in which translation is performed and taught into the future.

Advantages The GTT can upload documents from any source and popular formats (eg. HTML (.html), MsWord (.doc), Plain Text (.txt)) share the document with anyone in the Net be open for further proofreading enrich TM base globally

Disadvantage(s) ? wrong or poor translation versions can accumulate trash in the Internet ? translator should create a Google account

CONCLUSIONS Previously, machine translation has been used in two modes. used on its own to gist information written in a language unfamiliar to the user; and used along with controlled language input and post-edited output as a cheap and fast alternative to full human translation when quality is not a priority. a third possibility The GTT offers a third possibility: MT can be used as the intermediate format to achieve full natural language, outbound- quality translation.

The trend towards TM-MT integration is the localization industrys response to the limitations of TM. Until now, TM-MT integration took the form of MT- assisted TM, with database matches prompted where available, but otherwise no match segments are seeded by MT. The GTT is already pointing to the next stage, - TM- assisted MT, in which the match retrieval and terminology and quality assurance now available in the TM editor will take place directly in the MT window.