Machine translations (MT) have long been ridiculed. Since the success of Google Translate and DeepL in recent years, they are considered an important technology for the production of translations. The expectations are high: Reduction of translation costs with equivalent quality thanks to post-editing by trained reviser. Can it be done? Yes, but not for all documentation types and not for all purposes.
Experienced in machine translation since 2008
D.O.G. started to engage actively with machine translation as far back as 2008. At that time, the company was involved in a joint research project with the French university ISIT: "Enhancing Machine Translation Quality with ErrorSpy". This project resulted in functions in our quality assurance software ErrorSpy that are better able to recognize and correct MT errors.
Our MT offer
- We sit down with you and advise you on the advantages and disadvantages of various alternatives.
- Together with you, we work out a specification sheet in which systems, workflows, integration requirements and quality guidelines are defined.
- We use the most suitable MT system for you, e.g. a Neural Machine Translation System (NMT), which we train with your data.
- We set up the quality management system and employ a team of post-editors to correct the machine-generated texts according to the agreed criteria. We use our quality assurance software and a terminology agreed with you, which we can maintain together with you in our terminology management system LookUp if you wish.
- We continuously maintain the language resources, such as translation memories and terminology, which are important to ensure optimum training of the translation engine.
- You have a dedicated D.O.G. contact person who coordinates the team of developers, post-editors and translators for your projects.
If you want to travel to Rome, you can choose between different means of transport. The same applies to machine translation. There is not one system and one procedure, but several alternatives depending on your objectives, data scope and budget. Therefore, you can certainly choose between alternatives such as the following:
- Pre-trained neural systems (NMT), which have been trained e.g. with larger amounts of data in the field of technology
- Neural systems (NMT), which are specially trained on your data
- Statistical machine translation systems (SMT), which are more suitable in certain situations due to the better consideration of special domain terminology.
- There are also other options to consider. We are happy to advise you about this.
Do you want to set up a translation portal on your intranet that delivers fully automated translations, do you want to support translators while they work with CAT tools (Computer-Aided Translation Systems = Translation Memory Systems) or do you need post-edited documents of a quality similar to human translations?
Different workflows and implementations are created depending on the objectives. Here, too, our team of specialists and software developers will help you implement the appropriate solution.
Machine translation works but not for all texts
Companies produce and use a lot of information. This information serves different purposes (pure information and communication, advertising, operation of machines, etc.), may be legally binding or not (like the operating instructions required by the European Machinery Directive 2006/42/EC), use complex formulations or are highly standardized.
These characteristics and criteria will determine whether the use of machine translation is recommended or not.
Generally speaking, it is important to understand that machine translation works better when:
- the text is simply written using short sentences, very few ambiguities, as well as standardized terminology and syntax. Better still: The author knows how MT works and writes texts that are optimized for machine translation.
- the time factor is in the foreground: MT is clearly faster, even with post-editing.
- the text is for information purposes only. This is the case, for example, with internal communication, when you want to know "what is in the document/mail", when the technical customer service needs to understand what kind of problem needs to be solved.
- The subject and language combination was previously learned by the translation engine. This is done in the machine learning phase, when a relatively large volume of bilingual texts (mainly segments from translation memories) was processed by the deep learning algorithm.
- Quality losses compared to human translations are acceptable: a correct translation is not necessarily a good translation. In addition, the residual risk of a mistranslation is greater for machine translation than for human translators.
- the quantities of texts are sufficient to operate the system economically.
MT does not necessarily compete with translation memory systems. If anything, it is yet another tool that can be used to produce translations.
Machines make different mistakes
People make mistakes, so do machines but they make different mistakes. For example, they translate proper names, do not understand many abbreviations or add information that does not appear in the source text. They also sometimes leave out words in the translation. This is not always easy to discover, especially if the translated sentence sounds good in every other respect.
That is why it is important to train post-editors, to inform them about the types of errors they must search for. And the typology of these errors differs depending on the type of translation engine used: Statistical Machine Translation (SMT) systems make different errors from Neural Machine Translation (NMT) systems.
Since its foundation more than 20 years ago, D.O.G. has been intensively involved with the topic of translation quality and since 2008, specifically with the quality of machine translation systems. We have developed our own tools and metrics to monitor this quality. Before machine translation starts, we identify the segments that are not suitable for the MT process and sort them out or mark them for further processing.
You can profit from our experience.
The data and the training
Good data is crucial for machine learning. According to the motto "garbage in, garbage out", the success of machine translations depends heavily on how extensive and how clean the training data is. Translation memories (TMs) are the primary candidates for the training. TMs contain many segments that impede with the training process due to their creation history: empty segments, incorrectly assigned translations (e.g. in the case of incorrectly segmented sentences) or inconsistent terms and wording.
Training is not a one-time event. New texts and topics are added. Machine errors can be corrected by retraining. Our post-editors give us regular feedback on typical recurring machine errors. Our developers use this information for regular training of the translation engines. Thus, over time, the engine adapts itself better and better to your linguistic wishes and special features.
You can also benefit from our experience here. We help you optimize the training data to achieve a better result. We have developed methods and tools for this purpose.
Many companies need machine translation solutions in connection with post-editing. Although the new standard on the post-editing of machine-generated translations (ISO 18587:2017) defines two post-editing levels, full and light post-editing, in practice the criteria for acceptable quality depend strongly on the objectives pursued and the engines used. We help you to define the optimal quality objectives for the work of post-editors for the solution you have chosen. This results in guidelines for the work of the post-editors.
Machine translation and data security
One important reason why many companies opt for machine translation is data security. No company management wants to be responsible for the unwanted disclosure of confidential information because careless employees use a free translation service on the Internet to translate it.
The engines we train for you are either located on our own server in Germany, which is operated according to EU law (Data Protection Basic Regulation - DSGVO), or we work using secured cloud services in Germany.
You also have the option to install the trained engine on your own server.
The success of machine translation solutions stands and falls with the ability to detect translation errors and use the correct special domain terminology. This is where D.O.G. offers unique technological advantages. Since the foundation of the company more than 20 years ago, we have placed great emphasis on developing software products that support the quality of our services. In particular, ErrorSpy and LookUp help us with this.
LookUp is an intelligent terminology management system. LookUp makes it possible to record the relations between terms, making context information available. Our software-supported checks take these relations and your terminology into account. Consequently, our post-editors are better able to detect machine translation errors.
ErrorSpy supports the post-editor in detecting and correcting machine translation errors:
- ErrorSpy can detect missing translations
- ErrorSpy reports terminology errors
- ErrorSpy checks the punctuation, the numbers, the consistency of the translation
- ErrorSpy also works with regular expressions. Typical machine errors can be stored as a regular expression and automatically detected.
- ErrorSpy recognizes the context and can report incorrect translations in a specific context (e.g. when translating a word like "output").
Costs and benefits
Implementing a machine translation solution for your company is a project with two main components: setting up the system and using the system.
Setting up a machine translation system: a model must be trained for each language combination. For this purpose, the system requires relatively large amounts of high-quality bilingual data. The training is performed on special computers with GPU (Graphics Processor Unit), which allow large-scale parallel processing. The training can take several machine days and the training parameters must be optimized in various test runs. This is a cost factor.
Operating and using the system. Even in the operational phase, the system must learn from mistakes and be trained with new topics and data. Finally, there is the cost of post-editing, which basically reflects the time that post-editors invest in correcting machine errors.
We calculate the costs for setting up the solution (one-off costs) and for using the system separately. This means that you pay a price per word for the day-to-day operation of the system, which is lower than the cost of a human translation. After agreeing on the key data for your solution, we can make you an offer.
The cost-effectiveness of an MT solution ultimately depends on its deployment model. If you are already making large savings by using translation memory systems and would like to translate the same documents by machine in the future, the savings could be quite low. However, if you translate new texts and content that you have not previously translated, or have translated without translation technologies, then the profitability of a machine translation solution increases very quickly.
Request your offer free of charge and without any commitment: