Translation Memory Cleanup
Cleanup of translation memories in multiple languages
Maintenance of your translation databases
Translation memory maintenance reduces translation costs
In many companies, translation memories are at the heart of translation production. They are part of what are called the linguistic assets. Translation memories contain previously translated sentences (called segments) created by human translators or machine translation systems. Once translated, content can be reused. This saves work and costs. At the same time, this reuse ensures consistent translation of texts and thus higher quality.
However, if a translation memory contains errors or is outdated, this can cause major problems with your translations. Erroneous translation memories occur, for example, when special terms have changed or when several translators have provided different translations over time without standardizing their style or terminology. Translation errors resulting from a lack of knowledge about a company’s products or contextual meanings spread like a virus as soon as they enter a translation memory unnoticed.
To avoid erroneous translation memories, it is important to regularly update and maintain your translation memories. This task certainly involves costs. Moreover, this work requires linguistic and special subject knowledge in several languages. That’s exactly what we are here for, and we know how to keep costs down for this type of task. We can assist you with tools and methods that will help you keep your translation memories at a high level of quality.
If you maintain your translation memories regularly, you can be sure that you will save time and money and reuse correct translations of the best possible quality.
How we translate
Translation into all languages
We translate into all languages
Do you need a translation? We will send you a quote within the shortest possible time. Send us your request using this quote form.
Which translation memory formats do we process?
Overall, there are many different file formats for translation memories or translated segments. We can handle most of them.
Translation Memories (TMs)
are usually stored in a database, as is the case with the common translation memory systems such as Across, Trados, memoQ and some others. These TMs can be exchanged in various formats, primarily TMX or one of its tool-specific variants
TMX (Translation Memories eXchange)
This is a widely used file format for translation memories based on XML (eXtensible Markup Language). TMX files are easily interchangeable and can be used with a wide range of translation tools and software programs.
XLIFF (XML Localization Interchange File Format)
This is another XML-based file format for translated segments that is widely used in the localization industry. There are some variants of XLIFF such as SDLXLIFF, which is generated by Trados.
SDLTM (proprietary format for Trados databases)
SDLTM is Trados Studio’s internal format for translation memories. SDLTM is based on XML and can be exported to TMX.
CSV (Comma-Separated Values)
This is a simple file format commonly used for storing data in tabular form, with data columns and rows separated by commas. CSV files are relatively easy to handle and can be used with a wide range of software programs and tools.
is a widely used open source general-purpose scripting language that is particularly suitable for web development and can be embedded in HTML.
JSON is a data format used to store data in a structured way. It is often used to store data in a database or to transfer data between different parts of a web application.
Translation memory cleanup challenges
We have the necessary knowledge
Key aspects of translation database cleansing
The top six challenges in cleaning up translation memories are:
Identifying incorrect data: Cleaning up a TM requires identifying, finding, and correcting inaccurate or incomplete data. This can be a time-consuming process and requires knowledge of various error patterns in TMs.
Duplicates: Duplicate entries are not just 100% identical strings. Different phrases for the same content must be identified and either eliminated or merged.
Incomplete segments: Truncated segments occur when a text is not properly segmented before translation begins. This can lead to incorrect translations and requires manual correction.
Contextual errors: Contextual errors occur when the source segment has been translated correctly but does not fit the context of the target language. In this case, an expert translator must review the translation and check it for accuracy..
Wrong terminology: Wrong terminology can lead to errors in translation memories. In this case, manual correction of the terms used in the TM is required..
Translation errors: Incorrect translations can occur for various reasons. One needs translation expertise to detect errors in meaning.
Translation memory cleanup in 6 steps
Data cleansing procedure
This is how the cleanup of your translation memories works:
Cleanup and maintenance of translation memories
Our customers work in these industries
Translation databases in the following languages
We clean translation databases in a large number of language combinations. Very popular combinations for TMs are:
Quality and time savings through clean TMs
The cleaning of translation databases in detail
The analysis of translation memories
Before we start the analysis, we discuss with our clients the goal they want to achieve. Some of the aspects we examine during a translation memory are:
The analysis aims to understand the strengths and weaknesses of the memory and identify areas where it can be improved.
Data cleansing: The types of error
Here are some of the typical errors that can be found in a TM:
Duplicates: A sentence with multiple translations. We also look for sentences in the source language that have basically the same meaning, leading to unnecessary translation variants..
False translations: These occur when the translation does not accurately reflect the meaning of the source language. Sometimes translations come from neural machine translation engines and contain so-called “hallucinations”, i.e., words that do not exist in the source text.
Terminology errors: These occur when the terminology used in the translation memory is inconsistent or even incorrect.
Spelling errors: These occur when the translation (or the source text) has not been spell-checked.
Formal errors: These occur when formal aspects are incorrect, such as closing parentheses, inserting incorrect numbers, using incorrect encoding for special characters, etc.
Incomplete sentences: These occur when the text to be translated has not been segmented correctly. This results in incomplete segments, which can even lead to larger errors due to the different syntactic and morphological structure of the languages..
Punctuation errors: These occur when the translation contains incorrect punctuation or when punctuation marks are missing.
Deprecated translations. They may contain, for example, incorrect references, links, terms or product names.
About the importance of metadata
Metadata and attributes are important in translation memories because they provide information about each translated segment to which they are assigned. TM attributes can contain a variety of information, such as the name of the translated document, the project number, the acquisition or modification date, the frequency of translation reuse, the segment origin (e.g., alignment or MT), or the editing status of the segment. This information can be very useful in various contexts, such as when using TMs to train an MT engine.
Specific benefits of metadata include:
Technologies used for data cleansing
There are several tools and methods for cleaning translation memories that can be used depending on the task.
The most important tool we use is ErrorSpy, our translation quality assurance software. We started developing ErrorSpy about 20 years ago, and it has become a Swiss Army knife of quality control. Within seconds, ErrorSpy provides a list of possible errors, such as terminology, number, or consistency errors, for our reviewers to sift through.
We also work with regular expressions that allow us to recognize certain patterns in translation memories and automatically correct some of them. For example, we can recognize and change the spelling of product names, date formats, superfluous spaces, remnants of old spelling, or certain word sequences.
Artificial intelligence methods are used for certain tasks. Among other things, they are very useful for detecting semantic similarities. For example, we can find out that the statements “Do not stay under suspended loads.” and ” It is forbidden to stay under a suspended load.” actually have the same meaning and only need a single translation.
Finally, a number of other linguistic tools or self-written programs help us to detect and solve other typical problems such as incomplete sentences, ambiguities, or spelling errors.
Cleanup service for your language data
Seven reasons to work with us
Curation of translation memories
Our services at a glance
What our customers say
"We have been working with D.O.G. for many years and appreciate their team as a competent partner. We have our user manuals translated into 25 languages and our new website. No matter if the translations are needed later in InDesign or Typo3, technical requirements are no problem. Even if it is urgent, you can rely on D.O.G. The first time we ordered Japanese translations for a new client, they were highly praised when we asked."
"Very good response to quotations and competence in case of queries regarding translations. Reliable handling of the translations with integration of a TMS as well as fast delivery of the translations."
"High-quality technical translations are essential, especially for our operating instructions and customer documentation for materials testing machines. D.O.G. provides us with all the translations we require in the highest quality and also with absolute adherence to deadlines. We are very satisfied with the translation work and can always recommend D.O.G. GmbH."
Frequently asked questions about cleaning up translation memories
First of all, it is important that there is a process to keep translation memories "clean". This includes selecting the right translation partner, maintaining and using a corporate terminology, and using attributes to make the most of the translated segments. It is recommended that translation memories be cleaned up regularly every 3 to 6 months, with occasional additional reviews in between.
- Quality deficiencies:If a translation memory is not cleaned up, the quality of translations decreases. Over time, translation memories can be filled with incorrect, inconsistent, poorly translated, or inaccurate translations, resulting in a low-quality output.
- Unnecessary Costs:When a translation memory is not cleaned up, translation costs can increase. Fewer segments can be reused, and quality assurance costs are higher due to errors or inconsistencies in translated segments.
- Safety risk:Over time, translation memories can be filled with serious errors that can lead to incorrect actions by the user of a device or software and cause property damage or personal injury.
- Compatibility issues:If a translation memory is not cleaned up, it may cause compatibility issues when used with other systems.
Translation memories can be cleaned up both manually and automatically. Automatic cleanup can detect and remove duplicate segments or delete segments that are too short to be useful. You can also make formal changes to the content of TMs, e.g. using regular expressions. You can also add metadata to segments.
Manual procedures are required whenever human judgment is needed. This is the case, for example, when deciding whether a translation is incorrect or whether a technical term needs to be changed.