Translation Memory Bereinigung

Translation Memory Cleanup

Cleanup of translation memories in multiple languages

This page has been machine translated and post-edited. This page is an example of what machine translation and post-editing are capable of.

Maintenance of your translation databases

Translation memory maintenance reduces translation costs

In many companies, translation memories are at the heart of translation production. They are part of what are called the linguistic assets. Translation memories contain previously translated sentences (called segments) created by human translators or machine translation systems. Once translated, content can be reused. This saves work and costs. At the same time, this reuse ensures consistent translation of texts and thus higher quality.

However, if a translation memory contains errors or is outdated, this can cause major problems with your translations. Erroneous translation memories occur, for example, when special terms have changed or when several translators have provided different translations over time without standardizing their style or terminology. Translation errors resulting from a lack of knowledge about a company’s products or contextual meanings spread like a virus as soon as they enter a translation memory unnoticed.

To avoid erroneous translation memories, it is important to regularly update and maintain your translation memories. This task certainly involves costs. Moreover, this work requires linguistic and special subject knowledge in several languages. That’s exactly what we are here for, and we know how to keep costs down for this type of task. We can assist you with tools and methods that will help you keep your translation memories at a high level of quality.

If you maintain your translation memories regularly, you can be sure that you will save time and money and reuse correct translations of the best possible quality.

Translation into all languages
We translate into all languages

Do you need a translation? We will send you a quote within the shortest possible time. Send us your request using this quote form.

Click or drag files to this area to upload. You can upload up to 10 files.
For larger volumes: use our transfer drive

File formats

Which translation memory formats do we process?

Overall, there are many different file formats for translation memories or translated segments. We can handle most of them.

Translation Memories (TMs)

are usually stored in a database, as is the case with the common translation memory systems such as Across, Trados, memoQ and some others. These TMs can be exchanged in various formats, primarily TMX or one of its tool-specific variants

TMX Datei Format
TMX (Translation Memories eXchange)

This is a widely used file format for translation memories based on XML (eXtensible Markup Language). TMX files are easily interchangeable and can be used with a wide range of translation tools and software programs.

XLIFF Datei Format
XLIFF (XML Localization Interchange File Format)

This is another XML-based file format for translated segments that is widely used in the localization industry. There are some variants of XLIFF such as SDLXLIFF, which is generated by Trados.

Trados Translation Memory
SDLTM (proprietary format for Trados databases)

SDLTM is Trados Studio’s internal format for translation memories. SDLTM is based on XML and can be exported to TMX.

CSV Datei Format
CSV (Comma-Separated Values)

This is a simple file format commonly used for storing data in tabular form, with data columns and rows separated by commas. CSV files are relatively easy to handle and can be used with a wide range of software programs and tools.

PHP Datei Format bei Webseite Übersetzung

is a widely used open source general-purpose scripting language that is particularly suitable for web development and can be embedded in HTML.

JSON Datei Format

JSON is a data format used to store data in a structured way. It is often used to store data in a database or to transfer data between different parts of a web application.

Translation memory cleanup challenges

We have the necessary knowledge

Key aspects of translation database cleansing

The top six challenges in cleaning up translation memories are:

  1. Identifying incorrect data: Cleaning up a TM requires identifying, finding, and correcting inaccurate or incomplete data. This can be a time-consuming process and requires knowledge of various error patterns in TMs.

  2. Duplicates: Duplicate entries are not just 100% identical strings. Different phrases for the same content must be identified and either eliminated or merged.

  3. Incomplete segments: Truncated segments occur when a text is not properly segmented before translation begins. This can lead to incorrect translations and requires manual correction.

  4. Contextual errors: Contextual errors occur when the source segment has been translated correctly but does not fit the context of the target language. In this case, an expert translator must review the translation and check it for accuracy.

  5. .
  6. Wrong terminology: Wrong terminology can lead to errors in translation memories. In this case, manual correction of the terms used in the TM is required.

  7. .
  8. Translation errors: Incorrect translations can occur for various reasons. One needs translation expertise to detect errors in meaning.

Translation memory cleanup in 6 steps
Data cleansing procedure

This is how the cleanup of your translation memories works:


You give us your translation memories
We analyze the data, the types of errors and discuss cleanup alternatives with you.

Offer and project start

You check the offer and place the order
The offer specifies what needs to be cleaned up and what services are required (e.g. terminology buildup).

Cleanup according to specification

We clean your data step by step
We will clean up the various errors in the TM and contact you if there are any specific questions.

Quality assurance

We check the quality of the result
We check the cleaned translation memory for remaining errors.

Adding attributes

The adjusted segments receive attributes
The cleaned segments are provided with an attribute which confirms that they have been checked. Other attributes can be added. The name of the attribute is agreed in advance.

Delivery and maintenance

We provide the cleaned memory and terminology
You will receive the cleaned TM. Usually also the extracted terminology. Regular maintenance is recommended.

Language combinations

Translation databases in the following languages

We clean translation databases in a large number of language combinations. Very popular combinations for TMs are:

Quality and time savings through clean TMs

The cleaning of translation databases in detail

Translation-Memory Bereinigung

The analysis of translation memories

Before we start the analysis, we discuss with our clients the goal they want to achieve. Some of the aspects we examine during a translation memory are:

The analysis aims to understand the strengths and weaknesses of the memory and identify areas where it can be improved.

Data cleansing: The types of error

Here are some of the typical errors that can be found in a TM:

  1. Duplicates: A sentence with multiple translations. We also look for sentences in the source language that have basically the same meaning, leading to unnecessary translation variants.

  2. .
  3. False translations: These occur when the translation does not accurately reflect the meaning of the source language. Sometimes translations come from neural machine translation engines and contain so-called “hallucinations”, i.e., words that do not exist in the source text.

  4. Terminology errors: These occur when the terminology used in the translation memory is inconsistent or even incorrect.

  5. Spelling errors: These occur when the translation (or the source text) has not been spell-checked.

  6. Formal errors: These occur when formal aspects are incorrect, such as closing parentheses, inserting incorrect numbers, using incorrect encoding for special characters, etc.

  7. Incomplete sentences: These occur when the text to be translated has not been segmented correctly. This results in incomplete segments, which can even lead to larger errors due to the different syntactic and morphological structure of the languages.

  8. .
  9. Punctuation errors: These occur when the translation contains incorrect punctuation or when punctuation marks are missing.

  10. Deprecated translations. They may contain, for example, incorrect references, links, terms or product names.

Datenbereinigung Translation Memory
Bedeutung von Metadaten

About the importance of metadata

Metadata and attributes are important in translation memories because they provide information about each translated segment to which they are assigned. TM attributes can contain a variety of information, such as the name of the translated document, the project number, the acquisition or modification date, the frequency of translation reuse, the segment origin (e.g., alignment or MT), or the editing status of the segment. This information can be very useful in various contexts, such as when using TMs to train an MT engine.

Specific benefits of metadata include:


Technologies used for data cleansing

There are several tools and methods for cleaning translation memories that can be used depending on the task.

The most important tool we use is ErrorSpy, our translation quality assurance software. We started developing ErrorSpy about 20 years ago, and it has become a Swiss Army knife of quality control. Within seconds, ErrorSpy provides a list of possible errors, such as terminology, number, or consistency errors, for our reviewers to sift through.

We also work with regular expressions that allow us to recognize certain patterns in translation memories and automatically correct some of them. For example, we can recognize and change the spelling of product names, date formats, superfluous spaces, remnants of old spelling, or certain word sequences.

Artificial intelligence methods are used for certain tasks. Among other things, they are very useful for detecting semantic similarities. For example, we can find out that the statements “Do not stay under suspended loads.” and ” It is forbidden to stay under a suspended load.” actually have the same meaning and only need a single translation.

Finally, a number of other linguistic tools or self-written programs help us to detect and solve other typical problems such as incomplete sentences, ambiguities, or spelling errors.

Technologien bei der Datenbereinigung einsetzen
Zusammenarbeit mit der D.O.G. GmbH
Cleanup service for your language data
Seven reasons to work with us

Why should you use our translation memory cleanup services?

  1. We guarantee time and cost savings through cleansed TMs.
  2. We have the right tools and technologies for the job.
  3. We have more than 20 years of experience with translation memory cleansing
  4. Our quality assurance meets and exceeds DIN EN ISO 17100.
  5. We are familiar with AI-based quality assurance methods.
  6. We guarantee reliable services and quality.
  7. You have no fixed costs and costs are incurred only, for what you need.

Curation of translation memories

Our services at a glance

We offer a wide range of services for cleaning and curating translation memories, such as

Customer testimonials
What our customers say
Frequently asked questions about cleaning up translation memories
How often should translation memories be cleaned up?

First of all, it is important that there is a process to keep translation memories “clean”. This includes selecting the right translation partner, maintaining and using a corporate terminology, and using attributes to make the most of the translated segments. It is recommended that translation memories be cleaned up regularly every 3 to 6 months, with occasional additional reviews in between.

What are the risks of not cleaning translation memories?

  1. Quality deficiencies:If a translation memory is not cleaned up, the quality of translations decreases. Over time, translation memories can be filled with incorrect, inconsistent, poorly translated, or inaccurate translations, resulting in a low-quality output.
  2. Unnecessary Costs:When a translation memory is not cleaned up, translation costs can increase. Fewer segments can be reused, and quality assurance costs are higher due to errors or inconsistencies in translated segments.
  3. Safety risk:Over time, translation memories can be filled with serious errors that can lead to incorrect actions by the user of a device or software and cause property damage or personal injury.
  4. Compatibility issues:If a translation memory is not cleaned up, it may cause compatibility issues when used with other systems.
Can a translation memory be cleaned automatically, or is manual intervention required?

Translation memories can be cleaned up both manually and automatically. Automatic cleanup can detect and remove duplicate segments or delete segments that are too short to be useful. You can also make formal changes to the content of TMs, e.g. using regular expressions. You can also add metadata to segments.

Manual procedures are required whenever human judgment is needed. This is the case, for example, when deciding whether a translation is incorrect or whether a technical term needs to be changed.

Different options for different budgets

Strategies for cleaning and curating translation memories

Cleaning up translation memories is a complex and sometimes costly and lengthy process. Depending on the severity of the errors in the translation memories, the time budget and the cost budget, different strategies can be developed.

Option #1:

Complete cleanup of all errors: This provides the greatest assurance in terms of the quality of the final tested TMs, but not always the best cost-benefit ratio. For example, there are segments that are never used again or those that are very old and concern products that are no longer being developed.

Option #2:

Cleanup of only part of the translation memories. Some TMs were created 10 or more years ago and contain many segments that are no longer up-to-date. The cleanup action can be limited to the last three or five years, for example. This reduces the effort required.

Option #3:

Restrict the quality criteria and review only certain aspects. For example, you could specify that terminology standardization is limited to 50 or 100 key terms.

Option #4:

Work with attributes and deductions to the match level between segment in the memory and segment in the text when using segments from a translation memory. Thus, unchecked segments can receive an attribute and a 2-3% deduction for matches (hits from the translation memory when the same sentence occurs in the text). Thus, unchecked translations are fuzzy matches that the translator should check before including them in his translation. After the translation is completed and checked, all segments used are given the attribute “checked” (or similar).

These options can be combined with each other.

Service - Overview

We check these aspects of translation memories

Linguistic aspects:

Technical and content aspects:

Technical aspects:

Would you like to have your translation memories cleaned up?

Then you should talk to us, because there are many ways and means to save costs. You can benefit from our experience with numerous cleanup projects. Contact us without obligation.

Similar topics
You might also be interested in
Scroll to Top