In the third and last post of this series about Wazrone 2100, I am going to touch on how the subtitles of video sequences were translated, and why the game had to be modified in order to facilitate this process.

Intro

For the upcoming discussion, I adopt HTML5’s definition of subtitles and captions. That is, subtitles assume that the sound is available but cannot be understood, while captions aim to describe all the audible content, and they are suitable when sound is not available. But first, what exactly is the content that needs to be subtitled (or captioned)?

Similar to many other RTS games, Warzone 2100 advances the story using a series of video sequences (FMVs) that appear between missions or during them. Other than that, the game does not feature any dialogues or conversations while playing the missions themselves (Units do not speak, unlike C&C games for instance). When I first started working on the game, many of those sequences did already have some kind of subtitles. However, the majority of them suffered from several issues, which I am going to describe in some detail.

Valuable resources

I would like to point you to Max Deryagin‘s blog. Max is one of the distinguished veterans in the AVT field, and his blog contains a lot of great articles. Additionally, he dedicated two posts (this and this) to subtitling games. You should definitely check them out if you are interested in this subject or happen to be a game translator.

Issues

Despite subtitles becoming available in most, if not all, video games, many games still do not adhere to even the most basic of film and TV subtitling guidelines, such as line length or font size. Therefore, it did not strike me at odd that a game which initially came out in 1999 suffered from some problems related to them.

Admittedly, Warzone 2100 got a few thing right. From an accessibility viewpoint, the texts are displayed in white, with a grey outline, ensuring a good contrast. This, paired with a large enough font size, ensures that the subtitles are easily readable. Furthermore, most characters speak Standard English at a moderate speed, so subtitles can remain long enough on the screen, giving the player enough time to read and process them.

And now for the issues. First and foremost, there is a great deal of variation in the number of characters per line. Some lines are extremely long, while others are just a few words in length. As you might know, in film and TV, we often do not go beyond 45 characters. However, there is a lack of standardization in games, so it is not unusual for whole paragraphs to appear on the screen. These subtitles are hard to read, because the eyes will be forced to move excessively, bouncing repeatedly between the text and the video.

Unless it is required, I do not adhere to arbitrary numbers such as 42 or 43. Rather, I try to keep the lines under 45-50 characters, depending on the situation. For instance, if the game is so fast paced (say an FPS), then the player will benefit from segmenting subtitles into even smaller chunks, which could be processed at a glance.

Another issue is arbitrary line breaks. The original subtitles did not include line breaks, however, since some lines are long, they get broken at arbitrary points upon reaching the edge of the screen (The location of these points depend on the selected font size and resolution). Finally, a number of subtitles were out of sync with the audio, and many voice lines were not transcribed.

So with this in mind, and since the game is open source, I decided first to re-transcribe all sequences, before translating the transcriptions into Arabic, while ensuring that:

  • All lines are under 50 characters.
  • Line breaks occur at natural points.
  • There is a minimum gap between each subtitle and the next.
  • Subtitles do not run too quickly.

Implementing these changes, however, was not straightforward, due to a couple of technical difficulties.

Technical difficulties and solutions

In the case of movies or online videos, the translator or transcriber normally works with a file in a common subtitle format (such as SSA, SRT, WebVTT, etc.). This allows them to edit these files in specialized subtitling software quickly and accurately, without having to worry about the technical details of how they are going to be displayed.

Like many programs and video games, internally, Warzone 2100 uses the gettext localization system for the translation of all textual assets. All strings are extracted from the game files and stored in a single .po file. Then, this file is translated into several other .po files using Crowdin. So this is the first problem: Strings that belong to subtitles get mixed up with other text fragments from that belong to things like menus, descriptions, etc. This issue is not unique to Warzone. This practice, extracting all strings and dumping them in a single file, whether it is a .po file or an Excel sheet, is so widespread one can claim this is how the majority of games are translated.

The second problem concerns the files that contains the original English subtitles. Warzone uses its own subtitling format. It is a very basic format where each file consists of a series of independent lines, and a line is made up of four space-separated numerical values followed by a string. The numerical values represents the x and y coordinates, and the start and end times. The rationale behind adopting this format was probably the need to specify the exact position of the text. This is an example of a subtitle in this format (Zero coordinates mean that the line will appear below the previous one):

[x]     [y]     [start] [end]   [string]

20      432     6.0     13.0    _("Your attacks upon us will not go unpunished.")

0       0       6.0     13.0    _("You are in contravention of the New Paradigm.")

The format had obvious flaws. First, most, if not all, subtitling software cannot handle it, which leaves transcribers with no option but manually editing timestamps, or writing some kind of converter. Second, the format has no notion of a subtitle, that is, it deals with individual lines, because you cannot embed line breaks into the strings. For instance, a  subtitle that spans two lines

We cannot hold our positions

for much longer.

Will be represented in Warzone’s format as:

20      432     31.0    35.0    _("We cannot hold our positions.")

0       0       31.0    35.0    _("for much longer.")

As you can see, each line will appear as a separate string in Crowdin. Even if this break is natural in English, it might not be in other languages when the lines get translated, since line break rules differ from one language to another.

To sort this mess out, I decided to adopt a uncommon approach in the world of video games and make the game read subtitles from SRT files. In addition, rather than using gettext, each file will be translated into several versions, one for each supported language, and basically, the game will choose the correct file before playing a specific sequence.

With this approach, the majority of the aforementioned issues were solved. I re-transcribed all sequences, following the previously highlighted guidelines. All SRT files were then uploaded to Crowdin, and retranslated (Actually, a backup was made of the old translations, so nothing was lost). Crowdin provides basic support for SRT files, so line breaks in the target do not need to mirror those of the source, as in the old format. However, it does not provide the ability to adjust the timings. For this, we need a platform that specializes in subtitling, such as Amara, but honestly, that would be an overkill right now.

Conclusion

We have reached the end of our Warzone 2100 journey. I really learned a lot and had so much fun while translating it, and I hope that you found this series interesting, whether you translate into Arabic, or work with a different pair of languages. If you have any questions or remarks, please feel free to leave a comment below.