Audiovisual Communication and Subtitling from the Perspective of Semiotic Cohesion: A Case Study of “Garden of Eden”

Loreta Ulvydienė Huber
Vilnius University Kaunas Faculty
Institute of Language, Literature and Translation Studies
Muitinės g. 8, LT-44280 Kaunas, Lithuania
Research interests: media communication, semiotics of culture, audiovisual translation

Viktorija Lideikytė
Vilnius University Kaunas Faculty
Institute of Language, Literature and Translation Studies
Muitinės g. 8, LT-44280 Kaunas, Lithuania
Research interests: media communication, semiotics of culture, audiovisual translation

Abstract. Films as multimodal products have an increasing entertainment value, so the need to transfer them to other cultures arises. Audiovisual translation (AVT) becomes the only practice to translate and adapt multimodal discourse to various audiences. Together with audio description, that translates the visual into spoken language completing in this way the sounds and dialogues of films, subtitling deals with the changes within the semiotic system. Since subtitles have to interact and work in synchrony with dialogue and image, a great variety of problems arises when this translation mode is employed because a lot of constraints that exist. However, semiotic cohesion between subtitles and other elements such as moving pictures, verbal and non-verbal language and camera editing should be retained. The aim of the paper is to analyse the cases of semiotic cohesion in the English subtitles of the Lithuanian film Garden of Eden (2015). The research is carried out within the framework of multimodal discourse analysis that permits the incorporation of all identifiable communicative modes. The course of practical investigation crystallises out into three underlying directions: assessment of semiotic cohesion, identification of particular form(s) of semiotic cohesion depending on its (dis)appearance on screen and the analysis of the selected instances.

Keywords: semiotic cohesion, AVT, culture-bound elements, explicitation, implicitation.

Submitted 01 May 2020 / Accepted 18 January 2021
Įteikta 2020 05 01 / Priimta 2021 01 18
Films are multimodal products that have increasing entertainment value. UK’s lead organisation for film, television and the moving image aims that by 2022, industry, policy-makers, and the public alike will understand and champion the cultural value of a film1. Since films are viewed as cultural products, audiovisual translation (AVT), with subtitled content reaching over 167.1 million2 of subscribers, becomes the leading practice to translate and adapt multimodal discourse to various audiences.

Together with audio description, that translates the visual into spoken language, completing in this way, the sounds and dialogues of films, subtitling deals with the changes within the semiotic system.

Since subtitles have to interact and work in synchrony with dialogue and image, a great variety of problems arises when this translation mode is employed due to the fact that not only a lot of constraints (e.g. time and space) of subtitles exist, but semiotic cohesion between subtitles and other elements, such as moving pictures, a verbal and a non-verbal language should be retained.

The paper discusses cases of semiotic cohesion in the English subtitles of the Lithuanian film Garden of Eden (2015) directed by Algimantas Puipa. The plot focuses on the lives of seniors who were scattered all over the world and return to Lithuania to spend their golden years in a luxurious nursing house called “Garden of Eden”. The film examines the themes of death and euthanasia, faith and nostalgia for the lost time; moreover, irony and sarcasm help to reveal serious questions of everyday realia and quotidian existence.

As the perspective of the semiotics of culture and language implies, it is challenging to impart irony and sarcasm in translation, particularly in subtitling; the course of practical investigation crystallises out into three underlying directions: assessment of semiotic cohesion, identification of particular form(s) of semiotic cohesion depending on its (dis)­appearance on screen and the analysis of the selected instances.

1. Semiotic Channels and Semiotic Cohesion in Multimodal Communication

Film as an example of an audiovisual text, that encompasses different channels of communication, has been discussed by a number of translation scholars.

Mona Baker argues that film encompasses four simultaneous semiotic channels, i.e., verbal auditory channel (speech and dialogs), non-verbal auditory channel (music and sound effects), verbal visual channel (subtitles and written signs) and non-verbal visual channel (picture composition and flow) (Baker, 2001, p. 245).

Hartmut Stöckl, in a similar manner, introduces four core modes or abstract types of semiotic resources: sound, music, image and language, claiming that the visual and auditory semiotic resources required to create and interpret audiovisual texts can be grouped under them (Stöckl, 2004, pp. 11–15). Sound and music can be realized through auditory and visual media and usually go accompanied by or in synchrony with the images. Although the music is usually heard in films; in some cases, it can also be seen when, for example, printed score or sheet music is presented in the visual texture of a film when it is relevant to the plot. The same is with the image and language that can also be instantiated in several medial variants, for example, pictures can be moving, which is essential to the make-up of audiovisual texts, or in some cases, directors can choose to use “freeze-frames” (Stöckl, 2004, p. 13), meanwhile, language can be expressed through static writing, i.e., subtitles that are used to translate dialogue into another language, or dynamic writing, though not to translate a film but to interact with a film (Stöckl, 2004, pp. 11–15)3.

Ultimately, each core proposed by Stöckl commands a set of sub-modes, but it is beyond the scope of this analysis and, therefore, will not be developed further.

Some similarities between four semiotic channels presented by Baker (2001) and four core modes established by Stöckl (2004) are displayed in Table 1:

Table 1. Similarities between semiotic channels and core modes

Semiotic Channels by Baker (2001)

Core Modes by Stöckl (2004)

1. Verbal auditory channel (speech and dialogs)

1. Language (speech, static and dynamic subtitles)

2. Non-verbal auditory channel (music and sound effects)

2. Music (performed, i.e., heard music and score/sheet music, i.e., music which can be seen)

3. Verbal visual channel (subtitles and written signs)

3. Sound (sound effects)

4. Non-verbal visual channel (picture composition and flow)

4. Image (still and moving)

The verbal auditory channel coincides with the language core mode as it includes speech presented in a film. However, this Stöckl’s core mode includes subtitles, even though they fall under the verbal-visual semiotic channel. The non-verbal auditory channel correlates with music and sound core modes which involve music and sound effects of a film. The non-verbal visual channel conforms to the image core mode as they both include static picture compositions and a dynamic picture flow.

Moreover, when taking into consideration the first core mode defined by Stöckl, language, Diaz-Cintas and Ramael (2007, p. 45) speak about the importance of subtitles that are “an addition to the finished film, and if they are to function effectively, they must interact with and rely on all the film’s different channels.” Hence, the quality of subtitles and semiotic cohesion in a subtitled product depends on a translator’s or subtitler’s skills and abilities to keep subtitles work in synchrony together with a view, a sound and a dialogue of a film.

Adriana Tortoriello (2012, p. 63) relies on Kinga Klaudy (2005) and differentiates the two ways to achieve semiotic cohesion in subtitles: explicitation and implicitation. Explicitation occurs when the reiteration between the verbal and the non-verbal information is presented on the screen. Therefore, this means that a translator chooses to subtitle what is visual or audible in the source text. Meanwhile, implicitation occurs when meaningful lexical elements of the SL text are dropped in the TL text.

Cristopher Taylor (2016, p. 40) lists additional types of semiotic resources. They are “written words, images, gesture, gaze, paralinguistic features such as intonation and volume, music, light, perspective, and other film techniques such as fade-outs, flashbacks, etc.”

As can be noted from the data provided by the translation theorists, establishing cohesion involves adjusting the translation according to several different semiotic channels or modes.

Speaking about semiotic channels, Francesco Vitucci (2017, p. 84) also investigates explicitation and believes that explication strategies are important in establishing semiotic cohesion in subtitles.

Furtheron, Vitucci (2017, p. 89) claims that in AVT the transfer of “multisemiotic system in the subtitles” is called “intersemiotic explicitation” and it has three rendering strategies: addition, in case of language insertions inside the subtitles which are not present in the source text; specification, in the case of nominalizations justified by the iconic presence of objects on the screen which is not recalled in the soundtrack; and reformulation which manifests itself at the textual level and aims at replacing a vague syntax with informative sentences.

The strategies of rendering intersemiotic explicitation may help – establish cohesion between the visual channel of the film and the subtitles.

Vitucci identifies various types of explicitation that allow creating semiotic cohesion between subtitles and the audiovisual channel:

1) Explicitation induced by the decoding of symbolic gestures (i.e. when the meaning of symbolic gestures is made explicit);

2) Explicitation induced by the iconic decoding of source culture elements (i.e. when iconically provided source culture elements are made more explicit in the translation);

3) Explicitation induced by the decoding of deictic iconic gestures (i.e. when the meaning of deictic iconic gestures and facial expressions are provided more explicitly in the translation);

4) Explicitation induced by the combination of deictic gestures and evocative use of language (i.e. when the meaning is made more explicit due to evocative use of language and gestures);

5) Explicitation induced by iconic non-verbal rebuses (i.e. when a visual reference found in the source culture but not present in the target culture is provided in an explicit form) (2017, pp. 90–99).

The translator, as Jan Pedersen (2005, pp. 1–18) observes, “has to help the Target Text audience make sense of the utterance of which the reference is a part. This task often clashes with the by now ‘famous and infamous time-and-space constraints of subtitling’ (Gottlieb, 2004, p. 219 in Pedersen 2005, pp. 1–18). This means that certain devices that are at other translators’ disposal, such as footnotes, are virtually non-existent in subtitling and that the possibility of using other devices, such as explicitation, is limited. However, there remain several strategies to solve these crisis points (ranging from complete retention to complete omission over such strategies as generalization and adaptation) (Pedersen, 2005, pp. 1–18). Thus, subtitles will offer a condensed text in lieu of the fact that viewers can rely on the non-verbal text to make the right connections (see Pedersen 2005; Klaudy, 2005).

In respect to the quality of subtitles, semiotic cohesion seems to be the primary point at issue. Several forms of semiotic cohesion which should be achieved in order to produce a fluently subtitled multimodal product are distinguished by Jorge Díaz-Cintas and Aline Ramael (2007, pp. 49–53):

a) Interaction between words and images.

b) Interaction between speech and gesture.

c) Interaction between subtitles and camera movement / editing.

In other words, subtitles must interact with the verbal and the non-verbal language of the characters, the techniques of montage adjusted in the film and the images visible on the screen.

Hence, although multimodality is a complex system of different signs, it is easily understood by the audience watching or reading a multimodal product. On the other hand, it presents a great variety of manifold problems and challenges for the translator who is responsible for the fluently of the subtitled text, i.e., subtitles must not only clearly reveal the message of a source text but also semiotic cohesion between all semiotic channels of a film must be retained.

Evidently, the production and interpretation of a complex of semiotic modalities that are made available via the synchronized use of multiple media create the audiovisual communication. This phenomenon is understood as usually unnoticed by the viewers since they are traditionally able to make inter-modal connections and grasp the information realized through different semiotic resources in a subconscious manner. As Stöckl notes, “all modes <...> become a single unified gestalt in perception, and it is our neurological and cognitive disposition for multimodal information processing that is responsible for this kind of ease in our handling of multimodal artefacts” (2004, p. 16).

2. Garden of Eden: Issues of Semiotic Cohesion

Garden of Eden should offer an allusion to the Biblical garden of God but discloses the questions of death and anxiety instead. It is a story of a Swedish woman working at a nursing home in Lithuania. The action takes place in 2025. The protagonist becomes close to the residents who have come there to die in a respectful setting. The residents of the nursing house do not surrender to the stagnation and have a lot of strange wishes that are to be fulfilled by the so-called servants of the “Garden of Eden”.

The movie abounds in different culture-bound elements of verbal and non-verbal communication, such as body language, postures, gestures, and onomatopoeic words. In addition, bearing in mind that audiovisual translation does not occur between words but between cultures, types of humour the screenwriter employed in the movie, wordplay and satire are challenging to be transferred in translation for subtitling; thus, in such cases inter-cohesiveness between different channels becomes indispensable.

Examples of semiotic cohesion between cohesive interrelations between audio and visual channels, semiotic cohesion between subtitles and gesture as well as the intersemiotic balance between the written text and camera editing are discussed further. In each case, a linguistic description of the phenomenon is followed by an explanation of the phenomenon.

To begin with, the following case witnesses the importance of cohesive interrelations between audio and visual channels. The woman and the man are sitting in the car. It is an early morning, the woman is waking up from a nap: she is reclining in the passenger’s seat with her eyes closed and pronounces kukū which is an onomatopoeic phrase imitating the sound made by a bird cuckoo:

Segment 1. 1:13:064


Table 2. Transcription of speech

Source Text


Target Text


The woman reclines in a passenger’s seat, her eyes are closed.

The intention of this verbal sound kukū uttered by the woman is to give a sign and inform that she is no longer asleep, it has a function of a greeting in this context since birds usually sing at dawn. Lithuanian kukū, which in English would be translated as cuckoo, here means Good morning, I woke up. Thus, even though the informal meaning of the word is credited to American English, the word has been used to mean “stupid person’ since at least the 1580s, the imitation of the sound of a cuckoo in the context of the film serves as a playful and cheerful greeting and a sign of asking to pay attention; it also reflects the personality of the character and the relationship between the interlocutors.

Although the onomatopoeic word sounds similar both in Lithuanian and English, the pronunciation is not exactly the same. In the Lithuanian word, the second syllable is stressed, while in English, the first, the second or both syllables can be stressed, traditionally depending on the country from which the speaker comes, his dialect and so on. In this case, the translator chose not to subtitle the onomatopoeic sound, which is heard in the source text and, thus, the implication seems to be the option for solving the situation.

In the case of onomatopoeia strategy of omission was employed. On the one hand, the linguistic plane which is presumptive to be understood by the target audience would generate, to put into Tortoriello’s terms (2012, p. 61), redundancy in the target text, if it was simultaneous with subtitles and thus would lead to semiotic tautology. In other words, subtitles would generate needless verbalisation in this case. Moreover, the informative nature of the sound kukū made by its referent can be naturally perceived by the viewers since it sounds similar both in source and target languages and imparts comparable associations and meaning.

On the other hand, the untranslated verbal utterance with high semantic potential can entail a slight defamiliarisation in the target text. The possibility that the audience will not decipher its meaning due to the differences in pronunciation remains. Therefore, in order to avoid misinterpretations, the verbal phrase kukū should be turned in the form of the linguistic code and appear in subtitles as cuckoo since onomatopoeias are not the same across all languages and they usually conform to the broader linguistic system they are part of (Bredin, 2013, p. 557). Hence, the explicitation in this case would help to facilitate the semiotic cohesion and retain the interactive relationships between speech and subtitles.

A similar case is connected with the scene where the man is telling the woman about his secret plan to escape the nursing home:

Segment 2. 1:16:35


Table 3. Transcription of speech

Source Text


Target Text


The woman sits with her hands crossed, her face demonstrates the astonishment.

Having grasped her friend’s intentions, the woman is surprised and astonished. In the source text, she utters oho, which is a Lithuanian interjection, to express amazement. In the target text, the blank line appears in subtitles and the expression remains untranslated.

It is necessary to stress that in English oho also serves as an exclamation to express pleasing surprise or recognition. Although the interjection has the same meaning in both source and target languages, its pronunciation is different and can be misunderstood by the spectators when implicitated. Furthermore, more common exclamations to show surprise in English used in informal dialogues are ooh and wow, which could appear in the target text as equivalents of the Lithuanian oho.

In addition to this, semiotic cohesion displays cohesive trinomial relations among subtitles, the aural and the visual. In this case, subtitles, i.e., verbal visual channel does not coincide with the verbal auditory channel. Although very short, the verbal phrase has a high semantic potential as it reveals the character’s emotional state. Moreover, paralanguage is also important because the intonation and the pitch that follows the utterance dictate the same information as the linguistic code and, thus, strengthens its effect.

The semiotic cohesion in subtitles is also achieved through the synchrony between the verbal-visual channel and the verbal auditory channel, i.e., between subtitles and spoken language. As the analysed examples demonstrate, subtitlers tend to pursue implicitation and verbal information remains untranslated in some cases.

Speaking about semiotic cohesion between subtitles and gesture, the following segment is taken from the episode in which the old lady refuses to move from her place and is indirectly asking to leave her alone:

Segment 3. 26:00


Table 4. Transcription of speech

Source Text


Target Text

Just go!


The expressive face: eyes full of anguish, physiognomy expressing the sorrow. The woman swings her head and waves her hand.

When the nurse asks the old lady – to have a meal, the lady swings her head and waves her hand aside, demonstrating that she wants to be left alone. Moreover, the woman’s face is also very expressive: her eyes are full of anguish, and her physiognomy clearly expresses her sorrow. The old lady’s body gestures are accompanied by a very silent and almost inaudible verbal utterance Eikit. (En. Go.) which is translated into Just go! The utterance is accompanied by an intonation that conveys helplessness and sadness. The translator chooses to end this phrase with an exclamation mark which demonstrates the imperative. However, although a command is not strict and as the paralanguage, i.e., the pitch and the intonation of her voice, as well as conspicuous kinesics, illustrate, it is rather a plea to let her enjoy the moment and recall and recollect her old times in solitude.

The translator chooses to explicitate the verbal information, which is complemented by a non-verbal channel. This does not lead to “semiotic tautology”. As the instance illustrates, an attempt has been made to convey the non-verbal features of the original text via orthotypographical means. Nevertheless, the ellipsis (...) at the end of the dialogical line could have been one more option that would have helped reveal the speaker’s emotions, intonation and intention, with probably more faithful retention and degree of the semiotic cohesion between speech and gestures.

One more form of semiotic cohesion is the intersemiotic balance between the written text and camera editing; in other words, it deals with inter-cohesiveness between subtitles and camera movement. Subtitles must respect the visual narrative structure of a film and coincide with the picture presented on the screen. For example, the viewer has to see the character who is speaking at the moment and their verbal utterances are presented in the form of subtitles.

Segment 4 illustrates the case of implicitation that does not cause a problem of the disruption between visual and acoustic channels though at first sight shows how a camera movement conflicts with the linear sequence of the subtitles:

Segment 4. 14:12


Table 5. Transcription of speech

Source Text

Ar jūs matėte akiniuotą zuikį, griaužiantį morkas?

Target Text

Have you ever seen a rabbit with glasses?


The man sits still, takes a serious look and squints through his glasses.

The action of this episode takes place in the dining room. The waiter serves some steamed carrots to the senior, who in his turn thanks and says they are very delicious and healthy for the eyes. The kind waiter starts joking, asking if the senior has ever seen a rabbit with glasses. The encoded meaning of the joke is that rabbits eat a lot of carrots and, ostensibly, this is the reason why they do not need spectacles. The ST in Lithuanian sounds like “Have you ever seen a rabbit with glasses who would eat carrots?” meanwhile, reduction occurs in the TT. The need for brevity calls for a reduction.

In addition, while the humorous line is displayed in the form of subtitles, the camera is focused on the other senior nearby who is sitting still and is not eating. He takes a very serious look and squints through his glasses, and, thus, it produces a humorous effect as the viewers naturally and unconsciously understand that this man probably does not eat carrots. Two contrasting ideas, i.e., the juxtaposition of the shot of the man wearing glasses and subtitles on the screen, create an antithesis and the spectacled man is indirectly identified with a rabbit. The creation of two quickly altering shots in this scene is related to film semiotics, and they are deliberately used as an artistic tool.

As the selected instance illustrates, the subtitler chose to settle into a lower subtitling pace as the waiter’s verbal utterance – is presented not in the shot in which the speaker is visible but in the other shot where the implied subject of the dialogical line is seen. Ignoring camera editing is quite a dangerous alternative as it can lead to disruption if the visual narration and subtitles do not work in synchrony. However, in this case, the humorous effect is created, and the initial idea the film director aimed at is retained.


Semiotic cohesion is the focal point discussing the quality of subtitles. Several forms of it should be achieved – to produce a fluently subtitled product: semiotic cohesion between words and images, semiotic cohesion between speech and gesture, and semiotic cohesion between subtitles and camera movement / editing. Therefore, the translator’s task is – to translate the text – and maintain the interrelations between semiotic channels of the film and – sustain semiotic cohesion between the subtitles and the above-mentioned elements of a movie.

Even though implicitation and text reduction are commonly employed strategies in subtitling the selected instances illustrate that implicitation, when the omission is employed, may cause the loss of semiotic cohesion between audio and visual channels of the multimodal product.

The cases discussed prove that explicitations overcome the situation where the target audience cannot grasp the speaker’s communicative intentions. Explicitations expedite comprehension by reinforcing the association of words with either imagery / actual objects or situations and the additive effect of image and translation together results in a powerful combination.

Speaking about explicitation and implicitation in subtitling, the scope of the research was too narrow, and the results may be considered indicative. Nevertheless, the results of the case study prove that when translating from Lithuanian into English for subtitling, source language oriented cultural transference prevails.

In – cases of humour conveyance semiotic interrelations between subtitles, speech and gestures of the characters as well as camera editing, that in most cases have been sustained, play a decisive role.

Bearing in mind that the viewer’s perception is crucial in evaluating the final product, further studies can address the audience perception of culture-bound elements and humour rendering when translating from Lithuanian into English.


