iA


Trailer Mashups

by miguelb. Average Reading Time: about 2 minutes.

The final project for my Reading and Writing Electronic Text class was a joint effort between myself and Craig Protzel. Craig has previously worked with remixing trailers during his midterm and asked if I wanted to help him take the project further as a final project. I agreed and hilarity ensued.

The premise behind Trailer Mashups is to take any movie and create a trailer for it in another genre. Our example in this case, was to take Jaws and cut a trailer for it as a romantic comedy. Craig’s previous profession involved creating and producing trailers for the movie industry, so he has intimate knowledge of how genre specific trailers needs to be structured.

Trailer Structure

Our first step was to establish the structure for a typical romantic comedy trailer.

  • VO LINE 1 – INTRODUCE WORLD / LOCATION

    This was done by using Stanford’s NER. By running their engine on a corpus containing all of the lines from the script, a simple concordance gave up the most relevant location.

  • VO LINE 2 – INTRODUCE MAIN CHARACTERS

    The is a was a manual process in which we looked at IMDB for the leading male and female roles.

  • VO LINE 3 – ESTABLISH CURRENT SITUATION

    Using sentiment analysis, were able to tag each line in the dialog with a score ranging from -5 to 5. A random sampling of lines were then chosen with a score over 2 for the positive lines needed for this section

  • VO LINE 4 – INTRODUCE MAIN CONFLICT

    Using the same process as with the positive lines, in this this section we look for negative lines.

  • VO LINE 5 – DATE/TIME

    A manually created list of “time words” is used to search for the most relevant date/time setting

  • VO LINE 6 – SPEAK TO AUDIENCE

    Using a similar process for “time” words we used a list of action words.

  • VO LINE 7 – WRAP UP

    A random line for both the male and female leads is chose for this section.

  • VO LINE 8 – MOVIE TITLE

Content

Next we had to gather content. We found scripts on the internet and then imported them into Final Draft. Final Draft’s native file format is an

1
.fdx

file and it’s basically a

1
xml

file.

Structuring Data

After being able to parse through the script, we had to figure out how to structure all of the data we wanted to extract from it. We decided the best course of action was to establish a data hierarchy with the “Character” being the the most important component.

The data model looks like this:
Script
|
Characters
|
Scenes
|
Dialogues

All of the methods written were structured in a way to access the data through the characters. They became the glue that connected all of the other components together.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
#trailerMashup.py

# -*- coding: utf-8 -*-
from Script import Script
from Concordance import Concordance
from markov_by_char import CharacterMarkovGenerator
import markov
import random
import csv
import re
import string
import sys

voLines = []

def runMarkov():
    scripts = dict()
    voLines = []

    nums = {1: 5, 2: 4, 3: 5, 4: 5, 5: 4, 6: 4, 7: 4, 8: 4}
    scripts[1] = open('corpus/corpus_copy01.txt', 'r')
    scripts[2] = open('corpus/corpus_copy02.txt', 'r')
    scripts[3] = open('corpus/corpus_copy03.txt', 'r')
    scripts[4] = open('corpus/corpus_copy04.txt', 'r')
    scripts[5] = open('corpus/corpus_copy05.txt', 'r')
    scripts[6] = open('corpus/corpus_copy06.txt', 'r')
    scripts[7] = open('corpus/corpus_copy07.txt', 'r')
    scripts[8] = open('corpus/corpus_copy08.txt', 'r')

    for key in scripts:
        #print markov_for_all(scripts[key], nums[key])
        voLines.append( markov_for_all(scripts[key], nums[key]) )

    return voLines

def markov_for_all(script, num):
    generator = CharacterMarkovGenerator(n=num, max=100)
    lines = set()
    outputList = list()
    for line in script:
        line = line.strip()
        generator.feed(line)

    for i in range(5):
         outputList.append(generator.generate())

    return random.choice(outputList)



jaws = Script(sys.argv[1])
jaws.run()

#run markov for VO lines
voLines = runMarkov()  



#-----------------------------------------------------------
#print 1st VO line
print "#[ V01 ]:  " + voLines[0]


#get the location
location = None
concord = Concordance()
locationFile = open("corpus/jaws_locations.txt", 'r')
for line in locationFile:
    line = line.strip()
    concord.feed(line)
# -- Delete keys that are characters
for key in jaws.characters.keys():
    if  concord.concord.has_key(key):
        del concord.concord[key]


#-----------------------------------------------------------
#print 1st dialog line
locationLines = []
# -- Keep searchig until we get a location
while len(locationLines) == 0:
    location = random.choice(concord.most_common_words(5))
    regex = re.compile('\s%s\s'%location)
    locationLines = jaws.printDialogueForConditionOnAll(regex, 100)

print "\t#[ 1 ]:  " + random.choice( jaws.printDialogueForConditionOnAll(regex, 100) )


#-----------------------------------------------------------
#print 2nd VO line
print "\n#[ V02 ]:  " + voLines[1]


#-----------------------------------------------------------
#print 2nd 3rd dialog lines

#get main characters
brody = jaws.characters['brody']
ellen = jaws.characters['ellen']

print "\t#[ 2 ]:  " + brody.randomLine(30)
print "\t#[ 3 ]:  " + ellen.randomLine(30)

#-----------------------------------------------------------
#print 3rd VO line
print "\n#[ V03 ]:  " + voLines[2]

#-----------------------------------------------------------
#print 4th 5th dialog lines
cameTogetherScene = jaws.printDialogueForConditionWithinScene([brody, ellen], r'\byou\b.+\?$')
print "\t#[ 4 ]:  " + cameTogetherScene[0][0][0]  #this is super ugyl, but so be it
print "\t#[ 5 ]:  " + cameTogetherScene[1][0][0]

#-----------------------------------------------------------
#print 5th 6th dialog lines
cameTogetherScene = jaws.printDialogueForConditionWithinScene([ellen, brody], r'\byou\b.+\?$')
print "\n\t#[ 6 ]:  " + cameTogetherScene[0][0][0]  #this is super ugyl, but so be it
print "\t#[ 7 ]:  " + cameTogetherScene[1][0][0]

#-----------------------------------------------------------
#print 4th VO line
print "\n#[ V04 ]:  " + voLines[3]

#-----------------------------------------------------------
#print 7th 8th dialog lines
positiveLines = []
for key, value in jaws.characters.items():
    c = jaws.characters[key]
    for line in c.sentimentLines("pos", 1):
        if len(line) < 100:
            positiveLines.append(line)

#print positiveLines
print "\t#[ 8 ]:  " + random.choice(positiveLines)
random.shuffle(positiveLines)
print "\t#[ 9 ]:  " + random.choice(positiveLines)

#-----------------------------------------------------------
#print 4th VO line
print "\n#[ V05 ]:  " + voLines[4]



#-----------------------------------------------------------
#print 10th 11th dialog lines
negativeLines = []
for key, value in jaws.characters.items():
    c = jaws.characters[key]
    for line in c.sentimentLines("neg", 1):
        if len(line) < 100:
            negativeLines.append(line)

#print positiveLines
print "\t#[ 10 ]:  " + random.choice(negativeLines)
random.shuffle(positiveLines)
print "\t#[ 11 ]:  " + random.choice(negativeLines)

#print 5th VO line
print "\n#[ V06 ]:  " + voLines[5]



#-----------------------------------------------------------
#print 12th 13th dialog lines
exclamationList = list()
selectedExclamations = list()
allDialogue = list()
for key, value in jaws.characters.items():
    c = jaws.characters[key]
    for d in c.dialogues:
        allDialogue.append(d.text)

for line in allDialogue:
    if re.search(r"\!$", line):
        if len(line) < 50:
            exclamationList.append(line)

selectedExclamations = random.sample(exclamationList, 3)
print "\t#[ 12 ]:  " + selectedExclamations[0]
print "\t#[ 13 ]:  " + selectedExclamations[1]
print "\t#[ 14 ]:  " + selectedExclamations[2]


#-----------------------------------------------------------
#print 6th VO line
print "\n#[ V07 ]:  " + voLines[6]
actionSet = set()
allactions = []
for line in open("corpus/corpus_action.txt"):
    line = line.strip()
    allactions.append(line)

for line in jaws.allDialogue():
    for actionword in allactions:
        if re.search(actionword, line):
            actionSet.add(line)

selectedActions = random.sample(actionSet, 2)
print "\t#[ 15 ]:  " + selectedActions[0]
print "\t#[ 16 ]:  " + selectedActions[1]

 #-----------------------------------------------------------
#print 7th VO line
print "\n#[ V08 ]:  " + voLines[7]

meaningful_regex = r"(true|truth|life|eternity|forever|destiny|fate|future|finally|born|love|family|die|death|fortune|divine|wise|wisdom|experience|mental)"
meaningful_words = ["true", "truth", "life", "eternity", "forever" "destiny", "fate", "future", "finally", "born", "love", "family", "die", "death", "fortune", "divine", "wise", "wisdom", "experience", "mental"]
meaningfulList = list()

for line in jaws.allDialogue():
    if re.search(meaningful_regex, line):
        if len(line) < 140:
            if re.search(r"[^?]$", line):
                meaningfulList.append(line)

selectedMeaningful = random.sample(meaningfulList, 2)
print "\t#[ 17 ]:  " + selectedMeaningful[0]
print "\t#[ 18 ]:  " + selectedMeaningful[1]



sexyList = []
sexy_regex = r"(sex|kiss|hug|\bbody\b|naked|nude|bed|bedroom|relationship|relationships|legs|drunk|virgin|pregnant)"
for line in jaws.allDialogue():
    if re.search(sexy_regex, line):
        if len(line) < 60:
            sexyList.append(line)

selectedSexy = random.sample(sexyList, 1)
print "\n\t#[ 19 ]:  " + selectedSexy[0]

The Character Class

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#Character.py
# -*- coding: utf-8 -*-
import re
import random

class Character(object):

    def __init__(self, mName):
        self.name = mName.encode('utf-8')
        self.numWords = 0
        self.scenes = set()
        self.dialogues = []

    def totalwWordCount(self):
        if self.numWords == 0:
            for d in self.dialogues:
                self.numWords += d.count()

        return self.numWords


    def printDialogues(self):
        for d in self.dialogues:
            print "scene: " + str(d.sceneNum) + " text: " + d.text.lower() + "score: " + str(d.score)

    def randomLine(self, length):
        selectedLine = ""
        goodline = False
        while goodline == False:
            selectedLine = random.choice(self.dialogues).text
            if len(selectedLine) < length:
                goodline = True

        return selectedLine

    def sentimentLines(self, posOrNeg, threshold):
        plines = []
        for d in self.dialogues:
            if posOrNeg == "pos":
                if d.score >= threshold:
                    plines.append(d.text)
                    #print d.text
            elif posOrNeg == "neg":
                if d.score < threshold:
                    plines.append(d.text)

        return plines


    def printDialoguesForCondition(self, regex):
        #dList = []
        dDict = {}
        for d in self.dialogues:
            dialogueText= d.text.lower()
            if re.search(regex, dialogueText):
                #dList.append(d.text)
                if d.text.lower() not in dDict:
                    dDict[d.text.lower()] = d.sceneNum

        return dDict

The Scene Class

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# -*- coding: utf-8 -*-
class Scene(object):

    #internal class var for numbering Scenes
    index = 1

    def __init__(self, mLength, mText):
        self.num    = Scene.index
        self.length = mLength
        self.text   = mText.encode('utf-8')
        Scene.index += 1

    def __str__(self):
        return "SCENE: " + str(self.num) + " | name: " + self.text + " | length: " + str(self.length)

The Dialog Class

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# -*- coding: utf-8 -*-
import re

class Dialogue(object):

    def get_words_from_file(filename):
        all_words = list()
        for line in open(filename):
            if re.search(r'^;', line):
                pass
            else:
                line = line.strip()
                all_words.append(line)          

        return all_words

    positive_word_list = get_words_from_file("positive-words.txt")
    negative_word_list = get_words_from_file("negative-words.txt")

    def __init__(self, mText, mSceneNum):
        self.text     = mText.encode('utf-8')
        self.sceneNum = mSceneNum
        self.numWords = len(mText);
        self.score    = 0
        #TODO: add timeStamp ivar
       
        self.setScore()
       

    def count(self):
        return self.numWords

    def setScore(self):
        words           = self.text.split()
        pos_score       = len( set(words) & set(Dialogue.positive_word_list) )
        neg_score       = len( set(words) & set(Dialogue.negative_word_list) )
        sentiment_score = pos_score - neg_score
        self.score      = sentiment_score


    def __str__(self):
        return "DIALOGUE: text: " + self.text + " | sceneNum: " + str(self.sceneNum) + " | word count: " + str(self.numWords)

Next Steps

The goal of this project was to reach point where the idea of creating one of these trailers is doable via a website. Craig cut an example trailer based on what our script produced. Here’s the result: https://vimeo.com/41870939″

Additionally, we need a better way to pull lines from our script corpus. Right now it’s a smattering of processes but it’s still not as rich as if a human was doing the selection. At some point during the selection process, we do need to resort to random sampling. This process would be aided by using machine-assisted learning techniques. A Bayesian Classifier could be a possible tool with a corpus of romantic comedies.

One comment on ‘Trailer Mashups’

Leave a Reply