Trailer Mashups
by miguelb. Average Reading Time: about 2 minutes.
The final project for my Reading and Writing Electronic Text class was a joint effort between myself and Craig Protzel. Craig has previously worked with remixing trailers during his midterm and asked if I wanted to help him take the project further as a final project. I agreed and hilarity ensued.
The premise behind Trailer Mashups is to take any movie and create a trailer for it in another genre. Our example in this case, was to take Jaws and cut a trailer for it as a romantic comedy. Craig’s previous profession involved creating and producing trailers for the movie industry, so he has intimate knowledge of how genre specific trailers needs to be structured.
Trailer Structure
Our first step was to establish the structure for a typical romantic comedy trailer.
- VO LINE 1 – INTRODUCE WORLD / LOCATION
This was done by using Stanford’s NER. By running their engine on a corpus containing all of the lines from the script, a simple concordance gave up the most relevant location.
- VO LINE 2 – INTRODUCE MAIN CHARACTERS
The is a was a manual process in which we looked at IMDB for the leading male and female roles.
- VO LINE 3 – ESTABLISH CURRENT SITUATION
Using sentiment analysis, were able to tag each line in the dialog with a score ranging from -5 to 5. A random sampling of lines were then chosen with a score over 2 for the positive lines needed for this section
- VO LINE 4 – INTRODUCE MAIN CONFLICT
Using the same process as with the positive lines, in this this section we look for negative lines.
- VO LINE 5 – DATE/TIME
A manually created list of “time words” is used to search for the most relevant date/time setting
- VO LINE 6 – SPEAK TO AUDIENCE
Using a similar process for “time” words we used a list of action words.
- VO LINE 7 – WRAP UP
A random line for both the male and female leads is chose for this section.
- VO LINE 8 – MOVIE TITLE
Content
Next we had to gather content. We found scripts on the internet and then imported them into Final Draft. Final Draft’s native file format is an
1 | .fdx |
file and it’s basically a
1 | xml |
file.
Structuring Data
After being able to parse through the script, we had to figure out how to structure all of the data we wanted to extract from it. We decided the best course of action was to establish a data hierarchy with the “Character” being the the most important component.
The data model looks like this: Script | Characters | Scenes | Dialogues
All of the methods written were structured in a way to access the data through the characters. They became the glue that connected all of the other components together.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | #trailerMashup.py # -*- coding: utf-8 -*- from Script import Script from Concordance import Concordance from markov_by_char import CharacterMarkovGenerator import markov import random import csv import re import string import sys voLines = [] def runMarkov(): scripts = dict() voLines = [] nums = {1: 5, 2: 4, 3: 5, 4: 5, 5: 4, 6: 4, 7: 4, 8: 4} scripts[1] = open('corpus/corpus_copy01.txt', 'r') scripts[2] = open('corpus/corpus_copy02.txt', 'r') scripts[3] = open('corpus/corpus_copy03.txt', 'r') scripts[4] = open('corpus/corpus_copy04.txt', 'r') scripts[5] = open('corpus/corpus_copy05.txt', 'r') scripts[6] = open('corpus/corpus_copy06.txt', 'r') scripts[7] = open('corpus/corpus_copy07.txt', 'r') scripts[8] = open('corpus/corpus_copy08.txt', 'r') for key in scripts: #print markov_for_all(scripts[key], nums[key]) voLines.append( markov_for_all(scripts[key], nums[key]) ) return voLines def markov_for_all(script, num): generator = CharacterMarkovGenerator(n=num, max=100) lines = set() outputList = list() for line in script: line = line.strip() generator.feed(line) for i in range(5): outputList.append(generator.generate()) return random.choice(outputList) jaws = Script(sys.argv[1]) jaws.run() #run markov for VO lines voLines = runMarkov() #----------------------------------------------------------- #print 1st VO line print "#[ V01 ]: " + voLines[0] #get the location location = None concord = Concordance() locationFile = open("corpus/jaws_locations.txt", 'r') for line in locationFile: line = line.strip() concord.feed(line) # -- Delete keys that are characters for key in jaws.characters.keys(): if concord.concord.has_key(key): del concord.concord[key] #----------------------------------------------------------- #print 1st dialog line locationLines = [] # -- Keep searchig until we get a location while len(locationLines) == 0: location = random.choice(concord.most_common_words(5)) regex = re.compile('\s%s\s'%location) locationLines = jaws.printDialogueForConditionOnAll(regex, 100) print "\t#[ 1 ]: " + random.choice( jaws.printDialogueForConditionOnAll(regex, 100) ) #----------------------------------------------------------- #print 2nd VO line print "\n#[ V02 ]: " + voLines[1] #----------------------------------------------------------- #print 2nd 3rd dialog lines #get main characters brody = jaws.characters['brody'] ellen = jaws.characters['ellen'] print "\t#[ 2 ]: " + brody.randomLine(30) print "\t#[ 3 ]: " + ellen.randomLine(30) #----------------------------------------------------------- #print 3rd VO line print "\n#[ V03 ]: " + voLines[2] #----------------------------------------------------------- #print 4th 5th dialog lines cameTogetherScene = jaws.printDialogueForConditionWithinScene([brody, ellen], r'\byou\b.+\?$') print "\t#[ 4 ]: " + cameTogetherScene[0][0][0] #this is super ugyl, but so be it print "\t#[ 5 ]: " + cameTogetherScene[1][0][0] #----------------------------------------------------------- #print 5th 6th dialog lines cameTogetherScene = jaws.printDialogueForConditionWithinScene([ellen, brody], r'\byou\b.+\?$') print "\n\t#[ 6 ]: " + cameTogetherScene[0][0][0] #this is super ugyl, but so be it print "\t#[ 7 ]: " + cameTogetherScene[1][0][0] #----------------------------------------------------------- #print 4th VO line print "\n#[ V04 ]: " + voLines[3] #----------------------------------------------------------- #print 7th 8th dialog lines positiveLines = [] for key, value in jaws.characters.items(): c = jaws.characters[key] for line in c.sentimentLines("pos", 1): if len(line) < 100: positiveLines.append(line) #print positiveLines print "\t#[ 8 ]: " + random.choice(positiveLines) random.shuffle(positiveLines) print "\t#[ 9 ]: " + random.choice(positiveLines) #----------------------------------------------------------- #print 4th VO line print "\n#[ V05 ]: " + voLines[4] #----------------------------------------------------------- #print 10th 11th dialog lines negativeLines = [] for key, value in jaws.characters.items(): c = jaws.characters[key] for line in c.sentimentLines("neg", 1): if len(line) < 100: negativeLines.append(line) #print positiveLines print "\t#[ 10 ]: " + random.choice(negativeLines) random.shuffle(positiveLines) print "\t#[ 11 ]: " + random.choice(negativeLines) #print 5th VO line print "\n#[ V06 ]: " + voLines[5] #----------------------------------------------------------- #print 12th 13th dialog lines exclamationList = list() selectedExclamations = list() allDialogue = list() for key, value in jaws.characters.items(): c = jaws.characters[key] for d in c.dialogues: allDialogue.append(d.text) for line in allDialogue: if re.search(r"\!$", line): if len(line) < 50: exclamationList.append(line) selectedExclamations = random.sample(exclamationList, 3) print "\t#[ 12 ]: " + selectedExclamations[0] print "\t#[ 13 ]: " + selectedExclamations[1] print "\t#[ 14 ]: " + selectedExclamations[2] #----------------------------------------------------------- #print 6th VO line print "\n#[ V07 ]: " + voLines[6] actionSet = set() allactions = [] for line in open("corpus/corpus_action.txt"): line = line.strip() allactions.append(line) for line in jaws.allDialogue(): for actionword in allactions: if re.search(actionword, line): actionSet.add(line) selectedActions = random.sample(actionSet, 2) print "\t#[ 15 ]: " + selectedActions[0] print "\t#[ 16 ]: " + selectedActions[1] #----------------------------------------------------------- #print 7th VO line print "\n#[ V08 ]: " + voLines[7] meaningful_regex = r"(true|truth|life|eternity|forever|destiny|fate|future|finally|born|love|family|die|death|fortune|divine|wise|wisdom|experience|mental)" meaningful_words = ["true", "truth", "life", "eternity", "forever" "destiny", "fate", "future", "finally", "born", "love", "family", "die", "death", "fortune", "divine", "wise", "wisdom", "experience", "mental"] meaningfulList = list() for line in jaws.allDialogue(): if re.search(meaningful_regex, line): if len(line) < 140: if re.search(r"[^?]$", line): meaningfulList.append(line) selectedMeaningful = random.sample(meaningfulList, 2) print "\t#[ 17 ]: " + selectedMeaningful[0] print "\t#[ 18 ]: " + selectedMeaningful[1] sexyList = [] sexy_regex = r"(sex|kiss|hug|\bbody\b|naked|nude|bed|bedroom|relationship|relationships|legs|drunk|virgin|pregnant)" for line in jaws.allDialogue(): if re.search(sexy_regex, line): if len(line) < 60: sexyList.append(line) selectedSexy = random.sample(sexyList, 1) print "\n\t#[ 19 ]: " + selectedSexy[0] |
The Character Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | #Character.py # -*- coding: utf-8 -*- import re import random class Character(object): def __init__(self, mName): self.name = mName.encode('utf-8') self.numWords = 0 self.scenes = set() self.dialogues = [] def totalwWordCount(self): if self.numWords == 0: for d in self.dialogues: self.numWords += d.count() return self.numWords def printDialogues(self): for d in self.dialogues: print "scene: " + str(d.sceneNum) + " text: " + d.text.lower() + "score: " + str(d.score) def randomLine(self, length): selectedLine = "" goodline = False while goodline == False: selectedLine = random.choice(self.dialogues).text if len(selectedLine) < length: goodline = True return selectedLine def sentimentLines(self, posOrNeg, threshold): plines = [] for d in self.dialogues: if posOrNeg == "pos": if d.score >= threshold: plines.append(d.text) #print d.text elif posOrNeg == "neg": if d.score < threshold: plines.append(d.text) return plines def printDialoguesForCondition(self, regex): #dList = [] dDict = {} for d in self.dialogues: dialogueText= d.text.lower() if re.search(regex, dialogueText): #dList.append(d.text) if d.text.lower() not in dDict: dDict[d.text.lower()] = d.sceneNum return dDict |
The Scene Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # -*- coding: utf-8 -*- class Scene(object): #internal class var for numbering Scenes index = 1 def __init__(self, mLength, mText): self.num = Scene.index self.length = mLength self.text = mText.encode('utf-8') Scene.index += 1 def __str__(self): return "SCENE: " + str(self.num) + " | name: " + self.text + " | length: " + str(self.length) |
The Dialog Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | # -*- coding: utf-8 -*- import re class Dialogue(object): def get_words_from_file(filename): all_words = list() for line in open(filename): if re.search(r'^;', line): pass else: line = line.strip() all_words.append(line) return all_words positive_word_list = get_words_from_file("positive-words.txt") negative_word_list = get_words_from_file("negative-words.txt") def __init__(self, mText, mSceneNum): self.text = mText.encode('utf-8') self.sceneNum = mSceneNum self.numWords = len(mText); self.score = 0 #TODO: add timeStamp ivar self.setScore() def count(self): return self.numWords def setScore(self): words = self.text.split() pos_score = len( set(words) & set(Dialogue.positive_word_list) ) neg_score = len( set(words) & set(Dialogue.negative_word_list) ) sentiment_score = pos_score - neg_score self.score = sentiment_score def __str__(self): return "DIALOGUE: text: " + self.text + " | sceneNum: " + str(self.sceneNum) + " | word count: " + str(self.numWords) |
Next Steps
The goal of this project was to reach point where the idea of creating one of these trailers is doable via a website. Craig cut an example trailer based on what our script produced. Here’s the result: https://vimeo.com/41870939″
Additionally, we need a better way to pull lines from our script corpus. Right now it’s a smattering of processes but it’s still not as rich as if a human was doing the selection. At some point during the selection process, we do need to resort to random sampling. This process would be aided by using machine-assisted learning techniques. A Bayesian Classifier could be a possible tool with a corpus of romantic comedies.

[...] « Previous [...]