MovieRecommender

Modern Movie Recommender

TMDB is a database that hosts information about movie and TV shows. From here, you can get information like cast, budget, revenue, as well as basic information about a movie such as the date it was released and a brief synopsis.

In this project, I am going to build a movie recommendation app using data pulled back from TMDB.
TMDB has a robust API that is free to use for small sized projects. It has information on movies and TV shows. I will limit myself to movies made in 2000 or later. Using overviews of movies from TMDB, I will build recommendations for movies based on similarities (using natural language processing or NLP) in the text used to describe a movie along with their genres. These recommendations will then all be served in a RESTful API built in Flask. The flask web app I built is located here:

Get Movie Recommendations

Below is a guide of all the steps involved in this end-to-end project. I won’t go super in-depth on describing the methods or techniques, this is moreso showcasing how to replicate the work if you’d like to solve a similar problem.

Pulling Data from TMDB

TMDB has a well documented API. We will be using the “discover” and “movie” libraries. From looking over documentation for the discover API, we can see that only the API key is required, but there are several optional values to help us pull the data we need. I’ve decided to use a few options:

For the sake of keeping this manageable, I only pulled back movies from the year 2000 to now. I’m partial to newer movies, but I also am going in with the assumption that data for newer movies will be more accurate.

The code below will cycle through each page of results from the API for each year, returning the top 1,000 grossing movies for each year:

# import necessary libraries
import time
import pandas as pd
import numpy as np
import json
import requests
from config import tmdb_key
from rake_nltk import Rake
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
# declare static URL for API, empty list
tmdb_url = r"https://api.themoviedb.org/3/discover/movie"
tmdb = []

# Each page has 20 results
# To limit to widely known movies - limit to top 1,000 grossing movies per year; only pull back 50 pages per year for Discover Library API call
# Sort by revenue to get top 1,000, stop if there are no more pages
for year in range(2000, 2021):
    for page in range(1, 51):

        discover_params = {
            'api_key': tmdb_key,
            'primary_release_year': year,
            'include_adult': "false",
            'include_video': "false",
            'with_original_language': 'en',
            'sort_by': 'revenue.desc',
            'page': page
        }
        try: 
            response = requests.get(tmdb_url, params = discover_params)
            results = response.json()['results']
            for item in results:
                tmdb.append(item)
            print("Year {}, Page {} done".format(year, page))
        except:
            break
print("Count of movies: {:,}".format(len(tmdb)))
Count of movies: 21,000

Once we have this metadata from the Discover library, we can use the JSON results that we’ve put into a list to get specific information on each movie. I’m particularly interested in the overview and genre information from the movie library:

I’m going to declare a shell for the DataFrame that will house the information from the movie API call, then for each loop, make a new record in the DataFrame based on features I’d like to pull back:

columns = ['tmdb_id', 'imdb_id', 'title', 'budget', 'revenue', 'release_date', 'release_year', 'genres', 'overview']
movies_df = pd.DataFrame(columns=columns)
# static TMDB movie API URL
tmdb_movie_url = r"https://api.themoviedb.org/3/movie/"

print("Start time: ", time.strftime("%H:%M:%S", time.localtime(time.time())))

for movie in tmdb:
    movie_id = movie['id']
    overview = movie['overview']
    release_year = int(movie['release_date'][:4])
    
    # params specific to movie API
    tmdb_movie_params = {
        'api_key': tmdb_key,
        'language': "en-US"
    }
    
    # make request to API
    tmdb_response = requests.get(tmdb_movie_url+str(movie_id), params = tmdb_movie_params)
    # loop over only if data exists
    if tmdb_response.status_code == 200:
        tmdb_movie_json = tmdb_response.json()

        # make sure data exists - else save as NA
        # add into our DataFrame of movies
        # results show genre data needs to be flattened
        # return empty if no genre data
        try:
            genres_pd = pd.json_normalize(tmdb_movie_json['genres'])
            genres = genres_pd['name'].str.cat(sep = ', ')
            movies_df.loc[len(movies_df)] = [movie_id,
                                            tmdb_movie_json['imdb_id'],
                                            tmdb_movie_json['title'],
                                            tmdb_movie_json['budget'],
                                            tmdb_movie_json['revenue'],
                                            tmdb_movie_json['release_date'],
                                            release_year,
                                            genres,
                                            overview]
        except:
            movies_df.loc[len(movies_df)] = [movie_id,
                                    tmdb_movie_json['imdb_id'],
                                    tmdb_movie_json['title'],
                                    tmdb_movie_json['budget'],
                                    tmdb_movie_json['revenue'],
                                    tmdb_movie_json['release_date'],
                                    release_year,
                                    "NA",
                                    overview]
        print(".", end = " ")
        
print("/nEnded loop at: ", time.strftime("%H:%M:%S", time.localtime(time.time())))

This took roughly 50 minutes. From here, I’m adding a column that will have each genre split separately:

# add column which treats genres as a list
movies_df['genre_list'] = movies_df.genres.str.split(',')
movies_df.head(10)

After this was done, I saved to a data folder, so I could have a place to pick up from:

# Save data
movies_df.to_pickle('../Data/movies_df.pkl')

Creating Our Recommendation Data

To make accurate recommendations, we’ll be using a few different NLP techniques to clean and vectorize overviews for each movie, and use the keywords from each overview along with the genres to make movie recommendations
Before I go any further, I want to make sure that I will have good data to make recommendations with. So I’m going to limit to movies that have a genre, as well as minimum length for the overview. I’m also using the IMDB ID as an indicator for whether the data for the specifc movie is reliable or not, with the assumption that movies without one do not have reliable information.

tmdb_df = pd.read_pickle('../Data/movies_df.pkl')
# Before processing for key words, remove any movie titles from database that do not have an overview or genre

# Going to link to IMDB - exclude anything without an IMDB ID populated
imdb_mask =  tmdb_df['imdb_id'].str.len() > 0
# Overview must at least have at least a three word summary
overview_length_mask = tmdb_df['overview'].str.split().apply(lambda x: len(x)) >= 3
# Remove values of NA in genre column
genre_not_empty_mask = tmdb_df['genres'] != "NA"
tmdb_df1 = tmdb_df[(imdb_mask) & (overview_length_mask) & (genre_not_empty_mask)].copy()
# reset index
tmdb_df1.reset_index(inplace=True)
tmdb_df1.drop('index', axis=1, inplace=True)
tmdb_df1

After doing so, we are left with 9,650 movies:

tmdb_id imdb_id title budget revenue release_date release_year genres overview genre_list
0 955 tt0120755 Mission: Impossible II 125000000 546388105 2000-05-24 2000 Action, Adventure, Thriller With computer genius Luther Stickell at his si... [Action, Adventure, Thriller]
1 98 tt0172495 Gladiator 103000000 460583960 2000-05-01 2000 Action, Adventure, Drama In the year 180, the death of emperor Marcus A... [Action, Adventure, Drama]
2 8358 tt0162222 Cast Away 90000000 429632142 2000-12-22 2000 Adventure, Drama Chuck Nolan, a top international manager for F... [Adventure, Drama]
3 3981 tt0207201 What Women Want 70000000 374111707 2000-12-15 2000 Comedy, Romance Advertising executive Nick Marshall is as cock... [Comedy, Romance]
4 10567 tt0130623 Dinosaur 127500000 354248063 2000-05-19 2000 Animation, Family An orphaned dinosaur raised by lemurs joins an... [Animation, Family]
... ... ... ... ... ... ... ... ... ... ...
9645 714996 tt11957868 Peach 0 0 2020-01-13 2020 Comedy A socially anxious young woman lands a hot dat... [Comedy]
9646 714936 tt11754128 Atlas 0 0 2020-01-13 2020 Science Fiction Atlas and his dog Charlie are both locked into... [Science Fiction]
9647 714847 tt12299114 Trapped 0 0 2020-04-24 2020 Thriller A stressed young drug addicted person who is h... [Thriller]
9648 714842 tt12498618 8:46 0 0 2020-06-11 2020 Comedy, Documentary From Dave: Normally I wouldn't show you someth... [Comedy, Documentary]
9649 714836 tt12525356 Brock: Over the Top 0 0 2020-06-22 2020 Documentary Brock: Over the Top is a feature length docume... [Documentary]

9650 rows × 10 columns

The next part is to start preprocessing for our text data. Lemmatization is the process of removing inflections to return a word to its root form. This way, similar words can be analysed as a single item, as identified by the word’s lemma. For reference, see the table below:

Original Word Word Lemma
Copied Copy
Copying Copy
Copies Copy

Let’s go ahead and lemmatize our data. First, we’ll build a function to recognize parts of speech for each text, and lemma each word based on their part of speech tag:

# More pre-processing
# 1: Noise removal - get rid of non alphanumeric text
# 2: lemmatize text so root stays, but not different tenses or versions of same word (i.e. terrifying -> terrify)
    ## Much more powerful when part of speech for word is accurately identified
    ## Build function to accurately identify pos_tag, anothor to lemmatize text and return sentence lemmatized
lemm = WordNetLemmatizer()

def get_pos(tag):
    lemma_tag = tag[0].lower()
    return {
        "n": "n",
        "v": "v",
        "r": "r",
        "j": "a"
    }.get(lemma_tag, 'n')

def lemmatize(series):
    # RegEx removal of anything that is not alphanumeric
    s1 = series.str.replace('[^a-zA-Z\d\s:]', '')
    # tokenize
    s2 = s1.str.split()
    # get parts of speech tags for each word in each overview
    s3 = s2.apply(lambda x: pos_tag(x))
    # lemmatization
    s4 = s3.apply(lambda x:[lemm.lemmatize(word, pos=get_pos(tag)) for word, tag in x])
    # convert back to series
    lemma_series = s4.apply(lambda x: ' '.join(x))
    # lemmatized series
    return lemma_series
tmdb_df1['overview_lemma'] = lemmatize(tmdb_df1['overview'])

Lemma text versus non lemma text:

# Inspect
tmdb_df1['overview_lemma'][0]
'With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt race across Australia and Spain to stop a former IMF agent from unleash a genetically engineer biological weapon call Chimera This mission should Hunt choose to accept it plunge him into the center of an international crisis of terrify magnitude'
t1 = tmdb_df1['overview'].str.replace('[^a-zA-Z\d\s:]', '')
t1[0]
'With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt races across Australia and Spain to stop a former IMF agent from unleashing a genetically engineered biological weapon called Chimera This mission should Hunt choose to accept it plunges him into the center of an international crisis of terrifying magnitude'

Overall, five words were lemmatized in MI2’s overview (races, unleashing, engineered, plunges, terrifying).
From here, I’m going to use a method built to get keywords from text based on word occurence and co-occurence to make sure our recommender is only using important keywords to make recommendations:

from rake_nltk import Rake
# Rake - rapid automatic keyword extraction (semantically similar to TF-IDF)
    ## Gets keywords based on frequency of word occurence and co-occurence with other words in text
key_words = []
RAKE = Rake() 
for index, row in tmdb_df1.iterrows():
    RAKE.extract_keywords_from_text(row['overview_lemma'])
    key_word_scores = RAKE.get_word_degrees()
    key_words.append(list(key_word_scores.keys()))

Let’s check what was extracted:

# Check
print(tmdb_df1['overview_lemma'][0])
print(key_words[0])
print('Total words in overview: {:,}'.format(len(tmdb_df1['overview_lemma'][0].split())))
print('Total extracted: {:,}'.format(len(key_words[0])))
With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt race across Australia and Spain to stop a former IMF agent from unleash a genetically engineer biological weapon call Chimera This mission should Hunt choose to accept it plunge him into the center of an international crisis of terrify magnitude
['genetically', 'engineer', 'biological', 'weapon', 'call', 'chimera', 'mission', 'plunge', 'terrify', 'magnitude', 'former', 'imf', 'agent', 'beautiful', 'thief', 'side', 'stop', 'spain', 'center', 'hunt', 'choose', 'computer', 'genius', 'luther', 'stickell', 'unleash', 'international', 'crisis', 'mind', 'ethan', 'race', 'across', 'australia', 'accept']
Total words in overview: 58
Total extracted: 34

Overall, I think this is a good enough step for this problem. Let’s go ahead and append back to our dataframe.

# add back to dataframe
tmdb_df1['Key_Words'] = pd.Series(key_words)

The last step is to take the key words and the genres, and combine into a “bag of words” for each movie.

# Final combined DF with title and final list of words
recommend_df = pd.DataFrame(columns = ['Title', 'Recommender_BOW'])
# Iterate through each row of movie data, combine overview text and genre tags into one text column as bag of words
for i in range(len(tmdb_df1)):
    combined_row = [*tmdb_df1['Key_Words'].tolist()[i], *tmdb_df1['genre_list'].tolist()[i]]
    # join genre & key words from overview while removing double spaced characters
    recommend_df.loc[len(recommend_df)] = [tmdb_df1.loc[i, 'title'], ' '.join(combined_row).lower().replace('  ', ' ')]

Build recommendation

An easy way to understand how related a movie is, is to see if similar descriptions are used to describe two movies. However, before we can do that, we need a way to numerically represent the data. An easy way is to use CountVectorizer() from sklearn:

from sklearn.feature_extraction.text import CountVectorizer
# Vector representation of our bag of words using Count_Vectorizer: convert raw text into a sparse matrix to numerically represent words
# (can also use Python's collections library counter class)
    ## Because we have extracted key words - should just be binary i.e. whether word exists, rather than count of words
    ## Will have min document frequency of 2, max document frequency of 85%
    ## Another option would have been to include bigrams, but RAKE re-ordered key words from overview when extracting
#CV = CountVectorizer(min_df = 2, max_df = 0.85, ngram_range = (1,2))
CV = CountVectorizer(min_df = 2, max_df = 0.85)
CV_Matrix = CV.fit_transform(recommend_df['Recommender_BOW'])
# Take a look at the CV after processing words for our recommender
print("Count Vectorizer number of documents: {:,}".format(CV_Matrix.shape[0]))
print("Count Vectorizer number of unique words (vocabulary size): {:,}".format(CV_Matrix.shape[1]))
Count Vectorizer number of documents: 9,650
Count Vectorizer number of unique words (vocabulary size): 13,615
# Dictionary of word and position representing place in sparse matrix
print("Word: {} \nPosition: {:,}".format(list(CV.vocabulary_.keys())[0], list(CV.vocabulary_.values())[0]))
Word: genetically 
Position: 5,171

We can now numerically compare the ‘bag of words’ for each movie to each other. To get similarity, we can calculate the cosine similarity. It is a common similarity metric for measuring similarity between categorical data.

from sklearn.metrics.pairwise import cosine_similarity
# Matrix representing cosine_similarity once our vocabulary is transformed into a numeric vector reprsentation
cosine_similarities = cosine_similarity(CV_Matrix, CV_Matrix)
# Inspect
print(cosine_similarities)
# Top 10 similarity scores for first (should be mission impossible)
print(cosine_similarities[0].argsort())
[[1.         0.05484085 0.05976143 ... 0.04364358 0.         0.02020305]
 [0.05484085 1.         0.11470787 ... 0.         0.         0.03877834]
 [0.05976143 0.11470787 1.         ... 0.04564355 0.         0.02112886]
 ...
 [0.04364358 0.         0.04564355 ... 1.         0.         0.        ]
 [0.         0.         0.         ... 0.         1.         0.07559289]
 [0.02020305 0.03877834 0.02112886 ... 0.         0.07559289 1.        ]]
[4824 3701 3700 ... 7066 2769    0]

At this point - save cleaned TMDB DF & cosine_similarities to mark checkpoint to come back to

# Save data
tmdb_df1.to_pickle('../Data/tmdb_movies.pkl')
with open("../Data/movie_similarities.npy", 'wb') as npy:
    np.save(npy, cosine_similarities)

Inspect our results

The last part is to build a function to get recommendations for each movie, and review the results:

# Build recommender based on above
def recommend(title):
    idx = tmdb_df1[tmdb_df1['title'] == title].index[0]
    # top 50 recommendations
    similar_movies = pd.Series(cosine_similarities[idx]).sort_values(ascending = False)[1:51]
    # add similarity scores for top 50 - instead of iterating, pull back all data and drop null values
    recommend = pd.concat([tmdb_df1['imdb_id'], tmdb_df1['title'], tmdb_df1['release_year'], tmdb_df1['overview'], similar_movies], axis=1)
    recommend.columns = ['IMDB ID', 'Title', 'Year', 'Overview', 'Similarity Score']
    recommend = recommend.dropna()
    recommend = recommend.sort_values(by='Similarity Score', ascending=False)
    
    return recommend
recommend('Mission: Impossible II')[:20]
IMDB ID Title Year Overview Similarity Score
2769 tt0317919 Mission: Impossible III 2006 Retired from active duty to train new IMF agen... 0.308607
7066 tt2381249 Mission: Impossible - Rogue Nation 2015 Ethan and team take on their most impossible m... 0.271052
9059 tt5033998 Charlie's Angels 2019 When a systems engineer blows the whistle on a... 0.216225
8541 tt4912910 Mission: Impossible - Fallout 2018 When an IMF mission ends badly, the world is f... 0.212512
5194 tt1509767 The Three Musketeers 2011 The hot-headed young D'Artagnan along with thr... 0.198898
5138 tt1229238 Mission: Impossible - Ghost Protocol 2011 Ethan Hunt and his team are racing against tim... 0.197245
6204 tt1517260 The Host 2013 A parasitic alien soul is injected into the bo... 0.195180
1058 tt0283160 Extreme Ops 2002 While filming an advertisement, some extreme s... 0.187523
4807 tt1032751 The Warrior's Way 2010 A warrior-assassin is forced to hide in a smal... 0.187523
5227 tt0993842 Hanna 2011 A 16-year-old girl raised by her father to be ... 0.184428
989 tt0280486 Bad Company 2002 When a Harvard-educated CIA agent is killed du... 0.180702
7567 tt0918940 The Legend of Tarzan 2016 Tarzan, having acclimated to life in London, i... 0.180702
6469 tt6703928 A Fool's Paradise 2013 James Bond is sent on a mission to investigate... 0.179284
8673 tt4669264 Beirut 2018 In 1980s Beirut, Mason Skiles is a former U.S.... 0.176547
4745 tt1245526 RED 2010 When his peaceful life is threatened by a high... 0.176227
8054 tt3501632 Thor: Ragnarok 2017 Thor is imprisoned on the other side of the un... 0.176227
497 tt0266987 Spy Game 2001 On the day of his retirement, a veteran CIA ag... 0.172516
6221 tt2312718 Homefront 2013 Phil Broker is a former DEA agent who has gone... 0.170367
8952 tt9314132 When They Run 2018 A survivor of a zombie apocalypse is on the ru... 0.169031
8647 tt5177088 The Girl in the Spider's Web 2018 In Stockholm, Sweden, hacker Lisbeth Salander ... 0.169031

Not too shabby! We seem to recommend other Mission Impossible movies, as well as other actions movies. To get better results, we could use different methods to extract keywords or other preprocessing steps, as well as modifying our vector representation of our dictionary.

However, upon further inspection, I noticed that I was using a 9,650 x 9,650 2D Numpy array inside the recommend function. This is an incredibly large file, and would be hard to serve over the web without bogging down memory resources. I’m going to recreate the top 20 suggestions for each movie, save them to a dictionary with an index representation of each movie and their similarity. This will be used to get the recommendations on our web application.

Build recommender function

Below, I create a dictionary that takes the index value for each movie as the key, with the indexes of the top 20 most similar movies based on the cosine similarity between their key words and genres for each movie as the dictionary values.

recommender_dict = {}
for i in range (len(tmdf1)):
    recommender_dict.update({i: pd.Series(cosine_similarities[i]).sort_values(ascending = False)[1:21].to_dict()})

Let’s go ahead and save for future use:

import pickle

f = open("../Data/cosine_dict.pkl","wb")
pickle.dump(recommender_dict,f)
f.close()

Our new recommender function is going to use the newly created dictionary. It will still search for the index of the title requested by the user, but this time, it will use that as the dictionary key. From there, we’re mapping back to the original dataframe, and if there is a match on the dataframe index and an index value within the dictionary, a score will be returned. All other movies will show NaN. We’ll remove null values, and sort by highest similarity, and return those top 20 results:

def recommend(title):
    idx = tmdf1[tmdf1['title'] == title].index[0]
    dict_ref = recommender_dict[idx]
    df_copy = tmdf1.copy()
    df_copy['similarity'] = df_copy.index.map(dict_ref)
    df_cleaned = df_copy[df.similarity.notna()]
    df_sorted = df_cleaned.sort_values(by='similarity', ascending=False)
    return df_sorted

Are the results the same? Let’s take a look at our example movie Mission: Impossible II:

recommend('Mission: Impossible II')
tmdb_id imdb_id title budget revenue release_date release_year genres overview genre_list overview_lemma Key_Words similarity
2769 956 tt0317919 Mission: Impossible III 150000000 397850012 2006-05-03 2006 Action, Adventure, Thriller Retired from active duty to train new IMF agen... [Action, Adventure, Thriller] Retired from active duty to train new IMF agen... [mission, call, back, retired, train, new, imf... 0.308607
7066 177677 tt2381249 Mission: Impossible - Rogue Nation 150000000 682330139 2015-07-23 2015 Action, Adventure Ethan and team take on their most impossible m... [Action, Adventure] Ethan and team take on their most impossible m... [imf, syndicate, destroy, team, take, highlysk... 0.271052
9059 458897 tt5033998 Charlie's Angels 48000000 73279888 2019-11-14 2019 Action, Adventure, Comedy When a systems engineer blows the whistle on a... [Action, Adventure, Comedy] When a system engineer blow the whistle on a d... [system, engineer, blow, line, across, protect... 0.216225
8541 353081 tt4912910 Mission: Impossible - Fallout 178000000 791017452 2018-07-13 2018 Action, Adventure When an IMF mission ends badly, the world is f... [Action, Adventure] When an IMF mission end badly the world be fac... [time, hunt, loyalty, assassin, world, race, f... 0.212512
5194 52451 tt1509767 The Three Musketeers 75000000 132274484 2011-08-31 2011 Action, Adventure, Thriller The hot-headed young D'Artagnan along with thr... [Action, Adventure, Thriller] The hotheaded young DArtagnan along with three... [engulf, europe, hotheaded, young, dartagnan, ... 0.198898
5138 56292 tt1229238 Mission: Impossible - Ghost Protocol 145000000 694713380 2011-12-07 2011 Action, Adventure, Thriller Ethan Hunt and his team are racing against tim... [Action, Adventure, Thriller] Ethan Hunt and his team be race against time t... [bombing, force, disavow, kremlin, stop, ethan... 0.197245
6204 72710 tt1517260 The Host 44000000 63327201 2013-03-22 2013 Action, Adventure, Romance, Science Fiction, T... A parasitic alien soul is injected into the bo... [Action, Adventure, Romance, Science Fictio... A parasitic alien soul be inject into the body... [melanie, stryder, instead, inject, body, carr... 0.195180
4807 46528 tt1032751 The Warrior's Way 42000000 11087569 2010-12-02 2010 Action, Adventure, Fantasy, Thriller, Western A warrior-assassin is forced to hide in a smal... [Action, Adventure, Fantasy, Thriller, Wes... A warriorassassin be force to hide in a small ... [mission, hide, refuse, american, badlands, fo... 0.187523
1058 15074 tt0283160 Extreme Ops 40000000 10959475 2002-11-27 2002 Action, Adventure, Drama, Thriller While filming an advertisement, some extreme s... [Action, Adventure, Drama, Thriller] While film an advertisement some extreme sport... [advertisement, terrorist, film, group, extrem... 0.187523
5227 50456 tt0993842 Hanna 30000000 63782078 2011-04-07 2011 Action, Adventure, Thriller A 16-year-old girl raised by her father to be ... [Action, Adventure, Thriller] A 16yearold girl raise by her father to be the... [mission, across, europe, tracked, dispatch, h... 0.184428
7567 258489 tt0918940 The Legend of Tarzan 180000000 356743061 2016-06-06 2016 Action, Adventure Tarzan, having acclimated to life in London, i... [Action, Adventure] Tarzan have acclimate to life in London be cal... [investigate, call, back, acclimate, tarzan, m... 0.180702
989 3132 tt0280486 Bad Company 70000000 65977295 2002-06-07 2002 Action, Adventure, Comedy, Thriller When a Harvard-educated CIA agent is killed du... [Action, Adventure, Comedy, Thriller] When a Harvardeducated CIA agent be kill durin... [kill, twin, brother, harvardeducated, cia, ag... 0.180702
6469 699220 tt6703928 A Fool's Paradise 0 0 2013-05-04 2013 Action, Thriller James Bond is sent on a mission to investigate... [Action, Thriller] James Bond be send on a mission to investigate... [mission, send, investigate, michael, kristato... 0.179284
8673 399248 tt4669264 Beirut 0 7258534 2018-04-11 2018 Action, Drama, Thriller In 1980s Beirut, Mason Skiles is a former U.S.... [Action, Drama, Thriller] In 1980s Beirut Mason Skiles be a former US di... [mission, former, us, diplomat, call, back, ci... 0.176547
4745 39514 tt1245526 RED 58000000 71664962 2010-10-13 2010 Action, Adventure, Comedy, Crime, Thriller When his peaceful life is threatened by a high... [Action, Adventure, Comedy, Crime, Thriller] When his peaceful life be threaten by a highte... [peaceful, life, uncover, old, team, assailant... 0.176227
8054 284053 tt3501632 Thor: Ragnarok 180000000 853977126 2017-10-25 2017 Action, Adventure, Comedy, Fantasy Thor is imprisoned on the other side of the un... [Action, Adventure, Comedy, Fantasy] Thor be imprison on the other side of the univ... [asgardian, civilization, end, destruction, th... 0.176227
497 1535 tt0266987 Spy Game 115000000 143049560 2001-11-18 2001 Action, Crime, Thriller On the day of his retirement, a veteran CIA ag... [Action, Crime, Thriller] On the day of his retirement a veteran CIA age... [former, protg, die, arrest, international, sc... 0.172516
6221 204082 tt2312718 Homefront 22000000 43058898 2013-11-12 2013 Action, Thriller Phil Broker is a former DEA agent who has gone... [Action, Thriller] Phil Broker be a former DEA agent who have go ... [school, recently, widow, event, cost, 9yearso... 0.170367
3316 1620 tt0465494 Hitman 24000000 99965753 2007-11-21 2007 Action, Crime, Drama, Thriller The best-selling videogame, Hitman, roars to l... [Action, Crime, Drama, Thriller] The bestselling videogame Hitman roar to life ... [prey, international, intrigue, barrel, blaze,... 0.169031
8647 446807 tt5177088 The Girl in the Spider's Web 43000000 17894345 2018-10-25 2018 Action, Crime, Thriller In Stockholm, Sweden, hacker Lisbeth Salander ... [Action, Crime, Thriller] In Stockholm Sweden hacker Lisbeth Salander be... [exist, computer, engineer, stockholm, sweden,... 0.169031

It worked! Time do build a flask app and deploy.

Build a Flask App to Serve Recommendations

Since front end/HTML isn’t my forte, I’ll just disclose that I used Bootstrap to design the front end. IMO, it is the quickest way to get up and running with having a functioning, responsive site without worrying about customizing heavily or advanced frameworks.

I’ll just showcase the app.py file to build the Flask app. In it, I do a few things:

from flask import Flask, render_template, request, Response
import pandas as pd
from helper import get_movies, choose, overview, imdb, recommend

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def index():

    movielist = get_movies()
    if request.method == 'GET':
        return render_template('index.html', recommendations = pd.DataFrame(columns=['title', 'ID', 'year', 'overview']), movielist=movielist, movie="", overview = "", imdb="")
    
    if request.method == 'POST':

        if request.form.get('submit') == 'search':
            movie = request.form.get('movie')
            if movie in movielist:
                recs = recommend(movie)[:20]
                return render_template('index.html', recommendations = recs, movielist=movielist, movie=movie, overview = overview(movie), imdb=imdb(movie))
            else:
                return render_template('error.html')
        elif request.form.get('submit') == 'random':
            movie = choose()
            recs = recommend(movie)[:20]
            return render_template('index.html', recommendations = recs, movielist=movielist, movie=movie, overview = overview(movie), imdb=imdb(movie))

        

if __name__ == '__main__':
	app.run(debug=True)