MovieRecommender

Modern Movie Recommender

TMDB is a database that hosts information about movie and TV shows. From here, you can get information like cast, budget, revenue, as well as basic information about a movie such as the date it was released and a brief synopsis.

In this project, I am going to build a movie recommendation app using data pulled back from TMDB.
TMDB has a robust API that is free to use for small sized projects. It has information on movies and TV shows. I will limit myself to movies made in 2000 or later. Using overviews of movies from TMDB, I will build recommendations for movies based on similarities (using natural language processing or NLP) in the text used to describe a movie along with their genres. These recommendations will then all be served in a RESTful API built in Flask. The flask web app I built is located here:

Get Movie Recommendations

Below is a guide of all the steps involved in this end-to-end project. I won’t go super in-depth on describing the methods or techniques, this is moreso showcasing how to replicate the work if you’d like to solve a similar problem.

Pulling Data from TMDB

TMDB has a well documented API. We will be using the “discover” and “movie” libraries. From looking over documentation for the discover API, we can see that only the API key is required, but there are several optional values to help us pull the data we need. I’ve decided to use a few options:

release year: cycle through each year to pull back a large selection of movies from each year
with_original_language: focus on english movies only
sort_by: using revenue as a proxy for popularity, want to sort by highest revenue films for each year in descending order

For the sake of keeping this manageable, I only pulled back movies from the year 2000 to now. I’m partial to newer movies, but I also am going in with the assumption that data for newer movies will be more accurate.

The code below will cycle through each page of results from the API for each year, returning the top 1,000 grossing movies for each year:

# import necessary libraries
import time
import pandas as pd
import numpy as np
import json
import requests
from config import tmdb_key
from rake_nltk import Rake
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

# declare static URL for API, empty list
tmdb_url = r"https://api.themoviedb.org/3/discover/movie"
tmdb = []

# Each page has 20 results
# To limit to widely known movies - limit to top 1,000 grossing movies per year; only pull back 50 pages per year for Discover Library API call
# Sort by revenue to get top 1,000, stop if there are no more pages
for year in range(2000, 2021):
    for page in range(1, 51):

        discover_params = {
            'api_key': tmdb_key,
            'primary_release_year': year,
            'include_adult': "false",
            'include_video': "false",
            'with_original_language': 'en',
            'sort_by': 'revenue.desc',
            'page': page
        }
        try: 
            response = requests.get(tmdb_url, params = discover_params)
            results = response.json()['results']
            for item in results:
                tmdb.append(item)
            print("Year {}, Page {} done".format(year, page))
        except:
            break

print("Count of movies: {:,}".format(len(tmdb)))

Count of movies: 21,000

Once we have this metadata from the Discover library, we can use the JSON results that we’ve put into a list to get specific information on each movie. I’m particularly interested in the overview and genre information from the movie library:

I’m going to declare a shell for the DataFrame that will house the information from the movie API call, then for each loop, make a new record in the DataFrame based on features I’d like to pull back:

columns = ['tmdb_id', 'imdb_id', 'title', 'budget', 'revenue', 'release_date', 'release_year', 'genres', 'overview']
movies_df = pd.DataFrame(columns=columns)

# static TMDB movie API URL
tmdb_movie_url = r"https://api.themoviedb.org/3/movie/"

print("Start time: ", time.strftime("%H:%M:%S", time.localtime(time.time())))

for movie in tmdb:
    movie_id = movie['id']
    overview = movie['overview']
    release_year = int(movie['release_date'][:4])
    
    # params specific to movie API
    tmdb_movie_params = {
        'api_key': tmdb_key,
        'language': "en-US"
    }
    
    # make request to API
    tmdb_response = requests.get(tmdb_movie_url+str(movie_id), params = tmdb_movie_params)
    # loop over only if data exists
    if tmdb_response.status_code == 200:
        tmdb_movie_json = tmdb_response.json()

        # make sure data exists - else save as NA
        # add into our DataFrame of movies
        # results show genre data needs to be flattened
        # return empty if no genre data
        try:
            genres_pd = pd.json_normalize(tmdb_movie_json['genres'])
            genres = genres_pd['name'].str.cat(sep = ', ')
            movies_df.loc[len(movies_df)] = [movie_id,
                                            tmdb_movie_json['imdb_id'],
                                            tmdb_movie_json['title'],
                                            tmdb_movie_json['budget'],
                                            tmdb_movie_json['revenue'],
                                            tmdb_movie_json['release_date'],
                                            release_year,
                                            genres,
                                            overview]
        except:
            movies_df.loc[len(movies_df)] = [movie_id,
                                    tmdb_movie_json['imdb_id'],
                                    tmdb_movie_json['title'],
                                    tmdb_movie_json['budget'],
                                    tmdb_movie_json['revenue'],
                                    tmdb_movie_json['release_date'],
                                    release_year,
                                    "NA",
                                    overview]
        print(".", end = " ")
        
print("/nEnded loop at: ", time.strftime("%H:%M:%S", time.localtime(time.time())))

This took roughly 50 minutes. From here, I’m adding a column that will have each genre split separately:

# add column which treats genres as a list
movies_df['genre_list'] = movies_df.genres.str.split(',')
movies_df.head(10)

After this was done, I saved to a data folder, so I could have a place to pick up from:

# Save data
movies_df.to_pickle('../Data/movies_df.pkl')

Creating Our Recommendation Data

To make accurate recommendations, we’ll be using a few different NLP techniques to clean and vectorize overviews for each movie, and use the keywords from each overview along with the genres to make movie recommendations
Before I go any further, I want to make sure that I will have good data to make recommendations with. So I’m going to limit to movies that have a genre, as well as minimum length for the overview. I’m also using the IMDB ID as an indicator for whether the data for the specifc movie is reliable or not, with the assumption that movies without one do not have reliable information.

tmdb_df = pd.read_pickle('../Data/movies_df.pkl')

# Before processing for key words, remove any movie titles from database that do not have an overview or genre

# Going to link to IMDB - exclude anything without an IMDB ID populated
imdb_mask =  tmdb_df['imdb_id'].str.len() > 0
# Overview must at least have at least a three word summary
overview_length_mask = tmdb_df['overview'].str.split().apply(lambda x: len(x)) >= 3
# Remove values of NA in genre column
genre_not_empty_mask = tmdb_df['genres'] != "NA"
tmdb_df1 = tmdb_df[(imdb_mask) & (overview_length_mask) & (genre_not_empty_mask)].copy()
# reset index
tmdb_df1.reset_index(inplace=True)
tmdb_df1.drop('index', axis=1, inplace=True)
tmdb_df1

After doing so, we are left with 9,650 movies:

	tmdb_id	imdb_id	title	budget	revenue	release_date	release_year	genres	overview	genre_list
0	955	tt0120755	Mission: Impossible II	125000000	546388105	2000-05-24	2000	Action, Adventure, Thriller	With computer genius Luther Stickell at his si...	[Action, Adventure, Thriller]
1	98	tt0172495	Gladiator	103000000	460583960	2000-05-01	2000	Action, Adventure, Drama	In the year 180, the death of emperor Marcus A...	[Action, Adventure, Drama]
2	8358	tt0162222	Cast Away	90000000	429632142	2000-12-22	2000	Adventure, Drama	Chuck Nolan, a top international manager for F...	[Adventure, Drama]
3	3981	tt0207201	What Women Want	70000000	374111707	2000-12-15	2000	Comedy, Romance	Advertising executive Nick Marshall is as cock...	[Comedy, Romance]
4	10567	tt0130623	Dinosaur	127500000	354248063	2000-05-19	2000	Animation, Family	An orphaned dinosaur raised by lemurs joins an...	[Animation, Family]
...	...	...	...	...	...	...	...	...	...	...
9645	714996	tt11957868	Peach	0	0	2020-01-13	2020	Comedy	A socially anxious young woman lands a hot dat...	[Comedy]
9646	714936	tt11754128	Atlas	0	0	2020-01-13	2020	Science Fiction	Atlas and his dog Charlie are both locked into...	[Science Fiction]
9647	714847	tt12299114	Trapped	0	0	2020-04-24	2020	Thriller	A stressed young drug addicted person who is h...	[Thriller]
9648	714842	tt12498618	8:46	0	0	2020-06-11	2020	Comedy, Documentary	From Dave: Normally I wouldn't show you someth...	[Comedy, Documentary]
9649	714836	tt12525356	Brock: Over the Top	0	0	2020-06-22	2020	Documentary	Brock: Over the Top is a feature length docume...	[Documentary]

9650 rows × 10 columns

The next part is to start preprocessing for our text data. Lemmatization is the process of removing inflections to return a word to its root form. This way, similar words can be analysed as a single item, as identified by the word’s lemma. For reference, see the table below:

Original Word	Word Lemma
Copied	Copy
Copying	Copy
Copies	Copy

Let’s go ahead and lemmatize our data. First, we’ll build a function to recognize parts of speech for each text, and lemma each word based on their part of speech tag:

# More pre-processing
# 1: Noise removal - get rid of non alphanumeric text
# 2: lemmatize text so root stays, but not different tenses or versions of same word (i.e. terrifying -> terrify)
    ## Much more powerful when part of speech for word is accurately identified
    ## Build function to accurately identify pos_tag, anothor to lemmatize text and return sentence lemmatized
lemm = WordNetLemmatizer()

def get_pos(tag):
    lemma_tag = tag[0].lower()
    return {
        "n": "n",
        "v": "v",
        "r": "r",
        "j": "a"
    }.get(lemma_tag, 'n')

def lemmatize(series):
    # RegEx removal of anything that is not alphanumeric
    s1 = series.str.replace('[^a-zA-Z\d\s:]', '')
    # tokenize
    s2 = s1.str.split()
    # get parts of speech tags for each word in each overview
    s3 = s2.apply(lambda x: pos_tag(x))
    # lemmatization
    s4 = s3.apply(lambda x:[lemm.lemmatize(word, pos=get_pos(tag)) for word, tag in x])
    # convert back to series
    lemma_series = s4.apply(lambda x: ' '.join(x))
    # lemmatized series
    return lemma_series

tmdb_df1['overview_lemma'] = lemmatize(tmdb_df1['overview'])

Lemma text versus non lemma text:

# Inspect
tmdb_df1['overview_lemma'][0]

'With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt race across Australia and Spain to stop a former IMF agent from unleash a genetically engineer biological weapon call Chimera This mission should Hunt choose to accept it plunge him into the center of an international crisis of terrify magnitude'

t1 = tmdb_df1['overview'].str.replace('[^a-zA-Z\d\s:]', '')
t1[0]

'With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt races across Australia and Spain to stop a former IMF agent from unleashing a genetically engineered biological weapon called Chimera This mission should Hunt choose to accept it plunges him into the center of an international crisis of terrifying magnitude'

Overall, five words were lemmatized in MI2’s overview (races, unleashing, engineered, plunges, terrifying).
From here, I’m going to use a method built to get keywords from text based on word occurence and co-occurence to make sure our recommender is only using important keywords to make recommendations:

from rake_nltk import Rake
# Rake - rapid automatic keyword extraction (semantically similar to TF-IDF)
    ## Gets keywords based on frequency of word occurence and co-occurence with other words in text
key_words = []
RAKE = Rake() 
for index, row in tmdb_df1.iterrows():
    RAKE.extract_keywords_from_text(row['overview_lemma'])
    key_word_scores = RAKE.get_word_degrees()
    key_words.append(list(key_word_scores.keys()))

Let’s check what was extracted:

# Check
print(tmdb_df1['overview_lemma'][0])
print(key_words[0])
print('Total words in overview: {:,}'.format(len(tmdb_df1['overview_lemma'][0].split())))
print('Total extracted: {:,}'.format(len(key_words[0])))

With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt race across Australia and Spain to stop a former IMF agent from unleash a genetically engineer biological weapon call Chimera This mission should Hunt choose to accept it plunge him into the center of an international crisis of terrify magnitude
['genetically', 'engineer', 'biological', 'weapon', 'call', 'chimera', 'mission', 'plunge', 'terrify', 'magnitude', 'former', 'imf', 'agent', 'beautiful', 'thief', 'side', 'stop', 'spain', 'center', 'hunt', 'choose', 'computer', 'genius', 'luther', 'stickell', 'unleash', 'international', 'crisis', 'mind', 'ethan', 'race', 'across', 'australia', 'accept']
Total words in overview: 58
Total extracted: 34

Overall, I think this is a good enough step for this problem. Let’s go ahead and append back to our dataframe.

# add back to dataframe
tmdb_df1['Key_Words'] = pd.Series(key_words)

The last step is to take the key words and the genres, and combine into a “bag of words” for each movie.

# Final combined DF with title and final list of words
recommend_df = pd.DataFrame(columns = ['Title', 'Recommender_BOW'])

# Iterate through each row of movie data, combine overview text and genre tags into one text column as bag of words
for i in range(len(tmdb_df1)):
    combined_row = [*tmdb_df1['Key_Words'].tolist()[i], *tmdb_df1['genre_list'].tolist()[i]]
    # join genre & key words from overview while removing double spaced characters
    recommend_df.loc[len(recommend_df)] = [tmdb_df1.loc[i, 'title'], ' '.join(combined_row).lower().replace('  ', ' ')]

Build recommendation

An easy way to understand how related a movie is, is to see if similar descriptions are used to describe two movies. However, before we can do that, we need a way to numerically represent the data. An easy way is to use CountVectorizer() from sklearn:

from sklearn.feature_extraction.text import CountVectorizer
# Vector representation of our bag of words using Count_Vectorizer: convert raw text into a sparse matrix to numerically represent words
# (can also use Python's collections library counter class)
    ## Because we have extracted key words - should just be binary i.e. whether word exists, rather than count of words
    ## Will have min document frequency of 2, max document frequency of 85%
    ## Another option would have been to include bigrams, but RAKE re-ordered key words from overview when extracting
#CV = CountVectorizer(min_df = 2, max_df = 0.85, ngram_range = (1,2))
CV = CountVectorizer(min_df = 2, max_df = 0.85)
CV_Matrix = CV.fit_transform(recommend_df['Recommender_BOW'])

# Take a look at the CV after processing words for our recommender
print("Count Vectorizer number of documents: {:,}".format(CV_Matrix.shape[0]))
print("Count Vectorizer number of unique words (vocabulary size): {:,}".format(CV_Matrix.shape[1]))

Count Vectorizer number of documents: 9,650
Count Vectorizer number of unique words (vocabulary size): 13,615

# Dictionary of word and position representing place in sparse matrix
print("Word: {} \nPosition: {:,}".format(list(CV.vocabulary_.keys())[0], list(CV.vocabulary_.values())[0]))

Word: genetically 
Position: 5,171

We can now numerically compare the ‘bag of words’ for each movie to each other. To get similarity, we can calculate the cosine similarity. It is a common similarity metric for measuring similarity between categorical data.

from sklearn.metrics.pairwise import cosine_similarity
# Matrix representing cosine_similarity once our vocabulary is transformed into a numeric vector reprsentation
cosine_similarities = cosine_similarity(CV_Matrix, CV_Matrix)

# Inspect
print(cosine_similarities)
# Top 10 similarity scores for first (should be mission impossible)
print(cosine_similarities[0].argsort())

[[1.         0.05484085 0.05976143 ... 0.04364358 0.         0.02020305]
 [0.05484085 1.         0.11470787 ... 0.         0.         0.03877834]
 [0.05976143 0.11470787 1.         ... 0.04564355 0.         0.02112886]
 ...
 [0.04364358 0.         0.04564355 ... 1.         0.         0.        ]
 [0.         0.         0.         ... 0.         1.         0.07559289]
 [0.02020305 0.03877834 0.02112886 ... 0.         0.07559289 1.        ]]
[4824 3701 3700 ... 7066 2769    0]

At this point - save cleaned TMDB DF & cosine_similarities to mark checkpoint to come back to

# Save data
tmdb_df1.to_pickle('../Data/tmdb_movies.pkl')

with open("../Data/movie_similarities.npy", 'wb') as npy:
    np.save(npy, cosine_similarities)

Inspect our results

The last part is to build a function to get recommendations for each movie, and review the results:

# Build recommender based on above
def recommend(title):
    idx = tmdb_df1[tmdb_df1['title'] == title].index[0]
    # top 50 recommendations
    similar_movies = pd.Series(cosine_similarities[idx]).sort_values(ascending = False)[1:51]
    # add similarity scores for top 50 - instead of iterating, pull back all data and drop null values
    recommend = pd.concat([tmdb_df1['imdb_id'], tmdb_df1['title'], tmdb_df1['release_year'], tmdb_df1['overview'], similar_movies], axis=1)
    recommend.columns = ['IMDB ID', 'Title', 'Year', 'Overview', 'Similarity Score']
    recommend = recommend.dropna()
    recommend = recommend.sort_values(by='Similarity Score', ascending=False)
    
    return recommend

recommend('Mission: Impossible II')[:20]

	IMDB ID	Title	Year	Overview	Similarity Score
2769	tt0317919	Mission: Impossible III	2006	Retired from active duty to train new IMF agen...	0.308607
7066	tt2381249	Mission: Impossible - Rogue Nation	2015	Ethan and team take on their most impossible m...	0.271052
9059	tt5033998	Charlie's Angels	2019	When a systems engineer blows the whistle on a...	0.216225
8541	tt4912910	Mission: Impossible - Fallout	2018	When an IMF mission ends badly, the world is f...	0.212512
5194	tt1509767	The Three Musketeers	2011	The hot-headed young D'Artagnan along with thr...	0.198898
5138	tt1229238	Mission: Impossible - Ghost Protocol	2011	Ethan Hunt and his team are racing against tim...	0.197245
6204	tt1517260	The Host	2013	A parasitic alien soul is injected into the bo...	0.195180
1058	tt0283160	Extreme Ops	2002	While filming an advertisement, some extreme s...	0.187523
4807	tt1032751	The Warrior's Way	2010	A warrior-assassin is forced to hide in a smal...	0.187523
5227	tt0993842	Hanna	2011	A 16-year-old girl raised by her father to be ...	0.184428
989	tt0280486	Bad Company	2002	When a Harvard-educated CIA agent is killed du...	0.180702
7567	tt0918940	The Legend of Tarzan	2016	Tarzan, having acclimated to life in London, i...	0.180702
6469	tt6703928	A Fool's Paradise	2013	James Bond is sent on a mission to investigate...	0.179284
8673	tt4669264	Beirut	2018	In 1980s Beirut, Mason Skiles is a former U.S....	0.176547
4745	tt1245526	RED	2010	When his peaceful life is threatened by a high...	0.176227
8054	tt3501632	Thor: Ragnarok	2017	Thor is imprisoned on the other side of the un...	0.176227
497	tt0266987	Spy Game	2001	On the day of his retirement, a veteran CIA ag...	0.172516
6221	tt2312718	Homefront	2013	Phil Broker is a former DEA agent who has gone...	0.170367
8952	tt9314132	When They Run	2018	A survivor of a zombie apocalypse is on the ru...	0.169031
8647	tt5177088	The Girl in the Spider's Web	2018	In Stockholm, Sweden, hacker Lisbeth Salander ...	0.169031

Not too shabby! We seem to recommend other Mission Impossible movies, as well as other actions movies. To get better results, we could use different methods to extract keywords or other preprocessing steps, as well as modifying our vector representation of our dictionary.

However, upon further inspection, I noticed that I was using a 9,650 x 9,650 2D Numpy array inside the recommend function. This is an incredibly large file, and would be hard to serve over the web without bogging down memory resources. I’m going to recreate the top 20 suggestions for each movie, save them to a dictionary with an index representation of each movie and their similarity. This will be used to get the recommendations on our web application.

Build recommender function

Below, I create a dictionary that takes the index value for each movie as the key, with the indexes of the top 20 most similar movies based on the cosine similarity between their key words and genres for each movie as the dictionary values.

recommender_dict = {}
for i in range (len(tmdf1)):
    recommender_dict.update({i: pd.Series(cosine_similarities[i]).sort_values(ascending = False)[1:21].to_dict()})

Let’s go ahead and save for future use:

import pickle

f = open("../Data/cosine_dict.pkl","wb")
pickle.dump(recommender_dict,f)
f.close()

Our new recommender function is going to use the newly created dictionary. It will still search for the index of the title requested by the user, but this time, it will use that as the dictionary key. From there, we’re mapping back to the original dataframe, and if there is a match on the dataframe index and an index value within the dictionary, a score will be returned. All other movies will show NaN. We’ll remove null values, and sort by highest similarity, and return those top 20 results:

def recommend(title):
    idx = tmdf1[tmdf1['title'] == title].index[0]
    dict_ref = recommender_dict[idx]
    df_copy = tmdf1.copy()
    df_copy['similarity'] = df_copy.index.map(dict_ref)
    df_cleaned = df_copy[df.similarity.notna()]
    df_sorted = df_cleaned.sort_values(by='similarity', ascending=False)
    return df_sorted

Are the results the same? Let’s take a look at our example movie Mission: Impossible II:

recommend('Mission: Impossible II')

	tmdb_id	imdb_id	title	budget	revenue	release_date	release_year	genres	overview	genre_list	overview_lemma	Key_Words	similarity
2769	956	tt0317919	Mission: Impossible III	150000000	397850012	2006-05-03	2006	Action, Adventure, Thriller	Retired from active duty to train new IMF agen...	[Action, Adventure, Thriller]	Retired from active duty to train new IMF agen...	[mission, call, back, retired, train, new, imf...	0.308607
7066	177677	tt2381249	Mission: Impossible - Rogue Nation	150000000	682330139	2015-07-23	2015	Action, Adventure	Ethan and team take on their most impossible m...	[Action, Adventure]	Ethan and team take on their most impossible m...	[imf, syndicate, destroy, team, take, highlysk...	0.271052
9059	458897	tt5033998	Charlie's Angels	48000000	73279888	2019-11-14	2019	Action, Adventure, Comedy	When a systems engineer blows the whistle on a...	[Action, Adventure, Comedy]	When a system engineer blow the whistle on a d...	[system, engineer, blow, line, across, protect...	0.216225
8541	353081	tt4912910	Mission: Impossible - Fallout	178000000	791017452	2018-07-13	2018	Action, Adventure	When an IMF mission ends badly, the world is f...	[Action, Adventure]	When an IMF mission end badly the world be fac...	[time, hunt, loyalty, assassin, world, race, f...	0.212512
5194	52451	tt1509767	The Three Musketeers	75000000	132274484	2011-08-31	2011	Action, Adventure, Thriller	The hot-headed young D'Artagnan along with thr...	[Action, Adventure, Thriller]	The hotheaded young DArtagnan along with three...	[engulf, europe, hotheaded, young, dartagnan, ...	0.198898
5138	56292	tt1229238	Mission: Impossible - Ghost Protocol	145000000	694713380	2011-12-07	2011	Action, Adventure, Thriller	Ethan Hunt and his team are racing against tim...	[Action, Adventure, Thriller]	Ethan Hunt and his team be race against time t...	[bombing, force, disavow, kremlin, stop, ethan...	0.197245
6204	72710	tt1517260	The Host	44000000	63327201	2013-03-22	2013	Action, Adventure, Romance, Science Fiction, T...	A parasitic alien soul is injected into the bo...	[Action, Adventure, Romance, Science Fictio...	A parasitic alien soul be inject into the body...	[melanie, stryder, instead, inject, body, carr...	0.195180
4807	46528	tt1032751	The Warrior's Way	42000000	11087569	2010-12-02	2010	Action, Adventure, Fantasy, Thriller, Western	A warrior-assassin is forced to hide in a smal...	[Action, Adventure, Fantasy, Thriller, Wes...	A warriorassassin be force to hide in a small ...	[mission, hide, refuse, american, badlands, fo...	0.187523
1058	15074	tt0283160	Extreme Ops	40000000	10959475	2002-11-27	2002	Action, Adventure, Drama, Thriller	While filming an advertisement, some extreme s...	[Action, Adventure, Drama, Thriller]	While film an advertisement some extreme sport...	[advertisement, terrorist, film, group, extrem...	0.187523
5227	50456	tt0993842	Hanna	30000000	63782078	2011-04-07	2011	Action, Adventure, Thriller	A 16-year-old girl raised by her father to be ...	[Action, Adventure, Thriller]	A 16yearold girl raise by her father to be the...	[mission, across, europe, tracked, dispatch, h...	0.184428
7567	258489	tt0918940	The Legend of Tarzan	180000000	356743061	2016-06-06	2016	Action, Adventure	Tarzan, having acclimated to life in London, i...	[Action, Adventure]	Tarzan have acclimate to life in London be cal...	[investigate, call, back, acclimate, tarzan, m...	0.180702
989	3132	tt0280486	Bad Company	70000000	65977295	2002-06-07	2002	Action, Adventure, Comedy, Thriller	When a Harvard-educated CIA agent is killed du...	[Action, Adventure, Comedy, Thriller]	When a Harvardeducated CIA agent be kill durin...	[kill, twin, brother, harvardeducated, cia, ag...	0.180702
6469	699220	tt6703928	A Fool's Paradise	0	0	2013-05-04	2013	Action, Thriller	James Bond is sent on a mission to investigate...	[Action, Thriller]	James Bond be send on a mission to investigate...	[mission, send, investigate, michael, kristato...	0.179284
8673	399248	tt4669264	Beirut	0	7258534	2018-04-11	2018	Action, Drama, Thriller	In 1980s Beirut, Mason Skiles is a former U.S....	[Action, Drama, Thriller]	In 1980s Beirut Mason Skiles be a former US di...	[mission, former, us, diplomat, call, back, ci...	0.176547
4745	39514	tt1245526	RED	58000000	71664962	2010-10-13	2010	Action, Adventure, Comedy, Crime, Thriller	When his peaceful life is threatened by a high...	[Action, Adventure, Comedy, Crime, Thriller]	When his peaceful life be threaten by a highte...	[peaceful, life, uncover, old, team, assailant...	0.176227
8054	284053	tt3501632	Thor: Ragnarok	180000000	853977126	2017-10-25	2017	Action, Adventure, Comedy, Fantasy	Thor is imprisoned on the other side of the un...	[Action, Adventure, Comedy, Fantasy]	Thor be imprison on the other side of the univ...	[asgardian, civilization, end, destruction, th...	0.176227
497	1535	tt0266987	Spy Game	115000000	143049560	2001-11-18	2001	Action, Crime, Thriller	On the day of his retirement, a veteran CIA ag...	[Action, Crime, Thriller]	On the day of his retirement a veteran CIA age...	[former, protg, die, arrest, international, sc...	0.172516
6221	204082	tt2312718	Homefront	22000000	43058898	2013-11-12	2013	Action, Thriller	Phil Broker is a former DEA agent who has gone...	[Action, Thriller]	Phil Broker be a former DEA agent who have go ...	[school, recently, widow, event, cost, 9yearso...	0.170367
3316	1620	tt0465494	Hitman	24000000	99965753	2007-11-21	2007	Action, Crime, Drama, Thriller	The best-selling videogame, Hitman, roars to l...	[Action, Crime, Drama, Thriller]	The bestselling videogame Hitman roar to life ...	[prey, international, intrigue, barrel, blaze,...	0.169031
8647	446807	tt5177088	The Girl in the Spider's Web	43000000	17894345	2018-10-25	2018	Action, Crime, Thriller	In Stockholm, Sweden, hacker Lisbeth Salander ...	[Action, Crime, Thriller]	In Stockholm Sweden hacker Lisbeth Salander be...	[exist, computer, engineer, stockholm, sweden,...	0.169031

It worked! Time do build a flask app and deploy.

Build a Flask App to Serve Recommendations

Since front end/HTML isn’t my forte, I’ll just disclose that I used Bootstrap to design the front end. IMO, it is the quickest way to get up and running with having a functioning, responsive site without worrying about customizing heavily or advanced frameworks.

I’ll just showcase the app.py file to build the Flask app. In it, I do a few things:

Use Flask’s render_template function to serve an HTML file, which takes variables and uses the Jinja2 templating language to help return results.
Build helper functions to:
- Populate a list of movies within the search bar
- Get the overview, IMDB ID and title of the movie requested to serve as a summary before recommendations
- And of course, the recommend function created earlier

from flask import Flask, render_template, request, Response
import pandas as pd
from helper import get_movies, choose, overview, imdb, recommend

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def index():

    movielist = get_movies()
    if request.method == 'GET':
        return render_template('index.html', recommendations = pd.DataFrame(columns=['title', 'ID', 'year', 'overview']), movielist=movielist, movie="", overview = "", imdb="")
    
    if request.method == 'POST':

        if request.form.get('submit') == 'search':
            movie = request.form.get('movie')
            if movie in movielist:
                recs = recommend(movie)[:20]
                return render_template('index.html', recommendations = recs, movielist=movielist, movie=movie, overview = overview(movie), imdb=imdb(movie))
            else:
                return render_template('error.html')
        elif request.form.get('submit') == 'random':
            movie = choose()
            recs = recommend(movie)[:20]
            return render_template('index.html', recommendations = recs, movielist=movielist, movie=movie, overview = overview(movie), imdb=imdb(movie))

        

if __name__ == '__main__':
	app.run(debug=True)