TMDB is a database that hosts information about movie and TV shows. From here, you can get information like cast, budget, revenue, as well as basic information about a movie such as the date it was released and a brief synopsis.
In this project, I am going to build a movie recommendation app using data pulled back from TMDB.
TMDB has a robust API that is free to use for small sized projects. It has information on movies and TV shows. I will limit myself to movies made in 2000 or later. Using overviews of movies from TMDB, I will build recommendations for movies based on similarities (using natural language processing or NLP) in the text used to describe a movie along with their genres. These recommendations will then all be served in a RESTful API built in Flask. The flask web app I built is located here:
Below is a guide of all the steps involved in this end-to-end project. I won’t go super in-depth on describing the methods or techniques, this is moreso showcasing how to replicate the work if you’d like to solve a similar problem.
TMDB has a well documented API. We will be using the “discover” and “movie” libraries. From looking over documentation for the discover API, we can see that only the API key is required, but there are several optional values to help us pull the data we need. I’ve decided to use a few options:
For the sake of keeping this manageable, I only pulled back movies from the year 2000 to now. I’m partial to newer movies, but I also am going in with the assumption that data for newer movies will be more accurate.
# import necessary libraries
import time
import pandas as pd
import numpy as np
import json
import requests
from config import tmdb_key
from rake_nltk import Rake
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
# declare static URL for API, empty list
tmdb_url = r"https://api.themoviedb.org/3/discover/movie"
tmdb = []
# Each page has 20 results
# To limit to widely known movies - limit to top 1,000 grossing movies per year; only pull back 50 pages per year for Discover Library API call
# Sort by revenue to get top 1,000, stop if there are no more pages
for year in range(2000, 2021):
for page in range(1, 51):
discover_params = {
'api_key': tmdb_key,
'primary_release_year': year,
'include_adult': "false",
'include_video': "false",
'with_original_language': 'en',
'sort_by': 'revenue.desc',
'page': page
}
try:
response = requests.get(tmdb_url, params = discover_params)
results = response.json()['results']
for item in results:
tmdb.append(item)
print("Year {}, Page {} done".format(year, page))
except:
break
print("Count of movies: {:,}".format(len(tmdb)))
Count of movies: 21,000
Once we have this metadata from the Discover library, we can use the JSON results that we’ve put into a list to get specific information on each movie. I’m particularly interested in the overview and genre information from the movie library:
columns = ['tmdb_id', 'imdb_id', 'title', 'budget', 'revenue', 'release_date', 'release_year', 'genres', 'overview']
movies_df = pd.DataFrame(columns=columns)
# static TMDB movie API URL
tmdb_movie_url = r"https://api.themoviedb.org/3/movie/"
print("Start time: ", time.strftime("%H:%M:%S", time.localtime(time.time())))
for movie in tmdb:
movie_id = movie['id']
overview = movie['overview']
release_year = int(movie['release_date'][:4])
# params specific to movie API
tmdb_movie_params = {
'api_key': tmdb_key,
'language': "en-US"
}
# make request to API
tmdb_response = requests.get(tmdb_movie_url+str(movie_id), params = tmdb_movie_params)
# loop over only if data exists
if tmdb_response.status_code == 200:
tmdb_movie_json = tmdb_response.json()
# make sure data exists - else save as NA
# add into our DataFrame of movies
# results show genre data needs to be flattened
# return empty if no genre data
try:
genres_pd = pd.json_normalize(tmdb_movie_json['genres'])
genres = genres_pd['name'].str.cat(sep = ', ')
movies_df.loc[len(movies_df)] = [movie_id,
tmdb_movie_json['imdb_id'],
tmdb_movie_json['title'],
tmdb_movie_json['budget'],
tmdb_movie_json['revenue'],
tmdb_movie_json['release_date'],
release_year,
genres,
overview]
except:
movies_df.loc[len(movies_df)] = [movie_id,
tmdb_movie_json['imdb_id'],
tmdb_movie_json['title'],
tmdb_movie_json['budget'],
tmdb_movie_json['revenue'],
tmdb_movie_json['release_date'],
release_year,
"NA",
overview]
print(".", end = " ")
print("/nEnded loop at: ", time.strftime("%H:%M:%S", time.localtime(time.time())))
This took roughly 50 minutes. From here, I’m adding a column that will have each genre split separately:
# add column which treats genres as a list
movies_df['genre_list'] = movies_df.genres.str.split(',')
movies_df.head(10)
After this was done, I saved to a data folder, so I could have a place to pick up from:
# Save data
movies_df.to_pickle('../Data/movies_df.pkl')
To make accurate recommendations, we’ll be using a few different NLP techniques to clean and vectorize overviews for each movie, and use the keywords from each overview along with the genres to make movie recommendations
Before I go any further, I want to make sure that I will have good data to make recommendations with. So I’m going to limit to movies that have a genre, as well as minimum length for the overview. I’m also using the IMDB ID as an indicator for whether the data for the specifc movie is reliable or not, with the assumption that movies without one do not have reliable information.
tmdb_df = pd.read_pickle('../Data/movies_df.pkl')
# Before processing for key words, remove any movie titles from database that do not have an overview or genre
# Going to link to IMDB - exclude anything without an IMDB ID populated
imdb_mask = tmdb_df['imdb_id'].str.len() > 0
# Overview must at least have at least a three word summary
overview_length_mask = tmdb_df['overview'].str.split().apply(lambda x: len(x)) >= 3
# Remove values of NA in genre column
genre_not_empty_mask = tmdb_df['genres'] != "NA"
tmdb_df1 = tmdb_df[(imdb_mask) & (overview_length_mask) & (genre_not_empty_mask)].copy()
# reset index
tmdb_df1.reset_index(inplace=True)
tmdb_df1.drop('index', axis=1, inplace=True)
tmdb_df1
After doing so, we are left with 9,650 movies:
tmdb_id | imdb_id | title | budget | revenue | release_date | release_year | genres | overview | genre_list | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 955 | tt0120755 | Mission: Impossible II | 125000000 | 546388105 | 2000-05-24 | 2000 | Action, Adventure, Thriller | With computer genius Luther Stickell at his si... | [Action, Adventure, Thriller] |
1 | 98 | tt0172495 | Gladiator | 103000000 | 460583960 | 2000-05-01 | 2000 | Action, Adventure, Drama | In the year 180, the death of emperor Marcus A... | [Action, Adventure, Drama] |
2 | 8358 | tt0162222 | Cast Away | 90000000 | 429632142 | 2000-12-22 | 2000 | Adventure, Drama | Chuck Nolan, a top international manager for F... | [Adventure, Drama] |
3 | 3981 | tt0207201 | What Women Want | 70000000 | 374111707 | 2000-12-15 | 2000 | Comedy, Romance | Advertising executive Nick Marshall is as cock... | [Comedy, Romance] |
4 | 10567 | tt0130623 | Dinosaur | 127500000 | 354248063 | 2000-05-19 | 2000 | Animation, Family | An orphaned dinosaur raised by lemurs joins an... | [Animation, Family] |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9645 | 714996 | tt11957868 | Peach | 0 | 0 | 2020-01-13 | 2020 | Comedy | A socially anxious young woman lands a hot dat... | [Comedy] |
9646 | 714936 | tt11754128 | Atlas | 0 | 0 | 2020-01-13 | 2020 | Science Fiction | Atlas and his dog Charlie are both locked into... | [Science Fiction] |
9647 | 714847 | tt12299114 | Trapped | 0 | 0 | 2020-04-24 | 2020 | Thriller | A stressed young drug addicted person who is h... | [Thriller] |
9648 | 714842 | tt12498618 | 8:46 | 0 | 0 | 2020-06-11 | 2020 | Comedy, Documentary | From Dave: Normally I wouldn't show you someth... | [Comedy, Documentary] |
9649 | 714836 | tt12525356 | Brock: Over the Top | 0 | 0 | 2020-06-22 | 2020 | Documentary | Brock: Over the Top is a feature length docume... | [Documentary] |
9650 rows × 10 columns
Original Word | Word Lemma |
---|---|
Copied | Copy |
Copying | Copy |
Copies | Copy |
Let’s go ahead and lemmatize our data. First, we’ll build a function to recognize parts of speech for each text, and lemma each word based on their part of speech tag:
# More pre-processing
# 1: Noise removal - get rid of non alphanumeric text
# 2: lemmatize text so root stays, but not different tenses or versions of same word (i.e. terrifying -> terrify)
## Much more powerful when part of speech for word is accurately identified
## Build function to accurately identify pos_tag, anothor to lemmatize text and return sentence lemmatized
lemm = WordNetLemmatizer()
def get_pos(tag):
lemma_tag = tag[0].lower()
return {
"n": "n",
"v": "v",
"r": "r",
"j": "a"
}.get(lemma_tag, 'n')
def lemmatize(series):
# RegEx removal of anything that is not alphanumeric
s1 = series.str.replace('[^a-zA-Z\d\s:]', '')
# tokenize
s2 = s1.str.split()
# get parts of speech tags for each word in each overview
s3 = s2.apply(lambda x: pos_tag(x))
# lemmatization
s4 = s3.apply(lambda x:[lemm.lemmatize(word, pos=get_pos(tag)) for word, tag in x])
# convert back to series
lemma_series = s4.apply(lambda x: ' '.join(x))
# lemmatized series
return lemma_series
tmdb_df1['overview_lemma'] = lemmatize(tmdb_df1['overview'])
Lemma text versus non lemma text:
# Inspect
tmdb_df1['overview_lemma'][0]
'With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt race across Australia and Spain to stop a former IMF agent from unleash a genetically engineer biological weapon call Chimera This mission should Hunt choose to accept it plunge him into the center of an international crisis of terrify magnitude'
t1 = tmdb_df1['overview'].str.replace('[^a-zA-Z\d\s:]', '')
t1[0]
'With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt races across Australia and Spain to stop a former IMF agent from unleashing a genetically engineered biological weapon called Chimera This mission should Hunt choose to accept it plunges him into the center of an international crisis of terrifying magnitude'
Overall, five words were lemmatized in MI2’s overview (races, unleashing, engineered, plunges, terrifying).
From here, I’m going to use a method built to get keywords from text based on word occurence and co-occurence to make sure our recommender is only using important keywords to make recommendations:
from rake_nltk import Rake
# Rake - rapid automatic keyword extraction (semantically similar to TF-IDF)
## Gets keywords based on frequency of word occurence and co-occurence with other words in text
key_words = []
RAKE = Rake()
for index, row in tmdb_df1.iterrows():
RAKE.extract_keywords_from_text(row['overview_lemma'])
key_word_scores = RAKE.get_word_degrees()
key_words.append(list(key_word_scores.keys()))
Let’s check what was extracted:
# Check
print(tmdb_df1['overview_lemma'][0])
print(key_words[0])
print('Total words in overview: {:,}'.format(len(tmdb_df1['overview_lemma'][0].split())))
print('Total extracted: {:,}'.format(len(key_words[0])))
With computer genius Luther Stickell at his side and a beautiful thief on his mind agent Ethan Hunt race across Australia and Spain to stop a former IMF agent from unleash a genetically engineer biological weapon call Chimera This mission should Hunt choose to accept it plunge him into the center of an international crisis of terrify magnitude
['genetically', 'engineer', 'biological', 'weapon', 'call', 'chimera', 'mission', 'plunge', 'terrify', 'magnitude', 'former', 'imf', 'agent', 'beautiful', 'thief', 'side', 'stop', 'spain', 'center', 'hunt', 'choose', 'computer', 'genius', 'luther', 'stickell', 'unleash', 'international', 'crisis', 'mind', 'ethan', 'race', 'across', 'australia', 'accept']
Total words in overview: 58
Total extracted: 34
Overall, I think this is a good enough step for this problem. Let’s go ahead and append back to our dataframe.
# add back to dataframe
tmdb_df1['Key_Words'] = pd.Series(key_words)
# Final combined DF with title and final list of words
recommend_df = pd.DataFrame(columns = ['Title', 'Recommender_BOW'])
# Iterate through each row of movie data, combine overview text and genre tags into one text column as bag of words
for i in range(len(tmdb_df1)):
combined_row = [*tmdb_df1['Key_Words'].tolist()[i], *tmdb_df1['genre_list'].tolist()[i]]
# join genre & key words from overview while removing double spaced characters
recommend_df.loc[len(recommend_df)] = [tmdb_df1.loc[i, 'title'], ' '.join(combined_row).lower().replace(' ', ' ')]
An easy way to understand how related a movie is, is to see if similar descriptions are used to describe two movies. However, before we can do that, we need a way to numerically represent the data. An easy way is to use CountVectorizer() from sklearn:
from sklearn.feature_extraction.text import CountVectorizer
# Vector representation of our bag of words using Count_Vectorizer: convert raw text into a sparse matrix to numerically represent words
# (can also use Python's collections library counter class)
## Because we have extracted key words - should just be binary i.e. whether word exists, rather than count of words
## Will have min document frequency of 2, max document frequency of 85%
## Another option would have been to include bigrams, but RAKE re-ordered key words from overview when extracting
#CV = CountVectorizer(min_df = 2, max_df = 0.85, ngram_range = (1,2))
CV = CountVectorizer(min_df = 2, max_df = 0.85)
CV_Matrix = CV.fit_transform(recommend_df['Recommender_BOW'])
# Take a look at the CV after processing words for our recommender
print("Count Vectorizer number of documents: {:,}".format(CV_Matrix.shape[0]))
print("Count Vectorizer number of unique words (vocabulary size): {:,}".format(CV_Matrix.shape[1]))
Count Vectorizer number of documents: 9,650
Count Vectorizer number of unique words (vocabulary size): 13,615
# Dictionary of word and position representing place in sparse matrix
print("Word: {} \nPosition: {:,}".format(list(CV.vocabulary_.keys())[0], list(CV.vocabulary_.values())[0]))
Word: genetically
Position: 5,171
We can now numerically compare the ‘bag of words’ for each movie to each other. To get similarity, we can calculate the cosine similarity. It is a common similarity metric for measuring similarity between categorical data.
from sklearn.metrics.pairwise import cosine_similarity
# Matrix representing cosine_similarity once our vocabulary is transformed into a numeric vector reprsentation
cosine_similarities = cosine_similarity(CV_Matrix, CV_Matrix)
# Inspect
print(cosine_similarities)
# Top 10 similarity scores for first (should be mission impossible)
print(cosine_similarities[0].argsort())
[[1. 0.05484085 0.05976143 ... 0.04364358 0. 0.02020305]
[0.05484085 1. 0.11470787 ... 0. 0. 0.03877834]
[0.05976143 0.11470787 1. ... 0.04564355 0. 0.02112886]
...
[0.04364358 0. 0.04564355 ... 1. 0. 0. ]
[0. 0. 0. ... 0. 1. 0.07559289]
[0.02020305 0.03877834 0.02112886 ... 0. 0.07559289 1. ]]
[4824 3701 3700 ... 7066 2769 0]
# Save data
tmdb_df1.to_pickle('../Data/tmdb_movies.pkl')
with open("../Data/movie_similarities.npy", 'wb') as npy:
np.save(npy, cosine_similarities)
The last part is to build a function to get recommendations for each movie, and review the results:
# Build recommender based on above
def recommend(title):
idx = tmdb_df1[tmdb_df1['title'] == title].index[0]
# top 50 recommendations
similar_movies = pd.Series(cosine_similarities[idx]).sort_values(ascending = False)[1:51]
# add similarity scores for top 50 - instead of iterating, pull back all data and drop null values
recommend = pd.concat([tmdb_df1['imdb_id'], tmdb_df1['title'], tmdb_df1['release_year'], tmdb_df1['overview'], similar_movies], axis=1)
recommend.columns = ['IMDB ID', 'Title', 'Year', 'Overview', 'Similarity Score']
recommend = recommend.dropna()
recommend = recommend.sort_values(by='Similarity Score', ascending=False)
return recommend
recommend('Mission: Impossible II')[:20]
IMDB ID | Title | Year | Overview | Similarity Score | |
---|---|---|---|---|---|
2769 | tt0317919 | Mission: Impossible III | 2006 | Retired from active duty to train new IMF agen... | 0.308607 |
7066 | tt2381249 | Mission: Impossible - Rogue Nation | 2015 | Ethan and team take on their most impossible m... | 0.271052 |
9059 | tt5033998 | Charlie's Angels | 2019 | When a systems engineer blows the whistle on a... | 0.216225 |
8541 | tt4912910 | Mission: Impossible - Fallout | 2018 | When an IMF mission ends badly, the world is f... | 0.212512 |
5194 | tt1509767 | The Three Musketeers | 2011 | The hot-headed young D'Artagnan along with thr... | 0.198898 |
5138 | tt1229238 | Mission: Impossible - Ghost Protocol | 2011 | Ethan Hunt and his team are racing against tim... | 0.197245 |
6204 | tt1517260 | The Host | 2013 | A parasitic alien soul is injected into the bo... | 0.195180 |
1058 | tt0283160 | Extreme Ops | 2002 | While filming an advertisement, some extreme s... | 0.187523 |
4807 | tt1032751 | The Warrior's Way | 2010 | A warrior-assassin is forced to hide in a smal... | 0.187523 |
5227 | tt0993842 | Hanna | 2011 | A 16-year-old girl raised by her father to be ... | 0.184428 |
989 | tt0280486 | Bad Company | 2002 | When a Harvard-educated CIA agent is killed du... | 0.180702 |
7567 | tt0918940 | The Legend of Tarzan | 2016 | Tarzan, having acclimated to life in London, i... | 0.180702 |
6469 | tt6703928 | A Fool's Paradise | 2013 | James Bond is sent on a mission to investigate... | 0.179284 |
8673 | tt4669264 | Beirut | 2018 | In 1980s Beirut, Mason Skiles is a former U.S.... | 0.176547 |
4745 | tt1245526 | RED | 2010 | When his peaceful life is threatened by a high... | 0.176227 |
8054 | tt3501632 | Thor: Ragnarok | 2017 | Thor is imprisoned on the other side of the un... | 0.176227 |
497 | tt0266987 | Spy Game | 2001 | On the day of his retirement, a veteran CIA ag... | 0.172516 |
6221 | tt2312718 | Homefront | 2013 | Phil Broker is a former DEA agent who has gone... | 0.170367 |
8952 | tt9314132 | When They Run | 2018 | A survivor of a zombie apocalypse is on the ru... | 0.169031 |
8647 | tt5177088 | The Girl in the Spider's Web | 2018 | In Stockholm, Sweden, hacker Lisbeth Salander ... | 0.169031 |
Not too shabby! We seem to recommend other Mission Impossible movies, as well as other actions movies. To get better results, we could use different methods to extract keywords or other preprocessing steps, as well as modifying our vector representation of our dictionary.
However, upon further inspection, I noticed that I was using a 9,650 x 9,650 2D Numpy array inside the recommend function. This is an incredibly large file, and would be hard to serve over the web without bogging down memory resources. I’m going to recreate the top 20 suggestions for each movie, save them to a dictionary with an index representation of each movie and their similarity. This will be used to get the recommendations on our web application.
Below, I create a dictionary that takes the index value for each movie as the key, with the indexes of the top 20 most similar movies based on the cosine similarity between their key words and genres for each movie as the dictionary values.
recommender_dict = {}
for i in range (len(tmdf1)):
recommender_dict.update({i: pd.Series(cosine_similarities[i]).sort_values(ascending = False)[1:21].to_dict()})
Let’s go ahead and save for future use:
import pickle
f = open("../Data/cosine_dict.pkl","wb")
pickle.dump(recommender_dict,f)
f.close()
Our new recommender function is going to use the newly created dictionary. It will still search for the index of the title requested by the user, but this time, it will use that as the dictionary key. From there, we’re mapping back to the original dataframe, and if there is a match on the dataframe index and an index value within the dictionary, a score will be returned. All other movies will show NaN. We’ll remove null values, and sort by highest similarity, and return those top 20 results:
def recommend(title):
idx = tmdf1[tmdf1['title'] == title].index[0]
dict_ref = recommender_dict[idx]
df_copy = tmdf1.copy()
df_copy['similarity'] = df_copy.index.map(dict_ref)
df_cleaned = df_copy[df.similarity.notna()]
df_sorted = df_cleaned.sort_values(by='similarity', ascending=False)
return df_sorted
Are the results the same? Let’s take a look at our example movie Mission: Impossible II:
recommend('Mission: Impossible II')
tmdb_id | imdb_id | title | budget | revenue | release_date | release_year | genres | overview | genre_list | overview_lemma | Key_Words | similarity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2769 | 956 | tt0317919 | Mission: Impossible III | 150000000 | 397850012 | 2006-05-03 | 2006 | Action, Adventure, Thriller | Retired from active duty to train new IMF agen... | [Action, Adventure, Thriller] | Retired from active duty to train new IMF agen... | [mission, call, back, retired, train, new, imf... | 0.308607 |
7066 | 177677 | tt2381249 | Mission: Impossible - Rogue Nation | 150000000 | 682330139 | 2015-07-23 | 2015 | Action, Adventure | Ethan and team take on their most impossible m... | [Action, Adventure] | Ethan and team take on their most impossible m... | [imf, syndicate, destroy, team, take, highlysk... | 0.271052 |
9059 | 458897 | tt5033998 | Charlie's Angels | 48000000 | 73279888 | 2019-11-14 | 2019 | Action, Adventure, Comedy | When a systems engineer blows the whistle on a... | [Action, Adventure, Comedy] | When a system engineer blow the whistle on a d... | [system, engineer, blow, line, across, protect... | 0.216225 |
8541 | 353081 | tt4912910 | Mission: Impossible - Fallout | 178000000 | 791017452 | 2018-07-13 | 2018 | Action, Adventure | When an IMF mission ends badly, the world is f... | [Action, Adventure] | When an IMF mission end badly the world be fac... | [time, hunt, loyalty, assassin, world, race, f... | 0.212512 |
5194 | 52451 | tt1509767 | The Three Musketeers | 75000000 | 132274484 | 2011-08-31 | 2011 | Action, Adventure, Thriller | The hot-headed young D'Artagnan along with thr... | [Action, Adventure, Thriller] | The hotheaded young DArtagnan along with three... | [engulf, europe, hotheaded, young, dartagnan, ... | 0.198898 |
5138 | 56292 | tt1229238 | Mission: Impossible - Ghost Protocol | 145000000 | 694713380 | 2011-12-07 | 2011 | Action, Adventure, Thriller | Ethan Hunt and his team are racing against tim... | [Action, Adventure, Thriller] | Ethan Hunt and his team be race against time t... | [bombing, force, disavow, kremlin, stop, ethan... | 0.197245 |
6204 | 72710 | tt1517260 | The Host | 44000000 | 63327201 | 2013-03-22 | 2013 | Action, Adventure, Romance, Science Fiction, T... | A parasitic alien soul is injected into the bo... | [Action, Adventure, Romance, Science Fictio... | A parasitic alien soul be inject into the body... | [melanie, stryder, instead, inject, body, carr... | 0.195180 |
4807 | 46528 | tt1032751 | The Warrior's Way | 42000000 | 11087569 | 2010-12-02 | 2010 | Action, Adventure, Fantasy, Thriller, Western | A warrior-assassin is forced to hide in a smal... | [Action, Adventure, Fantasy, Thriller, Wes... | A warriorassassin be force to hide in a small ... | [mission, hide, refuse, american, badlands, fo... | 0.187523 |
1058 | 15074 | tt0283160 | Extreme Ops | 40000000 | 10959475 | 2002-11-27 | 2002 | Action, Adventure, Drama, Thriller | While filming an advertisement, some extreme s... | [Action, Adventure, Drama, Thriller] | While film an advertisement some extreme sport... | [advertisement, terrorist, film, group, extrem... | 0.187523 |
5227 | 50456 | tt0993842 | Hanna | 30000000 | 63782078 | 2011-04-07 | 2011 | Action, Adventure, Thriller | A 16-year-old girl raised by her father to be ... | [Action, Adventure, Thriller] | A 16yearold girl raise by her father to be the... | [mission, across, europe, tracked, dispatch, h... | 0.184428 |
7567 | 258489 | tt0918940 | The Legend of Tarzan | 180000000 | 356743061 | 2016-06-06 | 2016 | Action, Adventure | Tarzan, having acclimated to life in London, i... | [Action, Adventure] | Tarzan have acclimate to life in London be cal... | [investigate, call, back, acclimate, tarzan, m... | 0.180702 |
989 | 3132 | tt0280486 | Bad Company | 70000000 | 65977295 | 2002-06-07 | 2002 | Action, Adventure, Comedy, Thriller | When a Harvard-educated CIA agent is killed du... | [Action, Adventure, Comedy, Thriller] | When a Harvardeducated CIA agent be kill durin... | [kill, twin, brother, harvardeducated, cia, ag... | 0.180702 |
6469 | 699220 | tt6703928 | A Fool's Paradise | 0 | 0 | 2013-05-04 | 2013 | Action, Thriller | James Bond is sent on a mission to investigate... | [Action, Thriller] | James Bond be send on a mission to investigate... | [mission, send, investigate, michael, kristato... | 0.179284 |
8673 | 399248 | tt4669264 | Beirut | 0 | 7258534 | 2018-04-11 | 2018 | Action, Drama, Thriller | In 1980s Beirut, Mason Skiles is a former U.S.... | [Action, Drama, Thriller] | In 1980s Beirut Mason Skiles be a former US di... | [mission, former, us, diplomat, call, back, ci... | 0.176547 |
4745 | 39514 | tt1245526 | RED | 58000000 | 71664962 | 2010-10-13 | 2010 | Action, Adventure, Comedy, Crime, Thriller | When his peaceful life is threatened by a high... | [Action, Adventure, Comedy, Crime, Thriller] | When his peaceful life be threaten by a highte... | [peaceful, life, uncover, old, team, assailant... | 0.176227 |
8054 | 284053 | tt3501632 | Thor: Ragnarok | 180000000 | 853977126 | 2017-10-25 | 2017 | Action, Adventure, Comedy, Fantasy | Thor is imprisoned on the other side of the un... | [Action, Adventure, Comedy, Fantasy] | Thor be imprison on the other side of the univ... | [asgardian, civilization, end, destruction, th... | 0.176227 |
497 | 1535 | tt0266987 | Spy Game | 115000000 | 143049560 | 2001-11-18 | 2001 | Action, Crime, Thriller | On the day of his retirement, a veteran CIA ag... | [Action, Crime, Thriller] | On the day of his retirement a veteran CIA age... | [former, protg, die, arrest, international, sc... | 0.172516 |
6221 | 204082 | tt2312718 | Homefront | 22000000 | 43058898 | 2013-11-12 | 2013 | Action, Thriller | Phil Broker is a former DEA agent who has gone... | [Action, Thriller] | Phil Broker be a former DEA agent who have go ... | [school, recently, widow, event, cost, 9yearso... | 0.170367 |
3316 | 1620 | tt0465494 | Hitman | 24000000 | 99965753 | 2007-11-21 | 2007 | Action, Crime, Drama, Thriller | The best-selling videogame, Hitman, roars to l... | [Action, Crime, Drama, Thriller] | The bestselling videogame Hitman roar to life ... | [prey, international, intrigue, barrel, blaze,... | 0.169031 |
8647 | 446807 | tt5177088 | The Girl in the Spider's Web | 43000000 | 17894345 | 2018-10-25 | 2018 | Action, Crime, Thriller | In Stockholm, Sweden, hacker Lisbeth Salander ... | [Action, Crime, Thriller] | In Stockholm Sweden hacker Lisbeth Salander be... | [exist, computer, engineer, stockholm, sweden,... | 0.169031 |
Since front end/HTML isn’t my forte, I’ll just disclose that I used Bootstrap to design the front end. IMO, it is the quickest way to get up and running with having a functioning, responsive site without worrying about customizing heavily or advanced frameworks.
I’ll just showcase the app.py file to build the Flask app. In it, I do a few things:
from flask import Flask, render_template, request, Response
import pandas as pd
from helper import get_movies, choose, overview, imdb, recommend
app = Flask(__name__)
@app.route('/', methods=['GET', 'POST'])
def index():
movielist = get_movies()
if request.method == 'GET':
return render_template('index.html', recommendations = pd.DataFrame(columns=['title', 'ID', 'year', 'overview']), movielist=movielist, movie="", overview = "", imdb="")
if request.method == 'POST':
if request.form.get('submit') == 'search':
movie = request.form.get('movie')
if movie in movielist:
recs = recommend(movie)[:20]
return render_template('index.html', recommendations = recs, movielist=movielist, movie=movie, overview = overview(movie), imdb=imdb(movie))
else:
return render_template('error.html')
elif request.form.get('submit') == 'random':
movie = choose()
recs = recommend(movie)[:20]
return render_template('index.html', recommendations = recs, movielist=movielist, movie=movie, overview = overview(movie), imdb=imdb(movie))
if __name__ == '__main__':
app.run(debug=True)