MovieMetaTagger (Natural Language Processing Project)

Overview: This is a NLP project, where we try to generate tags for a given movie,, after extracting details about the movvie using the IMDb ID of the movie.
The project uses BeautifulSoup for web--scraping and uses LangChain to manage the interaction with the OpenAI API for generating and cleaning tags.

Approach for the Project

For this project, we made three major classes:

DataExtractor Class

This class is used to extract the data about the movie, using the IMDb ID of the movie, during the initialization of an object of this class, there are two things required:
1. The TMDb API key 2. The OMDb API key Once the object is initialized for the Class, we will then use the extract_data function of this class, to get the details of the movie, using the IMDb ID.
The function will return a dictionary containing the following details of the movie:
'IMDb ID', 'Title', 'Plot Synopsis (IMDb)', 'Movie Summary (TMDb)', 'About Movie (Wikipedia)', 'Plot Summary (OMDb)', 'Director', 'Cast', 'Genres', 'Keywords'

The class extracts data from multiple sources:
- IMDb: Plot synopsis and basic information.
- TMDb: Movie summary, genres, cast, and keywords.
- OMDb: Plot summary.
- Wikipedia: Detailed plot summary using Wikidata.

TagGenerator Class

This class is used to generate the tags for the movie, based on the details that were extracted by the previous class. There are two things that, this class essentially does:
1. It generates tags based on the movie details extracted. 2. It then cleans the generated tags, i.e., it removes repeated tags, removes irrelevant tags etc.

During the initialization of an object of this class, there are three things required:

The API key for OpenAI
The model that we are going to use (in our case, I have set it to GPT-4 by default, this can be changed)
The temperature value (this is used to control the randomness of the generation) (in our case, I have set it to 0 by default, as we want our model to be deterministic, and produce less random outputs, but this can too be changed)

Once the object is initialized for the Class, we will then use the generate_tags function of this class, to generate tags based on the movie details.

The function will return a list containing the tags for the movie.
The class uses LangChain to manage the interaction with the OpenAI API for generating and cleaning tags.

tags_generator_template: This is the prompt template, that is used for generating the tags.
tags_cleaner_template: This is the prompt template, that is used for cleaning the generated tags.

TagScorer Class

This class is used to score the tags for the movie, based on based on their relevance to the movie details.
During the initialization of an object of this class, there is only one thing required:
1. The API key for OpenAI
This time, I did not add the model name and the temperature, because I assumed it will be the same as for the previous class.
Incase the parameters (model name and temperature) is changed in the TagGenerator Class, then make sure to change it here too

Once the object is initialized for the Class, we will then use the score_tags function of this class, to score the tags, based on the movie details

The function will return a list containing the scored tags for the movie.
The class also uses LangChain to manage the interaction with the OpenAI API for scoring the tags.

tags_scoring_template: This is the prompt template, that is used for scoring the tags.

Keep in mind, that the scores of the tags will in general be pretty high, as the previous class has made sure only the 'relevant' tags of the movies will remain in the tags. Hence, as the tags will seem pretty relevant based on the movie details, therefore the scores will automatically be pretty high.

The final outputs achieved after running the files, have been uploaded in the Outputs Folder.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Class Files		Class Files
Outputs		Outputs
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MovieMetaTagger (Natural Language Processing Project)

Approach for the Project

About

Releases

Packages

Languages

nottpande/MovieMetaTagger

Folders and files

Latest commit

History

Repository files navigation

MovieMetaTagger (Natural Language Processing Project)

Approach for the Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages