Skip to content

This is a NLP project, where we try to generate tags for a given movie, after extracting details about the movvie using the IMDb ID of the movie.

Notifications You must be signed in to change notification settings

nottpande/MovieMetaTagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MovieMetaTagger (Natural Language Processing Project)

Overview: This is a NLP project, where we try to generate tags for a given movie,, after extracting details about the movvie using the IMDb ID of the movie.
The project uses BeautifulSoup for web--scraping and uses LangChain to manage the interaction with the OpenAI API for generating and cleaning tags.

Approach for the Project

For this project, we made three major classes:


  • DataExtractor Class
  • This class is used to extract the data about the movie, using the IMDb ID of the movie, during the initialization of an object of this class, there are two things required:
    1. The TMDb API key 2. The OMDb API key Once the object is initialized for the Class, we will then use the extract_data function of this class, to get the details of the movie, using the IMDb ID.
    The function will return a dictionary containing the following details of the movie:
    'IMDb ID', 'Title', 'Plot Synopsis (IMDb)', 'Movie Summary (TMDb)', 'About Movie (Wikipedia)', 'Plot Summary (OMDb)', 'Director', 'Cast', 'Genres', 'Keywords'

    The class extracts data from multiple sources:
    - IMDb: Plot synopsis and basic information.
    - TMDb: Movie summary, genres, cast, and keywords.
    - OMDb: Plot summary.
    - Wikipedia: Detailed plot summary using Wikidata.
  • TagGenerator Class
  • This class is used to generate the tags for the movie, based on the details that were extracted by the previous class. There are two things that, this class essentially does:
    1. It generates tags based on the movie details extracted. 2. It then cleans the generated tags, i.e., it removes repeated tags, removes irrelevant tags etc.

    During the initialization of an object of this class, there are three things required:

    1. The API key for OpenAI
    2. The model that we are going to use (in our case, I have set it to GPT-4 by default, this can be changed)
    3. The temperature value (this is used to control the randomness of the generation) (in our case, I have set it to 0 by default, as we want our model to be deterministic, and produce less random outputs, but this can too be changed)

    Once the object is initialized for the Class, we will then use the generate_tags function of this class, to generate tags based on the movie details.

    The function will return a list containing the tags for the movie.
    The class uses LangChain to manage the interaction with the OpenAI API for generating and cleaning tags.

    • tags_generator_template: This is the prompt template, that is used for generating the tags.
    • tags_cleaner_template: This is the prompt template, that is used for cleaning the generated tags.

  • TagScorer Class
  • This class is used to score the tags for the movie, based on based on their relevance to the movie details.
    During the initialization of an object of this class, there is only one thing required:
    1. The API key for OpenAI
    This time, I did not add the model name and the temperature, because I assumed it will be the same as for the previous class.
    Incase the parameters (model name and temperature) is changed in the TagGenerator Class, then make sure to change it here too

    Once the object is initialized for the Class, we will then use the score_tags function of this class, to score the tags, based on the movie details

    The function will return a list containing the scored tags for the movie.
    The class also uses LangChain to manage the interaction with the OpenAI API for scoring the tags.

    • tags_scoring_template: This is the prompt template, that is used for scoring the tags.
    Keep in mind, that the scores of the tags will in general be pretty high, as the previous class has made sure only the 'relevant' tags of the movies will remain in the tags. Hence, as the tags will seem pretty relevant based on the movie details, therefore the scores will automatically be pretty high.

    The final outputs achieved after running the files, have been uploaded in the Outputs Folder.

    About

    This is a NLP project, where we try to generate tags for a given movie, after extracting details about the movvie using the IMDb ID of the movie.

    Topics

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages