Text-Summarizer

Introduction

Text-Summarizer:

A powerful and efficient text summarization tool designed to condense large bodies of text into concise summaries, preserving key information and insights.

Dataset Used

Samsum Dataset

The SAMSum dataset contains about 16k messenger-like conversations with summaries. You can get the dataset from Hugging Face , https://huggingface.co/datasets/samsum?row=10

Model Used

Google Pegasus Model

The project utilizes the Google Pegasus model, a state-of-the-art transformer-based model for text generation tasks, including summarization. Developed by Google Research, Pegasus stands for Pre-training with Extracted Gap-sentences for Abstractive SUmmarization of Texts. It is designed to generate abstractive summaries by learning to predict masked tokens in a text, making it highly effective for tasks requiring understanding and summarizing long texts.

Workflow of Project

Update config.yaml
Update params.yaml
Update entity
Update the configuration manager in src config
Update the components
Update the Pipeline
Update the main.py
Update the app.py

Pipeline

Data Ingestion

The data ingestion phase involves downloading the dataset from hugging face and unzipping it into a designated directory.

Data Validation

After ingestion, the dataset undergoes validation to ensure all required files are present and correctly formatted. This process checks for the presence of 'train', 'test', and 'validation' directories and logs the status.

Data Transformation

In the data transformation phase, the dataset is further processed to prepare it for model training. This includes tokenization using the Google Pegasus tokenizer (google/pegasus-cnn_dailymail).

Model Training

The model training phase involves training the Google Pegasus model on the transformed dataset.

Model Evaluation

Finally, the trained model is evaluated on the same dataset used for training.

Running the Project

To clone and run the project, follow these steps:

Clone the repository:

git clone https://github.com/Kaustbh/Text-Summarizer.git

Navigate to the project directory:

cd Text-Summarizer

Create a Python virtual environment (optional but recommended):

python -m venv venv

Install the required packages:

pip install -r requirements.txt

Run the Flask app:

flask --app app run --debug

Contributing

Contributions to this project are welcome. If you encounter any issues or have suggestions for improvements, please submit a pull request or open an issue on the GitHub repository.

License

This project is licensed under the MIT License.

Feel free to customize this README file to include specific details about your project, such as how to extend the functionality, examples of usage, or any additional acknowledgments.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
config		config
research		research
src/textSummarizer		src/textSummarizer
.gitignore		.gitignore
DockerFile		DockerFile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
samsumdata.zip		samsumdata.zip
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Summarizer

Table of Contents

Introduction

Text-Summarizer:

Dataset Used

Samsum Dataset

Model Used

Google Pegasus Model

Workflow of Project

Pipeline

Data Ingestion

Data Validation

Data Transformation

Model Training

Model Evaluation

Running the Project

Contributing

License

About

Releases

Packages

Languages

License

Kaustbh/Text-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Text-Summarizer

Table of Contents

Introduction

Text-Summarizer:

Dataset Used

Samsum Dataset

Model Used

Google Pegasus Model

Workflow of Project

Pipeline

Data Ingestion

Data Validation

Data Transformation

Model Training

Model Evaluation

Running the Project

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages