Skip to content

zavierferodova/py-cspdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python-CSPDF

Python Check Similarity PDF from active directory and store it to csv file. Project inspired by diff-pdf

Installation

pip install -r requirements.py

Before Use !!

  1. Install all required depedencies.
  2. Copy cspdf.py into directory that contains pdf file to be compared.
  3. Run cspdf.py script.
  4. Note: This script just work on pdf files only, if you have word document please convert it into pdf first.

Usage

  1. Check similarity all pdf files on current active directory
    python cspdf.py -a -o comparison.csv
  2. Check similarity one pdf file then compare with all pdf files on current active directory
    python cspdf.py -t a.pdf -o comparison.csv
  3. Check similarity including image comparison (slow processing)
    # Just add -i or --image argument
    python cspdf.py -i -t a.pdf -o comparison.csv
  4. Get help
    python cspdf.py -h

Similarity Check Methods

  1. Text similarity with Sequence Matcher
  2. Image similarity with Structural Similarity Index (SSIM)

Libraries

  1. PDFMiner
  2. PyMuPDF
  3. OpenCV Python
  4. Scikit Image
  5. TQDM Progress Bar

Credits

Made by Zavier, enjoyy ✨

Releases

No releases published

Packages

No packages published

Languages