Skip to content

project for nis8021. RoBERTa on Quora Question Pairs Dataset.

Notifications You must be signed in to change notification settings

Bowen-n/quora-question-pairs

Repository files navigation

Quora-Question-Pairs Project

This is the final project of nis8021. Sentence similarity prediction on Quora Question Pairs Dataset. The dataset is splitted into train/valid/test set in ./data

Result

Model Test Accuracy Test F1 Score
RoBERTa_base + CE loss 0.910633 0.904185
RoBERTa_pretrained + CE loss 0.913156 0.907138
RoBERTa_pretrained + Focal loss 0.913453 0.907755
Stacking 0.917955 0.912335

Focal loss

Focal loss uses two parameters $\alpha$ and $\gamma$. $\alpha$ is determined by 0/1 sample distribution. $\gamma$ is estimated by easy/hard sample distribution.

python cal_focal_params.py

Run

  1. Pretrain on QQP
bash run_pretrain.sh
  1. Finetune with RoBERTa_base
bash run_finetune_roberta-base.sh
  1. Finetune with RoBERTa_pre
bash run_finetune_ce.sh
  1. Finetune with RoBERTa_pre using focal loss
bash run_finetune_focal.sh
  1. Inference to construct dataset for stacking
bash inference.sh
  1. Train a Stacking model
python stacking.py

Finetune Curve

About

project for nis8021. RoBERTa on Quora Question Pairs Dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published