solution to the competition https://www.kaggle.com/c/google-quest-challenge. 240th place with no post-processing. With post-processing, the output of the model could be in the silver medal zone.
Computers are good at answering questions with single, verifiable answers. But humans are often still better at answering questions about opinions, recommendations, or personal experiences. The task is predictive modeling for natural language understanding about intents of questions, and how they answered, i.e. helpful, interesting, or well-written. Evaluated by Spearman's correlation coefficient.
- BERT
- Pooling and then concatenation of last few hidden states from BERT
- Multi-sample dropout
- 5-fold group k-fold split
- Augmentation with multiple truncations of the sequence (head, tail, mix)
- TTA
- AdamW with linear warmup and linear annealing learning rate schedule
- BCE Loss for 30 multiple soft labels
nvidia tesla p100
nvidia apex
pytorch 1.2.0
transformers 2.3.0