Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
-
Updated
May 9, 2024 - Python
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
🤖 A PyTorch library of curated Transformer models and their composable components
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…
CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)
Unattended Lightweight Text Classifiers with LLM Embeddings
PyTorch implementation of Sentiment Analysis of the long texts written in Serbian language (which is underused language) using pretrained Multilingual RoBERTa based model (XLM-R) on the small dataset.
An implementation of drophead regularization for pytorch transformers
Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity
Resources and tools for the Tutorial - "Hate speech detection, mitigation and beyond" presented at ICWSM 2021
This is a Pytorch (+ Huggingface transformers) implementation of a "simple" text classifier defined using BERT-based models. In this lab we will see how it is simple to use BERT for a sentence classification task, obtaining state-of-the-art results in few lines of python code.
Improving Low-Resource Neural Machine Translation of Related Languages by Transfer Learning
Successfully developed a language detection transformer model that can accurately recognize the language in which any given text is written.
Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.
notebooks to finetune `bert-small-amharic`, `bert-mini-amharic`, and `xlm-roberta-base` models using an Amharic text classification dataset and the transformers library
Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
1st place solution to AI IJC Customer Service task
NLP Workshop -ML India
Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th, 4th and 3rd in Tamil, Malayalam and Kannada language of this task finally!🥳
Add a description, image, and links to the xlm-roberta topic page so that developers can more easily learn about it.
To associate your repository with the xlm-roberta topic, visit your repo's landing page and select "manage topics."