-
Notifications
You must be signed in to change notification settings - Fork 116
Word Embeddings
Giuseppe Attardi edited this page Aug 19, 2015
·
3 revisions
Word embeddings provide a dense vector representation of syntactic/semantic aspects of a word, which can be learned from unannotated plain text.
For example you can use the plain text extracted from a Wikipedia dump using WikiExtractor.
DeepNL
provides three methods for creating word embeddings:
- Collobert&Weston [1], through the script
dl-words.py
- Hellinger PCA [2], through the script
dl-words-pca.py
- Sentiment Specific Word Embeddings, through the script
dl-sentiwords.py
See the corresponding pages:
- Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12, 2493-2537.
- Lebret, Rémi, and Ronan Collobert. (2014). Word Embeddings through Hellinger PCA. EACL 2014: 482.
- Attardi, Giuseppe. (2015). Representation of Word Sentiment, Idioms and Senses. Proc. of 6th Italian Information Retrieval Workshop (IIR 2015). Cagliari.