Word Embeddings

Jump to bottom Edit New page

Giuseppe Attardi edited this page Aug 19, 2015 · 3 revisions

Word embeddings provide a dense vector representation of syntactic/semantic aspects of a word, which can be learned from unannotated plain text.

For example you can use the plain text extracted from a Wikipedia dump using WikiExtractor.

DeepNL provides three methods for creating word embeddings:

Collobert&Weston [1], through the script dl-words.py
Hellinger PCA [2], through the script dl-words-pca.py
Sentiment Specific Word Embeddings, through the script dl-sentiwords.py

Usage

See the corresponding pages:

References

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12, 2493-2537.
Lebret, Rémi, and Ronan Collobert. (2014). Word Embeddings through Hellinger PCA. EACL 2014: 482.
Attardi, Giuseppe. (2015). Representation of Word Sentiment, Idioms and Senses. Proc. of 6th Italian Information Retrieval Workshop (IIR 2015). Cagliari.