Skip to content
Giuseppe Attardi edited this page Aug 19, 2015 · 3 revisions

Word embeddings provide a dense vector representation of syntactic/semantic aspects of a word, which can be learned from unannotated plain text.

For example you can use the plain text extracted from a Wikipedia dump using WikiExtractor.

DeepNL provides three methods for creating word embeddings:

  1. Collobert&Weston [1], through the script dl-words.py
  2. Hellinger PCA [2], through the script dl-words-pca.py
  3. Sentiment Specific Word Embeddings, through the script dl-sentiwords.py

Usage

See the corresponding pages:

References

  1. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12, 2493-2537.
  2. Lebret, Rémi, and Ronan Collobert. (2014). Word Embeddings through Hellinger PCA. EACL 2014: 482.
  3. Attardi, Giuseppe. (2015). Representation of Word Sentiment, Idioms and Senses. Proc. of 6th Italian Information Retrieval Workshop (IIR 2015). Cagliari.
Clone this wiki locally