Query Expansion with Elasticsearch & NLTK

This project is developed in Python + NLTK + Elasticsearch for query expansion over a data. the data crawled from Snopes Fact checks and the designed crawler and its implementation is accessible from this repository.

Snopes Fact checks contains some rumors and questionable claims of the day. After gathering data via the crawler, it's time to index the data into a search engine to use it for retrieving the information. We use Elasticsearch which has a big community and also uses the power of Apache Lucene indexing & search tool.

We use query expansion, a technique for improving the quality of search results in a search engine and get help from wordnet database to find semantic relations between words. For simplifying the usage from wordnet and also tokenizing the queries and other preprocessings, we use NLTK python module.

The general idea behind query expansion is that for every token in the query, the sysnonyms are conjuncted with OR and the results are conjuncted with AND operator.

Environment

Python: 3.7.0
Elasticsearch: 7.16.0
NLTK: 3.6.7

Installation Guide

Clone the repository:

git clone https://github.com/mohsenMahmoodzadeh/query-expansion-with-elasticsearch.git

Create a virtual environment (to avoid conflicts):

virtualenv -p python3.7 fcquery

# this may vary depending on your shell
. fcquery/bin/activate

Install the dependencies:

pip install -r requirements.txt

The dataset is accessible from here. Put it on the root directory of your project.

Usage Guide

First, download the elasticsearch configuration from here and run it according to the installation guide of the website.

After setting up elasticsearch service, run the following command to index the data into elasticsearch engine:

python create_index.py

After the completion of the indexing phase, run the script below to query on the data, expand the queries and save the results into result/ directory.

python search_index.py

Future Works

Preprocess the data to be prepared for analyzing. This can contain some tasks such as encoding, setting lowercase, converting to numeric, etc.
Analyze the data with applying DSL queries and creating dashboards.

Contributing

Fixes and improvements are more than welcome, so raise an issue or send a PR!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
result		result
.gitignore		.gitignore
Readme.md		Readme.md
create_index.py		create_index.py
requirements.txt		requirements.txt
search_index.py		search_index.py
simple_queries.txt		simple_queries.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Query Expansion with Elasticsearch & NLTK

Environment

Installation Guide

Usage Guide

Future Works

Contributing

About

Releases

Packages

Languages

mohsenMahmoodzadeh/query-expansion-with-elasticsearch

Folders and files

Latest commit

History

Repository files navigation

Query Expansion with Elasticsearch & NLTK

Environment

Installation Guide

Usage Guide

Future Works

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages