DEPRECATED - no longer actively maintained

Elasticsearch Updater

Scheduled service to load data from a URL into Elasticsearch

The elasticsearch-updater is a dockerised application that will update an Elasticsearch instance on a regular basis using JSON data from a URL.

The file specified in the JSON_FILE_URL environment variable will be used as the source of the update if it is available, is valid JSON, and if the total count has not dropped by a significant amount as described in CHANGE_THRESHOLD below.

The destination instance is specified in the ES_HOST and ES_PORT variables.

As bespoke code is required to create mappings and perform transforms, Elasticsearch configurations must be provided within the elasticsearch-updater repository. ES configurations are stored on the config/esConfig object, and can be selected with the ES_INDEX parameter. Mappings and transforms for each available index are held in a folder with the same name as the configuration, i.e. profiles/mapping.json.

An ES configuration must provide a type, an idKey, a mapping definition and an optional transform function. The type is the index type used in the mapping, and the idKey identifies the unique id in the data. For the profiles configuration these are gps and choicesId respectively.

The file download and Elasticsearch update will run on startup, then on a daily schedule while the container continues to run.

The time of day defaults to 7am, and can be changed via the UPDATE_SCHEDULE environment variable. The schedule is run using node-schedule which uses a Cron-like syntax. Further details on node-schedule available here Note: the container time is GMT and does not take account of daylight saving, you may need to subtract an hour from the time if it is currently BST.

When updating the Elasticsearch instance the new data will be inserted into a date stamped index and validated against the existing index. Once validation passes the existing index will be deleted and an alias set up to the new index, i.e. profiles_20170629140702 will be aliased to profiles upon successful validation.

Validation will fail if the count of records drops significantly. The allowable drop in record count is controlled by the CHANGE_THRESHOLD environment variable. By default this is set to 0.99 which prevents the data being loaded if the new count is less than 99% of the previous count.

Environment variables

Environment variables are expected to be managed by the environment in which the application is being run. This is best practice as described by twelve-factor.

Variable	Description	Default	Required
`NODE_ENV`	Node environment	development
`LOG_LEVEL`	log level	Depends on `NODE_ENV`
`JSON_FILE_URL`	Publicly available URL of JSON data		yes
`ES_HOST`	Host name of Elasticsearch server		yes
`ES_INDEX`	Elasticsearch configuration to read		yes
`ES_PORT`	Port of Elasticsearch server	27017
`ES_REPLICAS`	Number of replicas configured for the index	1
`ES_TIMEOUT_SECONDS`	Maximum time in seconds to wait for response from Elasticsearch	180
`ES_SHARDS`	Number of shards for the index	5
`CHANGE_THRESHOLD`	Factor the data count can change by before erroring	0.99
`UPDATE_SCHEDULE`	Time of day to run the update	0 7 * * * (7 am)

Docker Compose Structure for Deployment and Development

The docker-compose.yml used for development and deployment via Rancher have a similar structure. A stack is run with three elasticsearch-updater images having different configurations pointing at the same Elasticsearch instance.

The convention for environment variables used in the Rancher configuration is to add a suffix to each of the variables in the table above. These are then mapped to the appropriate suffix-less variable in the container, i.e. for the pharmacies container the JSON_FILE_URL_PHARMACIES is mapped to JSON_FILE_URL, ES_HOST_PHARMACIES is mapped to ES_HOST and so on.

Architecture Decision Records

This repo uses Architecture Decision Records to record architectural decisions for this project. They are stored in doc/adr.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
config		config
doc/adr		doc/adr
input		input
lib		lib
rancher-config		rancher-config
scripts		scripts
test		test
.dockerignore		.dockerignore
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.gitmodules		.gitmodules
.istanbul.yml		.istanbul.yml
.travis.yml		.travis.yml
AUTHORS		AUTHORS
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.js		app.js
docker-compose-tests.yml		docker-compose-tests.yml
docker-compose.yml		docker-compose.yml
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEPRECATED - no longer actively maintained

Elasticsearch Updater

Environment variables

Docker Compose Structure for Deployment and Development

Architecture Decision Records

About

Releases 15

Packages

Contributors 6

Languages

License

nhsuk/elasticsearch-updater

Folders and files

Latest commit

History

Repository files navigation

DEPRECATED - no longer actively maintained

Elasticsearch Updater

Environment variables

Docker Compose Structure for Deployment and Development

Architecture Decision Records

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 15

Packages 0

Contributors 6

Languages

Packages