Skip to content

fearless-pioneer/simple-spark-flow

Repository files navigation

Python Template

License: Apache 2.0 Python 3.10 pdm-managed Code style: black Imports: isort Type Checking: mypy Linting: ruff

Preparation

Install Python 3.10 on Pyenv or Anaconda and execute the following commands:

$ make init             # set up packages (need only once)

Data Setup

First, create a data folder in the current repository path.

Next, access the eCommerce Events History in Cosmetics Shop and download the entire dataset. This will result in a file named archive.zip.

Lastly, unzip the zip file and move the five csv files from the archive folder to the ./data directory.

Spark Setup

TBD