Data Validation #54

Rob1Ham · 2020-02-04T16:27:07Z

Currently - the /train route pushes all validly solved sudoku puzzles that came from uploaded images into the S3 folder pre_validated_data . sagemaker_train is the folder in S3 the Sagemaker endpoint pulls from to train the model.

The reason for this split, is that right now, data is submitted as-is, without any validation that the predicted digits the model generates are accurate for the newly uploaded digits.

Ideally, all of the newly generated values that are stored in the pre_validated_data folder are group by digit classification, then validated by humans to ensure only accurate predictions are supplemented for the Sagemaker model training.

This could be done either in coordination with a future Labs Front End team to create an admin panel for digit validation, or as a simple Flask HTML page that would load the .csv of new predictions, sorted by predicted digit, then updating predicted class for any misclassified digits. This will enable to train a model as part of an ongoing basis as users upload more Sudoku Puzzles, with a train job automated in sagemaker to run on either a time basis, or after a new threshold of puzzles are shared.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Validation #54

Data Validation #54

Rob1Ham commented Feb 4, 2020

Data Validation #54

Data Validation #54

Comments

Rob1Ham commented Feb 4, 2020