Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Validation #54

Open
Rob1Ham opened this issue Feb 4, 2020 · 0 comments
Open

Data Validation #54

Rob1Ham opened this issue Feb 4, 2020 · 0 comments

Comments

@Rob1Ham
Copy link
Contributor

Rob1Ham commented Feb 4, 2020

Currently - the /train route pushes all validly solved sudoku puzzles that came from uploaded images into the S3 folder pre_validated_data . sagemaker_train is the folder in S3 the Sagemaker endpoint pulls from to train the model.

The reason for this split, is that right now, data is submitted as-is, without any validation that the predicted digits the model generates are accurate for the newly uploaded digits.

Ideally, all of the newly generated values that are stored in the pre_validated_data folder are group by digit classification, then validated by humans to ensure only accurate predictions are supplemented for the Sagemaker model training.

This could be done either in coordination with a future Labs Front End team to create an admin panel for digit validation, or as a simple Flask HTML page that would load the .csv of new predictions, sorted by predicted digit, then updating predicted class for any misclassified digits. This will enable to train a model as part of an ongoing basis as users upload more Sudoku Puzzles, with a train job automated in sagemaker to run on either a time basis, or after a new threshold of puzzles are shared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant