How to best cache a task based on input file content? #15371
Unanswered
NiklasKappel
asked this question in
Q&A
Replies: 1 comment
-
Hey @NiklasKappel! Writing a cache key function is the suggested way to cache based on the contents of a file or directory. Be sure to handle cases where a file or directory doesn't exist in your cache key function. Exceptions in a cache key function will prevent your tasks from executing. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Assume I have a task that takes as inputs an integer and a path to a file:
The way I understand prefect's cache key policies, by default, a cache key for the
read_line
task will be computed based on the values of the input arguments. Thus, if I call the task a second time with the same arguments, but the content of the file atfile_path
has changed in the meantime, the task will mistakenly not run again.Does prefect support generating cache keys based on the contents of files and directories referred by
pathlib.Path
objects? If not, I guess I can write a cache key function that replaces the values ofpathlib.Path
arguments with a hash of their contents, and then calls the original cache key function. Where can I find the latter? Are there some pitfalls I am unaware of?Beta Was this translation helpful? Give feedback.
All reactions