You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In current version, HDFSInputFormat reads the first directory(path) only. For example, if the path is /data, it will list the directory of /data and read the items(must be file) like /data/a and /data/b.
In order to be more flexible, it could support reading an organized path recursively(all files are in the last directories). For example, if the data is stored as a time-based path like /data/year/month/dates/FILES, it prefers scanning all items in path '/data' rather than giving a concrete path '/data/year/month/dates`. Of course, we need to set the maximum recursive layers to avoid the tremendous reading.
The text was updated successfully, but these errors were encountered:
In current version, HDFSInputFormat reads the first directory(path) only. For example, if the path is
/data
, it will list the directory of/data
and read the items(must be file) like/data/a
and/data/b
.In order to be more flexible, it could support reading an organized path recursively(all files are in the last directories). For example, if the data is stored as a time-based path like
/data/year/month/dates/FILES
, it prefers scanning all items in path '/data' rather than giving a concrete path '/data/year/month/dates`. Of course, we need to set the maximum recursive layers to avoid the tremendous reading.The text was updated successfully, but these errors were encountered: