-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve earthdata access #29
Conversation
[1] remove n5eil01u.ecs.nsidc.org credentials form .netrc [2] get urls to earthdatacloud for ICESat2 data
Don't merge this ... I will add for other sensors as well using new unified metadata .json once I figure out how to parse the damn thing |
I've also included the s3 paths for future transition to the cloud
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very interesting. My main question would be how we can still access the original (non-earthdata cloud) urls. Will there be a s3=true
option?
I believe the url to the DAAC hosted copy is buried in the .umm_json file. Maybe we could have 3 fields in the Granule types:
|
What is your use-case here precisely, in terms of API calls and the expected result(s)? Before, we only stored 1 url (either https, or a local filepath). With this code we would store multiple ones, which raises the question of how you would choose the correct url when you call |
I would like to eventually be able to run my code on EC2 and on servers. When on EC2 I would like to give preference to S3. As for the two https paths, downloading from the DAAC is typicaly 2x faster than downloading from the cloud https but... the DAAC servers have been flaky lately (was just down for 24 hrs) so I've been using the cloud https. |
Ok, then I envisage the API as follows:
|
@evetion that makes sense to me. DACC and EarthDataCloud urls and s3 paths are all included in the granules.umm_json. but @betolink mentioned the "you'll have to know the provider if you want NSIDC's cloud collections then you use NSIDC_CPRD if you want the DAAC hosted collections then NSIDC_ECS but it's not intuitive for a new user at the collection level you can use cloud_hosted and the short name, at the granule level you need the short name and the provider to differentiate cloud vs onprem" |
That's correct @alex-s-gardner, to complicate things a little further, cloud hosted collections come with 2 set of links, the direct S3 links and HTTPS links. HTTPS links are throttled and will be on average 2x slower than getting the same data from their DAACs. My advice is, if we are not running our code in |
@betolink "if" the DAAC servers are up and running :-) |
Some small remarks: The UMM JSON doesn't contain Polygon bounds for a granule, whereas the normal json does. This would be blocking for #28. Mimetype of files (which I would filter on instead of "GET DATA") is |
I've made some changes here, so this is fully backwards compatible:
However, some big changes:
The internal I will still need to merge with the master branch (which has polygon support) and investigate why earthdata doesn't return all pages in some cases before we can merge this. I also hope to have an s3 download example working by then. edit: Do you have a preference for s3 downloads? Call |
Keeping this all within Julia would be nice, so AWSS3.jl |
julia> vietnam = (min_x = 102., min_y = 8.0, max_x = 103.0, max_y = 9.0);
julia> granules = SpaceLiDAR.search(:ICESat2, :ATL08; bbox=vietnam, version=5, s3=true);
julia> g = granules[1]
ICESat2_Granule{:ATL08}("ATL08_20181101194503_05220107_005_01.h5", "s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/005/2018/11/01/ATL08_20181101194503_05220107_005_01.h5", NamedTuple(), (type = :ATL08, date = Dates.DateTime("2018-11-01T19:45:03"), rgt = 522, cycle = 1, segment = 7, version = 5, revision = 1, ascending = false, descending = true))
julia> download!(g) # takes some time...
ICESat2_Granule{:ATL08}("ATL08_20181101194503_05220107_005_01.h5", "/Users/evetion/code/SpaceLiDAR.jl/ATL08_20181101194503_05220107_005_01.h5", NamedTuple(), (type = :ATL08, date = Dates.DateTime("2018-11-01T19:45:03"), rgt = 522, cycle = 1, segment = 7, version = 5, revision = 1, ascending = false, descending = true)) edit: This now works without getting credentials manually. |
I'm gonna go ahead and merge this, I will make documentation updates before a release. In the meantime, could you test the new functionality? |
[1] remove n5eil01u.ecs.nsidc.org credentials form .netrc
[2] get urls to earthdatacloud for ICESat2 data