Improve earthdata access #29

alex-s-gardner · 2022-10-20T16:39:02Z

[1] remove n5eil01u.ecs.nsidc.org credentials form .netrc
[2] get urls to earthdatacloud for ICESat2 data

[1] remove n5eil01u.ecs.nsidc.org credentials form .netrc [2] get urls to earthdatacloud for ICESat2 data

alex-s-gardner · 2022-10-21T00:39:18Z

Don't merge this ... I will add for other sensors as well using new unified metadata .json once I figure out how to parse the damn thing

I've also included the s3 paths for future transition to the cloud

evetion

Looking very interesting. My main question would be how we can still access the original (non-earthdata cloud) urls. Will there be a s3=true option?

src/search.jl

…into dev1

alex-s-gardner · 2022-10-21T22:53:00Z

Looking very interesting. My main question would be how we can still access the original (non-earthdata cloud) urls. Will there be a s3=true option?

I believe the url to the DAAC hosted copy is buried in the .umm_json file. Maybe we could have 3 fields in the Granule types:

https_cloud
s3
https_daac

evetion · 2022-10-24T08:15:46Z

What is your use-case here precisely, in terms of API calls and the expected result(s)?

Before, we only stored 1 url (either https, or a local filepath). With this code we would store multiple ones, which raises the question of how you would choose the correct url when you call download. And at the moment download only works for https urls.

alex-s-gardner · 2022-10-24T19:46:58Z

What is your use-case here precisely, in terms of API calls and the expected result(s)?

Before, we only stored 1 url (either https, or a local filepath). With this code we would store multiple ones, which raises the question of how you would choose the correct url when you call download. And at the moment download only works for https urls.

I would like to eventually be able to run my code on EC2 and on servers. When on EC2 I would like to give preference to S3. As for the two https paths, downloading from the DAAC is typicaly 2x faster than downloading from the cloud https but... the DAAC servers have been flaky lately (was just down for 24 hrs) so I've been using the cloud https.

evetion · 2022-10-25T15:21:02Z

Ok, then I envisage the API as follows:

search takes a provider, either the DAAC (default), or the EarthDataCloud, as I don't see the urls to DAAC and EarthDataCloud in one response.
download keeps working normally, but can take a s3=true parameter, which requires a granule created from the correct provider, so it has an s3_url. search also optionally takes an s3=true, that automatically switches the provider?

download(granule, s3=true) (or s3_download?) then also calls earthdata_cloud_s3 or even earthdata_s3_env! (already implemented) if it's not expired yet (not yet implemented), so any S3 download actually works (whether run from shell, or from within Julia with AWSS3).

alex-s-gardner · 2022-10-25T16:53:33Z

@evetion that makes sense to me. DACC and EarthDataCloud urls and s3 paths are all included in the granules.umm_json.

but @betolink mentioned the "you'll have to know the provider if you want NSIDC's cloud collections then you use NSIDC_CPRD if you want the DAAC hosted collections then NSIDC_ECS but it's not intuitive for a new user at the collection level you can use cloud_hosted and the short name, at the granule level you need the short name and the provider to differentiate cloud vs onprem"

betolink · 2022-10-25T17:11:08Z

That's correct @alex-s-gardner, to complicate things a little further, cloud hosted collections come with 2 set of links, the direct S3 links and HTTPS links. HTTPS links are throttled and will be on average 2x slower than getting the same data from their DAACs. My advice is, if we are not running our code in us-west-2 it's better to use the DAAC urls (NSIDC_ECS).

alex-s-gardner · 2022-10-25T17:17:14Z

@betolink "if" the DAAC servers are up and running :-)

evetion · 2022-10-26T07:14:35Z

Some small remarks:

The UMM JSON doesn't contain Polygon bounds for a granule, whereas the normal json does. This would be blocking for #28.

Mimetype of files (which I would filter on instead of "GET DATA") is application/x-hdfeos for DAAC, but application/x-hdf5 for EarthDataCloud.

src/search.jl

evetion · 2023-01-03T21:24:41Z

I've made some changes here, so this is fully backwards compatible:

Reverted the s3 fields on granules
Added the non-umm json back

However, some big changes:

Dropped JSON dep
Find is renamed to search
Version is now an int
Search now takes only mission + product arguments, the rest are kwargs
Kwargs can specify the provider (daac or cloud) and s3 (yes/no)

The internal earthdata_search is fully keywords only now and allows finer control over requesting number of items/pages. It also includes a umm option, as we can parse that, but it's not enabled by default, as it is missing polygons in the response.

I will still need to merge with the master branch (which has polygon support) and investigate why earthdata doesn't return all pages in some cases before we can merge this. I also hope to have an s3 download example working by then.

edit: Do you have a preference for s3 downloads? Call aws s3 externally from Julia, or go with AWSS3.jl?

alex-s-gardner · 2023-01-03T21:54:26Z

edit: Do you have a preference for s3 downloads? Call aws s3 externally from Julia, or go with AWSS3.jl?

Keeping this all within Julia would be nice, so AWSS3.jl

evetion · 2023-01-04T21:47:53Z

Keeping this all within Julia would be nice, so AWSS3.jl

julia> vietnam = (min_x = 102., min_y = 8.0, max_x = 103.0, max_y = 9.0);
julia> granules = SpaceLiDAR.search(:ICESat2, :ATL08; bbox=vietnam, version=5, s3=true);
julia> g = granules[1]
ICESat2_Granule{:ATL08}("ATL08_20181101194503_05220107_005_01.h5", "s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/005/2018/11/01/ATL08_20181101194503_05220107_005_01.h5", NamedTuple(), (type = :ATL08, date = Dates.DateTime("2018-11-01T19:45:03"), rgt = 522, cycle = 1, segment = 7, version = 5, revision = 1, ascending = false, descending = true))

julia> download!(g)  # takes some time...
ICESat2_Granule{:ATL08}("ATL08_20181101194503_05220107_005_01.h5", "/Users/evetion/code/SpaceLiDAR.jl/ATL08_20181101194503_05220107_005_01.h5", NamedTuple(), (type = :ATL08, date = Dates.DateTime("2018-11-01T19:45:03"), rgt = 522, cycle = 1, segment = 7, version = 5, revision = 1, ascending = false, descending = true))

edit: This now works without getting credentials manually.

evetion · 2023-01-07T08:31:27Z

I'm gonna go ahead and merge this, I will make documentation updates before a release. In the meantime, could you test the new functionality?

alex-s-gardner added 2 commits October 20, 2022 09:38

Improve earthdata access

41afc38

[1] remove n5eil01u.ecs.nsidc.org credentials form .netrc [2] get urls to earthdatacloud for ICESat2 data

Fix for only ATL06 to point to earthdatacloud

19abe1e

alex-s-gardner and others added 3 commits October 21, 2022 12:38

Add support of earthdatacloud for all missions

61845e8

I've also included the s3 paths for future transition to the cloud

added back bbox to granule types

0333a51

Add AWS credentials getter.

64ac396

evetion reviewed Oct 21, 2022

View reviewed changes

src/search.jl Outdated Show resolved Hide resolved

src/search.jl Outdated Show resolved Hide resolved

alex-s-gardner added 3 commits October 21, 2022 14:44

remove missings when populating empty bbox

a6bf1da

Merge branch 'dev1' of https://github.com/alex-s-gardner/SpaceLiDAR.jl …

69edb60

…into dev1

Convert update for updated ICESat2_Granule

7ee36db

evetion reviewed Oct 26, 2022

View reviewed changes

src/search.jl Outdated Show resolved Hide resolved

Rename find to search. Allow for s3 urls and different DAACs.

6bb80bf

Make S3 download work.

be5866f

evetion and others added 3 commits January 6, 2023 13:47

Automatically renew S3 creds.

ef9aff0

Merged master.

d1ad4eb

Merge branch 'master' into dev1

ff51224

evetion approved these changes Jan 6, 2023

View reviewed changes

evetion merged commit 4695caf into evetion:master Jan 7, 2023

alex-s-gardner deleted the dev1 branch January 31, 2023 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve earthdata access #29

Improve earthdata access #29

alex-s-gardner commented Oct 20, 2022

alex-s-gardner commented Oct 21, 2022

evetion left a comment

alex-s-gardner commented Oct 21, 2022

evetion commented Oct 24, 2022

alex-s-gardner commented Oct 24, 2022

evetion commented Oct 25, 2022 •

edited

Loading

alex-s-gardner commented Oct 25, 2022 •

edited

Loading

betolink commented Oct 25, 2022

alex-s-gardner commented Oct 25, 2022

evetion commented Oct 26, 2022

evetion commented Jan 3, 2023 •

edited

Loading

alex-s-gardner commented Jan 3, 2023

evetion commented Jan 4, 2023 •

edited

Loading

evetion commented Jan 7, 2023

Improve earthdata access #29

Improve earthdata access #29

Conversation

alex-s-gardner commented Oct 20, 2022

alex-s-gardner commented Oct 21, 2022

evetion left a comment

Choose a reason for hiding this comment

alex-s-gardner commented Oct 21, 2022

evetion commented Oct 24, 2022

alex-s-gardner commented Oct 24, 2022

evetion commented Oct 25, 2022 • edited Loading

alex-s-gardner commented Oct 25, 2022 • edited Loading

betolink commented Oct 25, 2022

alex-s-gardner commented Oct 25, 2022

evetion commented Oct 26, 2022

evetion commented Jan 3, 2023 • edited Loading

alex-s-gardner commented Jan 3, 2023

evetion commented Jan 4, 2023 • edited Loading

evetion commented Jan 7, 2023

evetion commented Oct 25, 2022 •

edited

Loading

alex-s-gardner commented Oct 25, 2022 •

edited

Loading

evetion commented Jan 3, 2023 •

edited

Loading

evetion commented Jan 4, 2023 •

edited

Loading