-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bbox subsetting for ATL06 data #30
Changes from 1 commit
343dd88
19cd633
2ac4d73
1e17314
b2c0f46
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
""" | ||
points(g::ICESat2_Granule{:ATL06}, tracks=icesat2_tracks, step=1) | ||
points(g::ICESat2_Granule{:ATL06}, tracks=icesat2_tracks, step=1, bbox = (min_x = -Inf, min_y = -Inf, max_x = Inf, max_y = Inf)) | ||
|
||
Retrieve the points for a given ICESat-2 ATL06 (Land Ice) granule as a list of namedtuples, one for each beam. | ||
The names of the tuples are based on the following fields: | ||
|
@@ -25,37 +25,71 @@ function points( | |
granule::ICESat2_Granule{:ATL06}; | ||
tracks = icesat2_tracks, | ||
step = 1, | ||
bbox = (min_x = -Inf, min_y = -Inf, max_x = Inf, max_y = Inf), | ||
) | ||
dfs = Vector{NamedTuple}() | ||
HDF5.h5open(granule.url, "r") do file | ||
t_offset = read(file, "ancillary_data/atlas_sdp_gps_epoch")[1]::Float64 + gps_offset | ||
for (i, track) in enumerate(tracks) | ||
if in(track, keys(file)) && in("land_ice_segments", keys(file[track])) | ||
track_nt = points(granule, file, track, t_offset, step) | ||
track_nt.height[track_nt.height.==fill_value] .= NaN | ||
track_nt = points(granule, file, track, t_offset, step, bbox) | ||
if !isempty(track_nt.height) | ||
track_nt.height[track_nt.height.==fill_value] .= NaN | ||
end | ||
push!(dfs, track_nt) | ||
end | ||
end | ||
end | ||
return dfs | ||
end | ||
|
||
|
||
function points( | ||
::ICESat2_Granule{:ATL06}, | ||
file::HDF5.H5DataStore, | ||
track::AbstractString, | ||
t_offset::Float64, | ||
step = 1, | ||
bbox = (min_x = -Inf, min_y = -Inf, max_x = Inf, max_y = Inf), | ||
) | ||
z = file["$track/land_ice_segments/h_li"][1:step:end]::Vector{Float32} | ||
sigma_geo_h = file["$track/land_ice_segments/sigma_geo_h"][1:step:end]::Vector{Float32} | ||
h_li_sigma = file["$track/land_ice_segments/h_li_sigma"][1:step:end]::Vector{Float32} | ||
x = file["$track/land_ice_segments/longitude"][1:step:end]::Vector{Float64} | ||
y = file["$track/land_ice_segments/latitude"][1:step:end]::Vector{Float64} | ||
t = file["$track/land_ice_segments/delta_time"][1:step:end]::Vector{Float64} | ||
q = file["$track/land_ice_segments/atl06_quality_summary"][1:step:end]::Vector{Int8} | ||
dem = file["$track/land_ice_segments/dem/dem_h"][1:step:end]::Vector{Float32} | ||
x = file["$track/land_ice_segments/longitude"][:]::Vector{Float64} | ||
y = file["$track/land_ice_segments/latitude"][:]::Vector{Float64} | ||
|
||
# find index of points inside of bbox | ||
ind = (x .> bbox.min_x) .& (y .> bbox.min_y) .& (x .< bbox.max_x) .& (y .< bbox.max_y) | ||
start = findfirst(ind) | ||
stop = findlast(ind) | ||
|
||
if isnothing(start) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's better to just return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. by returning empty arrays it lets one keep track of what files and tracks were unsuccessful. There are issues where some tracks have data and others do not from the same granule. If you return nothing then It becomes hard to keep things strait as to which data the granules belong. The returned data needs to nicely convert to a uniform data frame. I like to have an empty On that note I'm using a modification of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I recognize the need for empty granules (I have several zero size granules on disk, just to prevent repeat downloads), but I wonder about the correct approach. From what I gather, you iterate over each track and store the results, for which you want an empty table, rather than Btw, I think most of this code can be written as julia> [1,2][1:1:0]
Int64[] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly it. I'm building data cubes of all ICESat-2 tracks that fall within 2 degree by 2 degree tiles (geotiles):
I then save the df as an Arrow file.. I can then go back at a later date and see which granules have already been subset and only download and subset and append new granules to the returning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ohh.. I hadn't thought of that. Do you think it would be less efficient as it would then reside within the h5 call, possibly mapping the variable even though nothing is being returned? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I hadn't thought of that, it is indeed less efficient. julia> @time h[1:0] # h = file["$track/land_segments/terrain/h_te_mean"]
0.000072 seconds (14 allocations: 608 bytes)
Float32[]
julia> @time Vector{Float32}()
0.000002 seconds (1 allocation: 64 bytes)
Float32[] |
||
@warn "no data found within bbox: $(file.filename)" | ||
|
||
spot_number = attrs(file["$track"])["atlas_spot_number"]::String | ||
atlas_beam_type = attrs(file["$track"])["atlas_beam_type"]::String | ||
|
||
nt = (; | ||
longitude = Vector{Float64}[], | ||
latitude = Vector{Float64}[], | ||
height = Vector{Float32}[], | ||
height_error = Vector{Float64}[], | ||
datetime =Vector{Dates.DateTime}[], | ||
quality = BitVector[], | ||
track = Fill(track, 0), | ||
strong_beam = Fill(atlas_beam_type == "strong", 0), | ||
detector_id = Fill(parse(Int8, spot_number), 0), | ||
height_reference = Vector{Float32}[], | ||
) | ||
return nt | ||
end | ||
|
||
# only include x and y data within bbox | ||
x = x[start:step:stop] | ||
y = y[start:step:stop] | ||
|
||
z = file["$track/land_ice_segments/h_li"][start:step:stop]::Vector{Float32} | ||
sigma_geo_h = file["$track/land_ice_segments/sigma_geo_h"][start:step:stop]::Vector{Float32} | ||
h_li_sigma = file["$track/land_ice_segments/h_li_sigma"][start:step:stop]::Vector{Float32} | ||
t = file["$track/land_ice_segments/delta_time"][start:step:stop]::Vector{Float64} | ||
q = file["$track/land_ice_segments/atl06_quality_summary"][start:step:stop]::Vector{Int8} | ||
dem = file["$track/land_ice_segments/dem/dem_h"][start:step:stop]::Vector{Float32} | ||
spot_number = attrs(file["$track"])["atlas_spot_number"]::String | ||
atlas_beam_type = attrs(file["$track"])["atlas_beam_type"]::String | ||
times = unix2datetime.(t .+ t_offset) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we will now always filter on bbox, even when nothing is passed. Might be better to pass nothing by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya that would make more sense. I don't think the search costs us much but if we used
bbox = nothing
we could just include an if statement to check if a search is needed. I'll add that.