Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random sampling functionality (like R's terra::spatSample()) #771

Open
edwardlavender opened this issue Sep 30, 2024 · 5 comments
Open

Comments

@edwardlavender
Copy link

edwardlavender commented Sep 30, 2024

Thanks for a great package! This is a feature request for a function like R’s terra::spatSample() that enables random sampling of locations on a raster, optionally excluding NaN/missing values. Below is some quick-and-dirty example code which demonstrates the functionality, but is inefficient when NaNs should be ignored. Maybe this is already quick & easy to do in Julia? (I am coming from R.)

# Convert Raster to DataFrame, optionally dropping NaNs 
function asDataFrame(; x::Rasters.Raster, na_rm::Bool = true)
    # Get DataFrame
    xyz = DataFrame(x)
    rename!(xyz, ["x", "y", "z"])
    # Filter non NA cells 
    if na_rm
        xyz = xyz[.!isnan.(xyz.z), :]
    end
    return xyz
end 

# terra::spatSample() replacement, via asDataFrame()
function spatSample(; x::Rasters.Raster, size::Int64 = 1, na_rm::Bool = true)
    xyz = asDataFrame(x = x, na_rm = na_rm)
    xyz[sample(1:nrow(xyz), size, replace = true), :]
end 
@rafaqz
Copy link
Owner

rafaqz commented Sep 30, 2024

No need for anything that complicated ;)

StatsBase gives us the sample method that will work on any AbstractArray. But we need to remove missing values from it first. To do that on pretty much any kind of iterator in Julia we just skipmissing(obj). Then collect turns it into an array again as sampling over a lazy iterator like skipmissing returns cant work efficiently, and sample needs some AbstractArray input anyway. But this is it, here giving you 100 samples:

using StatsBase
sample(collect(skipmissing(raster)), 100)

To keep the missing values, its just

using StatsBase
sample(raster, 100)

But you might want to use replace_missing(rast) first to make sure the missingval is something sensible like missing that the rest of Julia will understand (this will be the default behaviour very soon).

The key message here is unlike in R, a Raster is just an AbstractArray. So most packages that accept AbstractArray will just work with a Rasters.Raster, there is no need for intentional integration. (except in edge cases like larger than memory lazy disk arrays, etc).

@edwardlavender
Copy link
Author

Thanks a lot for the quick response & detailed explanation! That is really helpful. That is a very nice solution for sampling values--but I was really hoping to sample both coordinates and values for non-NA cells, hence the use of DataFrame() above. Is there a simple way to do this that avoids converting the entire raster into a DataFrame? Thanks again!

@rafaqz
Copy link
Owner

rafaqz commented Sep 30, 2024

It's probably no problem to convert, it's just DataFrame(raster), just use replace_missing first for now.

DataFrame(replace_missing(raster)).

Then probably sample will work too. You may need to use dropmissings on the DataFrame, look at the ?dropmissings help for syntax.

@edwardlavender
Copy link
Author

Fantastic - thank you!

@tiemvanderdeure
Copy link
Contributor

tiemvanderdeure commented Oct 1, 2024

I've been thinking about adding some sample function that mirrors syntax and outputs generated by extract - which includes a keywords to return coordinates and to skip missing values. So good to know that more people want this and let's keep this issue open :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants