Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented directional variograms for unstructured #87

Merged
merged 15 commits into from
Nov 6, 2020

Conversation

TobiasGlaubach
Copy link
Contributor

@MuellerSeb @LSchueler I finished implementing the angle dependent variogram calculation and implemented a few test cases. Calculation is quasi pure C as soon as the calculation arrives at the calculation heavy loops. Defaults for parameters were chosen in order to preserve original behavior and the implementation should add no computational overhead if no angles are requested.

Can one of you do a code review and look at the test cases?

Tests passed on my system without a problem.

(base) C:\Users\tobia\source\repos\GSTools>python tests/test_variogram_unstructured.py TestVariogramUnstructured.test_angles
.
----------------------------------------------------------------------
Ran 1 test in 0.003s

OK

Test values were generated like this:

import numpy as np
import gstools as gs

x = np.random.RandomState(19970221).rand(20) * 10.0 - 5
y = np.zeros_like(x)
model = gs.Exponential(dim=2, var=2, len_scale=8)
srf = gs.SRF(model, mean=0, seed=19970221)
field = srf((x, y))

bins = np.arange(10)
bin_center, gamma = gs.vario_estimate_unstructured((x, ), field, bins)

idx = np.argsort(x)

(gamma, x[idx], field[idx])

The implemented test-cases correspond to these "fields":
image

with

  • left being the basis for case 1
  • middle being the basis for case 2 and 3
  • right being the basis for case 4 and 5.

@MuellerSeb MuellerSeb requested a review from LSchueler May 3, 2020 10:04
@MuellerSeb MuellerSeb added the enhancement New feature or request label May 3, 2020
@MuellerSeb
Copy link
Member

Hey @TobiasGlaubach
thanks for your PR! We will take a look at this ASAP.

@MuellerSeb MuellerSeb added this to the 1.3 milestone May 5, 2020
Copy link
Member

@LSchueler LSchueler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this PR! - I like the very straightforward way of enhancing the variogram estimation and the fact that the interface doesn't break backwards compatibility.

I've added some change requests into the code. Apart from those, here are a few more comments:

  • How did you come up with the default value of the angle_tol = 5°? I imagine that's a good value for numerical experiments, but in case of actual measurements, it might be far too small. I have no clue what a good default value would be, but I agree, that we should have one :-)
  • I think some tests for the 3d case should definitely be added.
  • I'd like to see some test cases, where the field is more "spread out", but still with very few data points. Then you can test the angle tolerances better and actually exclude some, but not all points from the variogram estimation.
  • And the most important thing: Please add yourself to the AUTHORS.md in the main directory. Of course only if you want to.

gstools/variogram/variogram.py Outdated Show resolved Hide resolved
gstools/variogram/variogram.py Outdated Show resolved Hide resolved
tests/test_variogram_unstructured.py Outdated Show resolved Hide resolved
tests/test_variogram_unstructured.py Outdated Show resolved Hide resolved
@TobiasGlaubach
Copy link
Contributor Author

TobiasGlaubach commented May 9, 2020

Thank you so much for this PR! - I like the very straightforward way of enhancing the variogram estimation and the fact that the interface doesn't break backwards compatibility.

I've added some change requests into the code. Apart from those, here are a few more comments:

  • How did you come up with the default value of the angle_tol = 5°? I imagine that's a good value for numerical experiments, but in case of actual measurements, it might be far too small. I have no clue what a good default value would be, but I agree, that we should have one :-)
  • I think some tests for the 3d case should definitely be added.
  • I'd like to see some test cases, where the field is more "spread out", but still with very few data points. Then you can test the angle tolerances better and actually exclude some, but not all points from the variogram estimation.
  • And the most important thing: Please add yourself to the AUTHORS.md in the main directory. Of course only if you want to.

Regarding default angle:
To tell the truth, I just came up with some number. I just looked at the GSLIB based Tutorials by Michael Pyrcz and saw, that he uses 25° and 45°. I would propose to use 25° as a default value.

regarding additional testcases:
Will do. Do you have any confirmed cases where we know what the results should be? Otherwise I ca see if I find some, or come up with some.

regarding AUTHORS.md:
Will do.

@LSchueler
Copy link
Member

default angles:
Without having any prior experience, I have a feeling, that 25° is a sensible value.

additional testcases:

  • You could generate an anisotropic field and see, if the estimated correlation lengths are in the expected range for the main directions.
  • You could combine the 3 fields you have shown in the graphs in this PR and count how many field points have been taken into account when setting the angles <45° or >=45°

I think when these test cases have been added, we are good to go.

@TobiasGlaubach
Copy link
Contributor Author

Hi,
Update:
I was quite busy lately so I hadn't had time to work on this much. However I found some time to start working on this again since yesterday. Currently I am implementing test cases and I am also using the prototype for a real problem of variogram fitting (as a more realistic testing scenario). Also I found a paper (https://doi.org/10.1016/j.cageo.2018.07.011), where directional variograms are calculated from open source available data using a different tool (SGeMs), which can give us some valid data for cross validation.

@MuellerSeb @LSchueler Most tools give a bin tolerance for overlapping lags/bins for estimating unstructured variograms. GSTools does not seem to have this, what do you think? Should we implement this?

@TobiasGlaubach
Copy link
Contributor Author

TobiasGlaubach commented Jun 3, 2020

@MuellerSeb : I checked the covariance model angles and if I checked correctly found them to be deviating from the "normal" right hand side rule (standard rotation matrices) which also align spherical coordinate definitions and standard intrinsic Tait-Brian-Angles.

Usually, if the Z+ axis points upwards, and a 'standard' rotation is used, the rotation is counter clockwise. This leads to points on the XY plane rotating from X axis towards Y axis. Same goes for the Tait-Brian-Angles. See here:

image Taitbrianangles.svg from wikipedia

However the definition of the Covariance model rotates the Variogram in clockwise and the major axis aligns with Y. I generated a model like this:

model = gs.Gaussian(dim=3, var=2.0, len_scale=[10, 2], angles=[np.deg2rad(azim), np.deg2rad(0)])

And plotted is in 3D as point cloud for azim in [0,10,50,80] and found that:

  1. rotation is clockwise
  2. Major axis aligns with Y

image

For the elevarion angle I found that it is also clockwise rotated
Is that intended? Or is this a bug?

@MuellerSeb
Copy link
Member

MuellerSeb commented Jun 3, 2020

@TobiasGlaubach :
These angles always give me a headache!
I just checked it with (np.pi/4 at different positions in angles):

import numpy as np
import gstools as gs
model = gs.Gaussian(dim=3, var=2.0, len_scale=[8, 4, 2], angles=[0, np.pi/4, 0])
model.plot("vario_spatial")

I found, that the rotation in the xz plane (around y-axis) is clockwise (both others are counterclock wise, as far as I can tell)... There is a bug here (actually this should be correct):

return np.array(((+cos, 0.0, sin), (+0.0, 1.0, +0.0), (-sin, 0.0, cos)))

@MuellerSeb
Copy link
Member

MuellerSeb commented Jun 3, 2020

But following this:
https://en.wikipedia.org/wiki/Rotation_matrix#General_rotations

it should be correct the way it is!

@MuellerSeb
Copy link
Member

@TobiasGlaubach
Copy link
Contributor Author

This example also shows, that the rotation is counter-clockwise:
https://geostat-framework.readthedocs.io/projects/gstools/en/stable/examples/01_random_field/02_fancier.html#sphx-glr-examples-01-random-field-02-fancier-py

This example is 2D not 3D. Can there be a difference?

@TobiasGlaubach
Copy link
Contributor Author

TobiasGlaubach commented Jun 3, 2020

But following this:
https://en.wikipedia.org/wiki/Rotation_matrix#General_rotations

it should be correct the way it is!

Let me check the rotations from Wikipedia and the implementation in your code. I will get back to you in a hour or so. I will probably be able t look at this tomorrow or so.

@MuellerSeb
Copy link
Member

This example is 2D not 3D. Can there be a difference?

Then have a look at this:
https://geostat-framework.readthedocs.io/projects/gstools/en/stable/examples/02_cov_model/02_aniso_rotation.html#rotation-angles

There you can check, that rotation in the xy and yz plane is counter-clockwise and in the xz plane it is clockwise.

@TobiasGlaubach
Copy link
Contributor Author

Hmm... it looks like you're not the only one having problems with the angles :-/

Sorry to be such a pain, but I get different results compared to the model.plot() function when I plot the variogram data as scatter plot in 3D.

e.G:

import gstools as gs
elev = 10
model = gs.Gaussian(dim=3, var=2.0, len_scale=[10, 2], angles=[np.deg2rad(0), np.deg2rad(elev)])

results in the following plot (after changing to y slice).

image

When I plot the same data in 3d using scatter plots I get this plot:
image

For me the 'major' axis is y aligned and the rotation in elevation (major-Z plane after first rotation = x') is CCW.

I generated the scatter plot with this script:
https://colab.research.google.com/drive/1SYXxmYzaGAK_j9raBStBG_DibnKXy78V?usp=sharing

@MuellerSeb : can you check if I am doing something wrong when plotting before we proceed any further?

@MuellerSeb
Copy link
Member

MuellerSeb commented Jun 5, 2020

Hey there,
I guess the problem in your code is in the line:

X,Y,Z = np.meshgrid(xgrid,ygrid,zgrid)

That should be:

X,Y,Z = np.meshgrid(xgrid,ygrid,zgrid, indexing="ij")

Like done here:

x_u, y_u, z_u = np.meshgrid(x, y, z, indexing=indexing)

This indexing is another source of headache!
Because of that, you are flipping axis and the rotation seems to be in the wrong direction.

@TobiasGlaubach
Copy link
Contributor Author

OK, thanks a lot and sorry for the inconvenience! I will check my angles vs the covariance model angles then. I am fairly certain, the implementations are equal.

Another question: Is GSTools for release on PyPy compiled with OpenMP support? If not, I have a implementation of estimate.pyx which implements caching for distance and angle data. In case of single thread, this can significantly speed up the calculation (especially for large bin vectors).

I could pack it into this PR as well.

@MuellerSeb
Copy link
Member

MuellerSeb commented Jun 5, 2020

OpenMP support can be enabled by the user:
https://geostat-framework.readthedocs.io/projects/gstools/en/stable/#pip

Most gstools routines in cython use prange which is using openmp (if present, with sequential execution as fallback).

On Conda, openmp is enabled by default. On pip we didn't want to force the user to provide openmp. What do you mean exactly by "release on pypy"?

@TobiasGlaubach
Copy link
Contributor Author

Sorry I ment PyPi --> pip install gstools

My version enables caching with an additional parameter otherwise it just uses the previous implementation with prange. Would you want that feature as well or keep it as simple as possible for better maintainability?

@MuellerSeb
Copy link
Member

My version enables caching with an additional parameter otherwise it just uses the previous implementation with prange. Would you want that feature as well or keep it as simple as possible for better maintainability?

I would need to have a look at it, to decide. But it sounds interesting! ;-)
Only condition is: it shoud also work without openmp and the user shouldn't need to care about parallelization.

@MuellerSeb MuellerSeb changed the base branch from master to develop June 7, 2020 18:54
@TobiasGlaubach
Copy link
Contributor Author

TobiasGlaubach commented Jun 8, 2020

I had to make a new branch and commit it first.

Here is the full branch:
https://github.com/TobiasGlaubach/GSTools/tree/dev_caching

Here is the caching implementation:
https://github.com/TobiasGlaubach/GSTools/blob/289f62fa64e70b8900fc202650ebc1ffe8764c7a/gstools/variogram/estimator.pyx#L243-L270

And here is an example script I wrote for all new functionalities:
https://github.com/TobiasGlaubach/GSTools/blob/dev_caching/examples/03_variogram/02_variogram_estimation_options.py

The caching speeds up the caculation considerably for long bins vectors. On my machine (without OpenMP) I got the following results:

len(bins)  2: time per execution: no_caching 0.123 | with_caching 0.258
len(bins)  4: time per execution: no_caching 0.621 | with_caching 0.404
len(bins) 10: time per execution: no_caching 1.054 | with_caching 0.432
len(bins) 20: time per execution: no_caching 3.283 | with_caching 0.740

Let me know, what you think.

I will implement the 3D test cases for the original PR today or tomorrow.

@TobiasGlaubach
Copy link
Contributor Author

I implemented some basic 3d testcases, as well as a fix for the estimator function. Waiting for review now.

Copy link
Member

@LSchueler LSchueler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Thanks again for all the effort you are putting into this!
I really like your example script! That's gonna make a nice addition to the examples.

First things first:
Following test cases in test_variogram_unstructured.py are not working:

  • test_uncorrelated_3d
  • test_sampling_3d
  • test_angles_3d_estim

You should apply black --line-length 79 to all the files you modified, otherwise, MuellerSeb will get grumpy ;-)

Caching

Here are the results on my machine with OpenMP parallelisation:

len(bins) 2: time per execution: no_caching 0.224 | with_caching 0.174
len(bins) 4: time per execution: no_caching 0.231 | with_caching 0.210
len(bins) 10: time per execution: no_caching 0.377 | with_caching 0.299
len(bins) 20: time per execution: no_caching 0.733 | with_caching 0.418

I've built the Cython code with following command in the main directory of gstools, in case you want to try it:
python setup.py --openmp build_ext --inplace

Unfortunately, the OpenMP support of Cython is still very limited. That's also the reason why the distance functions look so strange.

We have already been discussing some options of how to parallelise more complex loops. When we come to that point, it would be great to use your idea of caching the distances. But I think for now the slight speed up (which is impressive nevertheless) compared to the parallelised version does not justify the increase in maintenance.
I'll definitely keep you updated regarding the caching!

gstools/variogram/variogram.py Outdated Show resolved Hide resolved
@MuellerSeb
Copy link
Member

Is there still some progress here? :-)

@TobiasGlaubach
Copy link
Contributor Author

Hi,
A few weeks ago I did some testing with 3d cases. But it stagnant due to reliable expected results to compare against. Since then I hadn't had much time. From next week on I will have a bit more time and hope to get this ready.

Also if anyone has some reliable test cases for 3d data, It would be greatly appreciated.

@LSchueler
Copy link
Member

Hi @TobiasGlaubach,
do you think you will have some time to finish things up on this PR? - I think your almost there ;-)
Otherwise, we could think if either @MuellerSeb or I could pull your contributions together and merge it.

@TobiasGlaubach
Copy link
Contributor Author

Hi @LSchueler Thanks for checking in on this. I am lacking some verified test cases for 3D data, but I am fairly certain the algorithms are right and the angles correspond to the angle definitions generally used in GSTools. whats your opinion on this? Shall we add a warning somewhere?

@MuellerSeb
Copy link
Member

@TobiasGlaubach why not just generate a field with gstools, so we know the properties of the given field?

@MuellerSeb
Copy link
Member

@TobiasGlaubach going through your code, I noticed two problems:

  1. ATM your code only take a pair of points into account, when the first aims to the second with correct angles (example 2D):

    cdef double phi = atan2(dy,dx)

    But we would also need the other way around (that is skipped in the variogram loop):

    cdef double phi = atan2(-dy,-dx)

    Same for 3D with -dx, -dy and -dz.

  2. In 3D i wonder about your formula for theta

    cdef double theta = atan2(dz,sqrt(dx + dy))

    Shouldn't that be (acos present in libc.math):

    cdef double theta = acos(dz / sqrt(dx ** 2 + dy ** 2 + dz ** 2))

    as described here: https://en.wikipedia.org/wiki/Spherical_coordinate_system#Cartesian_coordinates
    (squares missing and atan2 doesn't map correctly into [0, pi) for elevation)

I can take over if you want. I just realized, we don't have to care about trait-bryan angles, since we only check the variogram in one given direction. Retrieving trait brian angles from sample data should be another step.

@MuellerSeb
Copy link
Member

I did some updates:

  • better pre-processing for given angles: all in the interval [0, pi) except azimuth in [0,2pi)
  • check both directions for point-pairs in empirical variogram (only one combination will occur)
  • use great-circle distance in 3D to check the given angle-tolerance (solves your problem at the poles)

I got some concerns/ideas:

  • It would be nice, if one also provide the direction as a vector (then angle tolerance comparison would also be easier)
  • it would be nice to also provide multiple directions to check (for example in 2D checking 4 directions [N-E-S-W])
  • in order to be ready for nD, vectors would be better to use as direction (rather then n-D spherical coordinates like now)
  • this would allow for example to check main axes from Tait-Bryan angles

But I am happy with this first implementation. Later on, we can implement a switch to use either angles for direction or vectors.

@MuellerSeb MuellerSeb changed the base branch from develop to direct-vario November 6, 2020 15:31
@MuellerSeb
Copy link
Member

I'll merge this into a new branch here to take over.
I got some further ideas regarding bandwith and vectors for direction.
Will keep your angles formulation.

Thanks for everything @TobiasGlaubach !

@MuellerSeb MuellerSeb merged commit 95b1431 into GeoStat-Framework:direct-vario Nov 6, 2020
@TobiasGlaubach
Copy link
Contributor Author

@MuellerSeb Thanks for taking over, I was on vacation for a few weeks and atm I am quite buisy with other projects at the moment. Glad to have been able to contribute to your project :-)

keep up the good work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants