-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
287 add local morans i #305
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, even if I put quite many comments, looks solid overall! Most of my comments relate to cosmetic stuff and are straightforward changes
w = libpysal.weights.KNN.from_dataframe(gdf, k=k) | ||
else: | ||
raise ValueError("Invalid weight_type. Use 'queen' or 'knn'.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, we should row-standardize the weights to minimize bias. I am not sure if it is default/automatically done, but you could add this row to be sure:
weights.transform = 'R' |
elif weight_type == "knn": | ||
w = libpysal.weights.KNN.from_dataframe(gdf, k=k) | ||
else: | ||
raise ValueError("Invalid weight_type. Use 'queen' or 'knn'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use InvalidParameterValueException
from our custom exceptions here.
gdf: gpd.GeoDataFrame, | ||
column: str, | ||
weight_type: Literal["queen", "knn"] = "queen", | ||
k: Union[int, None] = 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make remove the option to have None
here. If the user decides to use knn
, k
must be defined, and if they don't use knn
they can just ignore this parameter.
I am not entirely sure about this, but I would increase the default value of k
a bit, perhaps to 4. Looking at our example geochemical dataset, it would make sense to me that k
should be higher than 2, although I don't know how representative our example dataset is. Of course the users can increase k
as they wish, if they know what they're doing..
"""Execute Local Moran's I calculation for the area. | ||
|
||
Args: | ||
gdf: The geodataframe that contains the area to be examined with local morans I. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit nitpicky comment, but I would change the wording here from "area" to "data".
"""Execute Local Moran's I calculation for the area. | |
Args: | |
gdf: The geodataframe that contains the area to be examined with local morans I. | |
"""Calculate Local Moran's I statistics for input data. | |
Args: | |
gdf: Geodataframe containing the input data. |
Geodataframe containing the calculations. | ||
|
||
Raises: | ||
EmptyDataFrameException if input geodataframe is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EmptyDataFrameException if input geodataframe is empty. | |
EmptyDataFrameException: The input geodataframe is empty. |
column: The column to be used in the analysis. | ||
weight_type: The type of spatial weights matrix to be used. Defaults to "queen". | ||
k: Number of nearest neighbors for the KNN weights matrix. Defaults to 2. | ||
permutations: Number of permutations for significance testing. Defaults to 999. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add checks for the column
, k
and permutations
inputs. So check that column
is in the input geodataframe, k
is at least 1, and permutations
is at least 100 (technically, it only needs to be a positive number, but I think we could set the limit somewhere around 100. smaller values probably lead to too inaccurate p-values)
moran_loc = Moran_Local(gdf[column], w, permutations=permutations) | ||
|
||
gdf[f"{column}_local_moran_I"] = moran_loc.Is | ||
gdf[f"{column}_p_value"] = moran_loc.p_sim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this column could be f"{column}_local_moran_I_p_value"
or f"{column}_local_moran_I_p"
, so that it is clear which statistic's p-value it is.
if len(gdf[column]) != len(w.weights): | ||
raise ValueError("Dimension mismatch between data and weights matrix.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this mismatch happen? Anyway, change the exception to some eis_toolkit
exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, we haven't made unit tests so far like they have been implemented here. Meaning that the target values are calculated during the test using the underlying library functions. You know more about testing @nialov , could you comment this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good that you added a notebook for example! However, I think we could use here our geochemical example data that is now in GitHub in the master branch. I think it's preferable to demonstrate using our geological data if convenient and possible, and I believe one of the primary uses for this Local Moran's I might be to inspect spatial autocorrelation of point geochemical data.
So the dataset is called "IOCG_CLB_Till_Geochem_reg_511p.gpkg" and should be found directly under the remote folder. For the target column, any of the mineral columns will probably do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notebook's data is "geographical" as I was not comfortable with using example data I do not fully understand. Hence, I used GDP of African countries to demonstrate the correctness of the Local Moran's I.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's absolutely fine. I have also used similar example datasets for some functions. This improvement just came to my mind, since we added the geochem data recently to master and I have personally used Local Moran's I for similar data exploration in my previous job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in the context of EIS it would be better to use the geochem data. I just do not know what columns (maybe any) and which parameters for weight_type
and k
are feasible for this analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I think you could use for example "Li_ppm_551", so concentration of Lithium.
…ng/eis_toolkit into 287-Add-Local-Morans-I
…ng/eis_toolkit into 287-Add-Local-Morans-I
I guess you have done all the updates now? Looks good to me otherwise, but I still think it could be better to compare results with static values rather than use the underlying library within the test. @nialov do you have opinions regarding this? |
@tomiturunen1 are you still working on this? If not, let me know and I can do the test adjustments and we'll get to merge this. |
Static values yes! The current implementation after the recent changes looks about right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep looks fine now, merging!
Added Local Moran's I function