Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splink4 : cumulative_comparisons_to_be_scored_from_blocking_rules_chart does support salted or exploding BRs #2233

Open
RobinL opened this issue Jul 8, 2024 · 0 comments

Comments

@RobinL
Copy link
Member

RobinL commented Jul 8, 2024

e.g.

import splink.comparison_library as cl
from splink import DuckDBAPI, Linker, SettingsCreator, block_on, splink_datasets
from splink.blocking_analysis import (
    cumulative_comparisons_to_be_scored_from_blocking_rules_chart,
)
from splink.datasets import splink_dataset_labels

db_api = DuckDBAPI()


df = splink_datasets.fake_1000.head(50)


df_1 = df.loc[df.index % 2 == 0]
df_2 = df.loc[df.index % 2 != 0]

cumulative_comparisons_to_be_scored_from_blocking_rules_chart(
    table_or_tables=df,
    blocking_rules=[block_on("first_name", salting_partitions=2)],
    link_type="dedupe_only",
    db_api=db_api,
)


BinderException: Binder Error: Values list "l" does not have a column named "__splink_salt"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant