Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] Inconsistent null handling in DataFrameColumn arithmetic #5650

Closed
rhysparry opened this issue Jun 18, 2020 · 1 comment · Fixed by #6770
Closed

[DataFrame] Inconsistent null handling in DataFrameColumn arithmetic #5650

rhysparry opened this issue Jun 18, 2020 · 1 comment · Fixed by #6770
Assignees
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs

Comments

@rhysparry
Copy link
Contributor

Suppose that I have the following DataFrame:

var df = new DataFrame(
    new PrimitiveDataFrameColumn<int>("Foo", 10),
    new PrimitiveDataFrameColumn<int>("Bar", Enumerable.Range(1, 10))
);

When performing mathematical operations where either side is null I would expect the null to be propagated to the resulting column.

Indeed, that is what happens when the null value is the left-hand operand. E.g.

df.Columns["Foo"] + df.Columns["Bar"]

Here the result is a column of nulls, but if we reverse the operands:

df.Columns["Bar"] + df.Columns["Foo"]

The nulls in the Foo column are effectively treated as 0.

It looks like this occurs because the Arithmetic classes are working on the underlying buffers which don't keep track of the null values (that seems to be tracked in a separate NullBitMapBuffers property on the container).

@pgovind pgovind self-assigned this Jun 18, 2020
@pgovind
Copy link
Contributor

pgovind commented Jun 20, 2020

Yup, you're right. I repro'd it with "Foo" have a null value and "Bar" having no null values. This is a bug in all our arithmetic methods. In the first case, we clone Foo first, so the null information gets carried over to the result correctly. In the reversed case, we clone Bar, but since it doesn't have any nulls, the null information doesn't make it through. The fix is to set the right bits in result.NullBitMapBuffers.

@pgovind pgovind transferred this issue from dotnet/corefxlab Mar 6, 2021
@pgovind pgovind added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Mar 6, 2021
@ghost ghost added the in-pr label Jul 20, 2023
@ghost ghost removed the in-pr label Aug 31, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants