Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] RooCurve::Average() overestimating average values of standard precision curves on small intervals #9838

Closed
MZBarel opened this issue Feb 8, 2022 · 0 comments · Fixed by #16099

Comments

@MZBarel
Copy link

MZBarel commented Feb 8, 2022

Describe the bug

When averaging a RooCurve generated with standard precision over intervals smaller than the distance between two subsequent points, the procedure for moving the xFirstPt and xLastPt back inside the interval [xFirst,xLast], can move xFirstPt and xLastPt outside of the requested interval. This leads to an increased value of the integral used for averaging, and a subsequently higher average than can be expected based on the interpolated values of the curve at the interval boundaries.

Expected behavior

RooCurve::average(Double_t xFirst, Double_t xLast) returing the average value of the curve in the requested interval

To Reproduce

Example code where this behaviour occurs:
ExampleMacro.zip

#include <iostream>
#include <RooAbsReal.h>
#include <RooRealVar.h>
#include <RooPlot.h>
#include <RooCmdArg.h>
#include <RooGenericPdf.h>



void ExampleMacro()
{
    RooRealVar x("x","x",0,50);
    RooGenericPdf func("func","Test Function","x",x);

    RooPlot* xframe = x.frame();

    func.plotOn(xframe, RooFit::Name("funcCurve"));

    RooCurve* funcCurve = xframe->getCurve("funcCurve");
    Double_t xFirst, xLast;
    std::cout << "\ni, f(i), f(i+0.1), avg[i,i+0.1]\n" << std::endl;

    for (Double_t i = 10; i < 11; i += 0.1)
    {
        Double_t avg = funcCurve->average(i,i+0.1);
        xFirst = funcCurve->interpolate(i, 1e-10);
        xLast = funcCurve->interpolate(i+0.1, 1e-10);
        std::cout << i << ", " << xFirst << ", " << xLast << ", " << avg << std::endl;
    }
}

Run with root -l ExampleMacro.C

Output, marking the higher than expected averages with <--:

root [0] 
Processing ExampleMacro.C...

RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

[#1] INFO:NumericIntegration -- RooRealIntegral::init(func_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)

i, f(i), f(i+0.1), avg[i,i+0.1]

10, 0.004, 0.00404, 0.00402
10.1, 0.00404, 0.00408, 0.02456   <--
10.2, 0.00408, 0.00412, 0.0041
10.3, 0.00412, 0.00416, 0.02464   <--
10.4, 0.00416, 0.0042, 0.00418
10.5, 0.0042, 0.00424, 0.00422
10.6, 0.00424, 0.00428, 0.02576   <--
10.7, 0.00428, 0.00432, 0.0043
10.8, 0.00432, 0.00436, 0.02584   <--
10.9, 0.00436, 0.0044, 0.00438
11, 0.0044, 0.00444, 0.00442

Setup

ROOT Version: 6.24/06
Built for linuxx8664gcc
From tags/v6-24-06@v6-24-06
Installed manually within WSL Ubuntu 20.04

Additional context

I first encountered this behaviour when calculating the chi^2 between a fitted RooCurve and data, where the data had bin sizes of 0.1, and the RooCurve was taking steps of 1. The chi^2 value of the RooCurve far exceeded what could be expected based on the fit residuals with respect to the data. Increasing the amount of curve points by increasing the precision of the plotOn() call resolved this issue.

A potential fix for this bug could be to check to see if the xFirstPt and xLastPt used for averaging remain within the averaging interval when shifting them upwards or downwards, leading to inversion of the averaging boundaries. If this occurs taking the interpolated y-value at the violated boundary instead of the shifted value might resolve this issue.
Otherwise warning the user that this is occurring, and recommending increasing the precision used when calling plotOn() to prevent this scenario from occurring to begin with.

@MZBarel MZBarel added the bug label Feb 8, 2022
@guitargeek guitargeek self-assigned this Feb 9, 2022
@guitargeek guitargeek changed the title RooCurve::Average() overestimating average values of standard precision curves on small intervals [RF] RooCurve::Average() overestimating average values of standard precision curves on small intervals Jun 14, 2022
guitargeek added a commit to guitargeek/root that referenced this issue Jul 24, 2024
When integrating the discretely-sampled RooCurves, the algorithm
implemented in RooCurve::average() was unnecessarily complicated.

The existing midpoints were only considered for the trapezoidal rule if
they are away from the interval limits with an arbitrary tolerance,
which seems like a premature optimization to me.

In particular, the logic was not correct if all midpoints were close to
the limits without this tolerance, resulting in issue root-project#9838.

Instead of making that case work correctly by implementing more code
branches, this commit suggests to simply don't do this tolerace check
and use all available midpoints for the trapezoidal integration rule.

A unit test with the reproducer from root-project#9838 is also implemented.

Closes root-project#9838.
guitargeek added a commit that referenced this issue Jul 24, 2024
When integrating the discretely-sampled RooCurves, the algorithm
implemented in RooCurve::average() was unnecessarily complicated.

The existing midpoints were only considered for the trapezoidal rule if
they are away from the interval limits with an arbitrary tolerance,
which seems like a premature optimization to me.

In particular, the logic was not correct if all midpoints were close to
the limits without this tolerance, resulting in issue #9838.

Instead of making that case work correctly by implementing more code
branches, this commit suggests to simply don't do this tolerace check
and use all available midpoints for the trapezoidal integration rule.

A unit test with the reproducer from #9838 is also implemented.

Closes #9838.
guitargeek added a commit to guitargeek/root that referenced this issue Jul 24, 2024
When integrating the discretely-sampled RooCurves, the algorithm
implemented in RooCurve::average() was unnecessarily complicated.

The existing midpoints were only considered for the trapezoidal rule if
they are away from the interval limits with an arbitrary tolerance,
which seems like a premature optimization to me.

In particular, the logic was not correct if all midpoints were close to
the limits without this tolerance, resulting in issue root-project#9838.

Instead of making that case work correctly by implementing more code
branches, this commit suggests to simply don't do this tolerace check
and use all available midpoints for the trapezoidal integration rule.

A unit test with the reproducer from root-project#9838 is also implemented.

Closes root-project#9838.
guitargeek added a commit that referenced this issue Jul 24, 2024
When integrating the discretely-sampled RooCurves, the algorithm
implemented in RooCurve::average() was unnecessarily complicated.

The existing midpoints were only considered for the trapezoidal rule if
they are away from the interval limits with an arbitrary tolerance,
which seems like a premature optimization to me.

In particular, the logic was not correct if all midpoints were close to
the limits without this tolerance, resulting in issue #9838.

Instead of making that case work correctly by implementing more code
branches, this commit suggests to simply don't do this tolerace check
and use all available midpoints for the trapezoidal integration rule.

A unit test with the reproducer from #9838 is also implemented.

Closes #9838.
silverweed pushed a commit to silverweed/root that referenced this issue Aug 19, 2024
When integrating the discretely-sampled RooCurves, the algorithm
implemented in RooCurve::average() was unnecessarily complicated.

The existing midpoints were only considered for the trapezoidal rule if
they are away from the interval limits with an arbitrary tolerance,
which seems like a premature optimization to me.

In particular, the logic was not correct if all midpoints were close to
the limits without this tolerance, resulting in issue root-project#9838.

Instead of making that case work correctly by implementing more code
branches, this commit suggests to simply don't do this tolerace check
and use all available midpoints for the trapezoidal integration rule.

A unit test with the reproducer from root-project#9838 is also implemented.

Closes root-project#9838.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants