[RF] RooCurve::Average() overestimating average values of standard precision curves on small intervals #9838

MZBarel · 2022-02-08T16:10:58Z

Describe the bug

When averaging a RooCurve generated with standard precision over intervals smaller than the distance between two subsequent points, the procedure for moving the xFirstPt and xLastPt back inside the interval [xFirst,xLast], can move xFirstPt and xLastPt outside of the requested interval. This leads to an increased value of the integral used for averaging, and a subsequently higher average than can be expected based on the interpolated values of the curve at the interval boundaries.

Expected behavior

RooCurve::average(Double_t xFirst, Double_t xLast) returing the average value of the curve in the requested interval

To Reproduce

Example code where this behaviour occurs:
ExampleMacro.zip

#include <iostream>
#include <RooAbsReal.h>
#include <RooRealVar.h>
#include <RooPlot.h>
#include <RooCmdArg.h>
#include <RooGenericPdf.h>



void ExampleMacro()
{
    RooRealVar x("x","x",0,50);
    RooGenericPdf func("func","Test Function","x",x);

    RooPlot* xframe = x.frame();

    func.plotOn(xframe, RooFit::Name("funcCurve"));

    RooCurve* funcCurve = xframe->getCurve("funcCurve");
    Double_t xFirst, xLast;
    std::cout << "\ni, f(i), f(i+0.1), avg[i,i+0.1]\n" << std::endl;

    for (Double_t i = 10; i < 11; i += 0.1)
    {
        Double_t avg = funcCurve->average(i,i+0.1);
        xFirst = funcCurve->interpolate(i, 1e-10);
        xLast = funcCurve->interpolate(i+0.1, 1e-10);
        std::cout << i << ", " << xFirst << ", " << xLast << ", " << avg << std::endl;
    }
}

Run with root -l ExampleMacro.C

Output, marking the higher than expected averages with <--:

root [0] 
Processing ExampleMacro.C...

RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

[#1] INFO:NumericIntegration -- RooRealIntegral::init(func_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)

i, f(i), f(i+0.1), avg[i,i+0.1]

10, 0.004, 0.00404, 0.00402
10.1, 0.00404, 0.00408, 0.02456   <--
10.2, 0.00408, 0.00412, 0.0041
10.3, 0.00412, 0.00416, 0.02464   <--
10.4, 0.00416, 0.0042, 0.00418
10.5, 0.0042, 0.00424, 0.00422
10.6, 0.00424, 0.00428, 0.02576   <--
10.7, 0.00428, 0.00432, 0.0043
10.8, 0.00432, 0.00436, 0.02584   <--
10.9, 0.00436, 0.0044, 0.00438
11, 0.0044, 0.00444, 0.00442

Setup

ROOT Version: 6.24/06
Built for linuxx8664gcc
From tags/v6-24-06@v6-24-06
Installed manually within WSL Ubuntu 20.04

Additional context

I first encountered this behaviour when calculating the chi^2 between a fitted RooCurve and data, where the data had bin sizes of 0.1, and the RooCurve was taking steps of 1. The chi^2 value of the RooCurve far exceeded what could be expected based on the fit residuals with respect to the data. Increasing the amount of curve points by increasing the precision of the plotOn() call resolved this issue.

A potential fix for this bug could be to check to see if the xFirstPt and xLastPt used for averaging remain within the averaging interval when shifting them upwards or downwards, leading to inversion of the averaging boundaries. If this occurs taking the interpolated y-value at the violated boundary instead of the shifted value might resolve this issue.
Otherwise warning the user that this is occurring, and recommending increasing the precision used when calling plotOn() to prevent this scenario from occurring to begin with.

The text was updated successfully, but these errors were encountered:

When integrating the discretely-sampled RooCurves, the algorithm implemented in RooCurve::average() was unnecessarily complicated. The existing midpoints were only considered for the trapezoidal rule if they are away from the interval limits with an arbitrary tolerance, which seems like a premature optimization to me. In particular, the logic was not correct if all midpoints were close to the limits without this tolerance, resulting in issue root-project#9838. Instead of making that case work correctly by implementing more code branches, this commit suggests to simply don't do this tolerace check and use all available midpoints for the trapezoidal integration rule. A unit test with the reproducer from root-project#9838 is also implemented. Closes root-project#9838.

When integrating the discretely-sampled RooCurves, the algorithm implemented in RooCurve::average() was unnecessarily complicated. The existing midpoints were only considered for the trapezoidal rule if they are away from the interval limits with an arbitrary tolerance, which seems like a premature optimization to me. In particular, the logic was not correct if all midpoints were close to the limits without this tolerance, resulting in issue #9838. Instead of making that case work correctly by implementing more code branches, this commit suggests to simply don't do this tolerace check and use all available midpoints for the trapezoidal integration rule. A unit test with the reproducer from #9838 is also implemented. Closes #9838.

When integrating the discretely-sampled RooCurves, the algorithm implemented in RooCurve::average() was unnecessarily complicated. The existing midpoints were only considered for the trapezoidal rule if they are away from the interval limits with an arbitrary tolerance, which seems like a premature optimization to me. In particular, the logic was not correct if all midpoints were close to the limits without this tolerance, resulting in issue root-project#9838. Instead of making that case work correctly by implementing more code branches, this commit suggests to simply don't do this tolerace check and use all available midpoints for the trapezoidal integration rule. A unit test with the reproducer from root-project#9838 is also implemented. Closes root-project#9838.

When integrating the discretely-sampled RooCurves, the algorithm implemented in RooCurve::average() was unnecessarily complicated. The existing midpoints were only considered for the trapezoidal rule if they are away from the interval limits with an arbitrary tolerance, which seems like a premature optimization to me. In particular, the logic was not correct if all midpoints were close to the limits without this tolerance, resulting in issue #9838. Instead of making that case work correctly by implementing more code branches, this commit suggests to simply don't do this tolerace check and use all available midpoints for the trapezoidal integration rule. A unit test with the reproducer from #9838 is also implemented. Closes #9838.

When integrating the discretely-sampled RooCurves, the algorithm implemented in RooCurve::average() was unnecessarily complicated. The existing midpoints were only considered for the trapezoidal rule if they are away from the interval limits with an arbitrary tolerance, which seems like a premature optimization to me. In particular, the logic was not correct if all midpoints were close to the limits without this tolerance, resulting in issue root-project#9838. Instead of making that case work correctly by implementing more code branches, this commit suggests to simply don't do this tolerace check and use all available midpoints for the trapezoidal integration rule. A unit test with the reproducer from root-project#9838 is also implemented. Closes root-project#9838.

MZBarel added the bug label Feb 8, 2022

guitargeek self-assigned this Feb 9, 2022

guitargeek added the in:RooFit/RooStats label Feb 9, 2022

guitargeek changed the title ~~RooCurve::Average() overestimating average values of standard precision curves on small intervals~~ [RF] RooCurve::Average() overestimating average values of standard precision curves on small intervals Jun 14, 2022

guitargeek mentioned this issue Jul 24, 2024

[RF] Improve implementation of RooCurve::average() #16099

Merged

guitargeek closed this as completed in #16099 Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RF] RooCurve::Average() overestimating average values of standard precision curves on small intervals #9838

[RF] RooCurve::Average() overestimating average values of standard precision curves on small intervals #9838

MZBarel commented Feb 8, 2022 •

edited by guitargeek

Loading

[RF] RooCurve::Average() overestimating average values of standard precision curves on small intervals #9838

[RF] RooCurve::Average() overestimating average values of standard precision curves on small intervals #9838

Comments

MZBarel commented Feb 8, 2022 • edited by guitargeek Loading

Describe the bug

Expected behavior

To Reproduce

Setup

Additional context

MZBarel commented Feb 8, 2022 •

edited by guitargeek

Loading