-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic Algorithms #194
Comments
SpatialPooler test started in PR #164 |
I'd like to add 3 more of these tests.
|
I totally agree with this. Actually we could use better unit test coverage as well on nearly all modules. I think is it not too hard to break something and our unit tests do not tell us. |
can this be part of the Hotgym, MNIST tests? As those use all of the said components, so it's just an issue of recording "gold standard" at some iteration and comparing that.
Better unittests, def yes. But careful with the gold standard, there will be PRs that legitimely break the "gold" (improved code, even new param, ...). But it'll be nice to add a flag "breaks_gold" and have all of these recorded thanks to the checks. |
TM output test failing on (and only) Windows CI, see #397 (comment) |
Btw, @dkeeney , when you get back, do you have idea why this would be failing on Windows? #397 (comment) |
@breznak I will be out again for this weekend but I have not forgotten that I need to check into this problem. |
The test is looking for a TM output of {26, 75} and it is getting {26, 75, 525} on Windows. I displayed the last 100 outputs using Linux (Ubuntu) and Windows (MSVC). When you put them in separate files and diff them, there is a significant difference in the outputs. I need to reduce the test size to isolate the point where the two results diverge. =============================================
|
yes, I'd be suspecting: Random, float rounding, tie-breakers in TM? |
I confirmed that all SP outputs are the same. So it is someplace in TM. So now I have to go inside the algorithm and see what is different at iteration 2424. |
I think I found the problem --- or at least a problem.
I changed overrun to be type During iteration 2423 the second call to growSynapses( ); the random numbers are still matching but the candidates returned have different values. See for loop at line 293. |
can you please submit a small PR with said fix?
|
I am still hunting this down. I have not figured out what causes it.... |
Some progress.
The output of this section on EPOCH 2422, Just prior to this, the output of the nth_element( ) partial sort has the elements partitioned correctly but the exact order between Linux and Windows is slightly different. So when we truncate we just happen to get a different value for the 10th element. That difference propagates and eventually everything is different between Linux and Windows. The Problem: The Fix:
becomes
|
Big KUDOs @dkeeney ! This has been really a huge and lengthy investigation. Great job! 💯 |
If we add Classifier (+Predictor) to hotgym, would you consider this issue resolved? With hotgym being the single place to run tests of all (most) components and check that the outputs are deterministic. |
If you add determinism tests for the predictor then yes, i would close this issue. |
ARM platform on
|
NuPIC should be deterministic, and should yield precisely the same results across all supported platforms. When changes to the source code cause the output to change in any way, we should know about it.
Currently the unit tests do not systemically check this. The solution to this problem should contain hard coded results which include details about the inner workings of the algorithms. The solution should also be easy to update when NuPIC's output is intentionally changed.
EDIT: Progress:
I'd like to add 3 more of these tests.
The text was updated successfully, but these errors were encountered: