Deterministic Algorithms #194

ctrl-z-9000-times · 2019-01-04T00:09:43Z

NuPIC should be deterministic, and should yield precisely the same results across all supported platforms. When changes to the source code cause the output to change in any way, we should know about it.

Currently the unit tests do not systemically check this. The solution to this problem should contain hard coded results which include details about the inner workings of the algorithms. The solution should also be easy to update when NuPIC's output is intentionally changed.

EDIT: Progress:

I'd like to add 3 more of these tests.

For TM, sdr classifier, and scalar encoder. Id like to make one big test for test coverage of all of these components.
For benchmark hotgym -- implemented in TM: cellsToColumns #397
- TM output test failing on Windows (ok on other platforms) -> fixed in Deterministic Algorithms #465
For MNIST SP
SP -- fixed in SP Exact Output Test #197

ctrl-z-9000-times · 2019-01-04T00:12:30Z

SpatialPooler test started in PR #164

ctrl-z-9000-times · 2019-01-09T17:59:48Z

I'd like to add 3 more of these tests.

For TM, sdr classifier, and scalar encoder. Id like to make one big test for test coverage of all of these components.
For benchmark hotgym
For MNIST SP

dkeeney · 2019-01-09T18:41:29Z

I totally agree with this. Actually we could use better unit test coverage as well on nearly all modules. I think is it not too hard to break something and our unit tests do not tell us.

breznak · 2019-02-27T12:35:57Z

For TM, sdr classifier, and scalar encoder. Id like to make one big test for test coverage of all of these components.

can this be part of the Hotgym, MNIST tests? As those use all of the said components, so it's just an issue of recording "gold standard" at some iteration and comparing that.

I totally agree with this. Actually we could use better unit test coverage as well on nearly all modules. I think is it not too hard to break something and our unit tests do not tell us.

Better unittests, def yes. But careful with the gold standard, there will be PRs that legitimely break the "gold" (improved code, even new param, ...). But it'll be nice to add a flag "breaks_gold" and have all of these recorded thanks to the checks.

breznak · 2019-04-23T15:23:47Z

TM output test failing on (and only) Windows CI, see #397 (comment)

breznak · 2019-04-24T17:19:54Z

Btw, @dkeeney , when you get back, do you have idea why this would be failing on Windows? #397 (comment)

dkeeney · 2019-04-27T13:10:49Z

@breznak I will be out again for this weekend but I have not forgotten that I need to check into this problem.

dkeeney · 2019-05-01T12:48:49Z

"Deterministic output of TM failed!"

The test is looking for a TM output of {26, 75} and it is getting {26, 75, 525} on Windows.
Debug mode and Release mode on Windows both produce the same output if given the same iteration count.

I displayed the last 100 outputs using Linux (Ubuntu) and Windows (MSVC). When you put them in separate files and diff them, there is a significant difference in the outputs.
I don't know what is causing it yet but I suspect there is some difference in floating point bit resolution or how something is truncated. Hard to tell at this point.

I need to reduce the test size to isolate the point where the two results diverge.

=============================================
Windows: Last 100 outputs:

e=4899 out: SDR( 2048 ) 102, 701, 1044, 1116, 1453, 1469
e=4900 out: SDR( 2048 ) 668, 701, 1044, 1116
e=4901 out: SDR( 2048 ) 102, 668, 1458
e=4902 out: SDR( 2048 ) 450, 1453, 1469
e=4903 out: SDR( 2048 ) 450, 1044, 1128, 1447, 1453, 1469
e=4904 out: SDR( 2048 ) 450, 1453
e=4905 out: SDR( 2048 ) 254, 450, 728, 1044, 1447, 1453, 1458, 1469, 1477, 1870
e=4906 out: SDR( 2048 ) 450, 1447, 1453
e=4907 out: SDR( 2048 ) 15, 254, 1069, 1447, 1870
e=4908 out: SDR( 2048 ) 15, 254, 1069
e=4909 out: SDR( 2048 ) 15, 254, 1069, 1447, 1477
e=4910 out: SDR( 2048 )
e=4911 out: SDR( 2048 ) 15, 1710, 1720, 2039
e=4912 out: SDR( 2048 ) 595, 1720, 1953
e=4913 out: SDR( 2048 ) 15, 1199, 1710, 1720, 2001, 2039
e=4914 out: SDR( 2048 ) 1635, 1953
e=4915 out: SDR( 2048 ) 327, 421, 1199, 1935, 2001
e=4916 out: SDR( 2048 ) 321, 456
e=4917 out: SDR( 2048 )
e=4918 out: SDR( 2048 ) 418, 490, 1069
e=4919 out: SDR( 2048 ) 418, 490, 1044, 1069, 1766
e=4920 out: SDR( 2048 )
e=4921 out: SDR( 2048 ) 39, 490, 1038, 1044, 1058, 1069
e=4922 out: SDR( 2048 ) 701
e=4923 out: SDR( 2048 )
e=4924 out: SDR( 2048 ) 194, 236, 701, 1710, 1724, 1940, 1960, 1980
e=4925 out: SDR( 2048 ) 1960
e=4926 out: SDR( 2048 ) 236, 701, 1710, 1940, 1960
e=4927 out: SDR( 2048 ) 1960
e=4928 out: SDR( 2048 ) 913, 923
e=4929 out: SDR( 2048 ) 335, 532, 795, 923, 926, 1239, 1670
e=4930 out: SDR( 2048 )
e=4931 out: SDR( 2048 ) 335, 532, 795, 1239
e=4932 out: SDR( 2048 ) 1038, 1044, 1048, 1064
e=4933 out: SDR( 2048 )
e=4934 out: SDR( 2048 ) 1038, 1044, 1048, 1064
e=4935 out: SDR( 2048 )
e=4936 out: SDR( 2048 ) 912, 1038, 1044, 1048, 1064
e=4937 out: SDR( 2048 ) 912, 1038
e=4938 out: SDR( 2048 )
e=4939 out: SDR( 2048 ) 1038, 1044, 1048, 1670, 1805, 1812, 1813, 1815, 1922
e=4940 out: SDR( 2048 ) 701
e=4941 out: SDR( 2048 )
e=4942 out: SDR( 2048 ) 194, 216, 1980
e=4943 out: SDR( 2048 ) 174, 561, 1980
e=4944 out: SDR( 2048 ) 1772, 1773, 2026
e=4945 out: SDR( 2048 ) 1772, 1773, 2026
e=4946 out: SDR( 2048 ) 7, 13, 1315, 1796
e=4947 out: SDR( 2048 ) 7, 13, 15, 17, 72, 1252, 1259, 1315, 1796, 1877
e=4948 out: SDR( 2048 ) 13, 17, 39, 1252, 1315, 1877
e=4949 out: SDR( 2048 )
e=4950 out: SDR( 2048 )
e=4951 out: SDR( 2048 )
e=4952 out: SDR( 2048 ) 160, 790, 859, 1867
e=4953 out: SDR( 2048 ) 160, 859, 982
e=4954 out: SDR( 2048 )
e=4955 out: SDR( 2048 ) 982, 1700, 1738, 1856
e=4956 out: SDR( 2048 ) 1700
e=4957 out: SDR( 2048 ) 961, 1700, 1738
e=4958 out: SDR( 2048 ) 230, 1107, 1685
e=4959 out: SDR( 2048 ) 124, 1107, 1946, 1976
e=4960 out: SDR( 2048 )
e=4961 out: SDR( 2048 ) 468
e=4962 out: SDR( 2048 ) 468, 923, 925, 1199, 1442, 1647, 1648, 1670
e=4963 out: SDR( 2048 )
e=4964 out: SDR( 2048 ) 879, 923, 1199, 1418, 1442
e=4965 out: SDR( 2048 ) 879, 1418, 1442
e=4966 out: SDR( 2048 )
e=4967 out: SDR( 2048 ) 348, 370, 1172, 1201, 1232, 1418
e=4968 out: SDR( 2048 )
e=4969 out: SDR( 2048 )
e=4970 out: SDR( 2048 ) 361, 733, 1257
e=4971 out: SDR( 2048 ) 361
e=4972 out: SDR( 2048 )
e=4973 out: SDR( 2048 )
e=4974 out: SDR( 2048 )
e=4975 out: SDR( 2048 ) 532, 560, 1471, 1486, 1786
e=4976 out: SDR( 2048 )
e=4977 out: SDR( 2048 ) 532, 560, 1366, 1486, 1786
e=4978 out: SDR( 2048 )
e=4979 out: SDR( 2048 ) 1085, 1366, 1591
e=4980 out: SDR( 2048 )
e=4981 out: SDR( 2048 )
e=4982 out: SDR( 2048 ) 1619, 1678
e=4983 out: SDR( 2048 )
e=4984 out: SDR( 2048 ) 963, 967, 1619, 2018
e=4985 out: SDR( 2048 ) 148, 167, 176
e=4986 out: SDR( 2048 )
e=4987 out: SDR( 2048 ) 144, 148, 167, 168, 174, 176, 197, 913, 1795
e=4988 out: SDR( 2048 )
e=4989 out: SDR( 2048 ) 1001
e=4990 out: SDR( 2048 ) 980
e=4991 out: SDR( 2048 ) 975, 980
e=4992 out: SDR( 2048 ) 975, 1813, 1815
e=4993 out: SDR( 2048 ) 975, 980
e=4994 out: SDR( 2048 )
e=4995 out: SDR( 2048 )
e=4996 out: SDR( 2048 )
e=4997 out: SDR( 2048 ) 75, 128
e=4998 out: SDR( 2048 ) 75, 128, 912
e=4999 out: SDR( 2048 ) 26, 75, 525

=============================================
Linux: Last 200 outputs:

e=4899 out: SDR( 2048 ) 102, 450, 1044, 1116, 1453, 1458, 1493, 1824
e=4900 out: SDR( 2048 )
e=4901 out: SDR( 2048 ) 102, 668, 701, 1044, 1116, 1453, 1469, 1824
e=4902 out: SDR( 2048 ) 1453
e=4903 out: SDR( 2048 ) 450, 1044, 1128, 1453, 1469
e=4904 out: SDR( 2048 ) 1453, 1469
e=4905 out: SDR( 2048 ) 254, 450, 728, 1044, 1116, 1447, 1453, 1458, 1469, 1477, 1493, 1870
e=4906 out: SDR( 2048 )
e=4907 out: SDR( 2048 ) 254, 1069, 1447, 1870
e=4908 out: SDR( 2048 ) 15, 254, 1069, 1447
e=4909 out: SDR( 2048 ) 254, 1069, 1447
e=4910 out: SDR( 2048 ) 1710, 1720
e=4911 out: SDR( 2048 )
e=4912 out: SDR( 2048 ) 15, 1710, 1720, 2013, 2039
e=4913 out: SDR( 2048 ) 1199, 2001
e=4914 out: SDR( 2048 ) 327, 1635, 1953
e=4915 out: SDR( 2048 )
e=4916 out: SDR( 2048 ) 321, 327
e=4917 out: SDR( 2048 )
e=4918 out: SDR( 2048 ) 418, 490, 1058, 1069
e=4919 out: SDR( 2048 ) 1766
e=4920 out: SDR( 2048 ) 39, 418, 490, 1049, 1058, 1068, 1069
e=4921 out: SDR( 2048 ) 1038
e=4922 out: SDR( 2048 ) 701
e=4923 out: SDR( 2048 ) 1710
e=4924 out: SDR( 2048 ) 194, 236, 701, 1710, 1724, 1940
e=4925 out: SDR( 2048 ) 1751, 1946, 1960
e=4926 out: SDR( 2048 ) 701, 1710, 1940, 1960
e=4927 out: SDR( 2048 ) 926
e=4928 out: SDR( 2048 ) 913, 923
e=4929 out: SDR( 2048 ) 335, 532, 795, 923, 926, 1239, 1670
e=4930 out: SDR( 2048 )
e=4931 out: SDR( 2048 ) 335, 532, 795, 921, 1239
e=4932 out: SDR( 2048 ) 1038, 1048, 1064
e=4933 out: SDR( 2048 )
e=4934 out: SDR( 2048 ) 1038, 1048, 1064, 1068, 1745
e=4935 out: SDR( 2048 )
e=4936 out: SDR( 2048 ) 1038, 1044, 1048
e=4937 out: SDR( 2048 ) 912, 1044
e=4938 out: SDR( 2048 )
e=4939 out: SDR( 2048 ) 1038, 1044, 1048, 1670, 1796, 1805, 1812, 1813, 1922
e=4940 out: SDR( 2048 ) 701
e=4941 out: SDR( 2048 )
e=4942 out: SDR( 2048 ) 194, 216, 1980
e=4943 out: SDR( 2048 ) 561, 1980
e=4944 out: SDR( 2048 ) 1772, 1773, 1796, 2026
e=4945 out: SDR( 2048 ) 1796, 2026
e=4946 out: SDR( 2048 ) 7, 13, 17, 1315, 1796
e=4947 out: SDR( 2048 ) 7, 13, 15, 17, 72, 1252, 1259, 1315, 1796, 1877
e=4948 out: SDR( 2048 ) 13, 17, 39, 1252, 1877
e=4949 out: SDR( 2048 )
e=4950 out: SDR( 2048 )
e=4951 out: SDR( 2048 )
e=4952 out: SDR( 2048 ) 160, 859, 1867
e=4953 out: SDR( 2048 ) 160, 859, 982, 1867
e=4954 out: SDR( 2048 )
e=4955 out: SDR( 2048 ) 982, 1700, 1738
e=4956 out: SDR( 2048 ) 961, 1700
e=4957 out: SDR( 2048 ) 961
e=4958 out: SDR( 2048 ) 230, 1107, 1685
e=4959 out: SDR( 2048 ) 1107, 1976
e=4960 out: SDR( 2048 ) 124, 1976
e=4961 out: SDR( 2048 ) 1648
e=4962 out: SDR( 2048 ) 468, 923, 925, 1199, 1442, 1670
e=4963 out: SDR( 2048 )
e=4964 out: SDR( 2048 ) 879, 923, 1199, 1418, 1442, 1458
e=4965 out: SDR( 2048 ) 879, 1442, 1458
e=4966 out: SDR( 2048 )
e=4967 out: SDR( 2048 ) 348, 370, 1201, 1232
e=4968 out: SDR( 2048 )
e=4969 out: SDR( 2048 )
e=4970 out: SDR( 2048 ) 361, 733, 1257
e=4971 out: SDR( 2048 ) 361
e=4972 out: SDR( 2048 )
e=4973 out: SDR( 2048 ) 42
e=4974 out: SDR( 2048 )
e=4975 out: SDR( 2048 ) 532, 1467, 1471, 1486, 1786
e=4976 out: SDR( 2048 )
e=4977 out: SDR( 2048 ) 532, 560, 1366, 1486, 1786
e=4978 out: SDR( 2048 )
e=4979 out: SDR( 2048 ) 1085, 1366, 1591
e=4980 out: SDR( 2048 )
e=4981 out: SDR( 2048 )
e=4982 out: SDR( 2048 ) 1619, 1670, 1678, 2018
e=4983 out: SDR( 2048 )
e=4984 out: SDR( 2048 ) 963, 967, 1619, 1678, 2018
e=4985 out: SDR( 2048 ) 148, 167
e=4986 out: SDR( 2048 )
e=4987 out: SDR( 2048 ) 144, 148, 157, 164, 167, 168, 176, 197, 913, 1795
e=4988 out: SDR( 2048 )
e=4989 out: SDR( 2048 ) 1001
e=4990 out: SDR( 2048 ) 975, 980
e=4991 out: SDR( 2048 )
e=4992 out: SDR( 2048 ) 80, 975, 1805, 1813
e=4993 out: SDR( 2048 )
e=4994 out: SDR( 2048 )
e=4995 out: SDR( 2048 )
e=4996 out: SDR( 2048 ) 42
e=4997 out: SDR( 2048 ) 75, 128
e=4998 out: SDR( 2048 ) 75, 305, 912
e=4999 out: SDR( 2048 ) 26, 75

breznak · 2019-05-01T13:56:00Z

I don't know what is causing it yet but I suspect there is some difference in floating point bit resolution or how something is truncated. Hard to tell at this point.

yes, I'd be suspecting: Random, float rounding, tie-breakers in TM?

dkeeney · 2019-05-01T16:17:20Z

I confirmed that all SP outputs are the same. So it is someplace in TM.
The TM outputs are not different until iteration 2424.

So now I have to go inside the algorithm and see what is different at iteration 2424.

dkeeney · 2019-05-02T00:09:12Z

I think I found the problem --- or at least a problem.
TemporalMemory.cpp, line 284 in growSynapses( ) .

  const size_t overrun = (connections.numSynapses(segment) + nActual - maxSynapsesPerSegment);
  if (overrun > 0) {
    destroyMinPermanenceSynapses(connections, rng, segment, static_cast<Int>(overrun), prevWinnerCells);
  }

overrun is unsigned long so the subtraction assigns a negative number which is interpreted as greater than 0. destroyMinPermanencesSynapses( ) is always called. But because it is cast to int, inside the function it is a negative number and nothing actually gets deleted. But its doing a lot of work that it does not need to do.

I changed overrun to be type Int. That stopped the unnecessary calls to destroyMinPermanencesSynapses( ) but we still diverge in the same place.

During iteration 2423 the second call to growSynapses( ); the random numbers are still matching but the candidates returned have different values. See for loop at line 293.

breznak · 2019-05-03T07:21:24Z

@dkeeney

I changed overrun to be type Int. That stopped the unnecessary calls to destroyMinPermanencesSynapses( ) but we still diverge in the same place.

can you please submit a small PR with said fix?

add to the top of destroyMinPermanenceSynapses(connections :

if(nDestroy < 1) return;

dkeeney · 2019-05-04T23:12:16Z

I am still hunting this down.
Windows and Linux deviate at the end of iteration 2422
While processing column 908, In ActivateCells( ) The Linux code calls BurstColumn( ) three times. Windows code calls BurstColumn( ) four times. That last time finds cell 7269 as the winner and it creates an extra segment.

I have not figured out what causes it....

dkeeney · 2019-05-11T02:59:56Z

Some progress.
In SpatialPooler.cpp, line 875 in inhibitColumnsGlobal_( )

  // Sort the columns by the amount of overlap.  First make a list of all of the
  // column indexes.
  activeColumns.reserve(numColumns_);
  for(UInt i = 0; i < numColumns_; i++)
    activeColumns.push_back(i);
  // Compare the column indexes by their overlap.
  auto compare = [&overlaps_](const UInt &a, const UInt &b) -> bool
    {return overlaps_[a] > overlaps_[b];};
  // Do a partial sort to divide the winners from the losers.  This sort is
  // faster than a regular sort because it stops after it partitions the
  // elements about the Nth element, with all elements on their correct side of
  // the Nth element.
  std::nth_element(
    activeColumns.begin(),
    activeColumns.begin() + numDesired,
    activeColumns.end(),
    compare);
  // Remove the columns which lost the competition.
  activeColumns.resize(numDesired);
  // Finish sorting the winner columns by their overlap.
  std::sort(activeColumns.begin(), activeColumns.end(), compare);
  // Remove sub-threshold winners
  while( !activeColumns.empty() &&
         overlaps[activeColumns.back()] < stimulusThreshold_)
      activeColumns.pop_back();

The output of this section on EPOCH 2422,
Linux: activeColumns out: 1038 1064 912 34 1095 1044 1449 40 1332 1051
Windows: activeColumns out: 1038 1064 912 34 1095 1044 1449 40 1332 908
Note the last element.

Just prior to this, the output of the nth_element( ) partial sort has the elements partitioned correctly but the exact order between Linux and Windows is slightly different. So when we truncate we just happen to get a different value for the 10th element. That difference propagates and eventually everything is different between Linux and Windows.

The Problem:
So, I think the problem is that the compare function does not uniquely specify the order. Items that compare equal can be in any order.

The Fix:

  // Compare the column indexes by their overlap.
  auto compare = [&overlaps_](const UInt &a, const UInt &b) -> bool
    {return overlaps_[a] > overlaps_[b];};

becomes

  // Compare the column indexes by their overlap.
  auto compare = [&overlaps_](const UInt &a, const UInt &b) -> bool
    {return (overlaps_[a] == overlaps_[b]) ? a > b : overlaps_[a] > overlaps_[b];};

breznak · 2019-05-12T14:04:19Z

Big KUDOs @dkeeney ! This has been really a huge and lengthy investigation. Great job! 💯

breznak · 2019-05-13T17:39:40Z

If we add Classifier (+Predictor) to hotgym, would you consider this issue resolved? With hotgym being the single place to run tests of all (most) components and check that the outputs are deterministic.

ctrl-z-9000-times · 2019-05-13T19:33:40Z

If you add determinism tests for the predictor then yes, i would close this issue.

breznak · 2020-06-02T12:31:36Z

ARM platform on muslC is not "deterministic.
TODO:

predictor
classifier ...add to HelloSPTP example cpp

ctrl-z-9000-times mentioned this issue Jan 5, 2019

SP Exact Output Test #197

Merged

breznak added enhancement New feature or request SP TM community tests platform labels Jan 14, 2019

breznak mentioned this issue Jan 17, 2019

Spatial Pooler Local Inhibition Rewrite - WIP #209

Open

7 tasks

breznak mentioned this issue Jan 31, 2019

Valgrind #238

Merged

6 tasks

breznak mentioned this issue Feb 28, 2019

Topological inhibition for SpatialPooler #287

Open

ctrl-z-9000-times added the newbie label Mar 16, 2019

breznak mentioned this issue Apr 17, 2019

TM: cellsToColumns #397

Merged

4 tasks

dkeeney self-assigned this Apr 27, 2019

This was referenced May 5, 2019

New method Connections::destroyMinPermanenceSynapses #446

Merged

Real-life benchmark: Hotgym example using C++ algorithms #30

Open

dkeeney mentioned this issue May 6, 2019

TMRegion new outputs: anomaly & predictiveCells #451

Merged

2 tasks

breznak mentioned this issue May 7, 2019

Hotgym: validate that avg anomaly improves #450

Merged

3 tasks

dkeeney mentioned this issue May 11, 2019

Deterministic Algorithms #465

Merged

breznak mentioned this issue Jul 10, 2019

SP replacec bumpUpWeakColumns with dendritic death/creation #551

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic Algorithms #194

Deterministic Algorithms #194

ctrl-z-9000-times commented Jan 4, 2019 •

edited by breznak

Loading

ctrl-z-9000-times commented Jan 4, 2019

ctrl-z-9000-times commented Jan 9, 2019

dkeeney commented Jan 9, 2019

breznak commented Feb 27, 2019

breznak commented Apr 23, 2019

breznak commented Apr 24, 2019

dkeeney commented Apr 27, 2019

dkeeney commented May 1, 2019

breznak commented May 1, 2019

dkeeney commented May 1, 2019

dkeeney commented May 2, 2019

breznak commented May 3, 2019

dkeeney commented May 4, 2019

dkeeney commented May 11, 2019

breznak commented May 12, 2019

breznak commented May 13, 2019

ctrl-z-9000-times commented May 13, 2019

breznak commented Jun 2, 2020

Deterministic Algorithms #194

Deterministic Algorithms #194

Comments

ctrl-z-9000-times commented Jan 4, 2019 • edited by breznak Loading

ctrl-z-9000-times commented Jan 4, 2019

ctrl-z-9000-times commented Jan 9, 2019

dkeeney commented Jan 9, 2019

breznak commented Feb 27, 2019

breznak commented Apr 23, 2019

breznak commented Apr 24, 2019

dkeeney commented Apr 27, 2019

dkeeney commented May 1, 2019

============================================= Windows: Last 100 outputs:

============================================= Linux: Last 200 outputs:

breznak commented May 1, 2019

dkeeney commented May 1, 2019

dkeeney commented May 2, 2019

breznak commented May 3, 2019

dkeeney commented May 4, 2019

dkeeney commented May 11, 2019

breznak commented May 12, 2019

breznak commented May 13, 2019

ctrl-z-9000-times commented May 13, 2019

breznak commented Jun 2, 2020

ctrl-z-9000-times commented Jan 4, 2019 •

edited by breznak

Loading

=============================================
Windows: Last 100 outputs:

=============================================
Linux: Last 200 outputs: