Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Comparison to nupic #792

Open
psteinroe opened this issue Apr 13, 2020 · 70 comments
Open

Performance Comparison to nupic #792

psteinroe opened this issue Apr 13, 2020 · 70 comments

Comments

@psteinroe
Copy link

Hi everyone,

thanks for the great work on htm.core! I am doing some research on leveraging HTM for anomaly detection and I am wondering wether I should use htm.core or nupic. Is there any comparison in terms of performance?

@breznak You described some issues in PR #15. What would you say - continue with htm.core or rather get nupic running? Its just a masters thesis so I won't be able to dive deep into htm to improve the implementation...

@dkeeney
Copy link

dkeeney commented Apr 13, 2020

htm.core is a rework of NuPic so that it will work. I think you will find that nearly everything you want to do with NuPic should be covered by htm.core. NuPic on the otherhand is basically broken.

@breznak will have to address performance of htm.core but I expect it to be comparable (or perhaps even better) than NuPic.

@breznak
Copy link
Member

breznak commented Apr 13, 2020

@steinroe glad for your interest!

I am wondering wether I should use htm.core or nupic

In terms of feature-fullness and active support, htm.core has now much surpased its parent, numenta's nupic(.core).

Is there any comparison in terms of performance? [in anomaly detection]

yes, this is an open problem. Theoretically htm.core should be better/or same/or slightly worse to Numenta's nupic. We did a couple of improvements as well as a couple of "regressions" in the name of biological plausibility. You could start by looking at the Changelog.md, the git diff is already too wild.

Yet, the so-far performance on NAB of htm.core is much worse. (Note, such regression is not observed on "sine", or "hotgym" data.). My gut feeling is it's just "some parameter that is off".

I won't be able to dive deep into htm to improve the implementation...

you would'nt have to dive deep, or we'd be here to help you.. So my recommendation would be: Give htm.core a trial period (say 1-5 weeks) and try to find where the cuplit is. Doing so would help the community and would be a significant result for your thesis. I could help you in trialing down the process in NAB so we can locate the err.

@psteinroe
Copy link
Author

@breznak and @dkeeney thanks for the quick replies!
I started by analysing the difference in the API and the respective parameters.

Regarding the Spatial Pooler, htm.core differentiates from nupic by three parameters: localAreaDensity, potentialRadius and stimulusThreshold. I tried some Bayesian Optimization to get an idea of what setting for these 3 might work well but only got a score of around 31 with {"localAreaDensity": 0.02149599864266627, "potentialRadius": 4148629.7213459704, "stimulusThreshold": 3.640494932522697}. All other settings were the same as in nupic. Do you have a gut feeling on what ranges would make sense? From the logs I would say localAreaDensity should be <0.09 and stimulusThreshold <10, but still the score is very bad. Any idea on that?

The second thing I did was a direct comparison of the anomaly scores yielded by nupic and by htm.core on the same datasets. You can find the output in the attached pdf. There are two things that aren’t right:

  • (1) It seems like the date encoding is not right or not used at all. The output shows no peak in e.g. the no jump dataset.
  • (2) The Scalar Encoding seems off, as e.g. in jumpsup a significant rise is not reflected at all. This could also be due to some parameters of the models I guess.

What would you say, is it rather because of the params of the encoders or params of the algorithm itself? Also, there are differences in the parameters of the RSDE encoder. Only the resolution parameter is also used in nupic. Or did I miss something here?

htm_impl_comparison.pdf

@breznak
Copy link
Member

breznak commented Apr 15, 2020

Very nice analysis @steinroe !! 👍
Let me answer your ideas by parts.

started by analysing the difference in the API

first, let me correct my mistake, the file I wanted to point you to is API_CHANGELOG.md

About the API (and implementation differences):
From top of my head...

  • our RDSE has slightly different representation (uses MurMur hash)
  • our DateTimeEncoder fixes some bugs in the original
    • there are pure python implementations of the encoders which are direct ports of the originals (just ported py2 to py3).
  • SP has the removed params as you discovered (there is a number of bugfixes and improvements) but the point is we're still passing the original test-suite (with known modifications), so the computations 99% are valid.
  • TM: ditto. We removed BacktrackingTM (which still has the best NAB scores), so results should be (atleast) comparable with numentaTM detector.
  • anomaly:
    • our TM now provides (raw) anomaly transparently.
    • Numenta's NAB detector uses AnomalyLikelihood (I think you can change it to raw). I'm quite sure our c++ AnomalyLikelihood is broken (well, untested at best), there's also the ./py/ Likelihood, which should be also just py3 port of the Nupic's.
    • NumentaDetector "cheats" by using/preprocessing "spatial anomaly". You can turn that off too. HtmcoreDetector does not use that.

tried some Bayesian Optimization to get an idea of what setting for these 3 might work well

cool. If you don't have own framework for optimization, I suggest looking at ./py/htm/optimization/

Do you have a gut feeling on what ranges would make sense?

This is tricky, but can be computed rather precisely. I'll have to look at this deeper again...
You can see my trials at community/NAB on this very same problem:
htm-community/NAB#15

stimulusThreshold This is a number specifying the minimum
number of synapses that must be active in order for a column to
turn ON. The purpose of this is to prevent noisy input from
activating columns.

So this depends on expected avg number of ON bits from encoder, number of synapses and range of input field each dendrite covers, how noisy is the problem.
Anything >=1 is imho a good starting point.

potentialRadius This parameter deteremines the extent of the
input that each column can potentially be connected to. This
can be thought of as the input bits that are visible to each
column, or a 'receptive field' of the field of vision. A large
enough value will result in global coverage, meaning
that each column can potentially be connected to every input
bit. This parameter defines a square (or hyper square) area: a
column will have a max square potential pool with sides of
length (2 * potentialRadius + 1), rounded to fit into each dimension.

depends on size of the encoder/encoding. And whether you want the columns to act as local/global approximators.

= size aka Inf aka "act global".

"potentialRadius": 4148629.7213459704

Seems your optimizer prefered that variant. (note, it might thus also just got stuck in a local optima). I'd try something as "global" and "25% of input field" as reasonable defaults.

localAreaDensity The desired density of active columns within
a local inhibition area (the size of which is set by the
internally calculated inhibitionRadius, which is in turn
determined from the average size of the connected potential
pools of all columns). The inhibition logic will insure that at
most N columns remain ON within a local inhibition area, where
N = localAreaDensity * (total number of columns in inhibition
area)
Default: 0.05 (5%)

This is lower-bounded by SP's size aka numColumns. (density * numCols >= some meaningful min value).
Its value determines TM's function and TM's variant of "stimulusThreshold" (it's called something different - activationThreshold and minThreshold).
2-15% seems a reasonable value to me.

The second thing I did was a direct comparison of the anomaly scores yielded by nupic and by htm.core on the same datasets.

great graphs!, some notes on that later.

This leads me to a decomposition. The chain is:
Encoder -> SP -> TM -> Anomaly
Ideally, we'd start from the top to verify all our implementations.

  • store Numenta TM's outputs, use our anomaly and compare, ...
  1. It seems like the date encoding is not right or not used at all. The output shows no peak in e.g. the no jump dataset.

actually, there's a small drop at that time.
All the graphs are on the same params? Looks that only the "flatmiddle" dataset htmcore results can detect anything meaningful. So the decision value of the others is questionable.
But it could be in settings of the datetime encoder, ratio of datetime / RDSE size, insensitivity of sp/tm.

(2) The Scalar Encoding seems off, as e.g. in jumpsup a significant rise is not reflected at all. This could also be due to some parameters of the models I guess.

The RDSE encoder is rather well tested, so I wouldn't expect a hidden bug there. Maybe unusable default params. It seems to me the the HTM didn't learn at all in most cases (except the "flatmiddle")

 What would you say, is it rather because of the params of the encoders or params of the algorithm itself? 

I think it'd be in the params. But cannot tell whether enc/sp/tm. It's all tied together.

Great job investigating so far. I'll be looking at it tonight as well. Please let us know if you find something else or if we can help explain something.
Cheers,

@breznak
Copy link
Member

breznak commented Apr 16, 2020

@steinroe
I've added a "dataset" with only the artificial data, the results look rather good.

Update:
When running on only the artificial/synthetic labels, our results are quite good:
python run.py -d htmcore --detect --score --optimize --normalize --windowsFile labels/synthetic.json -n 8

htmcore detector benchmark scores written to /mnt/store/devel/HTM/NAB/results/htmcore/htmcore_reward_low_FN_rate_scores.csv

Running score normalization step
Final score for 'htmcore' detector on 'standard' profile = 84.84
Final score for 'htmcore' detector on 'reward_low_FP_rate' profile = 84.36
Final score for 'htmcore' detector on 'reward_low_FN_rate' profile = 88.37
Final scores have been written to /mnt/store/devel/HTM/NAB/results/final_results.json.

PS: also I'd suggest using the following branch, it has some nice prints and comments to it.
htm-community/NAB#15

@breznak
Copy link
Member

breznak commented Apr 16, 2020

CC @Zbysekz as the author of HTMpandaVis, do you think you could help us debugging the issue? I'd really appreciate that!

TL;DR: minor parameters and changes were surely done to htm.core. Compared to Numenta's Nupic, our results on NAB really suck now. I'm guessing it should be matter of incorrect params.

There look into the representations with the visualizer would be really helpful.
More info here (and linked posts)
htm-community/NAB#15

@breznak
Copy link
Member

breznak commented Apr 16, 2020

Btw, I've made fixes to Jupyter plotter, see NAB/scripts/README.
These are some figures:
HTMcore:
htmcore_nab

For result file : ../results/htmcore/artificialWithAnomaly/htmcore_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 403
True Negative (Detected non anomalies) : 0
False Positive (False alarms) : 2679
False Negative (Anomaly not detected) : 0
Total data points : 3428
S(t)_standard score : -90.17164198169499

Numenta:
numenta_nab

For result file : ../results/numenta/artificialWithAnomaly/numenta_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 1
True Negative (Detected non anomalies) : 4027
False Positive (False alarms) : 0
False Negative (Anomaly not detected) : 0
Total data points : 4032
S(t)_standard score : 0.4999963147227

EDIT: fixed uploaded imgs

Note: both images look very similar. (I don't know why numenta scores look that bad here,must be some bug)

@psteinroe
Copy link
Author

@breznak Thanks for the detailed review!

You can turn that off too. HtmcoreDetector does not use that.

Sorry, I forgot to mention that I turned that off for both detectors - HTMCore and Nupic.

cool. If you don't have own framework for optimization, I suggest looking at ./py/htm/optimization/

I saw that right after I was done with my optimization. Going to try yours too very soon!

Thanks for the information on the parameters. I guess from your explanation this is very likely some parameters off. I will look into the default parameters of both to get complete comparison of what could be different.

I setup a repo with the stuff I used to plot both detectors against each other, as I didn't want to bother with all the NAB stuff for that. For the HTMCore detector I actually used yours from htm-community/NAB#15. For nupic, I setup a little server with a (very very) simple API to use the original nupic detector. I just removed the base class so I put the min/max stuff directly into the detector and removed the spatial anomaly detector in both.
Feel free to play around: https://github.com/steinroe/htm.core-vs-nupic

When running on only the artificial/synthetic labels, our results are quite good:

That appears weird to me... I am going to look into the outputs of the scores in detail, thanks for that!!!!

There look into the representations with the visualizer would be really helpful.

True! I guess we could be able to find the differences in the param settings better.

@breznak
Copy link
Member

breznak commented Apr 16, 2020

When running on only the artificial/synthetic labels, our results are quite good:

That appears weird to me... I am going to look into the outputs of the scores in detail, thanks for that!!!!

I've confirmed that on the "nojump" data, the err still persists. HTMcore does not detect anything, numenta does have a peak.

This could be 2 things:

  • low sensitivity (in params somewhere in encoder, SP, TM)
  • or too quickly adapting TM (learns the flat curve too fast, so it does re-learn in the nojump region).

@breznak
Copy link
Member

breznak commented Apr 16, 2020

For nupic, I setup a little server with a (very very) simple API to use the original nupic detector.

This looks good! I might be interested for that for community/NAB to provide (old) numenta detectors. Numenta/NAB switched to docker for the old py2 support, and this seems a good way to interface that!

I'll try your repo, thanks 👍

@breznak
Copy link
Member

breznak commented Apr 16, 2020

I don't understand the plotters/summary code:

if Error is None:
    TP,TN,FP,FN = 0,0,0,0
    print(standard_score)
    for x in standard_score:
        if x > 0:
            TP +=1
        elif x == 0:
            TN +=1
        elif x == -0.11:
            FP +=1
        elif x == -1:
            FN +=1
    print("For result file : " + result_file)
    print("True Positive (Detected anomalies) : " + str(TP))
    print("True Negative (Detected non anomalies) : " + str(TN))
    print("False Positive (False alarms) : " + str(FP))
    print("False Negative (Anomaly not detected) : " + str(FN))
    print("Total data points : " + str(total_Count))
    print(detector_profile+" score : "+str(np.sum(standard_score)))
else:
    print(Error)
    print("Run from beginng to clear Error")

which gives

[-0.11       -0.11       -0.11       ... -0.10999362 -0.1099937
 -0.10999377]
For result file : ../results/htmcore/artificialWithAnomaly/htmcore_art_daily_nojump.csv
True Positive (Detected anomalies) : 403
True Negative (Detected non anomalies) : 0
False Positive (False alarms) : 2787
False Negative (Anomaly not detected) : 0
Total data points : 3428
S(t)_standard score : -90.17200961167651

But according to the img and raw_anomaly, there should be only a few FP!
htm_nab_nojump

Bottom line, is our community/NAB (scorer) correct?
We could just copy htmcore results into numenta/NAB and have them re-scored.

@psteinroe
Copy link
Author

Sorry for the late reply. I am currently in progress of doing an in-depth comparison between the parameters and there are definitely some differences. You can find the table in the Readme here:
https://github.com/steinroe/htm.core-vs-nupic

While most differences could be easily resolved, I need your input on the params of the RDSE Encoder. While HTMCore does have size and sparsity, nupic has w, n and offset set with default parameters. The descriptions seem similar, however I am not sure if e.g. size and w (which is probably short for width) mean the same. Do you have an idea here @breznak ?

This is the relevant section of the table, sorry for its size. The value columns show the values which are set by the respective detectors in NAB. If the cell is empty, the default value is used.

HTMCore Nupic CPP
Attribute Description Value Default Attribute Description Value Default
size Member "size" is the total number of bits in the encoded output SDR. 400 0
sparsity Member "sparsity" is the fraction of bits in the encoded output which this encoder will activate. This is an alternative way to specify the member "activeBits". 0.1 0
resolution Member "resolution" Two inputs separated by greater than, or equal to the resolution are guaranteed to have different representations. 0.9 0 resolution A floating point positive number denoting the resolution of the output representation. Numbers within [offset-resolution/2, offset+resolution/2] will fall into the same bucket and thus have an identical representation. Adjacent buckets will differ in one bit. resolution is a required parameter. max(0.001, (maxVal - minVal) / numBuckets) -
activeBits Member "activeBits" is the number of true bits in the encoded output SDR.
radius Member "radius" Two inputs separated by more than the radius have non-overlapping representations. Two inputs separated by less than the radius will in general overlap in at least some of their bits. You can think of this as the radius of the input.
Category Member "category" means that the inputs are enumerated categories. If true then this encoder will only encode unsigned integers, and all inputs will have unique / non-overlapping representations. FALSE
numBuckets 130
seed Member "seed" forces different encoders to produce different outputs, even if the inputs and all other parameters are the same. Two encoders with the same seed, parameters, and input will produce identical outputs. The seed 0 is special. Seed 0 is replaced with a random number. 0 (random) seed 42 42
w Number of bits to set in output. w must be odd to avoid centering problems. w must be large enough that spatial pooler columns will have a sufficiently large overlap to avoid false matches. A value of w=21 is typical. 21
n Number of bits in the representation (must be > w). n must be large enough such that there is enough room to select new representations as the range grows. With w=21 a value of n=400 is typical. The class enforces n > 6*w. 400
name None
offset A floating point offset used to map scalar inputs to bucket indices. The middle bucket will correspond to numbers in the range [offset - resolution/2, offset + resolution/2). If set to None, the very first input that is encoded will be used to determine the offset. None
verbosity 0

@psteinroe
Copy link
Author

We could just copy htmcore results into numenta/NAB and have them re-scored.

Nice analysis!! Sounds like a good idea to try that out. I will do that after I am done with the parameter comparison.

@breznak
Copy link
Member

breznak commented Apr 17, 2020

That's a good idea to write such a comparison of params/API. I'd like the result to be published as a part of the repo here 👍

[RDSE Encoder] While HTMCore does have size and sparsity, nupic has w, n and offset

I can help on those:

activeBits = w
size = n 

I'm not 100% sure about offset without looking, but it'd be used in resultion imho.

@breznak
Copy link
Member

breznak commented Apr 17, 2020

RDSE:

  • I think we could get the numenta's encoder running in py if needed. Either in history of this repo, or in the community/nupic.py repo with py3 port. Let me know if you think it'd help you and I can try to dig one up.
  • in the numenta detector, you see a "cheat" I've complained about: RDSE should encode arbitrary range of numbers. Unlike ScalarEncoder which is simpler and is for limited ranges. Numenta is slightly biased in NAB and compute global min/max of the dataset. This info is used to parametrize the encoder. I'm not sure if htmcore's RDSE has ability to construct from the known range (?). If we could, it'd be good to compare (ideally that eventually should not be used, but we're pin-pointing now). Or try with Scalar enc.

@psteinroe
Copy link
Author

I can help on those:

Thanks!

Let me know if you think it'd help you and I can try to dig one up.

Let me try with the new parameter settings first. If that does not help, that might be another way to check wether its the encoder.

in the numenta detector, you see a "cheat" I've complained about:

Yes I saw that and I am calculating the resolution for htmcore encoder the same way to have a fair comparison.

The second parameter that is new for htm.core is the localAreaDensity. Nupic also has that one but uses another param numActiveColumnsPerInhArea instead to control the density of the active columns:

When using this method, as columns learn and grow their effective receptive fields, the inhibitionRadius will grow, and hence the net density of the active columns will decrease. This is in contrast to the localAreaDensity method, which keeps the density of active columns the same regardless of the size of their receptive fields.

Why was this removed from htm.core?

@breznak
Copy link
Member

breznak commented Apr 17, 2020

We could just copy htmcore results into numenta/NAB and have them re-scored.

Nice analysis!! Sounds like a good idea to try that out. I will do that after I am done with the parameter comparison.

I got the htmcore detector running with numenta/NAB.

  • results are slightly different
  • but still bad/ does not solve the issue.

...

@psteinroe
Copy link
Author

That's a good idea to write such a comparison of params/API. I'd like the result to be published as a part of the repo here 👍

I will create a PR once I am done :) Is the table format readable or should I rather make it textual?

@breznak
Copy link
Member

breznak commented Apr 17, 2020

Is the table format readable or should I rather make it textual?

the table is good! Might decide to drop the unimportant ones (verbosity, name) for clarity. but that's just a detail.

@breznak
Copy link
Member

breznak commented Apr 17, 2020

The second parameter that is new for htm.core is the localAreaDensity. Nupic also has that one but uses another param numActiveColumnsPerInhArea instead to control the density of the active

yes, I proposed the removal. The reasons were nice, but not crucial, and now I'm suspecting this could be a lead..

#549

The motivation for localAreaDensity is HTM's presumption that layers produce output (SDR) with a relatively const sparsity. (+ just code cleanup).

One the other hand, "a 'strong' column's receptive field grows" is also a good biological concept.

  • could we (temp) hack-in the logic for numActiveColumnsPerInhArea ?
    • if not, I'd again get it back with some more work.

This would be a significant result, if one can be proven "better" (dominating) over the other.

@psteinroe
Copy link
Author

This would be a significant result, if one can be proven "better" (dominating) over the other.

As the original sp also has localAreaDensity I would propose to first try out the original nupic detector with localAreaDensity instead of numActiveColumnsPerInhArea to see wether it has such an impact. I guess that would be faster than bringing it back.

@psteinroe
Copy link
Author

Good News! Using the Numenta parameters with localAreaDensity of 0.1 I achieve a score of 49.9 on NAB. At least some improvement. Going to try out the swarm algorithm to optimise localAreaDensity now.

  "htmcore": {
       "reward_low_FN_rate": 54.3433173159184,
       "reward_low_FP_rate": 42.32359518695501,
       "standard": 49.96056087185252
   },

These are the params:

params_numenta_comparable = {
  "enc": {
    "value": {
        #"resolution": 0.9, calculate by max(0.001, (maxVal - minVal) / numBuckets) where numBuckets = 130
        "size": 400,
        "activeBits": 21
      },
    "time": {
        "timeOfDay": (21, 9.49),
      }
  },
  "sp": {
    # inputDimensions: use width of encoding
    "columnDimensions": 2048,
    # "potentialRadius": 999999, use width of encoding
    "potentialPct": 0.8,
    "globalInhibition": True,
    "localAreaDensity": 0.1,  # optimize this one
    "stimulusThreshold": 0,
    "synPermInactiveDec": 0.0005,
    "synPermActiveInc": 0.003,
    "synPermConnected": 0.2,
    "boostStrength": 0.0,
    "wrapAround": True,
    "minPctOverlapDutyCycle": 0.001,
    "dutyCyclePeriod": 1000,
  },
  "tm": {
    "columnDimensions": 2048,
    "cellsPerColumn": 32,
    "activationThreshold": 20,
    "initialPermanence": 0.24,
    "connectedPermanence": 0.5,
    "minThreshold": 13,
    "maxNewSynapseCount": 31,
    "permanenceIncrement": 0.04,
    "permanenceDecrement": 0.008,
    "predictedSegmentDecrement": 0.001,
    "maxSegmentsPerCell": 128,
    "maxSynapsesPerSegment": 128,
  },
  "anomaly": {
    "likelihood": {
      "probationaryPct": 0.1,
      "reestimationPeriod": 100
    }
  }
}

@psteinroe
Copy link
Author

another thing that I wondered about:

In the detecor code there is a fixed param 999999999 defined when setting the infos for tm and sp. Shouldn't this be encodingWidth? Or is it the potentialRadius?

self.tm_info = Metrics([self.tm.numberOfCells()], 999999999)

@breznak
Copy link
Member

breznak commented Apr 17, 2020

param 999999999 defined when setting the infos for tm and sp. Shouldn't this be encodingWidth? Or is it the potentialRadius?

no, this is unimportant. the metric is only used of our info, does not affect the computation.
It's not related to "width"/number of bits, but rather a time/steps used of EMA in the metric

@psteinroe
Copy link
Author

As the original sp also has localAreaDensity I would propose to first try out the original nupic detector with localAreaDensity instead of numActiveColumnsPerInhArea to see wether it has such an impact. I guess that would be faster than bringing it back.

Alright, it seems to have a significant impact. Running the Numenta detectors with localAreaDensity of 0.1 instead of the numActiveColumnsPerInhArea results in basically the same results as with the htmcore detector.

The only question remaining is now if tuning localAreaDensity increases the score or if numActiveColumnsPerInhArea is superior in general. What do you propose how to proceed @breznak ?

Here is the code: https://github.com/steinroe/NAB/tree/test_numenta_localAreaDensity

  "numenta": {
       "reward_low_FN_rate": 52.56449422487971,
       "reward_low_FP_rate": 49.94586314087259,
       "standard": 50.82949995800923
   },
   "numentaTM": {
       "reward_low_FN_rate": 52.56449422487971,
       "reward_low_FP_rate": 49.94586314087259,
       "standard": 50.82949995800923
   },

@psteinroe
Copy link
Author

psteinroe commented Apr 19, 2020

Going to try out the swarm algorithm to optimise localAreaDensity now.

Used Bayesian optimization, but nevertheless these are the results:

    "htmcore": {
        "reward_low_FN_rate": 60.852121191220256,
        "reward_low_FP_rate": 45.428862226866734,
        "standard": 55.50231971786488
    },

for the following params:

parameters_numenta_comparable = {
        "enc": {
            "value": {
                # "resolution": 0.9, calculate by max(0.001, (maxVal - minVal) / numBuckets) where numBuckets = 130
                "size": 400,
                "activeBits": 21,
                "seed": 5,  # ignored for the final run
            },
            "time": {
                "timeOfDay": (21, 9.49),
            }
        },
        "sp": {
            # inputDimensions: use width of encoding
            "columnDimensions": 2048,
            # "potentialRadius": use width of encoding
            "potentialPct": 0.8,
            "globalInhibition": True,
            "localAreaDensity": 0.025049634479368352,  # optimize this one
            "stimulusThreshold": 0,
            "synPermInactiveDec": 0.0005,
            "synPermActiveInc": 0.003,
            "synPermConnected": 0.2,
            "boostStrength": 0.0,
            "wrapAround": True,
            "minPctOverlapDutyCycle": 0.001,
            "dutyCyclePeriod": 1000,
            "seed": 5, # ignored for the final run
        },
        "tm": {
            "columnDimensions": 2048,
            "cellsPerColumn": 32,
            "activationThreshold": 20,
            "initialPermanence": 0.24,
            "connectedPermanence": 0.5,
            "minThreshold": 13,
            "maxNewSynapseCount": 31,
            "permanenceIncrement": 0.04,
            "permanenceDecrement": 0.008,
            "predictedSegmentDecrement": 0.001,
            "maxSegmentsPerCell": 128,
            "maxSynapsesPerSegment": 128,
            "seed": 5,  # ignored for the final run
        },
        "anomaly": {
            "likelihood": {
                "probationaryPct": 0.1,
                "reestimationPeriod": 100
            }
        }
    }

I created a PR htm-community/NAB/pull/25 for the updated params.

These are the logs for seed fixed to 5, where I achieved a standard score of 60 as maximum.
optimization_logs

@breznak What would you suggest how to proceed from here?

@breznak
Copy link
Member

breznak commented Apr 19, 2020

Used Bayesian optimization, but nevertheless these are the results:

    "htmcore": {
        "reward_low_FN_rate": 60.852121191220256,
        "reward_low_FP_rate": 45.428862226866734,
        "standard": 55.50231971786488
    },

Wow, these are very nice results! I'm going to merge NAB.

"localAreaDensity": 0.025049634479368352, # optimize this one

Interestingly, this is what's claimed by the HTM theory (2%) as observed in the cortex.

What would you suggest how to proceed from here?

compared to the Numenta results:

  • this means we "beat" Numenta under the same conditions now, right?
  • we have worse FP ratio (too many detections), should work on that.
  • your Bayesian opt is multi-parametric? or just considering the select (localArea) param?

I'd suggest:

  • you try to tune the current score wrt the other params (there's lots of local optima and parameters are interleaved)
  • I'll revert and reintroduce the numActiveColsPerInhArea param (alternative to localArea)
    • ideally we achieve similar scores with localArea, but performance is paramount
    • if not, it's proof that the numActiveCols... is a crucial functionality
  • we should get NAB operational with our optimization framework for multi-param opt. So we can brute-force the parameter space.

@breznak
Copy link
Member

breznak commented Apr 19, 2020

I think we should review if NAB is conceptually correct (as the metric and methodology)!

See our current results (I made a tiny change to your recently updated params in NAB/fixing_anomaly branch)
htmcore_nab_good

That is almost perfect!
Yet in the plot:

For result file : ../results/htmcore/artificialWithAnomaly/htmcore_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 403
True Negative (Detected non anomalies) : 0
False Positive (False alarms) : 2679
False Negative (Anomaly not detected) : 0
Total data points : 3428
S(t)_standard score : -90.17164198169499

  • either just the Plot summary FP/FN is wrong ?
  • or even the scorer in NAB? (but the score is quite good, ~90% in the "synthetic")
  • still the NAB windows (where anomaly is expected) are incorrect, (atleast/even for) the Artificial anomalies. See below (professionally drawn )

nab_methodology_window

@breznak
Copy link
Member

breznak commented Apr 19, 2020

Q: do we want to keep comparable params & scores. Or just aim for the best score? Or both, separately?

@psteinroe
Copy link
Author

this means we "beat" Numenta under the same conditions now, right?

Yes, but they still win with numActiveColumnsPerInhArea set.

your Bayesian opt is multi-parametric? or just considering the select (localArea) param?

For this test I just optimised the localAreaDensity keeping the others constant but it can be multi parametric.

you try to tune the current score wrt the other params (there's lots of local optima and parameters are interleaved)

Alright, I will work on that. Do you have a feeling about which params may be important / optimizable?

I'll revert and reintroduce the numActiveColsPerInhArea param (alternative to localArea)

Perfect, thanks for your work!

we should get NAB operational with our optimization framework for multi-param opt. So we can brute-force the parameter space.

The "problem" is that your framework calls the optimization function in parallel, so my current setup with bayesian optimisation where I simply write to and read from a params.json file won't work. I will think about how to set that up that and come back to you as soon as I have a solution that works.

either just the Plot summary FP/FN is wrong ?

My gut says this may be the problem, as the scoring results seems fine. We should debug this.

Q: do we want to keep comparable params & scores. Or just aim for the best score? Or both, separately?

I would say we just aim for the best score, as even the bug fixes in this fork may influence the best param setting. I don't know if its useful to keep a second set of params as a comparison between the two would be only fair if both use the best possible params.

@breznak
Copy link
Member

breznak commented Apr 20, 2020

I like this, this is a really nice trick to get it done with NAB.

The optimization framework runs multiple scorings in parallel, so we would have conflicting results when running on our host system. With docker, we are able to prevent that.

good, bcs I thought it didn't run in parallel. We could do something like writing to /tmp/$PID/results.json but this already works good enough, so 👍

Sorry, I forgot to add that the PSO script above runs on your framework, so you run python -m htm.optimization.ae `

great. I was thinking you're developing from scatch. It's better that you could be using our existing tools!

Sure. With Bayesian Optimization you need to set bounds for each parameter, but you can use as many as you want, e.g. pbounds = {'x': (2, 4), 'y': (-3, 3)} to optimise x and y

even better! this looks really convenient.

If we can decide on useful bounds, I can setup a script for optimising all / a larger subset of the params

I'll try working on constraining the params, and along with that we can come up with a subset and its ranges.
For some params, we'd need some control over the granuity etc, eg. bounds for numColumns would be something as [1024, 8192], but we're not interested in 1023 etc, rather than stride by 1024. But we can hack such cases in the detector manually, as

if x not in {1024, 2048, 4096, ...}:
  raise 

I did a minor change to the htmcore detector (same can be done for any detector) where I read the params from a json file and set the bool params to True (as e.g. the swarm algorithm cannot deal with bool

  • could you make a PR with this to our NAB? Ie a toggle useOptimization and if set, it reads the params from the file. To eliminate one more step for using your idea
  • I'll get to testing this workflow/guide a bit later. Do you think it could be further automated? say a script, or a docker (with docker)?

@psteinroe
Copy link
Author

psteinroe commented Apr 21, 2020

good, bcs I thought it didn't run in parallel. We could do something like writing to /tmp/$PID/results.json but this already works good enough, so

We would probably still get interferences when the detector reads the params file. I think locking the parallel runs of NAB into a container is the cleanest and safest method. The only downside is probably performance, as we have to give docker host a fair share of our host resources.

For some params, we'd need some control over the granuity etc, eg. bounds for numColumns would be something as [1024, 8192], but we're not interested in 1023 etc, rather than stride by 1024. But we can hack such cases in the detector manually, as

Bayesian Optimization does that automatically by randomly choosing some parameters in the beginning and from time to time. Additionally, we can probe some settings (lets say 1024, 2048, ... 8192) so the algorithm knows of the effect these settings have. Its quite good in avoiding local maxima out of the box (at least to my experience)

could you make a PR with this to our NAB? Ie a toggle useOptimization and if set, it reads the params from the file. To eliminate one more step for using your idea

Sure, will do.

I'll get to testing this workflow/guide a bit later. Do you think it could be further automated? say a script, or a docker (with docker)?

I think we could put the image on docker hub so the user does not have to build the image. With a minor change to the script we could pull the image from docker hub. Then, The workflow would be as follows:

For using the htm.core optimization framework

Requirements: Docker Desktop, htmcore, requirements of script such as docker

  • In the same folder of the script, run python -m htm.optimization.ae -n 3 --memory_limit 4 -v --swarming 100 optimize_anomaly_swarm.py

For using the pure python script with bayesian opt

Requirements: Docker Desktop, requirements of script such as docker and bayesian opt

  • Run the script.

EDIT: Where to put the scripts? Another repo? Or just as branch?

@breznak
Copy link
Member

breznak commented Apr 21, 2020

We would probably still get interferences when the detector reads the params file. I think locking the parallel runs of NAB into a container is the cleanest and safest method

ok, I agree. We need to get there with minimal manhours, so this works as intended.

Additionally, we can probe some settings (lets say 1024, 2048, ... 8192) so the algorithm knows of the effect these settings have.

interesting. nice if we can "hint" to try these datapoints.

I think we could put the image on docker hub so the user does not have to build the image. With a minor change to the script we could pull the image from docker hub.

maybe not necessarily DockerHub, but a docker file in the repo (NAB) would be great 👍

htmcore is already dependency of (our) NAB, so that's no problem.
My (optimal) idea was a script/dockerfile that the user can run (modify which params should be probed, and provide data-as in NAB) and it runs the optimization and outputs the found set of params & score.

EDIT: Where to put the scripts? Another repo? Or just as branch?

I'd make this part of community/NAB. So submit as PR/branch, and we'll merge directly to master (this enhancement is unrelated to "fixing htmcore scores")

@psteinroe
Copy link
Author

Sounds good to me! Do we want to enable optimisation on NAB in general or only for htmcore?

@breznak
Copy link
Member

breznak commented Apr 21, 2020

Do we want to enable optimisation on NAB in general or only for htmcore?

I'd say minimal changes first - we need to solve topic of this issue. So just HTMcore for now.

Because I'm still unsure how to proceed with
htm-community/NAB#21

  • numenta/NAB is now (finally) py3 compatible
  • their numenta detectors already run in Docker
  • it's "the official" repo

but

  • our code offers same or better functionality (but the diff is now quite large due to different conversion/rewrite to py3)
  • couple of nice changes and improvements done to our community/NAB
  • Numenta has some bad rep for being slow implementing/wanting changes in the NAB repo. So our own gives us free reign.

@psteinroe
Copy link
Author

I'd say minimal changes first - we need to solve topic of this issue. So just HTMcore for now.

Alright, I will update the optimisation PR with a proposal on how to make it as user friendly as possible.

@Zbysekz
Copy link

Zbysekz commented Apr 21, 2020

CC @Zbysekz as the author of HTMpandaVis, do you think you could help us debugging the issue? I'd really appreciate that!

TL;DR: minor parameters and changes were surely done to htm.core. Compared to Numenta's Nupic, our results on NAB really suck now. I'm guessing it should be matter of incorrect params.

There look into the representations with the visualizer would be really helpful.
More info here (and linked posts)
htm-community/NAB#15

Hello, sorry for the delay.
Would it help, if i modify
HTMdetector in steinroe repo to use PandaVis and see particular steps live ?

I am not sure if that is the script that you guys are using to run it.

@breznak
Copy link
Member

breznak commented Apr 21, 2020

hi Zbysekz!

HTMdetector in steinroe repo to use PandaVis and see particular steps live ?

that'd be great step!
Actually we're merging and currently we run community/NAB. The file is in nab/detectors/htmcore/htmcore_detector.py

EDIT:

  • @steinroe is now finishing parameter optimization for NAB
  • so ability to replay and visualize and study the internal representations for certain parameter set would be useful altogether for understanding NAB.

@Zbysekz
Copy link

Zbysekz commented Apr 23, 2020

Ok in this PR is the modified htm core detector.
Don't know if this is the best setup, at the end of the script is csv loading, so here specify what from data folder you want to load.

@Zbysekz
Copy link

Zbysekz commented Apr 23, 2020

About the jupyter notebook "Plot Result - numenta.ipynb"

As you guys, i also don't get the FP/TP... score. It is something written by https://github.com/pasindubawantha
I tried to understand the Standard score, but just from the code it is not easy at all. There is "Sweeper" , which is also used by optimizer - it searches for best params by changing the threshold...

Just remove this piece of code and thats ALL. It gets the TP,TN,FB,FN from resulting file htm_core_standard_scores.csv (why to calculate that again?)

   TP,TN,FP,FN = 0,0,0,0
    print(standard_score)
    for x in standard_score:
        if x > 0:
            TP +=1
        elif x == 0:
            TN +=1
        elif x == -0.11:
            FP +=1
        elif x == -1:
            FN +=1

@breznak
Copy link
Member

breznak commented Apr 23, 2020

Just remove this piece of code and thats ALL. It gets the TP,TN,FB,FN from resulting file htm_core_standard_scores.csv (why to calculate that again?)

true that, the code seems really hacky, and as you say, the scores are there already. Off with it.
Btw, once this is more or less done, I'll want to go stricter on the NAB paper and its methodology and implementation. Some ground-truth data seem weird to me: #792 (comment)

@breznak
Copy link
Member

breznak commented Apr 23, 2020

I'm really happy how this comes together, fixes, research, optimization, visualization,... Thanks a lot guys!! 🍾

@Zbysekz
Copy link

Zbysekz commented Apr 23, 2020

For the third point of #792 (comment)

still the NAB windows (where anomaly is expected) are incorrect, (atleast/even for) the Artificial anomalies. See below (professionally drawn )

I agree, the expected anomaly window (aka label) have wrong values.
We can change this simply here NAB/labels/synthetic.json; each data serie has its pair "start" and "end" of this window. It's just values given by user if i am not wrong.

EDIT: also i am little bit afraid, if this whole evaluation/optimalization calculates also with the anomaly scores at the begining.. it seems that yes. IMHO evaluating, that HTM system gives me anomaly on data that never saw, is fundamentally wrong. Is there some learning period at the beggining? Do you guys know something about this?

@psteinroe
Copy link
Author

Is there some learning period at the beggining? Do you guys know something about this?

NAB has some probationary period which is used by the anomaly likelihood. It basically returns a score of 0.5 as long as this probationary period is over. Is that what you mean?

@breznak
Copy link
Member

breznak commented Apr 24, 2020

I agree, the expected anomaly window (aka label) have wrong values.
We can change this simply here NAB/labels/synthetic.json; each data serie has its pair "start" and "end" of this window. It's just values given by user if i am not wrong.

We'll have to do it manually. First, we should see if it's a concern for other datasets (realWorld anomalies etc) and/or if we can distinguish a "proper window".
Next,

  • do we reannotate it just by hand?
  • or develop/find some annotation tool (to be able to drag the "anomaly window" borders)

But guess by hand is feasible.

@breznak
Copy link
Member

breznak commented Apr 24, 2020

Is there some learning period at the beggining?

as Phillip says, anomaly is set to some predefined value (0.5) for the probatoryPeriod (10% most cases).
The thing is if the Scorer (Sweeper, whatever) takes that into account and

  • that period is basically ignored from the anomaly score computation?
  • atleast there mustn't be any anomaly windows there.

But to be fair, this might not be such an issue:

  • it's fair to all detectors.
  • surely you cannot detect contextual pattern in a new/unknown sequence,
  • but it allows to penalize algos that take longer to pick up the trend

@Zbysekz
Copy link

Zbysekz commented Apr 24, 2020

The thing is if the Scorer (Sweeper, whatever) takes that into account and

* that period is basically ignored from the anomaly score computation?

* atleast there mustn't be any anomaly windows there.

The probationary percent starts in runner.py as hardcoded parameter of value 0.15.
Then its handed over to the scorer and finally to the sweeper. Sweeper ignores the data according to that.
there is:

def _getProbationaryLength(self, numRows):
    return min(
      math.floor(self.probationPercent * numRows),
      self.probationPercent * 5000
    )

Where is that weird constant 5000. Probably due to limit calculation...
But it seems that it takes it to account.. so fine.
We have two probationary percent parameters, one in runner.py and one in htmcore_detector, but i guess that is not problem.

@breznak
Copy link
Member

breznak commented Apr 24, 2020

The probationary percent starts in runner.py as hardcoded parameter of value 0.15.

thanks for pointing to the code in question.

We have two probationary percent parameters, one in runner.py and one in htmcore_detector, but i guess that is not problem.

it shouldn't be. The one in detector is basically cosmetic. It switches when we think anomaly scores start to make sense. We should set it to the 15% in NAB.

@steinroe I recall in some example you use that "probatory period" as a param to be optimized. Baisically that should be unimportant, and could be removed from (optimized) params.

Off-Topic: @Zbysekz seems you've tapped into the NAB code, I'd welcome another review/opinion on htm-community/NAB#21 (NAB from numenta, vs community)

@psteinroe
Copy link
Author

Hi everyone,

sorry for my inactivity. I had / have to work on other stuff to get going with my thesis.

What are the open tasks at the moment?

@breznak
Copy link
Member

breznak commented Apr 30, 2020

@steinroe no problem,

What are the open tasks at the moment?

  • @Zbysekz just now merged the visualization for the htmcore detector. I think this can be pretty useful for us for insight into the detector, and as well as for you for cool images for your thesis 👍
  • merged is your code for param optimization
  • still WIP is figuring/tuning the params
    • AFAIK, "with same conditions as Numenta" (=fake spatial anomaly, numActiveCols.. param,...) we gain pretty much same score.
    • we'd still need to tune our params for the "bio plausible" combination

Other relevant tasks (where any help is welcome, hoping to write papers):

  • I'm working on the "smart params" (more checked parameter choices)
  • cool idea would be considering time (execution speed) for the detectors. As it makes sense for real-time, online systems. One of my paper-ideas is include that param (along standard score) to the optimization criterion, so we can control speed-vs-accuracy tradeoff.
  • thanks to now: "NAB is competitive, good real-world measure", we can proceed with validating more concept-changing PRs, such as Synaptic competition SP, TM use synapseCompetition() #558, and Energy based starve-off Replace Boosting with cell death (energy)  #288

@breznak
Copy link
Member

breznak commented Jun 2, 2020

FYI: We now have working pipy releases, https://github.com/htm-community/htm.core/releases/tag/v2.1.15

Is there any progress on the NAB results? @steinroe

@gotham29
Copy link

gotham29 commented Oct 18, 2022

Hey @breznak, @psteinroe, @dkeeney and @Zbysekz, I hope you're doing well and thanks so much for your work on this thread!!

It has helped me validate my 'htm_streamer' module (that I showed you @dkeeney and @breznak earlier this year).
I'm now getting NAB standard score around 50 as you did @psteinroe.

I'm curious - is anyone still curious why htm.core scores lower on NAB than the other HTM implementations (htm.java and Numenta)??
I know htm.core doesn't use BacktrackingTM, but neither does NumentaTM and it still scores 64.5.

My friend and I would be extremely curious for anyone intuitions on this, and I'd gladly arrange a quick call if anyone's game.
Best regards,

Sam
sheiser1@binghamton.edu

Screen Shot 2022-10-18 at 1 47 47 PM

@Thanh-Binh
Copy link

@gotham29 you should use TM, which worked well for us over 1 year ago

@gotham29
Copy link

Hi @Thanh-Binh, thanks!!

When you say 'TM' do you mean the NumentaTM detector?
I'm trying to productionalize HTM for frictionless generic use on any data, and I figure htm.core is more robust long term - since NuPIC is in maintenance mode and only works for Python 2.
But the NumentaTM does perform clearly better on NAB, as does htm.java - so I assume there must some hidden shortcoming to htm.core right?
Do you think so? Are you still using NuPIC-based implementation now?

Thanks again for your thoughts!!

@Thanh-Binh
Copy link

Hi @gotham29
yes I think about Numenta TM or ApicalTieBreakTemporalMemory.
At that it worked well with me.
Over 1.5 years I have no experiment with HTM.core

@gotham29
Copy link

Gotcha @Thanh-Binh, I'm very glad to hear it worked well for you!
Maybe its worth it to deal with Python 2 if NumentaTM is notably better than htm.core -- though I'm still curious why!
Thanks again for you thoughts 👍

@Thanh-Binh
Copy link

@gotham29 as far as I know HTM.core should be the best now. Maybe we need to find an optimal parameter set for its TM!

@gotham29
Copy link

@Thanh-Binh yes I agree!

Though I wonder why HTM.core's TM would need any different param set than NuPIC's TM?
Assuming HTM.core's TM works the same way as Numenta's, they should get the same result using the same params on the same data, right?!
It seems like HTM.core and NuPIC must diverge in functionality somewhere, I just don't know where!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants