Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example implemented using NetworkAPI #760

Merged
merged 11 commits into from
Nov 27, 2019
Merged

Example implemented using NetworkAPI #760

merged 11 commits into from
Nov 27, 2019

Conversation

dkeeney
Copy link

@dkeeney dkeeney commented Nov 20, 2019

This is a new example program.

This seems very slow. Can someone check my parameters?

Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David, it's great to have this example covered in NetworkAPI 👍

  • I think I got the reason for "slowness"
  • bonus: would be great if this example can read from a CSV, and use real hotgym data file (see hotgym.py)

src/examples/hotgym_NetworkAPI/hotgym_napi.cpp Outdated Show resolved Hide resolved
src/examples/hotgym_NetworkAPI/hotgym_napi.cpp Outdated Show resolved Hide resolved
std::string encoder_parameters = "{size: " + std::to_string(DIM_INPUT) + ", activeBits: 4, radius: 0.5, seed: 2019 }";
std::shared_ptr<Region> region1 = net.addRegion("region1", "RDSERegion", encoder_parameters);
std::shared_ptr<Region> region2a = net.addRegion("region2a", "SPRegion", "{columnCount: " + std::to_string(COLS) + "}");
std::shared_ptr<Region> region2b = net.addRegion("region2b", "SPRegion", "{columnCount: " + std::to_string(COLS) + ", globalInhibition: true}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding parameters, these setings use SPs' etc default params, I think it'd be nice to use:

  • I think the best would be if you also add SensorRegion to show reading a CSV file, so to replicate hotgym.py (uses the real hotgym data, also then use its params)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did not even think to pattern it after hotgym.py. I assumed it was the same as HelloSPTP.cpp. I will check that out.

src/examples/hotgym_NetworkAPI/README.md Outdated Show resolved Hide resolved
@dkeeney
Copy link
Author

dkeeney commented Nov 20, 2019

It looks like I need to make some more Regions.

  • Predictor
  • AnomalyLikelihood
  • Date Encoder
  • maybe Plot (using matplotlib perhaps)

And I like the idea of setting up the parameters in a table. What if I used a Yaml table?
For the plot we can call out to Python but I cannot be sure that it will be available. But if it is, it would be nice to have a region that runs it.

There is also a problem in knowing where the .csv file exists. The program cannot assume it is in the same folder as the executable...most likely not. Have to think about that.

@breznak
Copy link
Member

breznak commented Nov 20, 2019

It looks like I need to make some more Regions.
Predictor

probably yes, also SDRClassifier.

AnomalyLikelihood

this should be already available in TM.anomaly (does the TM region expose that field?)

Date Encoder
maybe Plot (using matplotlib perhaps)

would be nice to have. Date enc might see some larger changes, but who knows when.. (there's an issue for that)

And I like the idea of setting up the parameters in a table. What if I used a Yaml table?

do you mean a YAML file with the params? My ideal goal would be that Algorithms, and Regions support additional constructors with a (yaml/json) config file, so you can change settings w/o recompiling.

For the plot we can call out to Python but I cannot be sure that it will be available. But if it is, it would be nice to have a region that runs it.

What about a py Region for plotting if python is avail, and we can sort out plotting with no py later?

There is also a problem in knowing where the .csv file exists. The program cannot assume it is in the same folder as the executable...most likely not. Have to think about that.

this should be a minor issue. Expect the file path given as arg. And default the data can be copied by cmake to the bin/ folder (?)

@dkeeney
Copy link
Author

dkeeney commented Nov 20, 2019

available in TM.anomaly (does the TM region expose that field?)

Yes, anomaly is there but how do I get AnomalyLikelihood from it? I will probably need to add some code someplace to get that.

do you mean a YAML file with the params?

yes. I could pass a Yaml File into the addRegion call, or by expanding this a little, pass one yaml file to Network and it could define the regions to add as well as the parameters to apply to them.

Expect the file path given as arg. And default the data can be copied by cmake to the bin/ folder (?)

That was my thinking. Although when the program runs it knows only the default folder and that can be any place. It does not know where the repository is or even if one exists. In linux, you use cmake to do an install step before running the programs so more than likely, if you built from sources, the executable will be in the build/Release/bin folder. At least on Windows, if you do a debug build within MSVC it does not do the install step first so it executes it from someplace down in build/scripts/?? so in this case it will not know where to find the csv file. anyway, I will think of something.

@dkeeney
Copy link
Author

dkeeney commented Nov 20, 2019

So, it looks like I need to set this PR aside until I can do the following:

  1. Predictor; port algorithm and create Region
  2. SDRClassifier; create Region
  3. Date Encoder; port algorithm and create Region
  4. AnomalyLikelihood; (modify TMRegion to provide)
  5. Parameters in yaml that can be passed to Network to configure all regions
  6. A new .py Region for plots callable from C++ and C# apps

@dkeeney
Copy link
Author

dkeeney commented Nov 20, 2019

Oh, the SDRClassifier is actually the Predictor. Very confusing.

@breznak
Copy link
Member

breznak commented Nov 21, 2019

Yes, anomaly is there but how do I get AnomalyLikelihood from it?

TM has a param anomalyMode (or sth) that sets the type of anomaly you get from the field TM.anomaly.

So, it looks like I need to set this PR aside until I can do the following

how about wrapping this up to make it HelloSPTP alternative, as originaly intended, merge, and then extend to the hotgym.py functionality?

steps 1,4 are imho incorrect, 5,6 optional (would suit a separate PR each).

Oh, the SDRClassifier is actually the Predictor. Very confusing.

it's in the same file, if you split it into standalone, it'd be fine.
Predictor uses Classifiers and builds atop them.

@dkeeney
Copy link
Author

dkeeney commented Nov 21, 2019

It appears that the HelloSPTP.cpp example has nothing to do with hotgym so the hotgym.cpp is a bad name. Perhaps it should be called sine_wave or something.

So you think I should also create a HelloSPTP version using NetworkAPI in addition to the real hotgym. I guess I could.

@breznak
Copy link
Member

breznak commented Nov 21, 2019

It appears that the HelloSPTP.cpp example has nothing to do with hotgym

well, it does. It uses the same processing pipeline: SP-TM-Anomaly/prediction. ATM it's just not yet as complete as the py version.
The c++ is also used to benchmark our performance in a "typical workload".

So you think I should also create a HelloSPTP version using NetworkAPI in addition to the real hotgym

no, I was suggesting to merge this as-is (=similar to HelloSPTP) for now, and then build other NetworkAPI additions on it, to get closer to the hotgym.py functionality

@dkeeney
Copy link
Author

dkeeney commented Nov 21, 2019

no, I was suggesting to merge this as-is (=similar to HelloSPTP) for now, and then build other NetworkAPI additions on it, to get closer to the hotgym.py functionality

They are so different, I think it better to have two separate examples.
I will merge the one I have with some adjustments.

@dkeeney dkeeney changed the title hotgym implemented using NetworkAPI Example implemented using NetworkAPI Nov 21, 2019
@dkeeney
Copy link
Author

dkeeney commented Nov 21, 2019

The anomaly always comes out as 1.
It runs so slow that I had to cut back on the iterations and length of SDR to make it a reasonable demo. But this probably does not iterate enough times to learn anything.

I still have the SP (local) in the program. Since that is broken, let me try taking that out and see how much faster it runs.

@dkeeney
Copy link
Author

dkeeney commented Nov 21, 2019

yes, it runs much better without SP (local). SP (global) takes about 10 seconds to initialize (Windows, debug mode) but I guess that is ok.
I am going to leave it with SP(local) commented out rather than removing it because I assume that it will eventually be fixed.

@dkeeney dkeeney added ready and removed on_hold labels Nov 21, 2019
@breznak
Copy link
Member

breznak commented Nov 22, 2019

I still have the SP (local) in the program. Since that is broken,

SP with local inhibition is not broken, it's just slow. The usefulness us now questionable, but in a way the produced SDRs are of a better quality. We should try speeding it up.

src/examples/napi_sine/README.md Outdated Show resolved Hide resolved
src/examples/napi_sine/napi_sine.cpp Outdated Show resolved Hide resolved
breznak
breznak previously approved these changes Nov 23, 2019
Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appart from the last question about why the resutls are bad/"bad", this looks fine to me!
Thank you David

PS: I think the follow up PRs could build/modify these files and turn them into the "real" hotgym.py alike.

@dkeeney
Copy link
Author

dkeeney commented Nov 24, 2019

Thanks for approving this, but I think this should remain as a Hello World type example and I should build a second example that matches the hotgym.py example. Both would have value.

I was not trying to match the results of HelloSPTP in this PR but perhaps I should. I am computing the sine differently and I may have different parameters. So let me see if I can make them more the same.

The fact that both are bad is also a problem. This is probably a user's first exposure to htm.core, at least that is its purpose and it should have at least reasonable results. We control the example and we should pick an example that shows what htm can do. To not do so is embarrassing. That is bugger than what should be done in this PR so lets put that in another PR but it should be done.

@breznak
Copy link
Member

breznak commented Nov 25, 2019

I was not trying to match the results of HelloSPTP in this PR but perhaps I should.

if not 1:1 deterministic results, similar quality of results would be good!

The fact that both are bad is also a problem. This is probably a user's first exposure to htm.core, at least that is its purpose and it should have at least reasonable results

ok, as a separate issue, we should look at the results & params, it's true that avg 0.4 anomaly on a sine wave in 5000 steps isn't stellar. (I don't know if the random noice isn't too strong there etc)

This PR, feel free to merge as is 👍

@dkeeney
Copy link
Author

dkeeney commented Nov 25, 2019

f not 1:1 deterministic results, similar quality of results would be good!

At the moment I am not even close.
The input from the encoder is now the same but SP and TM are very different and anomaly and likelyhood are way off. These should all match exactly since I am calling the same functions with the same parameters. So I have something not quite right. I will keep working on it.

@dkeeney
Copy link
Author

dkeeney commented Nov 25, 2019

Found some things that needed correcting:

  • Found that some of the defaults in the SP and TM algorithms were not the same as the defaults in the Regions. Remember that if you change the defaults in an algorithm be sure to also change it in its corresponding Region.

  • found that computation of sine starts with 0.01 in HelloSPTP and not 0.0. Changed napi_sine to match.

  • found that in HelloSPTP, anLikelihood.anomalyProbability(an) was not being assigned to anlikely. That is why it was always 0.

  • SPRegion needed a way to add noise, so there is a new parameter 'noise'.

  • TMRegion now applies orColumnOutputs to predictiveCells as well as BottomUpOut.

@dkeeney
Copy link
Author

dkeeney commented Nov 25, 2019

Additional things we might want to consider:

  • adding a reset at the beginning of each sine wave cycle so the algorithm knows when its the beginning of the pattern.
  • iterate the sine wave using degrees rather than radians so the cycle begins on an even sample boundary.

@dkeeney
Copy link
Author

dkeeney commented Nov 25, 2019

@breznak I will need another review when you find time.

I will be traveling starting tomorrow, returning probably Sat. Happy Thanksgiving everyone.

@dkeeney
Copy link
Author

dkeeney commented Nov 26, 2019

benchmark_hotgym
    Epoch = 5000
    Anomaly = 0.77451
    Anomaly (avg) = 0.411894
    Anomaly (Likelihood) = 0.88831

napi_sine
     Result after 5000 iterations.
      Anomaly             = 0.77451
      Anomaly(avg)        = 0.411894
      Anomaly(Likelihood) = 0.88831

Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to have the matching results now, thanks Dave 👍
Happy thanksgiving everyone! 🦃

Please see small changes below,

src/examples/hotgym/HelloSPTP.cpp Outdated Show resolved Hide resolved
src/examples/hotgym/HelloSPTP.hpp Outdated Show resolved Hide resolved
@@ -47,18 +48,23 @@ namespace htm {
resolution: {type: Real32, default: "0.0"},
category: {type: Bool, default: "false"},
seed: {type: UInt32, default: "0"},
sensedValue: {type: Real64, default: "0.0", access: ReadWrite }},
noise: {description: "amount of noise to add to the output SDR. 0.01 is 1%",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea to have the noice included in the encoder 👍
Maybe for later, it'd be useful to have a "NoisyRegion", so that it can be applied to/after any layer. That is similar to dropout PR #535

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@@ -336,7 +343,7 @@ Spec *TMRegion::createSpec() {
NTA_BasicType_UInt32, // type
1, // elementCount
"", // constraints
"8", // defaultValue
"10", // defaultValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the defaults now match the Algo version (?) It's good that thanks to this example, we've validated that people using NetworkAPI get the same quality out of the box 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I changed all NetworkAPI defaults to match the Algorithm defaults. This required a manual eye-ball compare. We don't have a way to write a test that can check if the defaults are the same in all cases.

src/test/unit/regions/TMRegionTest.cpp Show resolved Hide resolved
Copy link
Member

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍

@dkeeney dkeeney merged commit abe0a39 into master Nov 27, 2019
@dkeeney dkeeney deleted the hotgym_napi branch November 27, 2019 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants