Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropout WIP #535

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Dropout WIP #535

wants to merge 3 commits into from

Conversation

breznak
Copy link
Member

@breznak breznak commented Jul 3, 2019

WIP dropout implementation

  • dropout works
  • move to connections?
  • use for SP, TM
  • doc

EDIT:
Motivation: I believe this change can be considered biological (noise on signal during transfer) and also acording to deep learning (more robust representations). It should be supported by measurable of SDR quality #155

@breznak breznak added SP research new functionality of HTM theory, research idea labels Jul 3, 2019
@breznak breznak self-assigned this Jul 3, 2019
@@ -471,10 +471,16 @@ class SparseDistributedRepresentation : public Serializable
* @param rng The random number generator to draw from. If not given, this
* makes one using the magic seed 0.
*/
void addNoise(Real fractionNoise);
void addNoise(Real fractionNoise); //TODO the name is confusing, rename to shuffle ?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it OK to rename this to shuffle() and have addNoise for the new fn? @ctrl-z-9000-times

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at your new addNoise function and I think it will have issues with keeping the sparsity at a reasonable level. I think that the sparsity of an SDR after this method is called on it will always tend towards 50%.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be indeed wrong. What I intended:

  • have SDR of current input
    • flip 0.01% bits
  • have a new SDR
    • flip 0.01%bits

So the sparsity would remain the same (actially grow, because we have much more off bits, so flipping on would be more probable). But it should remain the x% (2%) + 0.001%

void SparseDistributedRepresentation::addNoise2(const Real probability, Random& rng) {
NTA_ASSERT( probability >= 0.0f and probability <= 1.0f );
const ElemSparse numFlip = static_cast<ElemSparse>(size * probability);
if (numFlip == 0) return;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to write an effective implementation, but this has problem with p << size. Should we bother with such cases? return/assert?

input.addNoise2(0.01f, rng_); //TODO apply at synapse level in Conn?
//TODO fix for probability << input.size
//TODO apply killCells to active output?
//TODO apply dropout to segments? (so all are: synapse, segment, cell/column)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proof of concept dropout applied to input (as noise) and output (as killCells).

  • I'd prefer this be applied in Connections (in adaptSegment?)
  • where to apply?
    • ideally all of: SP, TM. & synapse, segment, cell, column
    • but that would be computationally infeasible, so..?

@breznak
Copy link
Member Author

breznak commented Jul 3, 2019

Deterministic test are still expected to fail, until we decide on values and update the exact outputs.

@ctrl-z-9000-times
Copy link
Collaborator

Maybe I don't understand this change, but it seems this will make the HTM perform worse. While it's interesting that the HTM keeps working even when some of its components are disabled, I don't think this belongs in the mainline. Maybe instead you could make an example/demonstration of these fault-tolerance properties (like numenta did in their SP paper).

@breznak
Copy link
Member Author

breznak commented Jul 3, 2019

but it seems this will make the HTM perform worse.

it's commonly used in deeplearning where it improves a lot. To be exact, dropout helps to prevent overfitting.

While HTM is already more robust to that (sparse SDR for output, stimulus threshold on input segments) I want to see if this helps and how much.
For now, on MNIST this gave a little (1-2%) better results, that is similar as impact of boosting.

I am looking for biological confirmation and datasets to proof if this works better. (It does slowdown a bit but that is an implementation detail).

  • I'd say it's biologically possible that the signal gets corrupted while transfered over the dendrite.
  • maybe this would help in wrong parameter configuration, for a well tuned system dropout indeed had low effect.

HTM keeps working even when some of its components are disabled

umm..no components are disabled permanently, this temporarily flips the bit, adding noise to the input.

@breznak breznak closed this Jul 6, 2019
@breznak breznak reopened this Jul 6, 2019
@breznak breznak closed this Jul 6, 2019
@breznak breznak reopened this Jul 6, 2019
@breznak
Copy link
Member Author

breznak commented Sep 18, 2019

Hotgym example internally uses dropout input.addNoise(), when testing, disable the explicit dropout there and use one from this PR (also test results w/o dropout)

@Zbysekz Zbysekz closed this Jun 26, 2020
@Zbysekz Zbysekz deleted the dropout branch June 26, 2020 06:57
@breznak breznak restored the dropout branch June 26, 2020 07:07
@breznak breznak reopened this Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in_progress research new functionality of HTM theory, research idea SP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants