Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with MacroBase streaming mode explanations #263

Open
ganesh-srinivas opened this issue Apr 30, 2018 · 0 comments
Open

Issues with MacroBase streaming mode explanations #263

ganesh-srinivas opened this issue Apr 30, 2018 · 0 comments

Comments

@ganesh-srinivas
Copy link

ganesh-srinivas commented Apr 30, 2018

I'm seeing very different explanations from bin/streaming.sh compared to bin/batch.sh and bin/frontent.sh. In addition to this, the values of ratio, records, and support are incorrect for streaming mode explanations.

For any explanation in the report, the following relationship should hold true:

ratio = (outliers_with_attr / outliers) / (inliers_with_attr / inliers)

Support is the proportion of records marked as outliers that contained this attribute combination. Theoretical minimum is 0 (no outliers had this pattern), maximum is 1 (all outlier records matched).

Ratio Out/In is the proportion of outlier records containing this attribute combination compared to the proportion of inlier records containing this attribute combination (i.e., support in outliers divided by support in inliers). A ratio of 1 means that this pattern appeared equally frequently in inlier and outliers. A ratio of infinity means this pattern was not present in the inliers.

Records is the actual number of outlier records matching this pattern (i.e., support * number of outliers).

Data

sensor_data_demo_db_version.txt

Here is a result from bin/batch.sh:

INFO  [2018-04-30 12:14:32,989] macrobase.runtime.command.MacroBasePipelineCommand: Result: [outliers: 1012.000000
inliers: 100241.000000
load time 1032ms
execution time: 713ms
summarization time: 366ms

-----

support: 1.000000
records: 1012.000000
ratio: Infinity

Columns:
	device_id: 2040
	model: M606
	firmware_version: 0.3.2
	state: MA

-----

]

The values of support, ratio and records make sense (UPDATE: risk ratio shouldn't be INFINITY. There exist inlier records with this attribute value combination! I'm going through the source code and learning about the FPGrowth algorithm to figure out the mistake).

Here is the result from bin/streaming:

INFO  [2018-04-30 12:13:41,512] macrobase.runtime.command.MacroBasePipelineCommand: Result: [outliers: 3837.000000
inliers: 97416.000000
load time 1124ms
execution time: 1312ms
summarization time: 134ms

-----

support: 0.097909
records: 1492.580000
ratio: 29.048441

Columns:
	device_id: 2040

-----

]

The values of support, records and ratio are in disagreement with each other.

  • support = records / outliers does not hold true.
  • value of ratio and records does not make sense.

Cause of issue

I believe that this issue is due to a bug in the code for streaming explanations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant