optimize booleans #790

oli-obk · 2016-03-23T14:37:32Z

TLDR: a || (a && b) => a, !!a => a, (a == b) && !(a == b) => false, but still some quirks, feedback requested

cc #590

So... I implemented the Quine–McCluskey algorithm and Petrick's Method which together allow an algorithmic minimization of arbitrary boolean expressions (with fewer than 32 terminals). The full algorithm runs twice. Once for getting the smallest "sum of products" (a && b && c || d && e || f) and once for getting the smallest "product of sums" ((a || b || c) && (d || e) && f). Then I check the list of smallest representations against the actual boolean expression. If the actual expression is the same (or some permutation of) one of the best ones, then I continue. Otherwise I dump all the better expressions as suggestions.

This is not perfect yet. The algorithms assume mathematical boolean expressions. So they don't know about short circuiting, xor or comparison operations. Also every boolean expression with more than 2 levels gets flattened to 2 levels. I plan on tackling those issues in future improvements. I made the lint allow-by-default because of this.

Open Questions:

Should I detect a == b and a != b in the same boolean expression and use them (during optimization) as a == b and !(a == b). This has some advantages (see below)
Should I turn !(a == b) in the optimized expression into a != b or should this be a separate lint?
~~Should stdlib comparison functions with inverses be treated like the builtin comparisons functions for the two previos points?~~ Uncommon
~~Should I keep ignoring short circuiting behavior? Are there any situations where an optimized form of an expression would break?~~ allow by default, except for logic bugs
~~changing (!a && b) || (a && !b) to a ^ b is fine? there's no short circuiting possible anyway, separate lint?~~ Uncommon
~~unwrapping a ^ b to the expanded form for optimization might yield better results in a larger expression, should this be done?~~ Uncommon
Should I treat bitwise and and bitwise or 100% equal to the regular boolean ops? This basically assumes that anyone using the non-fast-forwarding versions doesn't know what they are doing.

some benchmarking on cargo with clippy compiled in release mode says it has (nearly?) no impact:

with nonminimal_bool:

time: 0.951; rss: 275MB lint checking

just clippy:

time: 0.936; rss: 274MB lint checking
time: 0.929; rss: 275MB lint checking
time: 0.950; rss: 274MB lint checking

no clippy:

time: 0.255; rss: 271MB lint checking

How annoying is this lint?

cargo:

src/cargo/core/dependency.rs:207         self.name == id.name() &&
src/cargo/core/dependency.rs:208             (self.only_match_name || (self.req.matches(id.version()) &&
src/cargo/core/dependency.rs:209                                       &self.source_id == id.source_id()))

suggested to change to

(self.name == id.name() && self.req.matches(id.version()) && &self.source_id == id.source_id()) || (self.name == id.name() && self.only_match_name)

or

(self.only_match_name || &self.source_id == id.source_id()) && (self.only_match_name || self.req.matches(id.version())) && self.name == id.name()

The previous expression was fine imo, but had 3 levels of and/or operations

One bug, must be in SpanlessEq::expr_eq:

src/cargo/util/config.rs:205             let is_path = val.val.contains("/") ||
src/cargo/util/config.rs:206                           (cfg!(windows) && val.val.contains("\\"));
src/cargo/util/config.rs:205:27: 206:68 help: try
src/cargo/util/config.rs:                let is_path = val.val.contains("/");

on racer:

src/racer/matchers.rs:417     if (blob.starts_with("pub enum") || (local && blob.starts_with("enum"))) &&
src/racer/matchers.rs:418        txt_matches(search_type, searchstr, blob) {

suggests

if (local && blob.starts_with("enum") && txt_matches(search_type, searchstr, blob)) || (blob.starts_with("pub enum") && txt_matches(search_type, searchstr, blob)) {

and

if (blob.starts_with("pub enum") || blob.starts_with("enum")) && (blob.starts_with("pub enum") || local) && txt_matches(search_type, searchstr, blob) {

oli-obk · 2016-03-23T15:04:59Z

One thing we could do to make this really applicable is to only lint when we manage to eliminate a term (a || (a && b) => a eliminates b) or when the number of uses of a term doesn't increase (this prevents the cases shown in racer and cargo where the expression got more complex). (a term is anything producing a bool, but not because of a boolean operation)

mcarton · 2016-03-23T15:49:54Z

One bug, must be in SpanlessEq::expr_eq:

src/cargo/util/config.rs:205             let is_path = val.val.contains("/") ||
src/cargo/util/config.rs:206                           (cfg!(windows) && val.val.contains("\\"));
src/cargo/util/config.rs:205:27: 206:68 help: try
src/cargo/util/config.rs:                let is_path = val.val.contains("/");

Nope, just checked, strings literal compare correctly. Most likely cfg!(windows) == false, then the second part of the || is useless and your algo wants to drop it.

I’ll look at the rest later.

mcarton · 2016-03-23T15:58:54Z

on racer:

src/racer/matchers.rs:417     if (blob.starts_with("pub enum") || (local && blob.starts_with("enum"))) &&
src/racer/matchers.rs:418        txt_matches(search_type, searchstr, blob) {

suggests

if (local && blob.starts_with("enum") && txt_matches(search_type, searchstr, blob)) || (blob.starts_with("pub enum") && txt_matches(search_type, searchstr, blob)) {

and

if (blob.starts_with("pub enum") || blob.starts_with("enum")) && (blob.starts_with("pub enum") || local) && txt_matches(search_type, searchstr, blob) {

This is bad, (p || (l && e)) && t is more concise/factored than (l && e && t) || (p && t) and (p || e) && (p || l) && t.

oli-obk · 2016-03-23T16:08:34Z

This is bad, (p || (l && e)) && t is more concise/factored than (l && e && t) || (p && t) and (p || e) && (p || l) && t.

yea :( As I noted, these algorithms cut everything down to 2 levels of bool-ops. That's why I suggested

only lint when we manage to eliminate a term (a || (a && b) => a eliminates b) or when the number of uses of a term doesn't increase

The number of uses of either t or p increases in the racer case, while none of the other terminals get used less. So we have a few cases:

a terminal disappears => logic bug found in the code => lint
no increase in occurrence of any terminals, but a decrease of occurrence of some terminals => improvement => suggestion lint
no increase or decrease => not sure if that can happen
increase in occurrence of some terminals, and decrease in occurrence of others => no idea
increase in occurrence of some terminals, and no decrease in occurrence of any terminals => un-optimization of 3 or more level bool op => don't lint

Most likely cfg!(windows) == false, then the second part of the || is useless and you algo wants to drop it.

ah, yea, we should treat macro calls yielding bools as terminals... i'll add a bugfix

oli-obk · 2016-03-24T10:00:48Z

no more hits in cargo and racer

I split the lint into logic_bug and nonminimal_bool. The first one lints when a terminal is dropped entirely by a minimization. The second one only lints when the number of occurrences of a terminal doesn't increase.

I also made sure that !(a && b) doesn't get "optimized" to !a || !b

mcarton · 2016-03-24T12:08:00Z

src/booleans.rs

+
+struct NonminimalBoolVisitor<'a, 'tcx: 'a>(&'a LateContext<'a, 'tcx>);
+
+use quine_mc_cluskey::Bool;


Why is that use here?

to annoy enough so someone writes a lint warning about uses in odd places?

Seems legit 😄

mcarton · 2016-03-24T12:30:15Z

Should I detect a == b and a != b in the same boolean expression and use them (during optimization) as a == b and !(a == b). This has some advantages (see below)

That would be nice 😄

Should I turn !(a == b) in the optimized expression into a != b or should this be a separate lint?

I would say that’s more optimized, but it’s surely clea{n,r}er. Except if a == b comes from a macro, I wouldn’t expect anyone to ever write !(a == b).

Should stdlib comparison functions with inverses be treated like the builtin comparisons functions for the two previos points?

I’m not sure what you mean here.

Should I keep ignoring short circuiting behavior? Are there any situations where an optimized form of an expression would break?

Yes, that’s probably important. When I made the lints about copy&paste errors, I was told that servo (IIRC) had an instance of foo.pop() && foo.pop() which was intentional. What does your lint do here?

changing (!a && b) || (a && !b) to a ^ b is fine? there's no short circuiting possible anyway, separate lint?
unwrapping a ^ b to the expanded form for optimization might yield better results in a larger expression, should this be done?

a ^ b (or a != b) is probably clea{n,r}er, but that’s probably not very common.

llogiq · 2016-03-24T12:56:53Z

Yeah, once any of the sub-expressions calls a method or macro, all bets are off – we should at least be wary to reorder code then, because Rust doesn't save us from side effects (x.pop() != x.pop() being the prime example). Perhaps report under a maybe_... lint which is allow by default and add a warning that reordering may change the semantics due to side effects.

oli-obk · 2016-03-24T12:58:04Z

I’m not sure what you mean here.

methods like Iterator::eq can be detected and also inverted: !v.iter().eq(&w) could be v.iter().ne(&w)

Yes, that’s probably important. When I made the lints about copy&paste errors, I was told that servo (IIRC) had an instance of foo.pop() && foo.pop() which was intentional. What does your lint do here?

well since I'm using the util::SpanlessEq thing with ignore_fn, these are parsed as "different" and are then their own Terminal. I might accidentally swap them, which becomes an issue only if the boolean expression is simplifyable, which it is not. (in this case it's not even an issue since the order doesn't matter, but it might be foo.pop() && bar.pop())

an example would be !(!foo.pop() && bar.pop()) which should be foo.pop() || !bar.pop(), but might end up getting suggested as !bar.pop() || foo.pop().

oli-obk · 2016-03-24T13:00:21Z

Rust doesn't save us from side effects

could we somehow abuse marker traits to detect &mut or Cell/RefCell that might get modified by a function/method call? (then we just have things like file-system access and statics and raw pointers)...

mcarton · 2016-03-24T13:07:45Z

methods like Iterator::eq can be detected and also inverted: !v.iter().eq(&w) could be v.iter().ne(&w)

This is probably uncommon enough not to bother.

oli-obk · 2016-03-24T15:36:12Z

Should I detect a == b and a != b in the same boolean expression and use them (during optimization) as a == b and !(a == b). This has some advantages (see below)

That would be nice 😄

done

I would say that’s more optimized, but it’s surely clea{n,r}er. Except if a == b comes from a macro, I wouldn’t expect anyone to ever write !(a == b).

yea, currently the optimized form might end up suggesting !(a == b) instead of a != b, I'll look at it after the holidays

…ific hints

…ates

oli-obk · 2016-03-29T15:23:23Z

!(a == b) is now suggested as a != b, the same optimization is done for all other comparison operators.

oli-obk · 2016-03-30T10:58:20Z

Perhaps report under a maybe_... lint which is allow by default and add a warning that reordering may change the semantics due to side effects.

done.

I don't think there are any more open questions.

llogiq · 2016-03-30T13:10:29Z

This is a great addition to our bag of tricks 😄

mcarton reviewed Mar 24, 2016
View reviewed changes

oli-obk added 17 commits March 29, 2016 10:45

better simplification

93d097e

improve bracket display

57faa5a

also compute minimal product of sum form

1f1f09b

improve lint attribute detail

25ed62f

merge multiple equal terminals into one

5911cca

fallout and tests

050d7fd

bugfix in quine-mc_cluskey 0.2.1

0a78a79

treat macros as terminals to prevent cfg! from giving platform spec…

288ea79

…ific hints

differentiate between logic bugs and optimizable expressions

03833f6

negations around expressions can make things simpler

37cee84

if a < b { ... } if a >= b { ... } what am I doing?

76ab801

update lints

e7013a3

String::extend -> String::push_str

0f92f84

collect stats on bool ops and negations in an expression

dd6bee3

add tests showing the current level of minimization with ==

6904fd5

a small refactoring for readability

25bbde0

make sure a < b and a >= b are considered equal by SpanlessEq

3a0791e

oli-obk added 6 commits March 29, 2016 10:45

detect negations of terminals like a != b vs a == b

96be287

more tests

be72883

accidentally forgot about improvements if there were multiplie candid…

216edba

…ates

added brackets and fixed compiler comments

b05dd13

!(a == b) --> a != b

e9c87c7

dogfood

fa48ee6

make nonminimal_bool allow-by-default

2917484

llogiq merged commit e878ab4 into rust-lang:master Mar 30, 2016

oli-obk deleted the bool_opt branch March 30, 2016 13:11

est31 mentioned this pull request May 4, 2023

doc(cfg(a or b or c) and a) should cut unnecessary conditions rust-lang/rust#104991

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize booleans #790

optimize booleans #790

oli-obk commented Mar 23, 2016

oli-obk commented Mar 23, 2016

mcarton commented Mar 23, 2016

mcarton commented Mar 23, 2016

oli-obk commented Mar 23, 2016

oli-obk commented Mar 24, 2016

mcarton Mar 24, 2016

oli-obk Mar 24, 2016

mcarton Mar 24, 2016

mcarton commented Mar 24, 2016

llogiq commented Mar 24, 2016

oli-obk commented Mar 24, 2016

oli-obk commented Mar 24, 2016

mcarton commented Mar 24, 2016

oli-obk commented Mar 24, 2016

oli-obk commented Mar 29, 2016

oli-obk commented Mar 30, 2016

llogiq commented Mar 30, 2016


		struct NonminimalBoolVisitor<'a, 'tcx: 'a>(&'a LateContext<'a, 'tcx>);

		use quine_mc_cluskey::Bool;

optimize booleans #790

optimize booleans #790

Conversation

oli-obk commented Mar 23, 2016

oli-obk commented Mar 23, 2016

mcarton commented Mar 23, 2016

mcarton commented Mar 23, 2016

oli-obk commented Mar 23, 2016

oli-obk commented Mar 24, 2016

mcarton Mar 24, 2016

Choose a reason for hiding this comment

oli-obk Mar 24, 2016

Choose a reason for hiding this comment

mcarton Mar 24, 2016

Choose a reason for hiding this comment

mcarton commented Mar 24, 2016

llogiq commented Mar 24, 2016

oli-obk commented Mar 24, 2016

oli-obk commented Mar 24, 2016

mcarton commented Mar 24, 2016

oli-obk commented Mar 24, 2016

oli-obk commented Mar 29, 2016

oli-obk commented Mar 30, 2016

llogiq commented Mar 30, 2016