Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize booleans #790

Merged
merged 24 commits into from
Mar 30, 2016
Merged

optimize booleans #790

merged 24 commits into from
Mar 30, 2016

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Mar 23, 2016

TLDR: a || (a && b) => a, !!a => a, (a == b) && !(a == b) => false, but still some quirks, feedback requested

cc #590


So... I implemented the Quine–McCluskey algorithm and Petrick's Method which together allow an algorithmic minimization of arbitrary boolean expressions (with fewer than 32 terminals). The full algorithm runs twice. Once for getting the smallest "sum of products" (a && b && c || d && e || f) and once for getting the smallest "product of sums" ((a || b || c) && (d || e) && f). Then I check the list of smallest representations against the actual boolean expression. If the actual expression is the same (or some permutation of) one of the best ones, then I continue. Otherwise I dump all the better expressions as suggestions.

This is not perfect yet. The algorithms assume mathematical boolean expressions. So they don't know about short circuiting, xor or comparison operations. Also every boolean expression with more than 2 levels gets flattened to 2 levels. I plan on tackling those issues in future improvements. I made the lint allow-by-default because of this.

Open Questions:

  • Should I detect a == b and a != b in the same boolean expression and use them (during optimization) as a == b and !(a == b). This has some advantages (see below)
  • Should I turn !(a == b) in the optimized expression into a != b or should this be a separate lint?
  • Should stdlib comparison functions with inverses be treated like the builtin comparisons functions for the two previos points? Uncommon
  • Should I keep ignoring short circuiting behavior? Are there any situations where an optimized form of an expression would break? allow by default, except for logic bugs
  • changing (!a && b) || (a && !b) to a ^ b is fine? there's no short circuiting possible anyway, separate lint? Uncommon
  • unwrapping a ^ b to the expanded form for optimization might yield better results in a larger expression, should this be done? Uncommon
  • Should I treat bitwise and and bitwise or 100% equal to the regular boolean ops? This basically assumes that anyone using the non-fast-forwarding versions doesn't know what they are doing.

some benchmarking on cargo with clippy compiled in release mode says it has (nearly?) no impact:

with nonminimal_bool:

time: 0.951; rss: 275MB lint checking

just clippy:

time: 0.936; rss: 274MB lint checking
time: 0.929; rss: 275MB lint checking
time: 0.950; rss: 274MB lint checking

no clippy:

time: 0.255; rss: 271MB lint checking

How annoying is this lint?

cargo:

src/cargo/core/dependency.rs:207         self.name == id.name() &&
src/cargo/core/dependency.rs:208             (self.only_match_name || (self.req.matches(id.version()) &&
src/cargo/core/dependency.rs:209                                       &self.source_id == id.source_id()))

suggested to change to

(self.name == id.name() && self.req.matches(id.version()) && &self.source_id == id.source_id()) || (self.name == id.name() && self.only_match_name)

or

(self.only_match_name || &self.source_id == id.source_id()) && (self.only_match_name || self.req.matches(id.version())) && self.name == id.name()

The previous expression was fine imo, but had 3 levels of and/or operations

One bug, must be in SpanlessEq::expr_eq:

src/cargo/util/config.rs:205             let is_path = val.val.contains("/") ||
src/cargo/util/config.rs:206                           (cfg!(windows) && val.val.contains("\\"));
src/cargo/util/config.rs:205:27: 206:68 help: try
src/cargo/util/config.rs:                let is_path = val.val.contains("/");

on racer:

src/racer/matchers.rs:417     if (blob.starts_with("pub enum") || (local && blob.starts_with("enum"))) &&
src/racer/matchers.rs:418        txt_matches(search_type, searchstr, blob) {

suggests

if (local && blob.starts_with("enum") && txt_matches(search_type, searchstr, blob)) || (blob.starts_with("pub enum") && txt_matches(search_type, searchstr, blob)) {

and

if (blob.starts_with("pub enum") || blob.starts_with("enum")) && (blob.starts_with("pub enum") || local) && txt_matches(search_type, searchstr, blob) {

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 23, 2016

One thing we could do to make this really applicable is to only lint when we manage to eliminate a term (a || (a && b) => a eliminates b) or when the number of uses of a term doesn't increase (this prevents the cases shown in racer and cargo where the expression got more complex). (a term is anything producing a bool, but not because of a boolean operation)

@mcarton
Copy link
Member

mcarton commented Mar 23, 2016

One bug, must be in SpanlessEq::expr_eq:

src/cargo/util/config.rs:205             let is_path = val.val.contains("/") ||
src/cargo/util/config.rs:206                           (cfg!(windows) && val.val.contains("\\"));
src/cargo/util/config.rs:205:27: 206:68 help: try
src/cargo/util/config.rs:                let is_path = val.val.contains("/");

Nope, just checked, strings literal compare correctly. Most likely cfg!(windows) == false, then the second part of the || is useless and your algo wants to drop it.

I’ll look at the rest later.

@mcarton
Copy link
Member

mcarton commented Mar 23, 2016

on racer:

src/racer/matchers.rs:417     if (blob.starts_with("pub enum") || (local && blob.starts_with("enum"))) &&
src/racer/matchers.rs:418        txt_matches(search_type, searchstr, blob) {

suggests

if (local && blob.starts_with("enum") && txt_matches(search_type, searchstr, blob)) || (blob.starts_with("pub enum") && txt_matches(search_type, searchstr, blob)) {

and

if (blob.starts_with("pub enum") || blob.starts_with("enum")) && (blob.starts_with("pub enum") || local) && txt_matches(search_type, searchstr, blob) {

This is bad, (p || (l && e)) && t is more concise/factored than (l && e && t) || (p && t) and (p || e) && (p || l) && t.

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 23, 2016

This is bad, (p || (l && e)) && t is more concise/factored than (l && e && t) || (p && t) and (p || e) && (p || l) && t.

yea :( As I noted, these algorithms cut everything down to 2 levels of bool-ops. That's why I suggested

only lint when we manage to eliminate a term (a || (a && b) => a eliminates b) or when the number of uses of a term doesn't increase

The number of uses of either t or p increases in the racer case, while none of the other terminals get used less. So we have a few cases:

  1. a terminal disappears => logic bug found in the code => lint
  2. no increase in occurrence of any terminals, but a decrease of occurrence of some terminals => improvement => suggestion lint
  3. no increase or decrease => not sure if that can happen
  4. increase in occurrence of some terminals, and decrease in occurrence of others => no idea
  5. increase in occurrence of some terminals, and no decrease in occurrence of any terminals => un-optimization of 3 or more level bool op => don't lint

Most likely cfg!(windows) == false, then the second part of the || is useless and you algo wants to drop it.

ah, yea, we should treat macro calls yielding bools as terminals... i'll add a bugfix

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 24, 2016

no more hits in cargo and racer

I split the lint into logic_bug and nonminimal_bool. The first one lints when a terminal is dropped entirely by a minimization. The second one only lints when the number of occurrences of a terminal doesn't increase.

I also made sure that !(a && b) doesn't get "optimized" to !a || !b


struct NonminimalBoolVisitor<'a, 'tcx: 'a>(&'a LateContext<'a, 'tcx>);

use quine_mc_cluskey::Bool;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that use here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to annoy enough so someone writes a lint warning about uses in odd places?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit 😄

@mcarton
Copy link
Member

mcarton commented Mar 24, 2016

Should I detect a == b and a != b in the same boolean expression and use them (during optimization) as a == b and !(a == b). This has some advantages (see below)

That would be nice 😄

Should I turn !(a == b) in the optimized expression into a != b or should this be a separate lint?

I would say that’s more optimized, but it’s surely clea{n,r}er. Except if a == b comes from a macro, I wouldn’t expect anyone to ever write !(a == b).

Should stdlib comparison functions with inverses be treated like the builtin comparisons functions for the two previos points?

I’m not sure what you mean here.

Should I keep ignoring short circuiting behavior? Are there any situations where an optimized form of an expression would break?

Yes, that’s probably important. When I made the lints about copy&paste errors, I was told that servo (IIRC) had an instance of foo.pop() && foo.pop() which was intentional. What does your lint do here?

changing (!a && b) || (a && !b) to a ^ b is fine? there's no short circuiting possible anyway, separate lint?
unwrapping a ^ b to the expanded form for optimization might yield better results in a larger expression, should this be done?

a ^ b (or a != b) is probably clea{n,r}er, but that’s probably not very common.

@llogiq
Copy link
Contributor

llogiq commented Mar 24, 2016

Yeah, once any of the sub-expressions calls a method or macro, all bets are off – we should at least be wary to reorder code then, because Rust doesn't save us from side effects (x.pop() != x.pop() being the prime example). Perhaps report under a maybe_... lint which is allow by default and add a warning that reordering may change the semantics due to side effects.

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 24, 2016

I’m not sure what you mean here.

methods like Iterator::eq can be detected and also inverted: !v.iter().eq(&w) could be v.iter().ne(&w)

Yes, that’s probably important. When I made the lints about copy&paste errors, I was told that servo (IIRC) had an instance of foo.pop() && foo.pop() which was intentional. What does your lint do here?

well since I'm using the util::SpanlessEq thing with ignore_fn, these are parsed as "different" and are then their own Terminal. I might accidentally swap them, which becomes an issue only if the boolean expression is simplifyable, which it is not. (in this case it's not even an issue since the order doesn't matter, but it might be foo.pop() && bar.pop())

an example would be !(!foo.pop() && bar.pop()) which should be foo.pop() || !bar.pop(), but might end up getting suggested as !bar.pop() || foo.pop().

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 24, 2016

Rust doesn't save us from side effects

could we somehow abuse marker traits to detect &mut or Cell/RefCell that might get modified by a function/method call? (then we just have things like file-system access and statics and raw pointers)...

@mcarton
Copy link
Member

mcarton commented Mar 24, 2016

methods like Iterator::eq can be detected and also inverted: !v.iter().eq(&w) could be v.iter().ne(&w)

This is probably uncommon enough not to bother.

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 24, 2016

Should I detect a == b and a != b in the same boolean expression and use them (during optimization) as a == b and !(a == b). This has some advantages (see below)

That would be nice 😄

done

I would say that’s more optimized, but it’s surely clea{n,r}er. Except if a == b comes from a macro, I wouldn’t expect anyone to ever write !(a == b).

yea, currently the optimized form might end up suggesting !(a == b) instead of a != b, I'll look at it after the holidays

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 29, 2016

!(a == b) is now suggested as a != b, the same optimization is done for all other comparison operators.

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 30, 2016

Perhaps report under a maybe_... lint which is allow by default and add a warning that reordering may change the semantics due to side effects.

done.

I don't think there are any more open questions.

@llogiq llogiq merged commit e878ab4 into rust-lang:master Mar 30, 2016
@llogiq
Copy link
Contributor

llogiq commented Mar 30, 2016

This is a great addition to our bag of tricks 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants