-
Notifications
You must be signed in to change notification settings - Fork 108
/
type-safety.md
298 lines (227 loc) · 8.63 KB
/
type-safety.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
# Type safety
<a id="c-newtype"></a>
## Newtypes provide static distinctions (C-NEWTYPE)
Newtypes can statically distinguish between different interpretations of an
underlying type.
For example, a `f64` value might be used to represent a quantity in miles or in
kilometers. Using newtypes, we can keep track of the intended interpretation:
```rust
struct Miles(pub f64);
struct Kilometers(pub f64);
impl Miles {
fn to_kilometers(self) -> Kilometers { /* ... */ }
}
impl Kilometers {
fn to_miles(self) -> Miles { /* ... */ }
}
```
Once we have separated these two types, we can statically ensure that we do not
confuse them. For example, the function
```rust
fn are_we_there_yet(distance_travelled: Miles) -> bool { /* ... */ }
```
cannot accidentally be called with a `Kilometers` value. The compiler will
remind us to perform the conversion, thus averting certain [catastrophic bugs].
[catastrophic bugs]: http://en.wikipedia.org/wiki/Mars_Climate_Orbiter
<a id="c-custom-type"></a>
## Arguments convey meaning through types, not `bool` or `Option` (C-CUSTOM-TYPE)
Prefer
```rust
let w = Widget::new(Small, Round)
```
over
```rust
let w = Widget::new(true, false)
```
Core types like `bool`, `u8` and `Option` have many possible interpretations.
Use a deliberate type (whether enum, struct, or tuple) to convey interpretation
and invariants. In the above example, it is not immediately clear what `true`
and `false` are conveying without looking up the argument names, but `Small` and
`Round` are more suggestive.
Using custom types makes it easier to expand the options later on, for example
by adding an `ExtraLarge` variant.
See the newtype pattern ([C-NEWTYPE]) for a no-cost way to wrap existing types
with a distinguished name.
[C-NEWTYPE]: #c-newtype
<a id="c-bitflag"></a>
## Types for a set of flags are `bitflags`, not enums (C-BITFLAG)
Rust supports `enum` types with explicitly specified discriminants:
```rust
enum Color {
Red = 0xff0000,
Green = 0x00ff00,
Blue = 0x0000ff,
}
```
Custom discriminants are useful when an `enum` type needs to be serialized to an
integer value compatibly with some other system/language. They support
"typesafe" APIs: by taking a `Color`, rather than an integer, a function is
guaranteed to get well-formed inputs, even if it later views those inputs as
integers.
An `enum` allows an API to request exactly one choice from among many. Sometimes
an API's input is instead the presence or absence of a set of flags. In C code,
this is often done by having each flag correspond to a particular bit, allowing
a single integer to represent, say, 32 or 64 flags. Rust's [`bitflags`] crate
provides a typesafe representation of this pattern.
[`bitflags`]: https://github.com/bitflags/bitflags
```rust
use bitflags::bitflags;
bitflags! {
struct Flags: u32 {
const FLAG_A = 0b00000001;
const FLAG_B = 0b00000010;
const FLAG_C = 0b00000100;
}
}
fn f(settings: Flags) {
if settings.contains(Flags::FLAG_A) {
println!("doing thing A");
}
if settings.contains(Flags::FLAG_B) {
println!("doing thing B");
}
if settings.contains(Flags::FLAG_C) {
println!("doing thing C");
}
}
fn main() {
f(Flags::FLAG_A | Flags::FLAG_C);
}
```
<a id="c-builder"></a>
## Builders enable construction of complex values (C-BUILDER)
Some data structures are complicated to construct, due to their construction
needing:
* a large number of inputs
* compound data (e.g. slices)
* optional configuration data
* choice between several flavors
which can easily lead to a large number of distinct constructors with many
arguments each.
If `T` is such a data structure, consider introducing a `T` _builder_:
1. Introduce a separate data type `TBuilder` for incrementally configuring a `T`
value. When possible, choose a better name: e.g. [`Command`] is the builder
for a [child process], [`Url`] can be created from a [`ParseOptions`].
2. The builder constructor should take as parameters only the data _required_ to
make a `T`.
3. The builder should offer a suite of convenient methods for configuration,
including setting up compound inputs (like slices) incrementally. These
methods should return `self` to allow chaining.
4. The builder should provide one or more "_terminal_" methods for actually
building a `T`.
[`Command`]: https://doc.rust-lang.org/std/process/struct.Command.html
[child process]: https://doc.rust-lang.org/std/process/struct.Child.html
[`Url`]: https://docs.rs/url/1.4.0/url/struct.Url.html
[`ParseOptions`]: https://docs.rs/url/1.4.0/url/struct.ParseOptions.html
The builder pattern is especially appropriate when building a `T` involves side
effects, such as spawning a task or launching a process.
In Rust, there are two variants of the builder pattern, differing in the
treatment of ownership, as described below.
### Non-consuming builders (preferred)
In some cases, constructing the final `T` does not require the builder itself to
be consumed. The following variant on [`std::process::Command`] is one example:
[`std::process::Command`]: https://doc.rust-lang.org/std/process/struct.Command.html
```rust
// NOTE: the actual Command API does not use owned Strings;
// this is a simplified version.
pub struct Command {
program: String,
args: Vec<String>,
cwd: Option<String>,
// etc
}
impl Command {
pub fn new(program: String) -> Command {
Command {
program: program,
args: Vec::new(),
cwd: None,
}
}
/// Add an argument to pass to the program.
pub fn arg(&mut self, arg: String) -> &mut Command {
self.args.push(arg);
self
}
/// Add multiple arguments to pass to the program.
pub fn args(&mut self, args: &[String]) -> &mut Command {
self.args.extend_from_slice(args);
self
}
/// Set the working directory for the child process.
pub fn current_dir(&mut self, dir: String) -> &mut Command {
self.cwd = Some(dir);
self
}
/// Executes the command as a child process, which is returned.
pub fn spawn(&self) -> io::Result<Child> {
/* ... */
}
}
```
Note that the `spawn` method, which actually uses the builder configuration to
spawn a process, takes the builder by shared reference. This is possible because
spawning the process does not require ownership of the configuration data.
Because the terminal `spawn` method only needs a reference, the configuration
methods take and return a mutable borrow of `self`.
#### The benefit
By using borrows throughout, `Command` can be used conveniently for both
one-liner and more complex constructions:
```rust
// One-liners
Command::new("/bin/cat").arg("file.txt").spawn();
// Complex configuration
let mut cmd = Command::new("/bin/ls");
if size_sorted {
cmd.arg("-S");
}
cmd.arg(".");
cmd.spawn();
```
### Consuming builders
Sometimes builders must transfer ownership when constructing the final type `T`,
meaning that the terminal methods must take `self` rather than `&self`.
```rust
impl TaskBuilder {
/// Name the task-to-be.
pub fn named(mut self, name: String) -> TaskBuilder {
self.name = Some(name);
self
}
/// Redirect task-local stdout.
pub fn stdout(mut self, stdout: Box<io::Write + Send>) -> TaskBuilder {
self.stdout = Some(stdout);
self
}
/// Creates and executes a new child task.
pub fn spawn<F>(self, f: F) where F: FnOnce() + Send {
/* ... */
}
}
```
Here, the `stdout` configuration involves passing ownership of an `io::Write`,
which must be transferred to the task upon construction (in `spawn`).
When the terminal methods of the builder require ownership, there is a basic
tradeoff:
* If the other builder methods take/return a mutable borrow, the complex
configuration case will work well, but one-liner configuration becomes
impossible.
* If the other builder methods take/return an owned `self`, one-liners continue
to work well but complex configuration is less convenient.
Under the rubric of making easy things easy and hard things possible, all
builder methods for a consuming builder should take and return an owned
`self`. Then client code works as follows:
```rust
// One-liners
TaskBuilder::new("my_task").spawn(|| { /* ... */ });
// Complex configuration
let mut task = TaskBuilder::new();
task = task.named("my_task_2"); // must re-assign to retain ownership
if reroute {
task = task.stdout(mywriter);
}
task.spawn(|| { /* ... */ });
```
One-liners work as before, because ownership is threaded through each of the
builder methods until being consumed by `spawn`. Complex configuration, however,
is more verbose: it requires re-assigning the builder at each step.