Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for new approach with nanoseconds integers #107

Merged
merged 18 commits into from
Feb 15, 2022

Conversation

pwnorbitals
Copy link

@pwnorbitals pwnorbitals commented Dec 26, 2021

Did some light refactoring and benchmarked (will be deleted before merging). What are your thoughts about this approach ?

Note : this currently relies on bigint_helper_methods (rust-lang/rust#85532) so compiles on Nightly only

@ChristopherRabotin
Copy link
Member

Thanks for the pr!

If I read the benchmarks correctly, using nanos and centuries is about 3 times faster than two floats for the TBD seconds and JDE ET test, 8 times slower for Duration to f64 seconds time, about 40% faster for Duration add and assert day hour, marginally slower for Duration add and assert minute second, and like 20% slower for Duration add and assert subsecons.

Also, for the record, the conversion was about whether an FPU was available on embedded platforms, right? And that, typically, an FPU is slower than a APU. Was there another reason?

@pwnorbitals
Copy link
Author

Just updated the PR. Indeed, going to f64 is slower but all other operations in the end should be faster. For now, I use a 128-bit type as intermediary for the addition operation, which should not be needed when optimizing further as we could switch to a carrying_add with a bit of additional logic, which should gain some performance. Same for substraction.

Apart for performance, advantages are compatibility with platforms without FPU and guaranteed nanosecond-precision anywhere in the possible Duration range (implementations using floating-points have variable time precision across the range, with very low precision at the extremes).

@pwnorbitals
Copy link
Author

pwnorbitals commented Jan 3, 2022

Cleaned-up the branch a bit, almost all tests are passing now. Still need some work on the printing side, and it looks like there might be some nasty rounding errors I need to track down, but I'm positive this gets finalized soon™

@pwnorbitals pwnorbitals marked this pull request as ready for review January 3, 2022 21:15
@pwnorbitals
Copy link
Author

r ? @ChristopherRabotin

Copy link
Member

@ChristopherRabotin ChristopherRabotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good job! I think we (you) should change centuries to a smaller representation because we don't need lots of bits there. Also, avoid the overflows with the ::from() call instead of coercing the conversion where possible.

src/duration.rs Outdated
pub struct Duration(TwoFloat);
pub struct Duration {
ns : u64, // 1 century is about 3.1e18 ns, and max value of u64 is about 1e19.26
centuries : i64 // +- 9.22e18 centuries is the possible range for a Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that an i16 would be sufficient: that's still 256 centuries before J1900 and after J1900, so about 38% to the oldest cave paintings in the world (-660 centuries ago).

Also, in doing so, we're moving from 2x64 bits to 64+16=80 bits, thereby saving 48 bits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will edit

src/duration.rs Outdated
Self {
0: days * TwoFloat::from(SECONDS_PER_DAY_U),
}
pub fn new(ns : u64, centuries : i64) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn new(ns : u64, centuries : i64) -> Self {
pub fn new(centuries : i64, ns : u64) -> Self {

It makes more sense to me to have the larger representation first.

src/duration.rs Outdated
Self {
0: hours * TwoFloat::from(SECONDS_PER_HOUR_U),

pub fn total_ns(self) -> i128 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my rough math, it seems like an i128 can represent +/- 5.8 centuries. So I think there should be several functions instead of just total_ns, and I think that total_ns should have a warning: "Will panic if more than 5.8 centuries away from J1900". Then, add a try_total_ns that can't panic (returns a Result<i128, Error> of enum Overflow or something like that). And a third function called try_ns_from(baseline_century: i16) that shifts the reference century.

Question: should the base century be J2000? All astro stuff uses J2000, only Network Time Protocol uses J1900, but all of our applications so far are for astro...

>>> (2**64 / (1e9)) / (86400.*365.25*100)
5.845420460906264

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the switch to J2000 should be done in another PR to merge this one in faster.

src/duration.rs Outdated

fn normalize(&mut self) {
if self.ns > NS_PER_CENTURY_U as u64 {
let carry = self.ns / NS_PER_CENTURY_U as u64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that where possible, the code should be u64::from(...) because that forces the compiler to check that the operation is lossless and can't panic. This would give me some reassurance in the conversions.

src/duration.rs Outdated
let centuries = (value.div_euclid(century_divider as f64)) as i64;
value = value.rem_euclid(century_divider as f64);

// Risks : Overflow, loss of precision, unexpected roundings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair. We should add tests for that and try to de-risk it, maybe by doing lots of conversions manually or the u64::from method.

src/duration.rs Outdated
assert!((sum.in_unit_f64(TimeUnit::Minute) + 35.0).abs() < EPSILON);
}

#[test]
fn duration_print() {
// Check printing adds precision
assert_eq!(
format!("{}", TimeUnit::Day * 10.0 + TimeUnit::Hour * 5),
"10 days 5 h 0 min 0 s"
format!("{}", TimeUnit::Day * 10.0 + TimeUnit::Hour * 5).trim(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why trim()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because my Display implementation adds a space at the end of the string and I thought I'd rather trim() the tests than add complexity to the impl just for a single space

src/duration.rs Outdated
"-5 h 0 min 0 s 256 ms 3.5 ns"
TimeUnit::Hour * -5 + TimeUnit::Millisecond * -256 + dbg!(TimeUnit::Nanosecond * -3)
).trim(),
"-5 h 0 min 0 s 256 ms 0 us 3 ns"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is better I think

src/duration.rs Outdated Show resolved Hide resolved
src/duration.rs Outdated
format!("{}", now),
"14889 days 23 h 47 min 34 s 0 ms 203.125 ns"
format!("{}", now).trim(),
"40 years 289 days 23 h 47 min 34 s 0 ms 0 us 203 ns"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why is this test different? Actually, shouldn't the original test ending with 123ns actually display 203 nanoseconds as you do? Seems like a bug in the previous version

src/duration.rs Outdated
format!("{}", arbitrary),
"14889 days 23 h 47 min 34 s 0 ms 123 ns"
format!("{}", arbitrary).trim(),
"40 years 289 days 23 h 47 min 34 s 0 ms 0 us 123 ns"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this test is correct? Can we add more years tests because I'm just worried we'll get the conversion wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed 1 year = 365 days, as this is only used for Display. It's easy to rollback the years part of the decomposition if you're not comfortable with it, I just thought I'd roll with it as it was basically free in the new Display impl. We could indeed add more years tests though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the IETF or someone defines a year as 365.25 days (since one century is 36525 days). But the problem is that it's an approximate definition, and that's why lots of implementations don't allow for year based computations: it's confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, it makes sense, I will remove this part of the decomposition then

@ChristopherRabotin
Copy link
Member

Okay, I did a few changes to this branch. It doesn't work yet, but there's progress: I unified the ops macro so that we don't have to specify what is the type needed. Tests fails pretty badly but the fact it compiles is a start!

test duration::deser_test ... ok
test epoch::deser_test ... ok
test epoch::datetime_invalid_dates ... ok
test epoch::leap_year ... ok
test epoch::quorem_nominal_test ... ok
test duration::duration_print ... FAILED
test duration::time_unit ... FAILED
test epoch::quorem_nil_den_test - should panic ... ok
test epoch::gpst ... FAILED
test epoch::julian_epoch ... FAILED
test epoch::regression_test_gh_85 ... FAILED
test epoch::test_range ... ok
test epoch::spice_et_tdb ... FAILED
test error_unittest ... ok
test epoch::utc_tai ... FAILED
test epoch::ops ... FAILED
test epoch::utc_epochs ... FAILED
test timeseries::test_timeseries ... FAILED
test sim::clock_noise_up ... ok
test sim::clock_sample ... ok
test epoch::test_from_str ... FAILED

@ChristopherRabotin
Copy link
Member

ChristopherRabotin commented Feb 8, 2022

Just posting some temporary benchmarks from this PR after I started working on it last week.

Summary:

  • Converting a duration to f64 is almost three times slower: 10.29 ns vs 3.40 ns
  • Every other operation, including converting an epoch to its JDE seconds or TDB seconds (used for ephemeris querying) is up to twice as fast, and about equal time if not
  fixed nanosecond precision TwoFloat
TBD seconds and JDE ET 430.39 ns 1.0321 us
Duration to f64 seconds 10.480 ns 3.4000 ns
Duration add and assert day hour 24.747 ns 68.897 ns
Duration add and assert minute second 22.536 ns 32.867 ns
Duration add and assert subseconds 34.316 ns 32.319 ns

With nanosecond precision

     Running unittests (target/release/deps/bench_epoch-683b87af3f4470f4)
TBD seconds and JDE ET  time:   [424.62 ns 430.39 ns 437.20 ns]                                   
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Duration to f64 seconds time:   [10.404 ns 10.480 ns 10.569 ns]                                     
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

Duration add and assert day hour                                                                             
                        time:   [24.605 ns 24.747 ns 24.917 ns]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

Duration add and assert minute second                                                                             
                        time:   [22.437 ns 22.536 ns 22.636 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

Duration add and assert subsecons                                                                             
                        time:   [34.173 ns 34.316 ns 34.464 ns]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

With TwoFloat

Note: this was executed after the nanosecond precision so the "performance" increase/decrease metrics are compared to the nanosecond branch

test result: ok. 0 passed; 0 failed; 21 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/release/deps/bench_epoch-826101cd40d0f022)
TBD seconds and JDE ET  time:   [1.0265 us 1.0321 us 1.0376 us]                                    
                        change: [+138.56% +140.95% +143.07%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Duration to f64 seconds time:   [3.3858 ns 3.4000 ns 3.4155 ns]                                     
                        change: [-68.371% -67.860% -67.461%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Duration add and assert day hour                                                                            
                        time:   [68.197 ns 68.897 ns 69.607 ns]
                        change: [+178.35% +181.05% +183.91%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

Duration add and assert minute second                                                                             
                        time:   [32.746 ns 32.867 ns 32.982 ns]
                        change: [+44.195% +45.015% +45.828%] (p = 0.00 < 0.05)
                        Performance has regressed.

Duration add and assert subsecons                                                                             
                        time:   [32.172 ns 32.319 ns 32.462 ns]
                        change: [-7.0698% -6.4217% -5.7948%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
Remove lots of impl_ops for Duration because rustc isn't happy

Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
@ChristopherRabotin
Copy link
Member

ChristopherRabotin commented Feb 14, 2022

Crap... just implement a bunch of code fixes and now the performance is quite bad.

Edit: I think the issue is the creation of a Duration from a TimeUnit because it converts the input data to i128 and that's just slow in a lot of cases.

Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
@ChristopherRabotin
Copy link
Member

ChristopherRabotin commented Feb 15, 2022

Ha! Switching the creation of a Duration from initializing an i64 instead of an i128 where possible greatly drastically decreases execution times.

test result: ok. 0 passed; 0 failed; 24 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/release/deps/bench_epoch-1b18a789a5c33cd2)
TBD seconds and JDE ET  time:   [388.17 ns 390.82 ns 393.62 ns]                                   
                        change: [-53.202% -52.825% -52.433%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  9 (9.00%) low mild
  7 (7.00%) high mild
  4 (4.00%) high severe

Duration to f64 seconds time:   [5.9943 ns 6.0285 ns 6.0645 ns]                                     
                        change: [-69.589% -69.319% -69.059%] (p = 0.00 < 0.05)
                        Performance has improved.

Duration add and assert day hour                                                                             
                        time:   [15.324 ns 15.394 ns 15.469 ns]
                        change: [-62.021% -61.810% -61.603%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Duration add and assert minute second                                                                             
                        time:   [16.018 ns 16.227 ns 16.493 ns]
                        change: [-62.648% -62.124% -61.571%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

Duration add and assert subsecons                                                                             
                        time:   [15.179 ns 15.317 ns 15.474 ns]
                        change: [-61.607% -60.981% -60.272%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe


Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
@ChristopherRabotin
Copy link
Member

ChristopherRabotin commented Feb 15, 2022

Final benchmarks

  fixed nanosecond precision TwoFloat
TBD seconds and JDE ET 378.07 ns 1.0321 us
Duration to f64 seconds 5.7677 ns 3.4000 ns
Duration add and assert day hour 15.992 ns 68.897 ns
Duration add and assert minute second 15.689 ns 32.867 ns
Duration add and assert subseconds 14.341 ns 32.319 ns

Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
Signed-off-by: Christopher Rabotin <christopher.rabotin@gmail.com>
@ChristopherRabotin ChristopherRabotin merged commit ce69083 into nyx-space:master Feb 15, 2022
@pwnorbitals pwnorbitals deleted the to_integers branch February 20, 2022 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants