add reader for dnest4 #391

qacwnfq · 2024-07-16T17:20:27Z

Description

This is a draft and first attempt at adding a reader for diffusive nested samples.
I've opened the PR to facilitate easier discussion of the required changes

While making this, I was contemplating what would be required to replay a diffusive nested samling run.
I've added example output to tests/example_data/dnest4.
From the dnest4 output we can easily get the likelihood levels and replay what sample lead to the construction of the level at what iteration.
Additionally, we also already have the prior compression X available and the likelihoods.
Because we don't really track dead and live partices, I think the best approach would be a specialized class DiffusiveNestedSamples.

Hopefully, we could still achieve a similar interface to NestedSamples, in order to reuse the gui.

The only issue I think we can not solve within anesthetic, is that diffusive nested sampling is allowed to correct the level spacing in the second phase.
Therefore, we would not be able to "exactly" replay what the algorithm did.
Here, the solution would be to store the phases of diffusive nested sampling separately. I'm not sure this is absolutely required.

Fixes # (issue)

Checklist:

I have performed a self-review of my own code
My code is PEP8 compliant (flake8 anesthetic tests)
My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
New and existing unit tests pass locally with my changes (python -m pytest)
I have added tests that prove my fix is effective or that my feature works
I have appropriately incremented the semantic version number in both README.rst and anesthetic/_version.py

qacwnfq · 2024-07-19T09:05:31Z

@williamjameshandley Hey Will, I've added the complete output from DNest4 for a 2D-Gaussian with mean 0 and variance=unity. The prior is uniform on a box from -10 to 10.

In addition to the raw files from DNest4 (levels.txt, sample.txt, sample_info.txt),
I've added the additional files, that are produced by the postprocessing of DNest4.
They are log_prior_weights.txt, posterior_sample.txt and weights.txt (probably the posterior weights).
Additionally, there is a file called sampler_state.txt. I would have to check what it does.

For the visualization, I think levels.txt, sample.txt, sample_info.txt + posterior_sample.txt should be enough, because then it would be possible to show "live" points or the posterior.

williamjameshandley · 2024-07-19T14:55:41Z

As a first pass, here is one (non-dynamic) way to visualise a dnest run:

import numpy as np
import os
from anesthetic.plot import basic_cmap

levels_file = 'levels.txt'
sample_file = 'sample.txt'
sample_info_file = 'sample_info.txt'
weights_file = 'weights.txt'

root = 'tests/example_data/dnest4/'

levels = np.loadtxt(os.path.join(root, levels_file), dtype=float, delimiter=' ', comments='#')
samples = np.genfromtxt(os.path.join(root, sample_file), dtype=float, delimiter=' ', comments='#', skip_header=1)
sample_info = np.loadtxt(os.path.join(root, sample_info_file), dtype=float, delimiter=' ', comments='#')
weights = np.loadtxt(os.path.join(root, weights_file), dtype=float, delimiter=' ', comments='#')
n_params = samples.shape[1]

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame(np.concatenate([samples, sample_info],axis=1), columns=['x0', 'x1', 'level', 'log likelihood', 'tiebreaker', 'ID'])
df.ID = df.ID.astype(int)
df.level = df.level.astype(int)
levels = np.sort(df.level.unique())

cmap = basic_cmap('C0')
fig, axes = plt.subplots(3,3, sharex=True, sharey=True)
for j, ax in enumerate(axes.ravel()):
    ls = levels[:j+1]
    for i, l in enumerate(ls):
        color = basic_cmap('C0')((i+1)/len(ls))
        ax.plot(*df[df.level==l][['x0', 'x1']].to_numpy().T, '.', color=color)
        ax.set_xticks([])
        ax.set_yticks([])
        ax.text(-10,10, f'{j+1}', va='top')
        ax.set_xlim(-10, 10)
        ax.set_ylim(-10, 10)

fig.tight_layout()
fig.set_size_inches(7,7)
fig.savefig('dnest.png')

qacwnfq · 2024-07-27T17:39:41Z

That example code is very useful. With it the replay of samples with a colormap is almost done.

For the LX over log(X) curve, the easiest way right now is to add a function that stores the posterior weights and log(X) from DNest4.
Right now the log(X) is not written to file and redoing the computation in anesthetic should not be part of this PR.
I'll continue making the prototype and once it's done the code will need some refactoring to fit better into the existing architecture.

qacwnfq · 2024-07-30T13:06:03Z

Here is a prototype

Some bugs are still there, e.g, the higson plot is not correct. It should look more like the ouptut from dnest4.

qacwnfq · 2024-08-26T17:57:06Z

Hi,

I've made progress and the PR is ready for review.

Since NestedSamples and DiffusiveNestedSamples support different types of plots, I decided that these classes should each provide their own methods that return the supported plot types and the corresponding points.
Furthermore , I changed the TrianglePlot.update method. It now takes a list of sets of samples and a list of colors. The update method tries to reuse existing lines, but if new ones are added, it has to redraw all axes.

codecov · 2024-08-28T08:12:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (d367a4b) to head (38576fb).

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #391   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           36        37    +1     
  Lines         3058      3142   +84     
=========================================
+ Hits          3058      3142   +84

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

qacwnfq · 2024-09-10T15:39:48Z

@williamjameshandley This PR is ready for review :)

lukashergt

Hi @qacwnfq, thanks for contributing to anesthetic!

I leave a more detailed review of the diffusive nested sampling stuff to @williamjameshandley. But I've left some comments inline about integration into the anesthetic API.

lukashergt · 2024-09-26T21:59:38Z

anesthetic/samples.py

+    def n_live(self, i):
+        """
+        Get live points at iteration i.
+
+        Parameters
+        ----------
+        i: i
+            nested sampling iteration
+
+        Returns
+        -------
+        live points at teration i
+
+        """
+        return self.nlive.iloc[i]


Why the need for this function? Why not directly use self.nlive.iloc[i]?
I don't like how similarly self.n_live is spelled compared to self.nlive without any indication why/how they behave differently.

If a function is indeed necessary, then we should think about something along the lines of a more general get_nlive() method with an optional kwarg iteration (not i) or item (might be good to take a brief look at the standard naming for similiar things used in numpy and/or pandas).

lukashergt · 2024-09-26T22:01:29Z

anesthetic/samples.py

+    def LX(self, beta, logX):
+        """
+        Get LX, e.g., for Higson plot.
+
+        Parameters
+        ----------
+        beta: float
+            temperature
+        logX: np.ndarray
+            prior volumes
+
+        Returns
+        -------
+         LX: np.ndarray
+        """
+        LX = self.logL*beta + logX
+        return LX


I don't like the naming of this function, it's misleading. Makes me think that it returns L * X, which is not the case.

lukashergt · 2024-09-26T22:06:31Z

anesthetic/samples.py

+    def plot_types(self):
+        """
+        Get types of plots supported by this class.
+
+        Returns
+        -------
+        tuple[str]
+        """
+        return 'live', 'posterior'


In the bigger scheme of anesthetic, this is misleading. We have used the kwargs types and plot_type in the past to indicate KDE plots, histograms, scatter plots, etc. These kwargs have been renamed to kind, to unify with naming conventions in pandas and matplotlib.

I don't think this should be a method of NestedSamples. This seems to be a more GUI specific thing, so a simple list there makes probably more sense.

lukashergt · 2024-09-26T22:07:25Z

anesthetic/samples.py

+    def points_to_plot(self, plot_type, label, evolution, beta, base_color):
+        """
+        Get samples for plotting.
+
+        Parameters
+        ----------
+        plot_type: str
+            see plot_types() for supported types.
+        label: str
+            column to plot
+        evolution: int
+            iteration to plot
+        beta: float
+            temperature
+        base_color:
+            base_color used to create color palette
+
+        Returns
+        -------
+        List[array-like]: list of points to plot
+        List[tuple[float]: colors to use
+        """
+        if plot_type == 'posterior':
+            return [self.posterior_points(beta)[label]], [base_color]
+        elif plot_type == 'live':
+            logL = self.logL.iloc[evolution]
+            return [self.live_points(logL)[label]], [base_color]
+        else:
+            raise ValueError("plot_type not supported")


This, too, does not feel like it should be a method of NestedSamples. Too GUI specific.

lukashergt · 2024-09-26T22:10:50Z

tests/example_data/dnest4_no_column_names/README.md

I think it would be cleaner to have a single subfolder dnest4 in the example_data folder. Can we merge everything from dnest4_no_column_names into dnest4? dnest4 itself can have subfolders...

lukashergt · 2024-09-26T22:16:11Z

tests/test_gui.py

-    plotter.type.buttons.set_active(1)
-    assert plotter.type() == 'posterior'
-    plotter.type.buttons.set_active(0)
-    assert plotter.type() == 'live'
+    for i, plot_type in enumerate(samples.plot_types()):
+        plotter.type.buttons.set_active(i)
+        assert plotter.type() == plot_type


This change, although one line shorter, is not actually better. For unit tests it is better to repeat and make sure any issue can be pinned to a specific line. Loops leave unclear at which iteration in the loop an issue occured. So better to repeat in tests.

That said, excessive repeats are of course annoying to maintain. But the better way of handling that is with pytest's parametrize options.

lukashergt · 2024-09-26T22:33:26Z

tests/test_reader.py

+    ns.points_to_plot('visited points',
+                      label='x1',
+                      evolution=0,
+                      beta=1,
+                      base_color='C0')


test_reader.py should be about testing the reading of files, so this and the following plotting calls are a bit out of place.

lukashergt · 2024-09-26T22:37:10Z

tests/test_reader.py

+        ns.points_to_plot('visited points',
+                          label='x1',
+                          evolution=0,
+                          beta=1,
+                          base_color='C0')


Does not belong in test_reader.py.

Add reader and visualization for DNest4 output

8bdb0f2

qacwnfq force-pushed the master branch from 2bbf491 to 8bdb0f2 Compare August 26, 2024 16:51

qacwnfq added 2 commits August 26, 2024 19:14

complete checklist for PR

4867214

fix warning

19354ca

qacwnfq added 4 commits September 3, 2024 23:04

improve code coverage

d649ea3

PEP fix

1df5756

increase test coverage

2b15e66

improve test coverage

38576fb

qacwnfq requested a review from williamjameshandley September 10, 2024 15:39

lukashergt requested changes Sep 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add reader for dnest4 #391

add reader for dnest4 #391

qacwnfq commented Jul 16, 2024 •

edited

Loading

qacwnfq commented Jul 19, 2024

williamjameshandley commented Jul 19, 2024

qacwnfq commented Jul 27, 2024

qacwnfq commented Jul 30, 2024

qacwnfq commented Aug 26, 2024

codecov bot commented Aug 28, 2024 •

edited

Loading

qacwnfq commented Sep 10, 2024

lukashergt left a comment

lukashergt Sep 26, 2024

lukashergt Sep 26, 2024

lukashergt Sep 26, 2024

lukashergt Sep 26, 2024

lukashergt Sep 26, 2024

lukashergt Sep 26, 2024

lukashergt Sep 26, 2024

lukashergt Sep 26, 2024

add reader for dnest4 #391

Are you sure you want to change the base?

add reader for dnest4 #391

Conversation

qacwnfq commented Jul 16, 2024 • edited Loading

Description

Checklist:

qacwnfq commented Jul 19, 2024

williamjameshandley commented Jul 19, 2024

qacwnfq commented Jul 27, 2024

qacwnfq commented Jul 30, 2024

qacwnfq commented Aug 26, 2024

codecov bot commented Aug 28, 2024 • edited Loading

Codecov Report

qacwnfq commented Sep 10, 2024

lukashergt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qacwnfq commented Jul 16, 2024 •

edited

Loading

codecov bot commented Aug 28, 2024 •

edited

Loading