Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add reader for dnest4 #391

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

add reader for dnest4 #391

wants to merge 7 commits into from

Conversation

qacwnfq
Copy link
Collaborator

@qacwnfq qacwnfq commented Jul 16, 2024

Description

This is a draft and first attempt at adding a reader for diffusive nested samples.
I've opened the PR to facilitate easier discussion of the required changes

While making this, I was contemplating what would be required to replay a diffusive nested samling run.
I've added example output to tests/example_data/dnest4.
From the dnest4 output we can easily get the likelihood levels and replay what sample lead to the construction of the level at what iteration.
Additionally, we also already have the prior compression X available and the likelihoods.
Because we don't really track dead and live partices, I think the best approach would be a specialized class DiffusiveNestedSamples.

Hopefully, we could still achieve a similar interface to NestedSamples, in order to reuse the gui.

The only issue I think we can not solve within anesthetic, is that diffusive nested sampling is allowed to correct the level spacing in the second phase.
Therefore, we would not be able to "exactly" replay what the algorithm did.
Here, the solution would be to store the phases of diffusive nested sampling separately. I'm not sure this is absolutely required.

Fixes # (issue)

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works
  • I have appropriately incremented the semantic version number in both README.rst and anesthetic/_version.py

@qacwnfq
Copy link
Collaborator Author

qacwnfq commented Jul 19, 2024

@williamjameshandley Hey Will, I've added the complete output from DNest4 for a 2D-Gaussian with mean 0 and variance=unity. The prior is uniform on a box from -10 to 10.

In addition to the raw files from DNest4 (levels.txt, sample.txt, sample_info.txt),
I've added the additional files, that are produced by the postprocessing of DNest4.
They are log_prior_weights.txt, posterior_sample.txt and weights.txt (probably the posterior weights).
Additionally, there is a file called sampler_state.txt. I would have to check what it does.

For the visualization, I think levels.txt, sample.txt, sample_info.txt + posterior_sample.txt should be enough, because then it would be possible to show "live" points or the posterior.

@williamjameshandley
Copy link
Collaborator

As a first pass, here is one (non-dynamic) way to visualise a dnest run:

import numpy as np
import os
from anesthetic.plot import basic_cmap

levels_file = 'levels.txt'
sample_file = 'sample.txt'
sample_info_file = 'sample_info.txt'
weights_file = 'weights.txt'

root = 'tests/example_data/dnest4/'

levels = np.loadtxt(os.path.join(root, levels_file), dtype=float, delimiter=' ', comments='#')
samples = np.genfromtxt(os.path.join(root, sample_file), dtype=float, delimiter=' ', comments='#', skip_header=1)
sample_info = np.loadtxt(os.path.join(root, sample_info_file), dtype=float, delimiter=' ', comments='#')
weights = np.loadtxt(os.path.join(root, weights_file), dtype=float, delimiter=' ', comments='#')
n_params = samples.shape[1]

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame(np.concatenate([samples, sample_info],axis=1), columns=['x0', 'x1', 'level', 'log likelihood', 'tiebreaker', 'ID'])
df.ID = df.ID.astype(int)
df.level = df.level.astype(int)
levels = np.sort(df.level.unique())

cmap = basic_cmap('C0')
fig, axes = plt.subplots(3,3, sharex=True, sharey=True)
for j, ax in enumerate(axes.ravel()):
    ls = levels[:j+1]
    for i, l in enumerate(ls):
        color = basic_cmap('C0')((i+1)/len(ls))
        ax.plot(*df[df.level==l][['x0', 'x1']].to_numpy().T, '.', color=color)
        ax.set_xticks([])
        ax.set_yticks([])
        ax.text(-10,10, f'{j+1}', va='top')
        ax.set_xlim(-10, 10)
        ax.set_ylim(-10, 10)

fig.tight_layout()
fig.set_size_inches(7,7)
fig.savefig('dnest.png')

dnest

@qacwnfq
Copy link
Collaborator Author

qacwnfq commented Jul 27, 2024

That example code is very useful. With it the replay of samples with a colormap is almost done.

For the LX over log(X) curve, the easiest way right now is to add a function that stores the posterior weights and log(X) from DNest4.
Right now the log(X) is not written to file and redoing the computation in anesthetic should not be part of this PR.
I'll continue making the prototype and once it's done the code will need some refactoring to fit better into the existing architecture.

@qacwnfq
Copy link
Collaborator Author

qacwnfq commented Jul 30, 2024

Here is a prototype
prototype

Some bugs are still there, e.g, the higson plot is not correct. It should look more like the ouptut from dnest4.

@qacwnfq
Copy link
Collaborator Author

qacwnfq commented Aug 26, 2024

Hi,

I've made progress and the PR is ready for review.

Since NestedSamples and DiffusiveNestedSamples support different types of plots, I decided that these classes should each provide their own methods that return the supported plot types and the corresponding points.
Furthermore , I changed the TrianglePlot.update method. It now takes a list of sets of samples and a list of colors. The update method tries to reuse existing lines, but if new ones are added, it has to redraw all axes.

Copy link

codecov bot commented Aug 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (d367a4b) to head (38576fb).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #391   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           36        37    +1     
  Lines         3058      3142   +84     
=========================================
+ Hits          3058      3142   +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@qacwnfq
Copy link
Collaborator Author

qacwnfq commented Sep 10, 2024

@williamjameshandley This PR is ready for review :)

Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qacwnfq, thanks for contributing to anesthetic!

I leave a more detailed review of the diffusive nested sampling stuff to @williamjameshandley. But I've left some comments inline about integration into the anesthetic API.

Comment on lines +1171 to +1185
def n_live(self, i):
"""
Get live points at iteration i.

Parameters
----------
i: i
nested sampling iteration

Returns
-------
live points at teration i

"""
return self.nlive.iloc[i]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the need for this function? Why not directly use self.nlive.iloc[i]?
I don't like how similarly self.n_live is spelled compared to self.nlive without any indication why/how they behave differently.

If a function is indeed necessary, then we should think about something along the lines of a more general get_nlive() method with an optional kwarg iteration (not i) or item (might be good to take a brief look at the standard naming for similiar things used in numpy and/or pandas).

Comment on lines +1187 to +1203
def LX(self, beta, logX):
"""
Get LX, e.g., for Higson plot.

Parameters
----------
beta: float
temperature
logX: np.ndarray
prior volumes

Returns
-------
LX: np.ndarray
"""
LX = self.logL*beta + logX
return LX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the naming of this function, it's misleading. Makes me think that it returns L * X, which is not the case.

Comment on lines +1375 to +1383
def plot_types(self):
"""
Get types of plots supported by this class.

Returns
-------
tuple[str]
"""
return 'live', 'posterior'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the bigger scheme of anesthetic, this is misleading. We have used the kwargs types and plot_type in the past to indicate KDE plots, histograms, scatter plots, etc. These kwargs have been renamed to kind, to unify with naming conventions in pandas and matplotlib.

I don't think this should be a method of NestedSamples. This seems to be a more GUI specific thing, so a simple list there makes probably more sense.

Comment on lines +1385 to +1413
def points_to_plot(self, plot_type, label, evolution, beta, base_color):
"""
Get samples for plotting.

Parameters
----------
plot_type: str
see plot_types() for supported types.
label: str
column to plot
evolution: int
iteration to plot
beta: float
temperature
base_color:
base_color used to create color palette

Returns
-------
List[array-like]: list of points to plot
List[tuple[float]: colors to use
"""
if plot_type == 'posterior':
return [self.posterior_points(beta)[label]], [base_color]
elif plot_type == 'live':
logL = self.logL.iloc[evolution]
return [self.live_points(logL)[label]], [base_color]
else:
raise ValueError("plot_type not supported")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, too, does not feel like it should be a method of NestedSamples. Too GUI specific.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to have a single subfolder dnest4 in the example_data folder. Can we merge everything from dnest4_no_column_names into dnest4? dnest4 itself can have subfolders...

Comment on lines -23 to +27
plotter.type.buttons.set_active(1)
assert plotter.type() == 'posterior'
plotter.type.buttons.set_active(0)
assert plotter.type() == 'live'
for i, plot_type in enumerate(samples.plot_types()):
plotter.type.buttons.set_active(i)
assert plotter.type() == plot_type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change, although one line shorter, is not actually better. For unit tests it is better to repeat and make sure any issue can be pinned to a specific line. Loops leave unclear at which iteration in the loop an issue occured. So better to repeat in tests.

That said, excessive repeats are of course annoying to maintain. But the better way of handling that is with pytest's parametrize options.

Comment on lines +362 to +366
ns.points_to_plot('visited points',
label='x1',
evolution=0,
beta=1,
base_color='C0')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_reader.py should be about testing the reading of files, so this and the following plotting calls are a bit out of place.

Comment on lines +266 to +270
ns.points_to_plot('visited points',
label='x1',
evolution=0,
beta=1,
base_color='C0')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not belong in test_reader.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants