bayesim package

Submodules

bayesim.model module

class bayesim.model.Model(**argv)[source]

Bases: object

The main workhorse class of bayesim. Stores the modeled and observed data as well as a Pmf object which maintains the current probability distribution and grid subdivisions.

Attributes: [update this]

attach_ecs(**argv)[source]

Define parameters for experimental conditions.

Parameters:
  • ec_list (list of str) – names of experimental conditions
  • ec_tols (dict) – dict of form {ec_name_1:tolerance_1, ec_name_2:tolerance_2, …}, will supersede ec_list
  • ec_units (dict) – dict of form {ec_name_1:units_1, ec_name_2:units_2, …}, optional
attach_fit_params(params)[source]

Attach list of parameters to fit.

Parameters:param_list – list of Fit_param objects
attach_model(**argv)[source]

Attach the model for the data, either by feeding in a file of precomputed data or a function that does the computing.

Parameters:
  • mode (str) – ‘file’ or ‘function’
  • model_data_func (callable) – if mode=’function’, provide function here
  • model_data_path (str) – if mode==’file’, provide path to file
  • output_column (str) – optional, header of column containing output data (required if different from self.output_var)
  • calc_model_unc (bool) – whether to calculate model uncertainties as well, defaults to False
  • verbose (bool) – flag for verbosity, defaults to False
attach_observations(**argv)[source]

Attach measured dataset.

Parameters:
  • obs_data_path (str) – path to HDF5 file containing observed data
  • keep_all (bool) – whether to keep all the data in the file (longer simulation times) or to clip out data points that are close to each other (defaults to False)
  • ec_x_var (str) – required if keep_all is False, the experimental condition over which to measure differences (e.g. V for JV(Ti) curves in PV). It will also be used in plotting later.
  • max_ec_x_step (float) – used if keep_all is False, largest step to take in the ec_x_var before keeping a point even if curve if “flat” (defaults to 0.05 * range of ec_x_var)
  • thresh_dif_frac (float) – used if keep_all is False, threshold (as a percentage of the range of values, defaults to 0.01)
  • fixed_unc (float) – required if running in function mode or if file doesn’t have an ‘uncertainty’ column, value to use as uncertainty in measurement
  • output_column (str) – optional, header of column containing output data (required if different from self.output_var)
  • verbose (bool) – flag for verbosity, defaults to False
attach_params(params)[source]

Attach a param_list object.

calc_indices()[source]

Compute starting and ending indices in self.model_data for each point in self.probs.

calc_model_unc(**argv)[source]

Calculates largest difference in modeled output along any parameter direction for each experimental condition, to be used for uncertainty in calculating likelihoods. Currently only works if data is on a grid.

(also assumes it’s sorted by param names and then EC’s)

Parameters:
  • verbose (bool) – flag for verbosity, defaults to False
  • model_unc_factor (float) – multiplier on deltas to give uncertainty, defaults to 0.5 - smaller probably means faster convergence, but also higher chance to miss “hot spots”
  • take_average (bool) – flag for whether to use average model uncertainty at each measurement condition or the parameter-resolved version. Defaults to True if any parameters are logarithmically spaced, and False otherwise.
  • min_unc_frac (float) – minimum uncertainty as a fraction of the output variable value, defaults to 0.01
  • min_unc_val (float) – minimum uncertainty as an absolute number, defaults to 0.0

Note

If both min_unc_frac and min_unc_val are specified, the uncertainty will be set to the larger of the two in each case

check_data_columns(**argv)[source]

Make sure the columns in imported data make sense.

Parameters:
  • model_data (DataFrame) – dataset to check
  • output_column (str) – optional, header of column containing output data (required if different from self.output_var)
check_ecs(**argv)[source]

Check that all experimental conditions are present at each parameter point in modeled data.

Parameters:
  • gb (groupby) – Pandas groupby object of model data grouped by parameter points
  • verbose (bool) – flag for verbosity, defaults to False
comparison_plot(**argv)[source]

Plot observed data vs. highest-probability modeled data.

Parameters:
  • ec_vals (dict) – optional, dict of EC values at which to plot. If not provided, they will be chosen randomly. This can also be a list of dicts for multiple points.
  • num_ecs (int) – number of EC values to plot, defaults to 1 (ignored if ecs is provided)
  • num_param_pts (int) – number of the most probable parameter space points to plot (defaults to 1)
  • ec_x_var (str) – one of self.ec_names, will overwrite if this was provided before in attach_observations, required if it wasn’t. If ec was provided, this will supercede that
  • return_avg_err (bool) – whether or not to return average (absolute) error for highest probability model over all EC’s plotted (defaults to False)
  • fpath (str) – optional, path to save image to if desired
ec_names()[source]

Return list of experimental condition names.

fit_param_names()[source]

Return list of fitting parameter names.

list_model_pts_to_run(fpath, **argv)[source]

Generate full list of model points that need to be run (not just parameter points but also all experimental conditions). Saves to HDF5 at fpath.

Note that this could be very slow if used on the initial grid (i.e. for potentially millions of points) - it’s better for after a subdivide call.

Parameters:
  • fpath (str) – path to save the list to (HDF5)
  • verbose (bool) – flag for verbosity, defaults to False
run(**argv)[source]

Do Bayes! Will stop iterating through observations if/when >= th_pm of probability mass is concentrated in <= th_pv of boxes and decide it’s time to subdivide. (completely arbitrary thresholding for now)

Parameters:
  • save_step (int) – interval (number of data points) at which to save intermediate PMF’s (defaults to 10, 0 to save only final, <0 to save none)
  • th_pm (float) – threshold quantity of probability mass to be concentrated in th_pv fraction of parameter space to trigger the run to stop (defaults to 0.9)
  • th_pv (float) – threshold fraction of parameter space volume for th_pm fraction of probability to be concentrated into to trigger the run to stop (defaults to 0.05)
  • min_num_pts (int) – minimum number of observation points to use - if threshold is reached before this number of points has been used, it will start over and the final PMF will be the average of the number of runs needed to use sufficient points (defaults to 0.7 * the number of experimental measurements)
  • prob_relax (float) – number from 0 to 1.0, fraction of PMF from previous step to mix into prior for this step (defaults to 0) - higher values will likely converge faster but possibly have larger errors, especially if min_num_pts is small
  • verbose (bool) – flag for verbosity, defaults to False
save_state(filename='bayesim_state.h5')[source]

Save the entire state of this model object to an HDF5 file so that work can be resumed later.

set_param_info(param_name, **argv)[source]

Set additional info for parameter param_name (any type).

Parameters:
  • param_name (str) – name of parameter to modify
  • units (str) – units of parameter
  • min_width (float) – minimum width of parameter (only for fitting params)
  • display_name (str) – name to use on plots (can include TeX)
  • tolerance (float) – tolerance for this parameter
subdivide(**argv)[source]

Subdivide the probability distribution and save the list of new sims to run to a file.

Parameters:
  • threshold_prob (float) – minimum probability of box to (keep and) subdivide (default value is the uniform distribution probability)
  • new_sim_list_fpath (str) – filename for file containing list of new simulations to be run (optional)
top_probs(num)[source]

Return a DataFrame with the ‘num’ most probable points and some of the less interesting columns hidden.

visualize_grid(**argv)[source]

Visualize the current state of the grid.

Parameters:as pmf.visualize() (same) –
visualize_probs(**argv)[source]

Visualize the PMF with a corner plot.

Parameters:as pmf.visualize() (same) –

bayesim.params module

class bayesim.params.Fit_param(**argv)[source]

Bases: bayesim.params.Param

A bayesim fitting parameter. Because they will be initialized on a grid, each fitting parameter stores its full list of values as well as some other information such as the spacing between them and the minimum width of a box (used during grid subdivisions).

get_closest_val(val)[source]

Return closest value to val in this parameters current set of vals.

get_tol_digits(**argv)[source]

Compute number of digits to round to. ‘val’ must be provided if logspaced.

get_val_str(val)[source]

Return a string with this parameter’s value, reasonably formatted.

class bayesim.params.Measured_param(**argv)[source]

Bases: bayesim.params.Param

A bayesim measured parameter. Can be experimental input or output.

class bayesim.params.Param(**argv)[source]

Bases: object

A parameter in a bayesim analysis. Can be a fitting parameter or an experimental condition.

get_val_str(val)[source]
set_tolerance(tol, islog=False)[source]

Set the tolerance for this parameter.

class bayesim.params.Param_list(**argv)[source]

Bases: object

Small class to facilitate listing and comparison of bayesim parameters.

add_ec(**argv)[source]

Add an experimental condition.

Parameters:
  • name (str) – name of the parameter, required
  • units (str) – units in which parameter is measured (defaults to ‘unitless’)
  • tolerance (float) – smallest difference between two values of this parameter to consider “real,” defaults to 1E-6
  • is_x (bool) – set this to be the x-axis variable when plotting data, defaults to False
add_fit_param(**argv)[source]

Add a fitting parameter to the list.

Parameters:
  • param (Fit_param) – A Fit_param object to add to the list
  • name (str) – name of the parameter, required if param object not passed
  • units (str) – units in which parameter is measured (defaults to ‘unitless’)
  • tolerance (float) – smallest difference between two values of this parameter to consider “real”
  • val_range (:obj:`list` of float) – [min, max] (either this or vals is required)
  • vals (list of float) – full list of vals for this param
  • length (int) – initial length of this parameter (defaults to 10)
  • min_width (float) – minimum box width for this parameter - subtractive if linear spacing and divisive if logarithmic (defaults to 0.01 of total range, required if providing val_range)
  • spacing (str) – ‘linear’ or ‘log’ (defaults to linear)
  • verbose (bool) – verbosity flag
add_output(**argv)[source]

Add an output variable.

Parameters:
  • name (str) – name of the parameter, required
  • units (str) – units in which parameter is measured (defaults to ‘unitless’)
  • tolerance (float) – smallest difference between two values of this parameter to consider “real,” defaults to 1E-6
all_params()[source]

Return a flat list of all parameters of any type.

as_dict()[source]
find_param(name)[source]

Return the Param object with the given name.

Parameters:name (str) – name to search for
get_ec_x()[source]
is_empty()[source]
param_names(param_type=None)[source]

Return a list of parameter names. If no arguments provided, output will be a dict, if a type is provided, output will be a list of just the parameter names of that type.

param_present(name)[source]

Check that the param name isn’t already present in a list.

Parameters:name (str) – name to check for
set_ec_x(param_name, verbose=False)[source]

Set the x-variable for experimental conditions.

set_tolerance(param_name, tol)[source]

Set the tolerance value for the given parameter.

vals_equal(param_name, val1, val2)[source]

Compare two values of a given param.

Parameters:
  • param_name (str) – name of parameter, must be in one of the lists
  • val2 (val1,) – values to be compared
Returns:

True if abs(val1-val2) < tolerance of param_name

bayesim.pmf module

class bayesim.pmf.Pmf(**argv)[source]

Bases: object

Class that stores a PMF capable of nested sampling / “adaptive mesh refinement”.

Stores probabilities in a DataFrame which associates regions of parameter space with probability values.

all_current_values(param)[source]

List all values currently being considered for param.

as_dict()[source]

Return this Pmf object in (readable) dictionary form.

find_neighbor_boxes(index)[source]

Find and return all boxes neighboring the box at index.

likelihood(**argv)[source]

Compute likelihood over this Pmf’s parameter space given modeled data at the given EC’s for every parameter space point and a measurement at the same EC’s.

Parameters:
  • meas (float) – one output value
  • model_at_ec (DataFrame) – DataFrame containing model data at the experimental condition of the measurement and uncertainty values in a column called ‘uncertainty’ for every point in parameter space
  • output_col (str) – name of column with output variable
  • ec – dict with keys of condition names and values
  • meas – one output value e.g. J
  • unc – uncertainty in measured value (stdev of a Gaussian)
  • model_func – should accept one dict of params and one of conditions and output measurement (might deprecate)
  • verbose (bool) – be verbose or not
make_points_list(params, total_prob=1.0)[source]

Helper function for Pmf.__init__ as well as Pmf.subdivide. Given names and values for parameters, generate DataFrame listing values, bounds, and probabilities.

Parameters:
  • params (Param_list) –
  • total_prob (float) – total probability to divide among points in parameter space - for an initialization, this is 1.0, for a subdivide call will be less.
Returns:

obj:DataFrame with columns for each parameter’s value, min, and max as well as a probability associated with that point in parameter space

most_probable(n)[source]

Return the n largest probabilities in a new DataFrame.

multiply(other_pmf, **argv)[source]

Compute and store renormalized product of this Pmf with other_pmf.

Parameters:other_pmf (Pmf) – PMF to multiply by
normalize(**argv)[source]

Normalize overall PMF.

param_names()[source]

Return list of parameter names of this PMF.

pick_ticks(plot_ranges)[source]

Helper function for visualize.

populate_dense_grid(**argv)[source]

Populate a grid such as the one created by make_dense_grid.

Parameters:
  • (obj (df) – DataFrame): DataFrame to populate from (should have columns for every param)
  • col_to_pull (str) – name of the column to use when populating grid points
  • make_ind_lists (bool) – whether to return a list of indices corresponding to the first in every slice (used by bayesim.model.calc_model_gradients)
Returns:

a dict with keys for each thing requested

project_1D(param, dense_grid=[])[source]

Project down to a one-dimensional PMF over the given parameter. Used by the visualize() method.

Parameters:param (Fit_param) – one of self.params
Returns:bin edges for plotting with matplotlib.pyplot.hist (has length one more than next return list) probs (list of float): probability values for histogram-style plot - note that these technically have units of the inverse of whatever the parameter being plotted is (that is, they’re probability densities) dense_grid (matrix): optionally, pass precomputed dense grid to save time
Return type:bins (list of float)
project_2D(x_param, y_param, no_probs=False, dense_grid=[])[source]

Project down to two dimensions over the two parameters. This one doesn’t actually need to sum, it just draws a bunch of (potentially overlapping) rectangles with transparencies according to their probability densities (as a fraction of the normalized area). Used by the visualize() method.

Parameters:
  • x_param (dict) – one of self.params, to be the x-axis of the 2D joint plot
  • y_param (dict) – one of self.params, to be the y-axis of the 2D joint plot
  • no_probs (bool) – whether to just show the grid boxes or include probability as determiner of transparency
  • dense_grid (matrix) – optionally, pass precomputed dense grid to save time
Returns:

patches for plotting the 2D joint probability distribution

Return type:

(list of matplotlib.patches.Rectangle)

subdivide(threshold_prob, include_neighbors=True, **argv)[source]

Subdivide all boxes with P > threshold_prob and assign “locally uniform” probabilities within each box. If include_neighbors is true, also subdivide all boxes neighboring those boxes.

Boxes with P < threshold_prob are deleted.

Parameters:
  • threshold_prob (float) – probability above which a box should be retained
  • include_neighbors (bool) – whether to also subdivide all immediate neighbors to boxes meeting the threshold
  • verbose (bool) – Verbosity, default false
uniformize()[source]

Keep PMF shape and subdivisions but make every probability equal. Useful for rerunning whole inference after subdividing.

Note that because subdivisions are not uniform that this is NOT a uniform prior anymore.

visualize(**argv)[source]

Make histogram matrix to visualize the PMF.

Parameters:
  • frac_points (float) – number >0 and <=1 indicating fraction of total points to visualize (will take the most probable, defaults to 1.0)
  • just_grid (bool) – whether to show only the grid (i.e. visualize subdivisions) or the whole PMF (defaults to False)
  • fpath (str) – optional, path to save image to
  • true_vals (dict) – optional, set of param values to highlight on PMF
  • return_plots (bool) – whether to return figure and axes (used by visualize_PMF_sequence in utils), default False
  • color_index (int) – from 0 to 4, choose from blue, green, red, purple, orange (defaults to 0)
weighted_avgs()[source]

Returns a dict with the weighted average for each parameter.

bayesim.utils module

bayesim.utils.calc_deltas(grp, inds, param_lengths, model_data, fit_param_names, probs, output_var, take_average)[source]
bayesim.utils.get_closest_val(val, val_list)[source]

Return closest value to val in a list.

bayesim.utils.visualize_PMF_sequence(statefile_list, **argv)[source]

Create plot akin to that produced by pmf.visualize() but with data from multiple PMF’s overlaid. All should have the same set of fitting parameters. For now assumes that first statefile has the largest axes bounds, will add automated check later.

Parameters:
  • statefile_list (list of str) – list of paths to statefiles (saved by Model.save_state function storing PMF’s to be visualized), in order from least to most subdivided
  • name_list (list of str) – optional, list of names for legend, defaults to filenames
  • true_vals (dict) – optional, set of param values to highlight on PMF
  • fpath (str) – optional, path to save image to

Module contents