adopy.base
This module contains three basic classes of ADOpy: Task, Model, and Engine. These classes provide built-in functions for the Adaptive Design Optimization.
Note
Three basic classes are defined in the adopy.base
(i.e.,
adopy.base.Task
, adopy.base.Model
, and
adopy.base.Engine
). However, for convinience, users can import them
directly as adopy.Task
, adopy.Model
, and
adopy.Engine
.
from adopy import Task, Model, Engine
# works the same as
from adopy.base import Task, Model, Engine
Task
- class adopy.base.Task(designs, responses, name=None)
Bases:
object
A task object stores information for a specific experimental task, including labels of design variables (
designs
), labels of possible responses (responses
), and the task name (name
).Changed in version 0.4.0: The
response
argument is changed to the labels of response variables, instead of possible values of a response variable.- Parameters:
designs – Labels of design variables in the task.
responses – Labels of response variables in the task (e.g., choice, rt).
name – Name of the task.
Examples
>>> task = Task(name='Task A', ... designs=['d1', 'd2'], ... responses=['choice']) >>> task Task('Task A', designs=['d1', 'd2'], responses=['choice']) >>> task.name 'Task A' >>> task.designs ['d1', 'd2'] >>> task.responses ['choice']
- property name
Name of the task. If it has no name, returns
None
.
- property designs
Labels for design variables of the task.
- property responses
Labels of response variables in the task.
- extract_responses(data)
Extract response grids from the given data.
- Parameters:
data – A data object that contains key-value pairs or columns corresponding to design variables.
- Returns:
ret – An ordered dictionary of grids for response variables.
Model
- class adopy.base.Model(task, params, func=None, name=None)
Bases:
object
A base class for a model in the ADOpy package.
Its initialization requires up to 4 arguments:
task
,params
,func
(optional), andname
(optional).task
is an instance of aadopy.base.Task
class that this model is for.params
is a list of model parameters, given as a list of their labels, e.g.,['alpha', 'beta']
.name
is the name of this model, which is optional for its functioning.The most important argument is
func
, which calculates the log likelihood given with design values, parameter values, and response values as its input. The arguments of the function should include design variables and response variables (defined in thetask
: instance) and model parameters (given asparams
). The order of arguments does not matter. Iffunc
is not given, the model provides the log likelihood of a random noise. An simple example is given as follows:def compute_log_lik(design1, design2, param1, param2, param3, response1, response2): # ... calculating the log likelihood ... return log_lik
Warning
Since the version 0.4.0, the
func
argument should be defined to compute the log likelihood, instead of the probability of a binary response variable. Also, it should include the response variables as arguments. These changes might break existing codes using the previous versions of ADOpy.Changed in version 0.4.0: The
func
argument is changed to the log likelihood function, instead of the probability function of a single binary response.- Parameters:
task (Task) – Task object that this model object is for.
params – Labels of model parameters in the model.
func (function, optional) – A function to compute the log likelihood given a model, denoted as \(L(\mathbf{x} | \mathbf{d}, \mathbf{\theta})\), where \(\mathbf{x}\) is a response vector, \(\mathbf{d}\) is a design vector, and \(\mathbf{\theta}\) is a parameter vector. Note that the function arguments should include all design, parameter, and response variables.
name (Optional[str]) – Name of the task.
Examples
>>> task = Task(name='Task A', designs=['x1', 'x2'], responses=['y']) >>> def calculate_log_lik(y, x1, x2, b0, b1, b2): ... import numpy as np ... from scipy.stats import bernoulli ... logit = b0 + b1 * x1 + b2 * x2 ... p = np.divide(1, 1 + np.exp(-logit)) ... return bernoulli.logpmf(y, p) >>> model = Model(name='Model X', task=task, params=['b0', 'b1', 'b2'], ... func=calculate_log_lik) >>> model.name 'Model X' >>> model.task Task('Task A', designs=['x1', 'x2'], responses=['y']) >>> model.params ['b0', 'b1', 'b2'] >>> model.compute(y=1, x1=1, x2=-1, b0=1, b1=0.5, b2=0.25) -0.251929081345373 >>> compute_log_lik(y=1, x1=1, x2=-1, b0=1, b1=0.5, b2=0.25) -0.251929081345373
- property name
Name of the model. If it has no name, returns
None
.
- property task
Task instance for the model.
- property params
Labels for model parameters of the model.
- compute(*args, **kargs)
Compute log likelihood of obtaining responses with given designs and model parameters. The function provide the same result as the argument
func
given in the initialization. If the likelihood function is not given for the model, it returns the log probability of a random noise.Warning
Since the version 0.4.0,
compute()
function should compute the log likelihood, instead of the probability of a binary response variable. Also, it should include the response variables as arguments. These changes might break existing codes using the previous versions of ADOpy.Changed in version 0.4.0: Provide the log likelihood instead of the probability of a binary response.
Engine
- class adopy.base.Engine(task, model, grid_design, grid_param, grid_response, noise_ratio=1e-07, dtype=<class 'numpy.float32'>)
Bases:
object
A base class for an ADO engine to compute optimal designs.
- property task
Task instance for the engine.
- property model
Model instance for the engine.
- property grid_design
Grid space for design variables, generated from the grid definition, given as
grid_design
with initialization.
- property grid_param
Grid space for model parameters, generated from the grid definition, given as
grid_param
with initialization.
- property grid_response
Grid space for response variables, generated from the grid definition, given as
grid_response
with initialization.
- property log_prior
Log prior probabilities on the grid space of model parameters, \(\log p_0(\theta)\). This log probabilities correspond to grid points defined in
grid_param
.
- property log_post
Log posterior probabilities on the grid space of model parameters, \(\log p(\theta)\). This log probabilities correspond to grid points defined in
grid_param
.
- property prior
Prior probabilities on the grid space of model parameters, \(p_0(\theta)\). This probabilities correspond to grid points defined in
grid_param
.
- property post
Posterior probabilities on the grid space of model parameters, \(p(\theta)\). This probabilities correspond to grid points defined in
grid_param
.
- property marg_post
Marginal posterior distributions for each parameter
- property log_lik
Log likelihood \(p(y | d, \theta)\) for all discretized values of \(y\), \(d\), and \(\theta\).
- property marg_log_lik
Marginal log likelihood \(\log p(y | d)\) for all discretized values for \(y\) and \(d\).
- property ent
Entropy \(H(Y(d) | \theta) = -\sum_y p(y | d, \theta) \log p(y | d, \theta)\) for all discretized values for \(d\) and \(\theta\).
- property ent_marg
Marginal entropy \(H(Y(d)) = -\sum_y p(y | d) \log p(y | d)\) for all discretized values for \(d\), where \(p(y | d)\) indicates the marginal likelihood.
- property ent_cond
Conditional entropy \(H(Y(d) | \Theta) = \sum_\theta p(\theta) H(Y(d) | \theta)\) for all discretized values for \(d\), where \(p(\theta)\) indicates the posterior distribution for model parameters.
- property mutual_info
Mutual information \(I(Y(d); \Theta) = H(Y(d)) - H(Y(d) | \Theta)\), where \(H(Y(d))\) indicates the marginal entropy and \(H(Y(d) | \Theta)\) indicates the conditional entropy.
- property post_mean
A vector of estimated means for the posterior distribution. Its length is
num_params
.
- property post_cov
An estimated covariance matrix for the posterior distribution. Its shape is
(num_grids, num_params)
.
- property post_sd
A vector of estimated standard deviations for the posterior distribution. Its length is
num_params
.
- property dtype
The desired data-type for the internal vectors and matrixes, e.g.,
numpy.float64
. Default isnumpy.float32
.Added in version 0.4.0.
- reset()
Reset the engine as in the initial state.
- get_design(kind='optimal')
Choose a design with given one of following types:
'optimal'
(default): an optimal design \(d^*\) that maximizes the mutual information.'random'
: a design randomly chosen.
- Parameters:
kind ({‘optimal’, ‘random’}, optional) – Type of a design to choose. Default is
'optimal'
.- Returns:
design (Dict[str, any] or None) – A chosen design vector to use for the next trial. Returns None if there is no design available.
- update(design, response)
Update the posterior probabilities \(p(\theta | y, d^*)\) for all discretized values of \(\theta\).
\[p(\theta | y, d^*) \sim p( y | \theta, d^*) p(\theta)\]# Given design and resposne as `design` and `response`, # the engine can update probability with the following line: engine.update(design, response)
Also, it can takes multiple observations for updating posterior probabilities. Multiple pairs of design and response should be given as a list of designs and a list of responses, into
design
andresponse
argument, respectively.\[\begin{split}\begin{aligned} p\big(\theta | y_1, \ldots, y_n, d_1^*, \ldots, d_n^*\big) &\sim p\big( y_1, \ldots, y_n | \theta, d_1^*, \ldots, d_n^* \big) p(\theta) \\ &= p(y_1 | \theta, d_1^*) \cdot \ldots \cdot p(y_n | \theta, d_n^*) p(\theta) \end{aligned}\end{split}\]# Given a list of designs and corresponding responses as below: designs = [design1, design2, design3] responses = [response1, response2, response3] # the engine can update with multiple observations: engine.update(designs, responses)
- Parameters:
design (dict or
pandas.Series
or list of designs) – Design vector for given responseresponse (dict or
pandas.Series
or list of responses) – Any kinds of observed response