adopy.base

This module contains three basic classes of ADOpy: Task, Model, and Engine. These classes provide built-in functions for the Adaptive Design Optimization.

Note

Three basic classes are defined in the adopy.base (i.e., adopy.base.Task, adopy.base.Model, and adopy.base.Engine). However, for convinience, users can import them directly as adopy.Task, adopy.Model, and adopy.Engine.

from adopy import Task, Model, Engine
# works the same as
from adopy.base import Task, Model, Engine

Task

class adopy.base.Task(designs, responses, name=None)

Bases: object

A task object stores information for a specific experimental task, including labels of design variables (designs), labels of possible responses (responses), and the task name (name).

Changed in version 0.4.0: The response argument is changed to the labels of response variables, instead of possible values of a response variable.

Parameters:
  • designs – Labels of design variables in the task.

  • responses – Labels of response variables in the task (e.g., choice, rt).

  • name – Name of the task.

Examples

>>> task = Task(name='Task A',
...             designs=['d1', 'd2'],
...             responses=['choice'])
>>> task
Task('Task A', designs=['d1', 'd2'], responses=['choice'])
>>> task.name
'Task A'
>>> task.designs
['d1', 'd2']
>>> task.responses
['choice']
property name

Name of the task. If it has no name, returns None.

property designs

Labels for design variables of the task.

property responses

Labels of response variables in the task.

extract_responses(data)

Extract response grids from the given data.

Parameters:

data – A data object that contains key-value pairs or columns corresponding to design variables.

Returns:

ret – An ordered dictionary of grids for response variables.

Model

class adopy.base.Model(task, params, func=None, name=None)

Bases: object

A base class for a model in the ADOpy package.

Its initialization requires up to 4 arguments: task, params, func (optional), and name (optional).

task is an instance of a adopy.base.Task class that this model is for. params is a list of model parameters, given as a list of their labels, e.g., ['alpha', 'beta']. name is the name of this model, which is optional for its functioning.

The most important argument is func, which calculates the log likelihood given with design values, parameter values, and response values as its input. The arguments of the function should include design variables and response variables (defined in the task: instance) and model parameters (given as params). The order of arguments does not matter. If func is not given, the model provides the log likelihood of a random noise. An simple example is given as follows:

def compute_log_lik(design1, design2,
                    param1, param2, param3,
                    response1, response2):
    # ... calculating the log likelihood ...
    return log_lik

Warning

Since the version 0.4.0, the func argument should be defined to compute the log likelihood, instead of the probability of a binary response variable. Also, it should include the response variables as arguments. These changes might break existing codes using the previous versions of ADOpy.

Changed in version 0.4.0: The func argument is changed to the log likelihood function, instead of the probability function of a single binary response.

Parameters:
  • task (Task) – Task object that this model object is for.

  • params – Labels of model parameters in the model.

  • func (function, optional) – A function to compute the log likelihood given a model, denoted as \(L(\mathbf{x} | \mathbf{d}, \mathbf{\theta})\), where \(\mathbf{x}\) is a response vector, \(\mathbf{d}\) is a design vector, and \(\mathbf{\theta}\) is a parameter vector. Note that the function arguments should include all design, parameter, and response variables.

  • name (Optional[str]) – Name of the task.

Examples

>>> task = Task(name='Task A', designs=['x1', 'x2'], responses=['y'])
>>> def calculate_log_lik(y, x1, x2, b0, b1, b2):
...     import numpy as np
...     from scipy.stats import bernoulli
...     logit = b0 + b1 * x1 + b2 * x2
...     p = np.divide(1, 1 + np.exp(-logit))
...     return bernoulli.logpmf(y, p)
>>> model = Model(name='Model X', task=task, params=['b0', 'b1', 'b2'],
...               func=calculate_log_lik)
>>> model.name
'Model X'
>>> model.task
Task('Task A', designs=['x1', 'x2'], responses=['y'])
>>> model.params
['b0', 'b1', 'b2']
>>> model.compute(y=1, x1=1, x2=-1, b0=1, b1=0.5, b2=0.25)
-0.251929081345373
>>> compute_log_lik(y=1, x1=1, x2=-1, b0=1, b1=0.5, b2=0.25)
-0.251929081345373
property name

Name of the model. If it has no name, returns None.

property task

Task instance for the model.

property params

Labels for model parameters of the model.

compute(*args, **kargs)

Compute log likelihood of obtaining responses with given designs and model parameters. The function provide the same result as the argument func given in the initialization. If the likelihood function is not given for the model, it returns the log probability of a random noise.

Warning

Since the version 0.4.0, compute() function should compute the log likelihood, instead of the probability of a binary response variable. Also, it should include the response variables as arguments. These changes might break existing codes using the previous versions of ADOpy.

Changed in version 0.4.0: Provide the log likelihood instead of the probability of a binary response.

Engine

class adopy.base.Engine(task, model, grid_design, grid_param, grid_response, noise_ratio=1e-07, dtype=<class 'numpy.float32'>)

Bases: object

A base class for an ADO engine to compute optimal designs.

property task

Task instance for the engine.

property model

Model instance for the engine.

property grid_design

Grid space for design variables, generated from the grid definition, given as grid_design with initialization.

property grid_param

Grid space for model parameters, generated from the grid definition, given as grid_param with initialization.

property grid_response

Grid space for response variables, generated from the grid definition, given as grid_response with initialization.

property log_prior

Log prior probabilities on the grid space of model parameters, \(\log p_0(\theta)\). This log probabilities correspond to grid points defined in grid_param.

property log_post

Log posterior probabilities on the grid space of model parameters, \(\log p(\theta)\). This log probabilities correspond to grid points defined in grid_param.

property prior

Prior probabilities on the grid space of model parameters, \(p_0(\theta)\). This probabilities correspond to grid points defined in grid_param.

property post

Posterior probabilities on the grid space of model parameters, \(p(\theta)\). This probabilities correspond to grid points defined in grid_param.

property marg_post

Marginal posterior distributions for each parameter

property log_lik

Log likelihood \(p(y | d, \theta)\) for all discretized values of \(y\), \(d\), and \(\theta\).

property marg_log_lik

Marginal log likelihood \(\log p(y | d)\) for all discretized values for \(y\) and \(d\).

property ent

Entropy \(H(Y(d) | \theta) = -\sum_y p(y | d, \theta) \log p(y | d, \theta)\) for all discretized values for \(d\) and \(\theta\).

property ent_marg

Marginal entropy \(H(Y(d)) = -\sum_y p(y | d) \log p(y | d)\) for all discretized values for \(d\), where \(p(y | d)\) indicates the marginal likelihood.

property ent_cond

Conditional entropy \(H(Y(d) | \Theta) = \sum_\theta p(\theta) H(Y(d) | \theta)\) for all discretized values for \(d\), where \(p(\theta)\) indicates the posterior distribution for model parameters.

property mutual_info

Mutual information \(I(Y(d); \Theta) = H(Y(d)) - H(Y(d) | \Theta)\), where \(H(Y(d))\) indicates the marginal entropy and \(H(Y(d) | \Theta)\) indicates the conditional entropy.

property post_mean

A vector of estimated means for the posterior distribution. Its length is num_params.

property post_cov

An estimated covariance matrix for the posterior distribution. Its shape is (num_grids, num_params).

property post_sd

A vector of estimated standard deviations for the posterior distribution. Its length is num_params.

property dtype

The desired data-type for the internal vectors and matrixes, e.g., numpy.float64. Default is numpy.float32.

Added in version 0.4.0.

reset()

Reset the engine as in the initial state.

get_design(kind='optimal')

Choose a design with given one of following types:

  • 'optimal' (default): an optimal design \(d^*\) that maximizes the mutual information.

  • 'random': a design randomly chosen.

Parameters:

kind ({‘optimal’, ‘random’}, optional) – Type of a design to choose. Default is 'optimal'.

Returns:

design (Dict[str, any] or None) – A chosen design vector to use for the next trial. Returns None if there is no design available.

update(design, response)

Update the posterior probabilities \(p(\theta | y, d^*)\) for all discretized values of \(\theta\).

\[p(\theta | y, d^*) \sim p( y | \theta, d^*) p(\theta)\]
# Given design and resposne as `design` and `response`,
# the engine can update probability with the following line:
engine.update(design, response)

Also, it can takes multiple observations for updating posterior probabilities. Multiple pairs of design and response should be given as a list of designs and a list of responses, into design and response argument, respectively.

\[\begin{split}\begin{aligned} p\big(\theta | y_1, \ldots, y_n, d_1^*, \ldots, d_n^*\big) &\sim p\big( y_1, \ldots, y_n | \theta, d_1^*, \ldots, d_n^* \big) p(\theta) \\ &= p(y_1 | \theta, d_1^*) \cdot \ldots \cdot p(y_n | \theta, d_n^*) p(\theta) \end{aligned}\end{split}\]
# Given a list of designs and corresponding responses as below:
designs = [design1, design2, design3]
responses = [response1, response2, response3]

# the engine can update with multiple observations:
engine.update(designs, responses)
Parameters:
  • design (dict or pandas.Series or list of designs) – Design vector for given response

  • response (dict or pandas.Series or list of responses) – Any kinds of observed response