GridWorld

class safemdp.GridWorld(gp, world_shape, step_size, beta, altitudes, h, S0, S_hat0, L, update_dist=0)

Grid world with Safe exploration

Parameters:

gp: GPy.core.GP

Gaussian process that expresses our current belief over the safety feature

world_shape: shape

Tuple that contains the shape of the grid world n x m

step_size: tuple of floats

Tuple that contains the step sizes along each direction to create a linearly spaced grid

beta: float

Scaling factor to determine the amplitude of the confidence intervals

altitudes: np.array

It contains the flattened n x m matrix where the altitudes of all the points in the map are stored

h: float

Safety threshold

S0: np.array

n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which state-action pairs belong to the initial safe seed. Notice that, by convention we initialize all the states to be safe

S_hat0: np.array or nan

n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which state-action pairs belong to the initial safe seed and satisfy recovery and reachability properties. If it is nan, such a boolean matrix is computed during initialization

noise: float

Standard deviation of the measurement noise

L: float

Lipschitz constant to compute expanders

update_dist: int

Distance in unweighted graph used for confidence interval update. A sample will only influence other nodes within this distance.

Methods

add_gp_observations(x_new, y_new) Add observations to the gp mode.
add_observation(node, action) Add an observation of the given state-action pair.
compute_S_hat() Compute the safely reachable set given the current safe_set.
compute_expanders() Compute the expanders based on the current estimate of S_hat.
plot_S(safe_set[, action]) Plot the set of safe states
target_sample() Compute the next target (s, a) to sample (highest uncertainty within
update_confidence_interval([jacobian]) Updates the lower and the upper bound of the confidence intervals
update_sets() Update the sets S, S_hat and G taking with the available observation
add_gp_observations(x_new, y_new)

Add observations to the gp mode.

add_observation(node, action)

Add an observation of the given state-action pair.

Observing the pair (s, a) means adding an observation of the altitude at s and an observation of the altitude at f(s, a)

Parameters:

node: int

Node index

action: int

Action index

compute_S_hat()

Compute the safely reachable set given the current safe_set.

compute_expanders()

Compute the expanders based on the current estimate of S_hat.

plot_S(safe_set, action=0)

Plot the set of safe states

Parameters:

safe_set: np.array(dtype=bool)

n_states x (n_actions + 1) array of boolean values that indicates the safe set

action: int

The action for which we want to plot the safe set.

target_sample()

Compute the next target (s, a) to sample (highest uncertainty within G or S_hat)

Returns:

node: int

The next node to sample

action: int

The next action to sample

update_confidence_interval(jacobian=False)

Updates the lower and the upper bound of the confidence intervals using then posterior distribution over the gradients of the altitudes

Returns:

l: np.array

lower bound of the safety feature (mean - beta*std)

u: np.array

upper bound of the safety feature (mean - beta*std)

update_sets()

Update the sets S, S_hat and G taking with the available observation