\( \newcommand{\TODO}[1]{{\color{red}TODO: {#1}}} \renewcommand{\vec}[1]{\mathbf{#1}} \newcommand{\state}{\vec{x}} \def\statet{\state_t} \def\statetp{\state_{t-1}} \def\statehist{\state_{1:t-1}} \def\statetn{\state_{t+1}} \def\obs{\meas} \def\obst{\obs_t} \def\act{a} \def\actt{\act_t} \def\acttp{\act_{t-1}} \def\acttn{\act_{t+1}} \def\Obs{\mathcal{O}} \def\ObsEnc{\Phi_o} \def\ObsProb{P_o} \def\ObsFunc{C} \def\ObsFuncFull{\ObsFunc(\statet, \actt) \rightarrow \obst} \def\ObsFuncInv{\ObsFunc^{-1}} \def\ObsFuncInvFull{\ObsFuncInv(\obst, \statetp, \actt) \rightarrow \statet} \def\StateSp{\mathcal{X}} \def\Action{\mathcal{A}} \def\TransP{P_{T}} \def\Trans{T} \def\TransFull{\Trans(\statet, \actt) \rightarrow \statetn} \def\TransObs{T_c} \def\Rew{R} \def\rew{r} \def\rewards{\vec{r}_{1:t}} \def\rewt{\rew_t} \def\rewtp{\rew_{t-1}} \def\rewtn{\rew_{t+1}} \def\RewFull{\Rew(\statet, \actt) \rightarrow \rewtn} \def\TransObsFull{\TransObs(\statet, \obst, \actt, \rewt; \theta_T) \rightarrow \statetn} \def\Value{V} \def\pit{\pi_t} \def\piDef{\pi(\acttn|\statet, \obst, \actt, \rewt; \theta_\pi) \rightarrow \pit(\acttn ; \theta_\pi)} \def\Valuet{\Value_t} \def\ValueDef{\Value(\statet, \obst, \actt, \rewt; \theta_\Value) \rightarrow \Valuet(\theta_\Value)} \def\R{\mathbb{R}} \def\E{\mathbb{E}} \newcommand{\Goal}{\mathcal{G}} \newcommand{\goalRV}{G} \newcommand{\meas}{z} \newcommand{\measurements}{\vec{\meas}_{1:t}} \newcommand{\meast}[1][t]{\meas_{#1}} \newcommand{\param}{\theta} \newcommand{\policy}{\pi} \newcommand{\graph}{G} \newcommand{\vtces}{V} \newcommand{\edges}{E} \newcommand{\st}{\state} \newcommand{\stn}{\st_{t+1}} \newcommand{\stt}{\st_t} \newcommand{\stk}{\st_k} \newcommand{\stj}{\st_j} \newcommand{\sti}{\st_i} \newcommand{\St}{\mathcal{S}} \newcommand{\Act}{\mathcal{A}} \newcommand{\acti}{\act_i} \newcommand{\lpt}{\delta} \newcommand{\trans}{P_T} \newcommand{\Q}{\qValue} \newcommand{\fwcost}{Q} \newcommand{\fw}{\fwcost} \newcommand{\qValue}{Q} \newcommand{\prew}{\Upsilon} \newcommand{\epiT}{T} \newcommand{\vma}{\alpha_\Value} \newcommand{\qma}{\alpha_\qValue} \newcommand{\prewma}{\alpha_\prew} \newcommand{\fwma}{\alpha_\fwcost} \newcommand{\maxValueBeam}{\vec{\state}_{\Value:\text{max}(m)}} \newcommand{\nil}{\emptyset} \newcommand{\discount}{\gamma} \newcommand{\minedgecost}{\fwcost_0} \newcommand{\goal}{g} \newcommand{\pos}{x} %\newcommand{\fwargs}[5]{\fw_{#4}^{#5}\left({#3}\middle|{#1}, {#2}\right)} \newcommand{\fwargs}[5]{\fw_{#4}^{#5}\left({#1}, {#2}, {#3}\right)} \newcommand{\Rgoal}{R_{\text{goal}}} \newcommand{\Loo}{Latency-1:\textgreater1} \newcommand{\Loss}{\mathcal{L}} \newcommand{\LossText}[1]{\Loss_{\text{#1}}} \newcommand{\LossDDPG}{\LossText{ddpg}} \newcommand{\LossStep}{\LossText{step}} \newcommand{\LossLo}{\LossText{lo}} \newcommand{\LossUp}{\LossText{up}} \newcommand{\LossTrieq}{\LossText{trieq}} \newcommand{\tgt}{\text{tgt}} \newcommand{\Qstar}{\Q_{*}} \newcommand{\Qtgt}{\Q_{\text{tgt}}} \newcommand{\ytgt}{y_t} % Symbols \newcommand{\ctrl}{\vec{u}} \newcommand{\Ctrl}{\mathcal{U}} \newcommand{\Data}{\mathcal{D}} \newcommand{\stdt}{\dot{\state}} \newcommand{\StDt}{\dot{\StateSp}} \newcommand{\dynSt}{f} \newcommand{\dynCt}{g} \newcommand{\bDynSt}{\bar{\dynSt}} \newcommand{\bDynCt}{\bar{\dynCt}} \newcommand{\dynAff}{F} \newcommand{\bDynAff}{\bar{\dynAff}} \newcommand{\ctrlaff}{\underline{\mathbf{\ctrl}}} \newcommand{\smallbmat}[1]{\left[\begin{smallmatrix}#1\end{smallmatrix}\right]} \newcommand{\Knl}{K} \newcommand{\knl}{\kappa} \newcommand{\bKx}{k_\state} \newcommand{\bKF}{k_\dynAff} \newcommand{\bKFu}{k_{\dynAff\ctrl}} \newcommand{\bKFx}{k_{\dynAff\state}} \newcommand{\bKFux}{k_{\dynAff\ctrl\state}} \newcommand{\covf}{\text{cov}} \newcommand{\dt}{\delta t} \newcommand{\dSt}{\stdt} \newcommand{\N}{\mathcal{N}} \newcommand{\StDat}{\mathbf{X}} \newcommand{\StDtDat}{\dot{\mathbf{X}}} \newcommand{\CtDat}{\underline{\boldsymbol{\mathcal{U}}}_{1:k}} \newcommand{\mat}[1]{{#1}} \newcommand{\Y}{\mat{Y}} \newcommand{\bY}{\bar{\Y}} \newcommand{\W}{\mat{W}} \newcommand{\V}{\mat{V}} \newcommand{\mH}{\mat{H}} \newcommand{\KH}{\Knl^\mH} \newcommand{\kH}{\knl^\mH} \newcommand{\GP}{\mathcal{GP}} \newcommand{\kDA}{\knl^\dynAff} \newcommand{\KDA}{\Knl^\dynAff} %\newcommand{\M}{\mathcal{M}} \newcommand{\kh}{\knl^{\dynAff\ctrlaff}} \newcommand{\KDat}{\mathfrak{K}} \newcommand{\kDat}{\bm{\knl}} \newcommand{\KhDat}{\KDat^{\dynAff\ctrlaff}} \newcommand{\khDADat}{\kDat^{\dynAff\ctrlaff\dynAff}} \newcommand{\khDA}{\knl^{\dynAff\ctrlaff\dynAff}} \newcommand{\dynAffDat}{\mathbf{\dynAff}} \newcommand{\grad}{\nabla} \newcommand{\Lie}{\mathcal{L}} \newcommand{\tdf}{\tilde{f}} \newcommand{\tdg}{\tilde{g}} \newcommand{\barf}{\bar{f}} \newcommand{\barg}{\bar{g}} \newcommand{\erf}{\textit{erf}} \newcommand{\etal}{et~al.} \newcommand{\CBC}{\mbox{CBC}} \newcommand{\CBCtwo}{\CBC^{(2)}} \newcommand{\CBCr}{\CBC^{(r)}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\tdbff}{\bff^*_k} \newcommand{\mDynAffs}{\bfM_k} \newcommand{\bfBs}{\bfB_k} \DeclareMathOperator{\vect}{\textit{vec}} \DeclareMathOperator{\diag}{\mathbf{diag}} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\Cov}{\mathbf{Cov}} \DeclareMathOperator{\Var}{Var} % Calligraphic fonts \newcommand{\calA}{{\cal A}} \newcommand{\calB}{{\cal B}} \newcommand{\calC}{{\cal C}} \newcommand{\calD}{{\cal D}} \newcommand{\calE}{{\cal E}} \newcommand{\calF}{{\cal F}} \newcommand{\calG}{{\cal G}} \newcommand{\calH}{{\cal H}} \newcommand{\calI}{{\cal I}} \newcommand{\calJ}{{\cal J}} \newcommand{\calK}{{\cal K}} \newcommand{\calL}{{\cal L}} \newcommand{\calM}{{\cal M}} \newcommand{\calN}{{\cal N}} \newcommand{\calO}{{\cal O}} \newcommand{\calP}{{\cal P}} \newcommand{\calQ}{{\cal Q}} \newcommand{\calR}{{\cal R}} \newcommand{\calS}{{\cal S}} \newcommand{\calT}{{\cal T}} \newcommand{\calU}{{\cal U}} \newcommand{\calV}{{\cal V}} \newcommand{\calW}{{\cal W}} \newcommand{\calX}{{\cal X}} \newcommand{\calY}{{\cal Y}} \newcommand{\calZ}{{\cal Z}} % Sets: \newcommand{\setA}{\textsf{A}} \newcommand{\setB}{\textsf{B}} \newcommand{\setC}{\textsf{C}} \newcommand{\setD}{\textsf{D}} \newcommand{\setE}{\textsf{E}} \newcommand{\setF}{\textsf{F}} \newcommand{\setG}{\textsf{G}} \newcommand{\setH}{\textsf{H}} \newcommand{\setI}{\textsf{I}} \newcommand{\setJ}{\textsf{J}} \newcommand{\setK}{\textsf{K}} \newcommand{\setL}{\textsf{L}} \newcommand{\setM}{\textsf{M}} \newcommand{\setN}{\textsf{N}} \newcommand{\setO}{\textsf{O}} \newcommand{\setP}{\textsf{P}} \newcommand{\setQ}{\textsf{Q}} \newcommand{\setR}{\textsf{R}} \newcommand{\setS}{\textsf{S}} \newcommand{\setT}{\textsf{T}} \newcommand{\setU}{\textsf{U}} \newcommand{\setV}{\textsf{V}} \newcommand{\setW}{\textsf{W}} \newcommand{\setX}{\textsf{X}} \newcommand{\setY}{\textsf{Y}} \newcommand{\setZ}{\textsf{Z}} % Vectors \newcommand{\bfa}{\mathbf{a}} \newcommand{\bfb}{\mathbf{b}} \newcommand{\bfc}{\mathbf{c}} \newcommand{\bfd}{\mathbf{d}} \newcommand{\bfe}{\mathbf{e}} \newcommand{\bff}{\mathbf{f}} \newcommand{\bfg}{\mathbf{g}} \newcommand{\bfh}{\mathbf{h}} \newcommand{\bfi}{\mathbf{i}} \newcommand{\bfj}{\mathbf{j}} \newcommand{\bfk}{\mathbf{k}} \newcommand{\bfl}{\mathbf{l}} \newcommand{\bfm}{\mathbf{m}} \newcommand{\bfn}{\mathbf{n}} \newcommand{\bfo}{\mathbf{o}} \newcommand{\bfp}{\mathbf{p}} \newcommand{\bfq}{\mathbf{q}} \newcommand{\bfr}{\mathbf{r}} \newcommand{\bfs}{\mathbf{s}} \newcommand{\bft}{\mathbf{t}} \newcommand{\bfu}{\mathbf{u}} \newcommand{\bfv}{\mathbf{v}} \newcommand{\bfw}{\mathbf{w}} \newcommand{\bfx}{\mathbf{x}} \newcommand{\bfy}{\mathbf{y}} \newcommand{\bfz}{\mathbf{z}} \newcommand{\bfalpha}{\boldsymbol{\alpha}} \newcommand{\bfbeta}{\boldsymbol{\beta}} \newcommand{\bfgamma}{\boldsymbol{\gamma}} \newcommand{\bfdelta}{\boldsymbol{\delta}} \newcommand{\bfepsilon}{\boldsymbol{\epsilon}} \newcommand{\bfzeta}{\boldsymbol{\zeta}} \newcommand{\bfeta}{\boldsymbol{\eta}} \newcommand{\bftheta}{\boldsymbol{\theta}} \newcommand{\bfiota}{\boldsymbol{\iota}} \newcommand{\bfkappa}{\boldsymbol{\kappa}} \newcommand{\bflambda}{\boldsymbol{\lambda}} \newcommand{\bfmu}{\boldsymbol{\mu}} \newcommand{\bfnu}{\boldsymbol{\nu}} \newcommand{\bfomicron}{\boldsymbol{\omicron}} \newcommand{\bfpi}{\boldsymbol{\pi}} \newcommand{\bfrho}{\boldsymbol{\rho}} \newcommand{\bfsigma}{\boldsymbol{\sigma}} \newcommand{\bftau}{\boldsymbol{\tau}} \newcommand{\bfupsilon}{\boldsymbol{\upsilon}} \newcommand{\bfphi}{\boldsymbol{\phi}} \newcommand{\bfchi}{\boldsymbol{\chi}} \newcommand{\bfpsi}{\boldsymbol{\psi}} \newcommand{\bfomega}{\boldsymbol{\omega}} \newcommand{\bfxi}{\boldsymbol{\xi}} \newcommand{\bfell}{\boldsymbol{\ell}} % Matrices \newcommand{\bfA}{\mathbf{A}} \newcommand{\bfB}{\mathbf{B}} \newcommand{\bfC}{\mathbf{C}} \newcommand{\bfD}{\mathbf{D}} \newcommand{\bfE}{\mathbf{E}} \newcommand{\bfF}{\mathbf{F}} \newcommand{\bfG}{\mathbf{G}} \newcommand{\bfH}{\mathbf{H}} \newcommand{\bfI}{\mathbf{I}} \newcommand{\bfJ}{\mathbf{J}} \newcommand{\bfK}{\mathbf{K}} \newcommand{\bfL}{\mathbf{L}} \newcommand{\bfM}{\mathbf{M}} \newcommand{\bfN}{\mathbf{N}} \newcommand{\bfO}{\mathbf{O}} \newcommand{\bfP}{\mathbf{P}} \newcommand{\bfQ}{\mathbf{Q}} \newcommand{\bfR}{\mathbf{R}} \newcommand{\bfS}{\mathbf{S}} \newcommand{\bfT}{\mathbf{T}} \newcommand{\bfU}{\mathbf{U}} \newcommand{\bfV}{\mathbf{V}} \newcommand{\bfW}{\mathbf{W}} \newcommand{\bfX}{\mathbf{X}} \newcommand{\bfY}{\mathbf{Y}} \newcommand{\bfZ}{\mathbf{Z}} \newcommand{\bfGamma}{\boldsymbol{\Gamma}} \newcommand{\bfDelta}{\boldsymbol{\Delta}} \newcommand{\bfTheta}{\boldsymbol{\Theta}} \newcommand{\bfLambda}{\boldsymbol{\Lambda}} \newcommand{\bfPi}{\boldsymbol{\Pi}} \newcommand{\bfSigma}{\boldsymbol{\Sigma}} \newcommand{\bfUpsilon}{\boldsymbol{\Upsilon}} \newcommand{\bfPhi}{\boldsymbol{\Phi}} \newcommand{\bfPsi}{\boldsymbol{\Psi}} \newcommand{\bfOmega}{\boldsymbol{\Omega}} % Blackboard Bold: \newcommand{\bbA}{\mathbb{A}} \newcommand{\bbB}{\mathbb{B}} \newcommand{\bbC}{\mathbb{C}} \newcommand{\bbD}{\mathbb{D}} \newcommand{\bbE}{\mathbb{E}} \newcommand{\bbF}{\mathbb{F}} \newcommand{\bbG}{\mathbb{G}} \newcommand{\bbH}{\mathbb{H}} \newcommand{\bbI}{\mathbb{I}} \newcommand{\bbJ}{\mathbb{J}} \newcommand{\bbK}{\mathbb{K}} \newcommand{\bbL}{\mathbb{L}} \newcommand{\bbM}{\mathbb{M}} \newcommand{\bbN}{\mathbb{N}} \newcommand{\bbO}{\mathbb{O}} \newcommand{\bbP}{\mathbb{P}} \newcommand{\bbQ}{\mathbb{Q}} \newcommand{\bbR}{\mathbb{R}} \newcommand{\bbS}{\mathbb{S}} \newcommand{\bbT}{\mathbb{T}} \newcommand{\bbU}{\mathbb{U}} \newcommand{\bbV}{\mathbb{V}} \newcommand{\bbW}{\mathbb{W}} \newcommand{\bbX}{\mathbb{X}} \newcommand{\bbY}{\mathbb{Y}} \newcommand{\bbZ}{\mathbb{Z}} \newcommand{\CBCr}{\mbox{CBC}^{(r)}} \) \( \newenvironment{proof}{\paragraph{Proof:}}{\hfill$\square$} %\newtheorem{theorem}{Theorem} %\theoremstyle{remark} %\newtheorem{lemma}{Lemma} %\newtheorem{remark}{Remark} %\theoremstyle{definition} \newtheorem{defn}{Definition} %\theoremstyle{definition} \newtheorem{exmp}{Example} \newtheorem{conj}{Conjecture} %\newtheorem{corollary}{Corollary} \newtheorem{Proposition}{Proposition} \newtheorem{ansatz}{Assumption} \newtheorem{problem}{Problem} \newcommand{\oprocendsymbol}{\hbox{$\bullet$}} \newcommand{\oprocend}{\relax\ifmmode\else\unskip\hfill\fi\oprocendsymbol} \def\eqoprocend{\tag*{$\bullet$}} \newcommand{\blue}[1]{\color{blue}{#1}} %% math functions \newcommand{\modulo}{\text{mod}} %% symbols \newcommand{\real}{\mathbb{R}} \newcommand{\integers}{\mathbb{N}} \newcommand{\complex}{\mathbb{C}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\softmax}{softmax} \DeclareMathOperator*{\Tr}{Tr} \DeclareMathOperator*{\RE}{Re} \DeclareMathOperator*{\IM}{Im} \newcommand{\trc}{\mathbf{trc}} \newcommand{\Cov}{\mathbf{Cov}} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\scaleMathLine}[2][1]{\resizebox{#1\linewidth}{!}{$\displaystyle{#2}$}} \)

Towards safe robots that learn

Vikas Dhiman
Postdoc at UCSD

Success of Reinforcement Learning

We want autonomous cars

Google trends for 'Autonomous cars'

Why?

Big Data is not enough.

Data brings uncertainity.

How to handle uncertainity safely?

My Background

Today's focus

Given:
  • Map and localization (Full observability)
  • Desired trajectory as a plan
  • Unsafe regions
Unknown (to learn from samples):
  • Robot system dynamics
Want:
  • Follow trajectory avoiding unsafe actions

Problem formulation

  • \begin{align} \label{eq:system_dyanmics} \dot{\bfx} = f(\bfx) + g(\bfx)\bfu = \begin{bmatrix} f(\bfx) & g(\bfx)\end{bmatrix} \begin{bmatrix}1\\\bfu\end{bmatrix} =: F(\bfx) \ctrlaff \end{align}
  • \[ \vect(F(\bfx)) \sim \GP(\vect(\bfM_0(.)), \bfK_0(.,.)) \]
  • \begin{align} \min_{\bfu_k \in \mathcal{U}}& \text{ Task cost function } \\ \qquad\text{s.t.}&~~\bbP\bigl( \text{ Safety constraint } \mid \bfx_k,\bfu_k \bigr) \ge \tilde{p}_k, \end{align}
    \begin{align} \min_{\bfu_k \in \mathcal{U}}& \|\bfu_k - \cssId{highlight-border-red-1}{\class{fragment}{\pi_\epsilon(\bfx_k)}} \|_Q \\ \qquad\text{s.t.}&~~\bbP\bigl( \cssId{highlight-border-red-1}{\class{fragment}{h(\bfx) > \zeta_h > 0}} \mid \bfx_k,\bfu_k \bigr) \ge \tilde{p}_k, \end{align}

Approach

  • Estimate \(F(\bfx)\) with uncertainity.
  • Propagate uncertainty to the Safety condition.
  • Extension to continous time using Lipchitz continuity assumptions.
  • Extension to higher relative degree systems.

Matrix Variate Gaussian Processes

\[ \vect(F(\bfx)) \sim \GP(\vect(\bfM_0(.)), \bfK_0(.,.)) \]
Option 1: Learn each matrix element independently \[ \bfK_0(\bfx, \bfx')_{i, j} = \kappa(\bfx, \bfx') \] No correlation across dimensions
Option 2: Alvarez et al (FTML 2012): \[ \bfK_0(\bfx, \bfx') = \kappa(\bfx, \bfx') \boldsymbol{\Sigma} \] \(\Sigma \in \R^{n(1+m) \times (1+m)n}\) has too many parameters to learn
Option 3: Sun et al (AISTATS 2017)
\[ F \sim \mathcal{MVG}(\bfM, \bfA, \bfB) \Leftrightarrow \vect(F) \sim \calN(\vect(M), \bfB \otimes \bfA) \]
\[ \bfK_0(\bfx, \bfx') = \bfB_0(\bfx, \bfx') \otimes \bfA \]

Factorization assumption: \[ \vect(F(\bfx)) \sim \GP(\vect(\bfM_0(.)), \bfB_0(.,.) \otimes \bfA) \]

Matrix variate Gaussian Process

\( \newcommand{\prl}[1]{\left(#1\right)} \newcommand{\brl}[1]{\left[#1\right]} \newcommand{\crl}[1]{\left\{#1\right\}} \) \begin{equation} \begin{aligned} \vect(F(\bfx)) &\sim \mathcal{GP}(\vect(\bfM_0(\bfx)), \bfB_0(\bfx,\bfx') \otimes \bfA) %F(\bfx)\underline{\bfu} &\sim \mathcal{GP}(\bfM_0(\bfx)\underline{\bfu}, \underline{\bfu}^\top \bfB_0(\bfx,\bfx') \underline{\bfu}' \otimes \bfA) \end{aligned} \end{equation}
Given data \(\StDat_{1:k} := [\bfx(t_1), \dots, \bfx(t_k)]\), \(\StDtDat_{1:k}=[\dot{\bfx}(t_1), \dots, \dot{\bfx}(t_k)] \), and \( \underline{\boldsymbol{\mathcal{U}}}_{1:k}:= \diag(\ctrlaff_1, \dots, \ctrlaff_k) \).
\begin{equation*} \begin{aligned} \bfM_k(\bfx_*) &:= \bfM_0(\bfx_*) + \prl{ \dot{\bfX}_{1:k} - \boldsymbol{\mathcal{M}}_{1:k}\underline{\boldsymbol{\mathcal{U}}}_{1:k}} \prl{\underline{\boldsymbol{\mathcal{U}}}_{1:k}^\top\bfB_0(\bfX_{1:k},\bfX_{1:k})\underline{\boldsymbol{\mathcal{U}}}_{1:k}}^{-1}\underline{\boldsymbol{\mathcal{U}}}_{1:k}^\top\bfB_0(\bfX_{1:k},\bfx_*)\\ \bfB_k(\bfx_*,\bfx_*') &:= \bfB_0(\bfx_*,\bfx_*') + \bfB_0(\bfx_*,\bfX_{1:k})\underline{\boldsymbol{\mathcal{U}}}_{1:k}\prl{\underline{\boldsymbol{\mathcal{U}}}_{1:k}^\top\bfB_0(\bfX_{1:k},\bfX_{1:k})\underline{\boldsymbol{\mathcal{U}}}_{1:k}}^{-1}\underline{\boldsymbol{\mathcal{U}}}_{1:k}^\top\bfB_0(\bfX_{1:k},\bfx_*') \label{eq:mvg-posterior} \end{aligned} \end{equation*}
Inference on MVGP: \begin{align} \vect(F_k(\bfx_*)) &\sim \mathcal{GP}(\vect(\bfM_k(\bfx_*)), \; \bfB_k(\bfx_*,\bfx_*') \otimes \bfA). \\ F_k(\bfx_*)\underline{\bfu}_* &\sim \mathcal{GP}(\bfM_k(\bfx_*)\underline{\bfu}_*, \; \underline{\bfu}_*^\top\bfB_k(\bfx_*,\bfx_*')\underline{\bfu}_*\otimes\bfA). \end{align}

Approach

  • Estimate \(F(\bfx)\) with Matrix-Variate Gaussian Process
  • Propagate uncertainty to the Safety condition
  • Extension to continous time using Lipchitz continuity assumptions.
  • Extension to higher relative degree systems.

Control Barrier Functions

  • For differentiable \( h(\bfx) \),
    safe set is \( \calC = \{ \bfx \in \calX : h(\bfx) > 0 \} \)
  • Assume \( \grad_\bfx h(\bfx) \ne 0 \quad \forall x \in \partial \calC \)
  • Assume system starts in safe state \( \bfx(0) \in \calC \)
  • Ames et al (ECC 2019): \begin{multline} \text{ System stays safe } \Leftrightarrow~~\exists~\bfu = \pi(\bfx)~~\text{s.t.}\\ \mbox{CBC}(\bfx,\bfu) := \Lie_f h(\bfx) + \Lie_g h(\bfx)\bfu + \alpha(h(\bfx)) \ge 0 \;~ \forall \bfx \in \calX. \end{multline} where \( \alpha(y) \) is some extended class \( \calK_\infty \) function

Uncertainity propagation to CBC

  • \begin{align} \mbox{CBC}(\bfx, \bfu) &:= \Lie_{f}h(\bfx) + \Lie_{g}h(\bfx)\bfu + \alpha(h(\bfx)) \end{align}
  • \[ \mbox{CBC}(\bfx, \bfu)= \grad_\bfx h(\bfx)F_k(\bfx)\ctrlaff + \alpha(h(\bfx)) \]
  • Recall: \begin{equation} F_k(\bfx_*)\underline{\bfu}_* \sim \mathcal{GP}(\bfM_k(\bfx_*)\underline{\bfu}_*, \underline{\bfu}_*^\top\bfB_k(\bfx_*,\bfx_*')\underline{\bfu}_*\otimes\bfA). \end{equation}
  • Lemma : \[ \mbox{CBC}(\bfx, \bfu) \sim \GP(\E[\mbox{CBC}], \Var(\mbox{CBC})) \] \begin{align} \label{eq:parametofpi5543} \E[\mbox{CBC}_k](\bfx, \bfu) &= \nabla_\bfx h(\bfx)^\top \bfM_k(\bfx)\underline{\bfu} + \alpha(h(\bfx)),\\ \Var[\mbox{CBC}_k](\bfx, \bfx'; \bfu) &= \underline{\bfu}^\top\bfB_k(\bfx,\bfx')\underline{\bfu} \nabla_\bfx h(\bfx)^{\top}\bfA\nabla_\bfx h(\bfx') \end{align} Note: mean and variance are Affine and Quadratic in \( \bfu \) respectively.

Deterministic condition for controller

  • \begin{align} \min_{\bfu_k \in \mathcal{U}}& \text{ Task cost function } \\ \qquad\text{s.t.}&~~\bbP\bigl( \text{ Safety constraint } \mid \bfx_k,\bfu_k \bigr) \ge \tilde{p}_k, \end{align}
    \begin{align} \min_{\bfu_k \in \mathcal{U}}& \|\bfu_k - \pi_\epsilon(\bfx_k) \|_Q \\ \qquad\text{s.t.}&~~\bbP\bigl( \style{color:red}{\mbox{CBC}(\bfx_k, \bfu_k) > \zeta > 0} \mid \bfx_k,\bfu_k \bigr) \ge \tilde{p}_k, \end{align}
  • \[ \newcommand{\CBC}{\mbox{CBC}} \bbP\bigl(\mbox{CBC}(\bfx_k, \bfu_k) > \zeta \mid \bfx_k,\bfu_k \bigr) \ge \tilde{p}_k \\ \Leftrightarrow \frac{1}{2}-\frac{1}{2} \erf\left( \frac{\zeta - \E[\CBC] }{\sqrt{2\Var(\CBC)}} \right) \ge \tilde{p}_k \] where \( \erf(y) \) is there error function.
  • Safe controller (an SOCP): \begin{align} \min_{\bfu_k \in \mathcal{U}}& \|\bfu_k - \pi_\epsilon(\bfx_k) \|_Q \\ \qquad\text{s.t.}\qquad& \cssId{highlight-current-red-1}{\class{fragment}{ \E[\CBC] - \zeta \ge \sqrt{2\Var(\CBC)(\erf^{-1}(1-2\tilde{p}_k))^2} }} \end{align}

Approach

  • Estimate \(F(\bfx)\) with Matrix-Variate Gaussian Process
  • Propagate uncertainty to the Control Barrier condition.
  • Extension to continous time using Lipchitz continuity assumptions.
  • Extension to higher relative degree systems.

Safety beyond triggering times

  • So far: \begin{align} \min_{\bfu_k \in \mathcal{U}}& \|\bfu_k - \pi_\epsilon(\bfx_k) \|_Q \\ \qquad\text{s.t.}&~~ \bbP\bigl( \mbox{CBC}(\style{color:red}{\bfx_k}, \bfu_k) > \style{color:red}{\zeta} \mid \bfx_k,\bfu_k \bigr) \ge \style{color:red}{\tilde{p}_k}, \end{align}
  • Next: \begin{align} \min_{\bfu_k \in \mathcal{U}}& \|\bfu_k - \pi_\epsilon(\bfx_k) \|_Q \\ \qquad\text{s.t.}&~~ \bbP\bigl( \mbox{CBC}(\style{color:red}{\bfx(t)}, \bfu_k) > \style{color:red}{0} \mid \bfx_k,\bfu_k \bigr) \ge \style{color:red}{p_k}, \qquad \style{color:red}{\forall t \in [t_k, \tau_k)} \end{align}

Safety beyond triggering times

  • Assume Lipchitz continuity of dynamics: \begin{align} \textstyle \label{eq:smoth23} \bbP\left( \sup_{s \in [0, \tau_k)}\|F(\bfx(t_k+s))\ctrlaff_k -F(\bfx(t_k))\ctrlaff_k\| \le L_k \|\bfx(t_k+s)-\bfx_k\| \right) \ge q_k:=1-e^{-b_kL_k}. \end{align}
  • Assume Lipchitz continuity of \( \alpha(h(\bfx)) \): \begin{align} \label{htym6!7uytf} |\alpha \circ h(\bfx(t_k+s))-\alpha \circ h(\bfx_k)| \le L_{\alpha \circ h} \|\bfx(t_k+s)-\bfx_k\|. \end{align}
Theorem: \[ \bbP\bigl( \mbox{CBC}(\bfx_k, \bfu_k) > \zeta \mid \bfx_k,\bfu_k \bigr) \ge \tilde{p}_k \quad\Rightarrow\quad \bbP\bigl( \mbox{CBC}(\bfx(t), \bfu_k) > 0 \mid \bfx_k,\bfu_k \bigr) \ge p_k, \; \forall t \in [t_k, \tau_k) \] holds with \( p_k = \tilde{p}_k q_k \) and \( \tau_k \le \frac{1}{L_k}\ln\left(1+\frac{L_k\zeta}{(\chi_kL_k+L_{\alpha \circ h})\|\dot{\bfx}_k\|}\right) \)

Approach

  • Estimate \(F(\bfx)\) with Matrix-Variate Gaussian Process
  • Propagate uncertainty to the Control Barrier condition.
  • Extension to continous time using Lipchitz continuity assumptions.
  • Extension to higher relative degree systems.

Higher relative degree CBFs

  • \begin{align} \begin{bmatrix} \dot{\theta} \\ \dot{\omega} \end{bmatrix} = \underbrace{\begin{bmatrix} \omega \\ -\frac{g}{l} \sin(\theta) \end{bmatrix}}_{f(\bfx)} + \underbrace{\begin{bmatrix} 0 \\ \frac{1}{ml} \end{bmatrix}}_{g(\bfx)} u \end{align}
  • \begin{align} h\left(\begin{bmatrix} \theta \\ \omega \end{bmatrix} \right) = \cos(\Delta_{col}) - \cos(\theta - \theta_c) \end{align}
  • Note that \( \Lie_g h(\bfx) = \grad h(\bfx) g(\bfx) = 0 \)
  • Thus \( \CBC(\bfx, \bfu) \) is independent of u.

Exponential Control Barrier Functions (ECBF)

  • \[ \CBCr(\bfx, \bfu) := \Lie_f^{(r)} h(\bfx) + \Lie_g \Lie_f^{(r-1)} h(\bfx) \bfu + K_\alpha \begin{bmatrix} h(\bfx) \\ \Lie_f h(\bfx) \\ \vdots \\ \Lie_f^{(r-1)} h(\bfx) \end{bmatrix} \]
  • \( r \ge 1 \) is the relative degree of CBF, \( h(\bfx) \), then \( \Lie_g \Lie_f^{k} h(\bfx) = 0, \; \forall k = \{0, \dots, r-2 \} \) and \( \Lie_g \Lie_f^{(r-1)} h(\bfx) \ne 0 \) and

Propagating uncertainity to \( \CBCtwo \)

  • \[ \CBCtwo(\bfx, \bfu) = [\grad_\bfx \Lie_f h(\bfx)]^\top F(\bfx)\ctrlaff + K_\alpha \begin{bmatrix} h(\bfx) & \Lie_f h(\bfx) \end{bmatrix}^\top \]
  • \( \Lie_f h(\bfx) = \grad_x h(\bfx) f(\bfx) \) is a Gaussian process
  • \( \grad_\bfx \Lie_f h(\bfx) \) is a Gaussian process
    • If \( p(\bfx) \sim \GP(\mu(\bfx), \kappa(\bfx, \bfx'))\), then
      \( \grad_\bfx p(\bfx) \sim \GP(\grad_\bfx \mu(\bfx), H_\bfx \kappa(\bfx, \bfx')) \)

Propagating uncertainity to \( \CBCtwo \)

  • \[ \CBCtwo(\bfx, \bfu) = [\grad_\bfx \Lie_f h(\bfx)]^\top F(\bfx)\ctrlaff + K_\alpha \begin{bmatrix} h(\bfx) & \Lie_f h(\bfx) \end{bmatrix}^\top \]
  • \( \Lie_f h(\bfx) = \grad_x h(\bfx) f(\bfx) \) is a Gaussian process
  • \( \grad_\bfx \Lie_f h(\bfx) \) is a Gaussian process
  • \( [\grad_\bfx \Lie_f h(\bfx)]^\top F(\bfx)\ctrlaff \) is a quadratic form of GP (not a GP )
    • \(\newcommand{\trc}{\text{tr}}\) If \(p(\bfx)\) and \(q(\bfy)\) are GPs then \(p(\bfx)^\top q(\bfx)\) is also a GP
      \begin{multline} p(\bfx)^\top q(\bfx) \sim \GP(\mu_p(\bfx)^\top \mu_q(\bfx) + \trc(\Cov_{p,q}(\bfx, \bfx)), \\ 2\trc(\Cov_{p,q}(\bfx, \bfx'))^2 ) + p(\bfx)^\top \kappa_q(\bfx, \bfx') p(\bfx') \\ + q(\bfx)^\top \kappa_p(\bfx, \bfx') q(\bfx') + 2 q(\bfx)^\top \Cov_{p,q}(\bfx, \bfx') p(\bfx') \end{multline}
  • \( \CBCtwo(\bfx, \bfu) \) is a quadratic form of GP.
    \( \E[\CBCtwo](\bfx, \bfu) \) is still affine in \( \bfu \).
    \( \Var[\CBCtwo](\bfx, \bfx'; \bfu) \) is still quadratic in \( \bfu \).

Extending to \(\CBCr\)

  • \[ \CBCr(\bfx, \bfu) = [\grad_\bfx \Lie_f^{(r)} h(\bfx)]^\top F(\bfx)\ctrlaff + K_\alpha \begin{bmatrix} h(\bfx) & \Lie_f h(\bfx) & \dots \Lie_f^{(r-1)} h(\bfx) \end{bmatrix}^\top \]
  • \( \CBCr(\bfx, \bfu) \) is not a GP
    \( \E[\CBCr](\bfx, \bfu) \) is still affine in \( \bfu \).
    \( \Var[\CBCr](\bfx, \bfx'; \bfu) \) is still quadratic in \( \bfu \).
  • For \( r \ge 3 \), \(\CBCr\) statistics can be estimated by Monte-carlo methods.

Safe controller using ECBF

  • \begin{align} \min_{\bfu_k \in \mathcal{U}}& \|\bfu_k - \pi_\epsilon(\bfx_k) \|_Q \\ \qquad\text{s.t.}&~~ \bbP\bigl( \CBCr(\bfx_k, \bfu_k) > \zeta \mid \bfx_k,\bfu_k \bigr) \ge \tilde{p}_k \end{align}
  • Using Cantelli's (Chebyshev's one-sided) inequality
  • Safe controller (an SOCP) \begin{align} \min_{\bfu_k \in \mathcal{U}}& \|\bfu_k - \pi_\epsilon(\bfx_k) \|_Q \\ \qquad\text{s.t.}\qquad &\E[\mbox{CBC}_k^{(r)}]-\zeta \ge \sqrt{\frac{\tilde{p}_k}{1-\tilde{p}_k}\Var[\mbox{CBC}_k^{(r)}]} \end{align}

Learning Experiments

  • \begin{align} \begin{bmatrix} \dot{\theta} \\ \dot{\omega} \end{bmatrix} = \underbrace{\begin{bmatrix} \omega \\ -\frac{g}{l} \sin(\theta) \end{bmatrix}}_{f(\bfx)} + \underbrace{\begin{bmatrix} 0 \\ \frac{1}{ml} \end{bmatrix}}_{g(\bfx)} u \end{align}
  • \begin{align} h\left(\begin{bmatrix} \theta \\ \omega \end{bmatrix} \right) = \cos(\Delta_{col}) - \cos(\theta - \theta_c) \end{align}

Safe controller using ECBF Experiments

  • \begin{align} \begin{bmatrix} \dot{\theta} \\ \dot{\omega} \end{bmatrix} = \underbrace{\begin{bmatrix} \omega \\ -\frac{g}{l} \sin(\theta) \end{bmatrix}}_{f(\bfx)} + \underbrace{\begin{bmatrix} 0 \\ \frac{1}{ml} \end{bmatrix}}_{g(\bfx)} u \end{align}
  • \begin{align} h\left(\begin{bmatrix} \theta \\ \omega \end{bmatrix} \right) = \cos(\Delta_{col}) - \cos(\theta - \theta_c) \end{align}


Take away

  • Safety guarantees in stochastic control-affine systems were formuated as Quadratic contraints on the control signal using Exponential Control Barrier Functions.

Ongoing work

  • More experiments (closer to the Motivation).
  • Entropy objective to pick optimal actions for reducing uncertainity.
  • Application of Hansen-Wright like inequalities for tighter bounds on \( \CBCr \)

Bibliography

Thank you. Questions?

Paper URL: arxiv.org/abs/1912.10116

\(^*\) These authors contributed equally.