$\require{mathtools} % %%% GENERIC MATH %%% % % Environments \newcommand{\al}[1]{\begin{align}#1\end{align}} % need this for \tag{} to work \renewcommand{\r}{\mathrm} % % Greek \newcommand{\eps}{\epsilon} \newcommand{\veps}{\varepsilon} \newcommand{\Om}{\Omega} \newcommand{\om}{\omega} \newcommand{\Th}{\Theta} \let\fi\phi % because it looks like an f \let\phi\varphi % because it looks like a p % % Miscellaneous shortcuts % .. over and under \newcommand{\ss}[1]{_{\substack{#1}}} \newcommand{\ob}{\overbrace} \newcommand{\ub}{\underbrace} \newcommand{\ol}{\overline} \newcommand{\tld}{\widetilde} \newcommand{\HAT}{\widehat} \newcommand{\f}{\frac} \newcommand{\s}[2]{#1 /\mathopen{}#2} \newcommand{\rt}[1]{ {\sqrt{#1}}} % .. relations \newcommand{\sr}{\stackrel} \newcommand{\sse}{\subseteq} \newcommand{\ce}{\coloneqq} \newcommand{\ec}{\eqqcolon} \newcommand{\ap}{\approx} \newcommand{\ls}{\lesssim} \newcommand{\gs}{\gtrsim} % .. miscer \newcommand{\q}{\quad} \newcommand{\qq}{\qquad} \newcommand{\heart}{\heartsuit} % % Delimiters % (I needed to create my own because the MathJax version of \DeclarePairedDelimiter doesn't have \mathopen{} and that messes up the spacing) % .. one-part \newcommand{\p}[1]{\mathopen{}\left( #1 \right)} \newcommand{\b}[1]{\mathopen{}\left[ #1 \right]} \newcommand{\set}[1]{\mathopen{}\left\{ #1 \right\}} \newcommand{\abs}[1]{\mathopen{}\left\lvert #1 \right\rvert} \newcommand{\floor}[1]{\mathopen{}\left\lfloor #1 \right\rfloor} \newcommand{\ceil}[1]{\mathopen{}\left\lceil #1 \right\rceil} \newcommand{\inner}[1]{\mathopen{}\left\langle #1 \right\rangle} % .... (use phantom to force at least the standard height of double bars) \newcommand{\norm}[1]{\mathopen{}\left\lVert #1 \vphantom{f} \right\rVert} \newcommand{\frob}[1]{\norm{#1}_\mathrm{F}} %% .. two-part \newcommand{\incond}[2]{#1 \mathop{}\middle|\mathop{} #2} \newcommand{\cond}[2]{ {\left.\incond{#1}{#2}\right.}} \newcommand{\pco}[2]{\p{\incond{#1}{#2}}} \newcommand{\bco}[2]{\b{\incond{#1}{#2}}} \newcommand{\setco}[2]{\set{\incond{#1}{#2}}} \newcommand{\at}[2]{ {\left.#1\right|_{#2}}} \newcommand{\pat}[2]{\p{\at{#1}{#2}}} \newcommand{\bat}[2]{\b{\at{#1}{#2}}} % ..... (use phantom to force at least the standard height of double bar) \newcommand{\oldpara}[2]{#1\vphantom{f} \mathop{}\middle\|\mathop{} #2} %\newcommand{\para}[2]{#1\vphantom{f} \mathop{}\middle\|\mathop{} #2} \newcommand{\para}[2]{\mathchoice{\begin{matrix}#1\\\hdashline#2\end{matrix}}{\begin{smallmatrix}#1\\\hdashline#2\end{smallmatrix}}{\begin{smallmatrix}#1\\\hdashline#2\end{smallmatrix}}{\begin{smallmatrix}#1\\\hdashline#2\end{smallmatrix}}} \newcommand{\ppa}[2]{\p{\para{#1}{#2}}} \newcommand{\bpa}[2]{\b{\para{#1}{#2}}} %\newcommand{\bpaco}[4]{\bpa{\incond{#1}{#2}}{\incond{#3}{#4}}} \newcommand{\bpaco}[4]{\bpa{\cond{#1}{#2}}{\cond{#3}{#4}}} % % Levels of closeness \newcommand{\scirc}[1]{\sr{\circ}{#1}} \newcommand{\sdot}[1]{\sr{.}{#1}} \newcommand{\slog}[1]{\sr{\log}{#1}} \newcommand{\createClosenessLevels}[7]{ \newcommand{#2}{\mathrel{(#1)}} \newcommand{#3}{\mathrel{#1}} \newcommand{#4}{\mathrel{#1\!\!#1}} \newcommand{#5}{\mathrel{#1\!\!#1\!\!#1}} \newcommand{#6}{\mathrel{(\sdot{#1})}} \newcommand{#7}{\mathrel{(\slog{#1})}} } \let\lt\undefined \let\gt\undefined % .. vanilla versions (is it within a constant?) \newcommand{\ez}{\scirc=} \newcommand{\eq}{\simeq} \newcommand{\eqq}{\mathrel{\eq\!\!\eq}} \newcommand{\eqqq}{\mathrel{\eq\!\!\eq\!\!\eq}} \newcommand{\lez}{\scirc\le} \newcommand{\lq}{\preceq} \newcommand{\lqq}{\mathrel{\lq\!\!\lq}} \newcommand{\lqqq}{\mathrel{\lq\!\!\lq\!\!\lq}} \newcommand{\gez}{\scirc\ge} \newcommand{\gq}{\succeq} \newcommand{\gqq}{\mathrel{\gq\!\!\gq}} \newcommand{\gqqq}{\mathrel{\gq\!\!\gq\!\!\gq}} \newcommand{\lz}{\scirc<} \newcommand{\lt}{\prec} \newcommand{\ltt}{\mathrel{\lt\!\!\lt}} \newcommand{\lttt}{\mathrel{\lt\!\!\lt\!\!\lt}} \newcommand{\gz}{\scirc>} \newcommand{\gt}{\succ} \newcommand{\gtt}{\mathrel{\gt\!\!\gt}} \newcommand{\gttt}{\mathrel{\gt\!\!\gt\!\!\gt}} % .. dotted versions (is it equal in the limit?) \newcommand{\ed}{\sdot=} \newcommand{\eqd}{\sdot\eq} \newcommand{\eqqd}{\sdot\eqq} \newcommand{\eqqqd}{\sdot\eqqq} \newcommand{\led}{\sdot\le} \newcommand{\lqd}{\sdot\lq} \newcommand{\lqqd}{\sdot\lqq} \newcommand{\lqqqd}{\sdot\lqqq} \newcommand{\ged}{\sdot\ge} \newcommand{\gqd}{\sdot\gq} \newcommand{\gqqd}{\sdot\gqq} \newcommand{\gqqqd}{\sdot\gqqq} \newcommand{\ld}{\sdot<} \newcommand{\ltd}{\sdot\lt} \newcommand{\lttd}{\sdot\ltt} \newcommand{\ltttd}{\sdot\lttt} \newcommand{\gd}{\sdot>} \newcommand{\gtd}{\sdot\gt} \newcommand{\gttd}{\sdot\gtt} \newcommand{\gtttd}{\sdot\gttt} % .. log versions (is it equal up to log?) \newcommand{\elog}{\slog=} \newcommand{\eqlog}{\slog\eq} \newcommand{\eqqlog}{\slog\eqq} \newcommand{\eqqqlog}{\slog\eqqq} \newcommand{\lelog}{\slog\le} \newcommand{\lqlog}{\slog\lq} \newcommand{\lqqlog}{\slog\lqq} \newcommand{\lqqqlog}{\slog\lqqq} \newcommand{\gelog}{\slog\ge} \newcommand{\gqlog}{\slog\gq} \newcommand{\gqqlog}{\slog\gqq} \newcommand{\gqqqlog}{\slog\gqqq} \newcommand{\llog}{\slog<} \newcommand{\ltlog}{\slog\lt} \newcommand{\lttlog}{\slog\ltt} \newcommand{\ltttlog}{\slog\lttt} \newcommand{\glog}{\slog>} \newcommand{\gtlog}{\slog\gt} \newcommand{\gttlog}{\slog\gtt} \newcommand{\gtttlog}{\slog\gttt} % % Miscellaneous \newcommand{\LHS}{\mathrm{LHS}} \newcommand{\RHS}{\mathrm{RHS}} % .. operators \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\quasipoly}{quasipoly} \DeclareMathOperator{\negl}{negl} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} % .. functions \DeclareMathOperator{\id}{id} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\err}{err} \DeclareMathOperator{\ReLU}{ReLU} % .. analysis \let\d\undefined \newcommand{\d}{\operatorname{d}\mathopen{}} \newcommand{\df}[2]{ {\f{\d #1}{\d #2}}} \newcommand{\ds}[2]{ {\s{\d #1}{\d #2}}} \newcommand{\part}{\partial} \newcommand{\partf}[2]{\f{\part #1}{\part #2}} \newcommand{\parts}[2]{\s{\part #1}{\part #2}} \newcommand{\grad}[1]{\mathop{\nabla\!_{#1}}} % .. sets of numbers \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\F}{\mathbb{F}} % %%% SPECIALIZED MATH %%% % % Logic \renewcommand{\and}{\wedge} \newcommand{\AND}{\bigwedge} \newcommand{\or}{\vee} \newcommand{\OR}{\bigvee} \newcommand{\xor}{\oplus} \newcommand{\XOR}{\bigoplus} \newcommand{\union}{\cup} \newcommand{\inter}{\cap} \newcommand{\UNION}{\bigcup} \newcommand{\INTER}{\bigcap} \newcommand{\comp}{\overline} \newcommand{\true}{\r{true}} \newcommand{\false}{\r{false}} \newcommand{\tf}{\set{\true,\false}} \DeclareMathOperator{\One}{\mathbb{1}} \DeclareMathOperator{\1}{\mathbb{1}} % % Linear algebra \renewcommand{\span}{\mathrm{span}} \DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\proj}{proj} \DeclareMathOperator{\dom}{dom} \DeclareMathOperator{\Img}{Im} \newcommand{\transp}{\mathsf{T}} \renewcommand{\t}{^\transp} % ... named tensors \newcommand{\namedtensorstrut}{\vphantom{fg}} % milder than \mathstrut \newcommand{\name}[1]{\mathsf{\namedtensorstrut #1}} \newcommand{\nbin}[2]{\mathbin{\underset{\substack{#1}}{\namedtensorstrut #2}}} \newcommand{\ndot}[1]{\nbin{#1}{\odot}} \newcommand{\ncat}[1]{\nbin{#1}{\oplus}} \newcommand{\nsum}[1]{\sum\limits_{\substack{#1}}} \newcommand{\nfun}[2]{\mathop{\underset{\substack{#1}}{\namedtensorstrut\mathrm{#2}}}} \newcommand{\ndef}[2]{\newcommand{#1}{\name{#2}}} \newcommand{\nt}[1]{^{\transp(#1)}} % % Probability \newcommand{\Normal}{\mathcal{N}} \let\Pr\undefined \DeclareMathOperator*{\Pr}{Pr} \DeclareMathOperator*{\G}{\mathbb{G}} \DeclareMathOperator*{\Odds}{Od} \DeclareMathOperator*{\E}{E} \DeclareMathOperator*{\Var}{Var} \DeclareMathOperator*{\Cov}{Cov} \DeclareMathOperator*{\corr}{corr} \DeclareMathOperator*{\median}{median} \newcommand{\dTV}{d_{\mathrm{TV}}} \newcommand{\dHel}{d_{\mathrm{Hel}}} \newcommand{\dJS}{d_{\mathrm{JS}}} % ... information theory \let\H\undefined \DeclareMathOperator*{\H}{H} \DeclareMathOperator*{\I}{I} \DeclareMathOperator*{\D}{D} % %%% SPECIALIZED COMPUTER SCIENCE %%% % % Complexity classes % .. classical \newcommand{\Poly}{\mathsf{P}} \newcommand{\NP}{\mathsf{NP}} \newcommand{\PH}{\mathsf{PH}} \newcommand{\PSPACE}{\mathsf{PSPACE}} \renewcommand{\L}{\mathsf{L}} % .. probabilistic \newcommand{\formost}{\mathsf{Я}} \newcommand{\RP}{\mathsf{RP}} \newcommand{\BPP}{\mathsf{BPP}} \newcommand{\MA}{\mathsf{MA}} \newcommand{\AM}{\mathsf{AM}} \newcommand{\IP}{\mathsf{IP}} \newcommand{\RL}{\mathsf{RL}} % .. circuits \newcommand{\NC}{\mathsf{NC}} \newcommand{\AC}{\mathsf{AC}} \newcommand{\ACC}{\mathsf{ACC}} \newcommand{\TC}{\mathsf{TC}} \newcommand{\Ppoly}{\mathsf{P}/\poly} \newcommand{\Lpoly}{\mathsf{L}/\poly} % .. resources \newcommand{\TIME}{\mathsf{TIME}} \newcommand{\SPACE}{\mathsf{SPACE}} \newcommand{\TISP}{\mathsf{TISP}} \newcommand{\SIZE}{\mathsf{SIZE}} % .. keywords \newcommand{\co}{\mathsf{co}} \newcommand{\Prom}{\mathsf{Promise}} % % Boolean analysis \newcommand{\zo}{\set{0,1}} \newcommand{\pmo}{\set{\pm 1}} \newcommand{\zpmo}{\set{0,\pm 1}} \newcommand{\harpoon}{\!\upharpoonright\!} \newcommand{\rr}[2]{#1\harpoon_{#2}} \newcommand{\Fou}[1]{\widehat{#1}} \DeclareMathOperator{\Ind}{\mathrm{Ind}} \DeclareMathOperator{\Inf}{\mathrm{Inf}} \newcommand{\Der}[1]{\operatorname{D}_{#1}\mathopen{}} \newcommand{\Exp}[1]{\operatorname{E}_{#1}\mathopen{}} \DeclareMathOperator{\Stab}{\mathrm{Stab}} \DeclareMathOperator{\T}{T} \DeclareMathOperator{\sens}{\mathrm{s}} \DeclareMathOperator{\bsens}{\mathrm{bs}} \DeclareMathOperator{\fbsens}{\mathrm{fbs}} \DeclareMathOperator{\Cert}{\mathrm{C}} \DeclareMathOperator{\DT}{\mathrm{DT}} \DeclareMathOperator{\CDT}{\mathrm{CDT}} % canonical \DeclareMathOperator{\ECDT}{\mathrm{ECDT}} \DeclareMathOperator{\CDTv}{\mathrm{CDT_{vars}}} \DeclareMathOperator{\ECDTv}{\mathrm{ECDT_{vars}}} \DeclareMathOperator{\CDTt}{\mathrm{CDT_{terms}}} \DeclareMathOperator{\ECDTt}{\mathrm{ECDT_{terms}}} \DeclareMathOperator{\CDTw}{\mathrm{CDT_{weighted}}} \DeclareMathOperator{\ECDTw}{\mathrm{ECDT_{weighted}}} \DeclareMathOperator{\AvgDT}{\mathrm{AvgDT}} \DeclareMathOperator{\PDT}{\mathrm{PDT}} % partial decision tree \DeclareMathOperator{\DTsize}{\mathrm{DT_{size}}} \DeclareMathOperator{\W}{\mathbf{W}} % .. functions (small caps sadly doesn't work) \DeclareMathOperator{\Par}{\mathrm{Par}} \DeclareMathOperator{\Maj}{\mathrm{Maj}} \DeclareMathOperator{\HW}{\mathrm{HW}} \DeclareMathOperator{\Thr}{\mathrm{Thr}} \DeclareMathOperator{\Tribes}{\mathrm{Tribes}} \DeclareMathOperator{\RotTribes}{\mathrm{RotTribes}} \DeclareMathOperator{\CycleRun}{\mathrm{CycleRun}} \DeclareMathOperator{\SAT}{\mathrm{SAT}} \DeclareMathOperator{\UniqueSAT}{\mathrm{UniqueSAT}} % % Dynamic optimality \newcommand{\OPT}{\mathsf{OPT}} \newcommand{\Alt}{\mathsf{Alt}} \newcommand{\Funnel}{\mathsf{Funnel}} % % Alignment \DeclareMathOperator{\Amp}{\mathrm{Amp}} % %%% TYPESETTING %%% % % In text \renewcommand{\th}{^{\mathrm{th}}} \newcommand{\degree}{^\circ} % % Fonts % .. bold \newcommand{\BA}{\boldsymbol{A}} \newcommand{\BB}{\boldsymbol{B}} \newcommand{\BC}{\boldsymbol{C}} \newcommand{\BD}{\boldsymbol{D}} \newcommand{\BE}{\boldsymbol{E}} \newcommand{\BF}{\boldsymbol{F}} \newcommand{\BG}{\boldsymbol{G}} \newcommand{\BH}{\boldsymbol{H}} \newcommand{\BI}{\boldsymbol{I}} \newcommand{\BJ}{\boldsymbol{J}} \newcommand{\BK}{\boldsymbol{K}} \newcommand{\BL}{\boldsymbol{L}} \newcommand{\BM}{\boldsymbol{M}} \newcommand{\BN}{\boldsymbol{N}} \newcommand{\BO}{\boldsymbol{O}} \newcommand{\BP}{\boldsymbol{P}} \newcommand{\BQ}{\boldsymbol{Q}} \newcommand{\BR}{\boldsymbol{R}} \newcommand{\BS}{\boldsymbol{S}} \newcommand{\BT}{\boldsymbol{T}} \newcommand{\BU}{\boldsymbol{U}} \newcommand{\BV}{\boldsymbol{V}} \newcommand{\BW}{\boldsymbol{W}} \newcommand{\BX}{\boldsymbol{X}} \newcommand{\BY}{\boldsymbol{Y}} \newcommand{\BZ}{\boldsymbol{Z}} \newcommand{\Ba}{\boldsymbol{a}} \newcommand{\Bb}{\boldsymbol{b}} \newcommand{\Bc}{\boldsymbol{c}} \newcommand{\Bd}{\boldsymbol{d}} \newcommand{\Be}{\boldsymbol{e}} \newcommand{\Bf}{\boldsymbol{f}} \newcommand{\Bg}{\boldsymbol{g}} \newcommand{\Bh}{\boldsymbol{h}} \newcommand{\Bi}{\boldsymbol{i}} \newcommand{\Bj}{\boldsymbol{j}} \newcommand{\Bk}{\boldsymbol{k}} \newcommand{\Bp}{\boldsymbol{p}} \newcommand{\Bq}{\boldsymbol{q}} \newcommand{\Br}{\boldsymbol{r}} \newcommand{\Bs}{\boldsymbol{s}} \newcommand{\Bt}{\boldsymbol{t}} \newcommand{\Bu}{\boldsymbol{u}} \newcommand{\Bv}{\boldsymbol{v}} \newcommand{\Bw}{\boldsymbol{w}} \newcommand{\Bx}{\boldsymbol{x}} \newcommand{\By}{\boldsymbol{y}} \newcommand{\Bz}{\boldsymbol{z}} \newcommand{\Balpha}{\boldsymbol{\alpha}} \newcommand{\Bbeta}{\boldsymbol{\beta}} \newcommand{\Bgamma}{\boldsymbol{\gamma}} \newcommand{\Bdelta}{\boldsymbol{\delta}} \newcommand{\Beps}{\boldsymbol{\eps}} \newcommand{\Bveps}{\boldsymbol{\veps}} \newcommand{\Bzeta}{\boldsymbol{\zeta}} \newcommand{\Beta}{\boldsymbol{\eta}} \newcommand{\Btheta}{\boldsymbol{\theta}} \newcommand{\Biota}{\boldsymbol{\iota}} \newcommand{\Bkappa}{\boldsymbol{\kappa}} \newcommand{\Blambda}{\boldsymbol{\lambda}} \newcommand{\Bmu}{\boldsymbol{\mu}} \newcommand{\Bnu}{\boldsymbol{\nu}} \newcommand{\Bxi}{\boldsymbol{\xi}} \newcommand{\Bomicron}{\boldsymbol{\omicron}} \newcommand{\Bpi}{\boldsymbol{\pi}} \newcommand{\Brho}{\boldsymbol{\rho}} \newcommand{\Bsigma}{\boldsymbol{\sigma}} \newcommand{\Btau}{\boldsymbol{\tau}} \newcommand{\Bupsilon}{\boldsymbol{\upsilon}} \newcommand{\Bphi}{\boldsymbol{\phi}} \newcommand{\Bfi}{\boldsymbol{\fi}} \newcommand{\Bchi}{\boldsymbol{\chi}} \newcommand{\Bpsi}{\boldsymbol{\psi}} \newcommand{\Bomega}{\boldsymbol{\omega}} % .. calligraphic \newcommand{\CA}{\mathcal{A}} \newcommand{\CB}{\mathcal{B}} \newcommand{\CC}{\mathcal{C}} \newcommand{\CD}{\mathcal{D}} \newcommand{\CE}{\mathcal{E}} \newcommand{\CF}{\mathcal{F}} \newcommand{\CG}{\mathcal{G}} \newcommand{\CH}{\mathcal{H}} \newcommand{\CI}{\mathcal{I}} \newcommand{\CJ}{\mathcal{J}} \newcommand{\CK}{\mathcal{K}} \newcommand{\CL}{\mathcal{L}} \newcommand{\CM}{\mathcal{M}} \newcommand{\CN}{\mathcal{N}} \newcommand{\CO}{\mathcal{O}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CQ}{\mathcal{Q}} \newcommand{\CR}{\mathcal{R}} \newcommand{\CS}{\mathcal{S}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CU}{\mathcal{U}} \newcommand{\CV}{\mathcal{V}} \newcommand{\CW}{\mathcal{W}} \newcommand{\CX}{\mathcal{X}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\CZ}{\mathcal{Z}} % .. typewriter \newcommand{\TA}{\mathtt{A}} \newcommand{\TB}{\mathtt{B}} \newcommand{\TC}{\mathtt{C}} \newcommand{\TD}{\mathtt{D}} \newcommand{\TE}{\mathtt{E}} \newcommand{\TF}{\mathtt{F}} \newcommand{\TG}{\mathtt{G}} \newcommand{\TH}{\mathtt{H}} \newcommand{\TI}{\mathtt{I}} \newcommand{\TJ}{\mathtt{J}} \newcommand{\TK}{\mathtt{K}} \newcommand{\TL}{\mathtt{L}} \newcommand{\TM}{\mathtt{M}} \newcommand{\TN}{\mathtt{N}} \newcommand{\TO}{\mathtt{O}} \newcommand{\TP}{\mathtt{P}} \newcommand{\TQ}{\mathtt{Q}} \newcommand{\TR}{\mathtt{R}} \newcommand{\TS}{\mathtt{S}} \newcommand{\TT}{\mathtt{T}} \newcommand{\TU}{\mathtt{U}} \newcommand{\TV}{\mathtt{V}} \newcommand{\TW}{\mathtt{W}} \newcommand{\TX}{\mathtt{X}} \newcommand{\TY}{\mathtt{Y}} \newcommand{\TZ}{\mathtt{Z}}$

Personal summary of the Pragmatic AI safety sequence by Dan Hendrycks and Thomas Woodside.

Because

  • pre-paradigmatic research (e.g. MIRI/ARC stuff) can’t be easily scaled
  • research with capabilities externalities (e.g. RLHF) shouldn’t be scaled

we should put more emphasis on research with

  • ML research predecents
    • the ML community is successful in ways we should emulate
  • minimal capabilities externalities
  • a sociotechnical systems view
    • solving the technical problem is not enough

ML research precedents

Main article: A bird’s eye view of the ML field

The ML community is good as solving ML problems, partially because of these aspects of their culture:

  • long term goals are broken down into empirical simplified microcosmic problems
  • subproblems can be worked on iteratively, collectively, and scalably
  • contributions are objectively measured
  • the set of research priorities is a portfolio
  • researchers must convince anonymous reviewers of the value of their work
  • highly competitive, pragmatic, no-nonsense culture
  • long-run research track records are necessary for success

Examples of ML-flavored safety problems this could apply to:

  • honest AI
  • power-averseness
  • implementing moral decision making
  • value clarification
  • adversarial robustness
  • anomaly detection
  • interpretable uncertainty
  • detection of emergent behavior
  • transparency
  • ML for cyberdefense
  • ML for improved epistemics

Minimal capabilities externalities

Some research produces safety gains through capabilities gain. AI safety researchers should let the ML community take care of those, and only produce research that demonstrably doesn’t improve general capabilities.

Sociotechnical systems view

AI risk hinges on a sociotechnical system with “feedback loops, multiple causes, circular causation, self-reinforcing processes, butterfly effects, microscale-macroscale dynamics, and so on”.

Rather than focusing only the operating process (in this case, a particular AI system’s technical implementation), hazard analysis tells us that we need to focus on systemic factors like

  • social pressures
  • regulations
  • (perhaps most importantly) safety culture
    • which is why we need to engage with the broader ML community, including China