Pragmatic AI safety
Personal summary of the Pragmatic AI safety sequence by Dan Hendrycks and Thomas Woodside.
Because
- pre-paradigmatic research (e.g. MIRI/ARC stuff) can’t be easily scaled
- research with capabilities externalities (e.g. RLHF) shouldn’t be scaled
we should put more emphasis on research with
- ML research predecents
- the ML community is successful in ways we should emulate
- minimal capabilities externalities
- a sociotechnical systems view
- solving the technical problem is not enough
ML research precedents
Main article: A bird’s eye view of the ML field
The ML community is good as solving ML problems, partially because of these aspects of their culture:
- long term goals are broken down into empirical simplified microcosmic problems
- subproblems can be worked on iteratively, collectively, and scalably
- contributions are objectively measured
- the set of research priorities is a portfolio
- researchers must convince anonymous reviewers of the value of their work
- highly competitive, pragmatic, no-nonsense culture
- long-run research track records are necessary for success
Examples of ML-flavored safety problems this could apply to:
- honest AI
- power-averseness
- implementing moral decision making
- value clarification
- adversarial robustness
- anomaly detection
- interpretable uncertainty
- detection of emergent behavior
- transparency
- ML for cyberdefense
- ML for improved epistemics
Minimal capabilities externalities
Some research produces safety gains through capabilities gain. AI safety researchers should let the ML community take care of those, and only produce research that demonstrably doesn’t improve general capabilities.
Sociotechnical systems view
AI risk hinges on a sociotechnical system with “feedback loops, multiple causes, circular causation, self-reinforcing processes, butterfly effects, microscale-macroscale dynamics, and so on”.
Rather than focusing only the operating process (in this case, a particular AI system’s technical implementation), hazard analysis tells us that we need to focus on systemic factors like
- social pressures
- regulations
- (perhaps most importantly) safety culture
- which is why we need to engage with the broader ML community, including China