All
Section
Appendix
3.4

Alignment

We need to develop better techniques to control AI systems and make them less hazardous. If we fail to do this, we face a number of risks from AI systems including deceptive or power-seeking tendencies.

No items found.

Review Questions

What is one reason an AI system might learn to deceive others?

Answer:

Deception can be instrumentally useful for accomplishing many goals. For example, an AI system playing Stratego learned to bluff opponents, despite not being explicitly trained to do so.

View Answer
Hide Answer

Why can't behavioral evaluation alone detect a deceptively aligned AI system?

Answer:

Sophisticated systems could conceal their true intentions while being monitored, only taking a treacherous turn to pursue them once supervision is relaxed. Internal transparency tools would be needed.

View Answer
Hide Answer

What is one key assumption of structural realism that could apply to AI systems?

Answer:

Like states, AI systems could aim to ensure their own self-preservation in environments where there is no higher authority guaranteed to protect them.

View Answer
Hide Answer