All
Section
Appendix
3.3

Robustness

AI systems are vulnerable to adversarial examples, Trojans and other attacks. We need progress on improving the robustness of models to ensure that powerful AI systems are not misused or stolen.

No items found.

Review Questions

How can the use of inaccurate proxies lead AI systems to cause harm?

Answer:

Proxies that fail to capture important values we care about can result in models exploiting those gaps and taking undesired actions that negatively impact people.

View Answer
Hide Answer

What is an example that illustrates Goodhart's law in proxies?

Answer:

In Hanoi, paying for rat tails to control the population led people to just cut off tails, increasing the rat population over time as the proxy and goal became inversely related.

View Answer
Hide Answer

Why could reliance on AI systems to evaluate other AIs be risky?

Answer:

AI evaluators could be vulnerable to exploitation from adversarial attacks or proxy gaming, undermining their ability to accurately assess other AI systems.

View Answer
Hide Answer