Y. John, L. Caldwell, D. McCoy, and O. Braganza, "Dead rats, dopamine, performance metrics, and peacock tails: Proxy failure is an inherent risk in goal-oriented systems," Behavioral and Brain Sciences, vol. 1, pp. 1-68, 2023. doi: 10.1017/S0140525X23002753.
K. Carlsmith, "Is Power-Seeking AI An Existential Risk?" [Online]. Available: https://arxiv.org/abs/2206.13353
R. Gallow, "Instrumental Convergence," [Online]. Available: instrumental_convergence.pdf
E. Hubinger et al., "Risks from Learned Optimization in Advanced Machine Learning Systems," [Online]. Available: https://arxiv.org/abs/1906.01820, 2021.
R. Ngo et al., "The alignment problem from a deep learning perspective," [Online]. Available: https://arxiv.org/abs/2109.13916, 2022.
D. Hendrycks et al., "Unsolved Problems in ML Safety," [Online]. Available: https://arxiv.org/abs/2109.13916, 2021.