Safety, Ethics & Future

What Is AI Alignment?

AI alignment is the problem of ensuring that an AI system’s goals and behavior match what its designers and society actually intend. Because models optimize whatever objective they are given, misaligned incentives can produce unexpected or harmful behavior. Techniques like RLHF are early alignment tools; aligning much more capable systems is an open research problem.

What Is AI Alignment?

Related topics

Further reading