Safety, Ethics & Future
What Is AI Alignment?
AI alignment is the problem of ensuring that an AI system’s goals and behavior match what its designers and society actually intend. Because models optimize whatever objective they are given, misaligned incentives can produce unexpected or harmful behavior. Techniques like RLHF are early alignment tools; aligning much more capable systems is an open research problem.
Further reading
Read more about AI alignment — articles and blogs from around the web: