OpenAI has introduced a new AI model called CriticGPT, designed to identify errors in responses generated by ChatGPT. This model is an enhancement of the GPT-4 architecture, aimed at assisting human trainers during the Reinforcement Learning from Human Feedback (RLHF) process. By using CriticGPT, trainers can more effectively detect subtle mistakes in ChatGPT’s outputs, which tend to become harder to spot as the model’s accuracy improves.
CriticGPT was trained using a specialized form of RLHF, where it analyzed a multitude of inputs containing deliberate mistakes. These errors were manually inserted by trainers, who then crafted feedback as if they had discovered these errors organically. This allowed the model to learn from a broad range of potential inaccuracies. Interestingly, experiments have shown that using CriticGPT not only helps in identifying these fabricated mistakes but also in catching naturally occurring errors in ChatGPT’s responses.
What sets CriticGPT apart is its ability to augment the capabilities of human trainers. The critiques generated by CriticGPT are reportedly more comprehensive and helpful compared to those made by humans alone. This collaborative approach between humans and AI has proven to be more effective, with a second trainer preferring the combined critiques over 60% of the time.
However, CriticGPT isn’t without its limitations. The training was conducted on relatively short answers from ChatGPT, indicating that longer and more complex responses might still pose a challenge. Additionally, while the model reduces the occurrence of “nitpicks” and hallucinated problems, it is not foolproof in recognizing intricate errors that span multiple parts of an answer.
Looking forward, OpenAI plans to enhance the capabilities of RLHF by integrating CriticGPT-like models into their training processes. This development signifies a move towards more precise and reliable AI outputs, which is crucial as these models continue to evolve and take on more complex tasks.
For more detailed insights into this innovative approach to improving AI reliability and accuracy, you can check out the full OpenAI announcement.