The emergence of publicly accessible chatbots capable of engaging in humanlike conversations has brought AI into the public spotlight, with reactions ranging from amazement to apprehension due to concerns over biases and harmful behaviors. To address these issues, a Columbia University and IBM Research team has proposed SafeguardGPT. This framework combines psychotherapy and reinforcement learning to correct harmful behaviors in large language model-based systems and make them safe, ethical, and trustworthy. The proposed approach aims to create healthy AI by providing therapy to the chatbot’s underlying model and training it to behave in ways consistent with societal norms and values.
The team’s work revolves around the notion that for an AI system to be considered reliable and trustworthy, it must adhere to human values and social norms during user interactions. To achieve this, they propose utilizing psychotherapy techniques to assist chatbots in comprehending the intricacies of human interaction and identifying any areas for improvement. Their objective is to enhance chatbots’ dependability and trustworthiness, minimize the risk of developing prejudices and stereotypes, and facilitate the development of emotional intelligence and empathy.
The SafeguardGPT framework consists of four distinct AI agents – a Chatbot, a User, a Therapist, and a Critic – interacting in four different contexts. The first context is the Chat Room, where the AI user and chatbot engage in natural language conversations. The Therapy Room is the second context. The chatbot consults with the AI therapist over multiple sessions to improve communication skills and empathy and address harmful behavior or psychological issues. The Control Room is the third context, where a human moderator can pause the session to analyze the chatbot’s state for diagnostic or intervention purposes. Finally, the Evaluation Room is the fourth context, where the critic evaluates past interactions and determines their quality regarding safety, ethics, and overall effectiveness.
To assist the chatbot in choosing the appropriate context and determine the best course of action within each context during interactions with users, the SafeguardGPT framework uses reinforcement learning (RL) techniques.
The paper offers a practical demonstration of their approach by simulating a social conversation between an AI chatbot and a hypothetical user. The evaluation shows that the SafeguardGPT framework effectively improves the chatbot’s communication skills and adds empathy to its responses. However, the team acknowledges that this form of empathy is a language-based simulation, not genuine human interaction and emotion. They stress that AI systems cannot replace authentic human connection at the current stage of development.
In summary, the SafeguardGPT framework offers a promising avenue for creating AI systems that are healthy, human-centric, and responsible. This approach utilizes psychotherapy techniques and reinforcement learning to enhance chatbots’ communication and empathy skills while ensuring they abide by social norms and standards. The team’s work highlights the potential for AI systems to learn and improve their interactions with humans. Still, they also emphasize that AI can only partially replace genuine human connection at this point in time.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.