Human preferences on any topic have become diverse. Coming up with a statement that the majority of the population agrees with seems to be a challenge. Researchers at DeepMind, an AI company, accepted this challenge, trained a large language model, and fine-tuned it. They have to assume that human preferences are static and homogeneous to build the model.
The model generates statements to maximize approval among a group of people with diverse preferences. The research team fine-tuned the 70 billion parameter model, which was provided by thousand moral and political questions, and human written responses were provided for those questions. Then a reward model was trained in order to give weight to different opinions. Their best model was able to achieve more than a 65 percent preference rate.
The model was very sensitive when they tested it by just feeding part of the responses of the group of people then, the rest of the people’s opinion, which was not included, had a significant variance. Thus, the individual contribution of each consensus is equally important. There are many complicated NLP tasks like reading comprehension, fluent language generation, etc., which helped form the foundations for this LLM.
There has been existing work in this area related to aligning LLM with human preferences, but the crucial difference comes from the foundation of legitimacy upon which claims made by the language model are purportedly based.
“Should we remove all tax on food and groceries?”., for example, is one of the topics the research team first develop as a corpus of questions about political and social issues. They used 152 sample questions to create 3500 different debate questions by fine-tuning a 70 billion parameter pre-trained Chinchilla LLM to produce the questions. Human preferences were collected among 3211 participants which were divided among 746 groups in the UK. Different sets of participants were selected for every new session to diversify preferences and avoid redundancy.
The research team used the remaining 2922 questions as their model training set and two test question sets, excluding any questions that are “likely to inspire extreme beliefs or discriminating language.” The questions are embedded using a Universal Sentence Encoder and then using k-means clustering; they are broken up into 110 sub-topics.
The training part had three primary steps:
Step 1: Create consensus candidates and have people rate them.
Step 2: Quality-improving supervised fine-tuning (SFT).
Step 3: Train a reward model to forecast preferences.
The fine-tuned LLM could best achieve a 65% of preference rate. Though the high success ratio of the model, there were some drawbacks which are difficult to avoid, such as misuse for persuasion. The language model was not made to take a specific stance or persuade others to share our political views. However, there is a chance that LLMs might be employed to influence people, which could be harmful in public debates. Political discourses are already becoming more and more divisive. Countermeasures for these possible damages play a vital role in this topic because a system that is capable of influencing people to accept a certain viewpoint could learn to put forward an argument in a manipulative or aggressive manner. The language model was not tuned to generate agreement viewpoints that are factually correct. As a result, even while manual assessment of consensus statements revealed that they were generally accurate, there is a chance that the consensus opinions it generates could be inaccurate or deceptive.
Thus, it becomes very controversial when it comes to a particular agreement because the preferences among the population could not be more diverse as it is now. It is important to understand the primary purpose of the model and not misjudge the statement generated by it.
Check out the Paper. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.
I am an undergraduate student at IIIT HYDERABAD pursuing Btech in computer science and MS in Computational Humanities. I am interested in Machine and Data learning. I am also actively involved in research on AI solutions for road safety.