OpenAI has announced the creation of GPT-4, a large multimodal model capable of accepting image and text inputs while emitting text outputs. The model exhibits human-level performance on various professional and academic benchmarks, though it is less capable than humans in many real-world scenarios. For instance, GPT-4’s simulated bar exam score is around the top 10% of test takers, compared to GPT-3.5’s score, which was around the bottom 10%. OpenAI spent 6 months iteratively aligning GPT-4 using lessons from their adversarial testing program and other sources. As a result, the model performs better than previous versions in areas such as factuality, steerability, and staying within guardrails, but there is still room for improvement.
The difference between GPT-3.5 and GPT-4 may be subtle in casual conversations, but it becomes apparent when dealing with complex tasks. GPT-4 outperforms GPT-3.5 regarding reliability, creativity, and ability to handle nuanced instructions. Various benchmarks were used to test the difference between the two models, including simulated exams originally intended for humans. The tests used were either the latest publicly available or 2022-2023 practice exams explicitly purchased for this purpose. No specific training was done for these exams, although the model previously encountered a small portion of the problems during training. The results obtained are believed to be representative and can be found in the technical report.
Some of the results of the comparisons
GPT-4 can process text and image inputs, allowing users to specify any language or vision task. It can generate text outputs such as natural language and code based on inputs that include text and images in various domains, such as documents with text, photographs, diagrams, or screenshots. GPT-4 displays similar capabilities on text-only and mixed inputs. It can also be enhanced with techniques developed for text-only language models like few-shot and chain-of-thought prompting. However, the image input feature is still in the research phase and is not publicly available.
Despite its impressive capabilities, GPT-4 shares similar limitations with its predecessors. One of its major limitations is its lack of complete reliability, as it still tends to produce incorrect information and reasoning errors, commonly known as “hallucinations.” Therefore, it is crucial to exercise caution when utilizing language model outputs, especially in high-stakes situations. To address this issue, different approaches, such as human review, grounding with additional context, or avoiding high-stakes uses altogether, should be adopted based on specific use cases.
Although it still faces reliability challenges, GPT-4 shows significant improvements in reducing hallucinations compared to previous models. Internal adversarial factuality evaluations indicate that GPT-4 scores 40% higher than the latest GPT-3.5 model, which improved considerably from previous iterations.
The language model, GPT-4, may exhibit biases in its outputs despite efforts to reduce them. The model’s knowledge is limited to events before September 2021 and needs to learn from experience. It can sometimes make reasoning errors, be overly gullible, and fail at hard problems, similar to humans. GPT-4 may confidently make incorrect predictions, and its calibration is reduced through the current post-training process. However, efforts are being made to ensure that the model has reasonable default behaviors that reflect a wide range of user values and can be customized within certain bounds with input from the public.
Check out the Technical Paper and OpenAI Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.