Anthropic Released Their Claude 2 Model With Significant Improvements In Coding, Math, And Reasoning Compared To Previous Models

Anthropic launched its new model called Claude 2, boasting improved performance, longer responses, and accessibility through an API and a public beta website. Users have praised Claude’s conversational abilities, clear explanations, reduced likelihood of generating harmful outputs, and improved memory compared to previous models. Notably, Claude 2 exhibited better performance in coding, math, and reasoning tasks. For instance, it scored 76.5% on the multiple-choice section of the Bar exam, surpassing its predecessor’s score of 73.0%. Compared to college students applying to graduate school, Claude 2 performed above the 90th percentile in GRE reading and writing exams and performed similarly to the median applicant in quantitative reasoning.

The developers envision Claude as a friendly and enthusiastic virtual colleague or personal assistant capable of understanding natural language instructions to assist with various tasks. The Claude 2 API for businesses is available at the same price as its predecessor, Claude 1.3. Moreover, individuals in the United States and the United Kingdom can already utilize the beta chat experience.

Efforts have been made to enhance the performance and safety of Claude models. Input and output lengths have been increased, allowing users to input up to 100K tokens per prompt. This enables Claude to process extensive technical documentation and books and generate longer documents such as memos, letters, and stories comprising thousands of tokens.

The latest model, Claude 2, has significantly improved coding skills, achieving a score of 71.2% on the Codex HumanEval Python coding test compared to Claude 1.3’s score of 56.0%. In the GSM8k math problem set, Claude 2 scored 88.0% compared to 85.2% for its predecessor. Future plans include the gradual deployment of capability improvements for Claude 2.

Safety measures have been a development focus, aiming to reduce harmful and offensive outputs. An internal red-teaming evaluation assesses Claude models against a representative set of harmful prompts, combining automated testing with manual checks. Claude 2 exhibited twice the effectiveness of providing harmless responses compared to Claude 1.3. While no model is completely immune to undesirable outputs, safety techniques and extensive red-teaming have been employed to improve the overall quality of outputs.

Several businesses have already embraced the Claude API, with partners such as Jasper and Sourcegraph leveraging Claude 2’s capabilities. Jasper, a generative AI platform, highlighted Claude 2’s compatibility with state-of-the-art models for diverse use cases, emphasizing its strength in long-form, low-latency applications. Sourcegraph, a code AI platform, incorporates Claude 2’s improved reasoning ability into their coding assistant, Cody. Cody can provide more accurate answers to user queries while conveying increased codebase context through up to 100K context windows. The training of Claude 2 on recent data equips Cody with knowledge of newer frameworks and libraries, empowering developers to build software more efficiently.

Overall, the release of Claude 2 signifies advancements in performance, safety, and versatility, enabling users to leverage its capabilities in various domains.


Check out the Tool and Blog. Don’t forget to join our 26k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 100’s AI Tools in AI Tools Club

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...