Microsoft Researchers Introduce ‘Jigsaw’: An AI Tool To Augment Large Language Models (GPT-3, Codex, etc.) By Deploying Post-Processing Techniques That Understand The Programs’ Syntax And Semantics

This research summary is based on the paper 'Jigsaw: Large Language Models meet Program Synthesis'.

Please don't forget to join our ML Subreddit

GPT-3, Codex, and other sizable pre-trained language models can be adjusted to create code from natural language descriptions of programmer intent. Every developer in the world might benefit from these automated models, which have the potential to increase productivity. However, because the models may fail to understand program semantics, the quality of the generated code cannot be guaranteed.

Microsoft researchers introduce Jigsaw, a new tool that can help these big language models perform better. Jigsaw is a Python Pandas API code generator that accepts multi-modal inputs. Jigsaw uses post-processing techniques to decipher the syntax and semantics of programs and then uses user feedback to improve future performance.


While working on a programming task, a software developer can describe an intended code fragment in English, and Codex will synthesize the code in languages like Python or JavaScript. However, there is a reasonable probability that the code will fail to compile or will not compile at all. Project Jigsaw attempts to automate vetting to increase the efficiency of developers that use huge language models for code synthesis, such as Codex.

The developer performs a rudimentary code validation by seeing if it compiles. If it fails to build, the developer attempts to correct the problem. When the code has finally been compiled, a regular developer will run it against an input to see if it produces the desired result. The code could fail again, and the developer would have to fix it. The goal is to demonstrate that the entire procedure is automated. Jigsaw accepts an English description of the required code and an example of I/O as input. It connects an input to its corresponding output and ensures that the output Python code will compile and produce the desired outcome based on the specified input.

Python Pandas have been used to implement the above method. Pandas is a popular data science API that allows you to manipulate data frames. Jigsaw requires the user to submit an English description of the necessary transformation and an input data frame and corresponding output data frame before allowing Jigsaw to synthesize the desired code. Using Jigsaw instead of forcing a developer to memorize all of these functions is arguably a better strategy.

Functioning of Jigsaw

Jigsaw pre-processes the English question with the proper context to create an input that can be sent to an enormous language model considered a black box. Jigsaw examines whether the output code generated by the model meets the I/O example. If that’s the case, Jigsaw is complete! If the code fails, the repair begins in the post-processing step. Jigsaw uses three types of transformations to correct the code during post-processing. Each of these modifications is driven by the failure scenarios in previous models.

Helpful transformations:

  • Variable Transformations: Codex tends to mix up the titles of variables assigned to it. Jigsaw corrects such mistakes by replacing names in Codex-derived code with all potential names in the domain until it discovers a program that meets the I/O example.
  • Argument Transformations: Some arguments are wrong when Codex generates code, and the API functions are not called. Jigsaw iterates through all possible arguments to correct such problems, starting with the function and argument sequences provided by Codex, until it finds a program that meets the I/O example’s requirements.
  • AST-to-AST transformations: Models like Codex may output code that is syntactically highly close to the desired program but contains some erroneous characters. The user would have to fix the code themself, after which the Jigsaw UI would capture the change, generalize it to a universal transformation, and learn it. Jigsaw provides AST-to-AST translations learned overtime to fix this failure mode.

The researchers feel they have only solved a fraction of the existing problems. They want to improve the pretreatment and post-processing stages that Jigsaw can use. The research team believes that the developer overhead of supplying an example rather than merely a natural language query should be investigated further.




🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...