Text2Code: A Jupyter extension to convert English text to python code

Source: https://github.com/deepklarity/jupyter-text2code

Kartik Godawat and Deepak Rawat have developed a ready to install Project Jupyter extension, Text2Code, which converts English queries into relevant python code. OpenAI’s GPT-3 inspires it. GPT-3 has Natural Language processing capabilities, can also generate React code and simplify command-line commands. All of these initiated the idea for something that produces ready-to-execute code for many human queries.

It is a supervised model that can work for the predefined training pipeline. The following are the components of the model:

  • Collecting training data: They used some general English commands and then generated variations in them using an elementary generator.
  • Intent matching: This was done to identify the intent of the user by the query. They used Universal Sentence Encoder to embed the user query and find cosine similarity with the predefined intent queries.
  • NER(Named Entity Recognition): This was done to identify variables(entities) in the sentences using the previously generated data to train a custom entity recognition model using Spacy.
  • Fill Template: Use extracted entities in a fixed template to generate code.
  • Wrap inside jupyter extension: Everything was wrapped in a single python package. They created a frontend and a server extension, which gets loaded when the jupyter notebook is opened. The frontend sends the query to the server to fetch the generated template code and then inserts it in the cell and finally executes it.

Scope of improvement for the model:

AdvertisementCoursera Plus banner featuring Johns Hopkins University, Google, and University of Michigan courses highlighting data science career-advancing content
  • It cannot understand the same sentence written in two different ways conveying the same meaning. Using more training data, paraphrasing can help to overcome this limitation.
  • It randomly generates variable names. Using real-world variable names and library names would be useful.
  • Named Entity Recognition( NER) could be tried with a transformer-based model to improve performance.
  • With enough data, a language model can be trained to directly convert English to code just like GPT-3, instead of going through separate stages.

Github: https://github.com/deepklarity/jupyter-text2code



Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.