Meet FauxPilot, An Attempt To Build A Locally Hosted Version of GitHub Copilot

Github Copilot is one of several new tools for using AI models to generate suggestions for programming code. However, some users still have issues regarding its licensing and the program’s telemetry to the Microsoft-owned corporation. As a result, a team of academics from NYU Tandon’s Computer Science and Engineering department has developed FauxPilot, a local Copilot substitute that does not communicate with Microsoft corporate. Copilot uses OpenAI Codex, a GPT-3-based natural language-to-code system trained on billions of lines of open source code from GitHub projects. Because Microsoft and GitHub did not expressly state which repositories inspired Codex, it has caused discomfort among proponents of free and open source software (FOSS). 

It is well known that Github Copilot users risk increasing liability as it improves. Users presently have no other options for determining whether Copilot’s output is protected by copyright except chance and informed estimates. Shortly after GitHub Copilot became commercially accessible, SFC even advised open-source maintainers not to use GitHub because of Github’s refusal to fix issues with Copilot. FauxPilot relies on Salesforce’s CodeGen model but does not use Codex. It is preferred since CodeGen was likewise trained using open-source software without consideration for the subtleties of various licenses; thus, it is unlikely to please FOSS proponents.

FauxPilot’s main objective is to give people with privacy concerns a method to deploy the AI help software on-premises. The models that FauxPilot currently uses are the ones that were trained by Salesforce, which in turn were trained on GitHub’s public code. Therefore, this would not fix all the problems, including perhaps licensing-related ones. Since they may now run their code locally, it may even prove advantageous for some business regulations that forbid them from submitting their code to a third party. GitHub offers the option to turn off the collection of Code Snippets Data, which describes the data that Copilot gathers. However, it does not seem that doing so will stop the collection of User Engagement Data, which may include personal information such as pseudonymous identifiers. 

FauxPilot is now more of a research platform, according to the team. In order to test the models, perhaps even with real users using something like Copilot but with their models, they want to train code models that, in theory, produce more secure code. However, doing so presents several difficulties. The team believes that because the models are so data hungry right now, it is somewhat challenging to try and generate a dataset with no security flaws. Much code is required for training, but there are not many effective ways to ensure the code is error-free. Therefore, curating a dataset free of security flaws requires a lot of work.

The team believes that Copilot is quite helpful, despite the necessity to verify its results occasionally. In contrast to attempting to develop it from scratch, it does provide users with a starting point that they may subsequently edit into accuracy. The official Visual Studio Code Copilot extension will still provide telemetry to GitHub and Microsoft when used with FauxPilot. This problem will be resolved once the group makes its VSCode extension available. The project’s extension will not be entirely ready until then if consumers seek a wholly non-Microsoft experience, even though FauxPilot does not entirely hand over all the details to Microsoft.



Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.