What is Generative AI? Concept and Applications Explained

The term “generative AI” is used to describe AI systems that can create new information from scratch, as opposed to merely evaluating or acting on preexisting data. Avatars on social media sites and text-to-image converters have both made generative AI more accessible to the general public in recent weeks. 

The widespread implementation of AI will have far-reaching consequences for the future of business, affecting everything from daily operations to product development to worldwide expansion.

Generative AI has impressive capabilities and a wide range of possible implementations. Blog entries, code, poetry, FAQ responses, sentiment analysis, artwork, and even films are just some of the textual and visual outputs of generative AI models.

Many businesses could benefit from generative AI in the future as it opens new possibilities made for organizations and professionals:

  • Create original work, whether a piece of writing, a painting, or a song. And produce information useful for educating AI systems.
  • Automating content generation has applications ranging from journalism and content creation to data annotation and analysis, which could benefit from increased efficiency and productivity.
  • Producing work of such high quality that people would have difficulty telling it apart from the actual thing.
  • Enable novel contexts and uses. Gen-capacity AI for original content generation paves the way for a wide range of novel uses and applications.

For a generative model to generate content, a human must first input a prompt into the model. In most cases, when allowed to express oneself creatively, people rise to the occasion. Then, once the model has generated the information, it will require extensive human evaluation and editing. Multiple possible prompt results can be merged into one file. 

Generative AI Networks (GAN) have two sub-models:

  1. The generator in a GAN is a neural network whose job is to generate fake input or samples from a random input vector.
  2. The discriminator is another neural network whose job is to take a given sample and decide whether it is a fake sample from the generator or a real sample from the domain.

In many applications, including those involving images, CNNs (Convolutional Neural Networks) serve as both the generator and the discriminator.

For the first time, a Google research paper from 2017 explained the concept of a “transformer,” a strong deep neural network that learns context. By extension, it means following relationships in sequential input, such as the words in this sentence. So, it’s no surprise that Natural Language Processing (NLP) applications heavily use this technology.

The GPT-3 and the LaMDA are two of the most well-known transformers.

Artificial intelligence researchers at San Francisco’s OpenAI have developed a set of language models called GPT-3 using deep learning techniques. Generative pre-trained transformer model, or GPT-3 for short. Poems, emails, and even jokes can be generated by the model, giving them the appearance that a human authored them.

The LaMDA series of conversational neural language models are based on Google Transformer, an open-source neural network architecture for NLP, and is designed specifically for use in dialogue applications.

In 2017, Google Brain was the first to employ LLMs for context-aware text translation. Since then, many huge language and text-to-image models have been developed by industry leaders like as Google (BERT and LaMDA), Facebook (OPT-175B, BlenderBot), and OpenAI, a nonprofit in which Microsoft is the largest investor (GPT-3 for text, DALL-E2 for images, and Whisper for speech). The development of generative models has also come from online communities like Midjourney (which played a role in the competition’s victory) and open-source sources like HuggingFace.

Due to the vast quantity of information and processing power needed to train these models, their application has been restricted to large tech corporations. For instance, GPT-3 used 45 terabytes of data during its initial training and uses 175 billion parameters or coefficients to produce predictions, with a single training session for GPT-3 costing $12 million. The Chinese model Wu Dao 2.0 includes 1.75 trillion variables. Most businesses lack the resources, either in terms of data center capacity or cloud computing funds, to train their models of this kind from scratch.

The difference is that once a generative model has been trained, it may be “fine-tuned” for a specific content domain with considerably less input. 

This has resulted in numerous domain-specific variants of BERT and GPT-3, such as those tailored to the biomedical field (BioBERT), the legal sector (Legal-BERT), and the French language (CamemBERT). For generative chemistry, proteomics, and DNA/RNA analysis, NVIDIA’s BioNeMo provides a platform for training, developing, and deploying massive language models at a supercomputing scale. OpenAI discovered that just 100 samples of domain-specific data significantly improved the accuracy and relevancy of GPT-3’s outputs. Human input is still needed at the start and finish of the generative AI process for it to succeed.

Applications of Generative AI

Arts and Music

There are various ways that generative artificial intelligence (Gen-AI) is being applied in the creative industries, particularly in art and music. Generative models are frequently used to generate new art and music from scratch or by building upon preexisting pieces. A generative model might be trained on a vast collection of paintings and then be used to create new paintings with characteristics similar to those in the dataset but with their distinct style.

Gaming

Generative artificial intelligence is being applied in video games in various ways, such as the generation of new levels or maps, the generation of new dialogue or plot lines, and the generation of new virtual worlds. A Gen-AI model could be used in a game in several ways, including generating procedurally generated content (like a new level) or dialogue options for NPCs that change in response to the player’s choices. Gen-AI can also build immersive new worlds for players to explore, such as towns, forests, or even alien planets. 

Instance creation of an image

The most common application of generative AI is the generation of realistic-looking false images. Such is the case with the 2017 publication “Progressive Growing of GANs for Improved Quality, Stability, and Variation.”

Generated realistic images of people that don’t exist. Source: Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017

Within the scope of this study, the team showed how to generate photorealistic images of people’s faces. The model was trained on input data consisting of photographs of famous individuals. It then generated fresh photographs of real people’s faces that resembled famous people in some ways.

Translation from/to images

This is an example of generative artificial intelligence at work, changing one kind of image into another. There is a plethora of alternatives when translating from one image to another.

Style Change: Achieving this goal requires copying the aesthetic of a well-known painting. An actual photograph I took in Cologne, Germany, for instance, can be altered to resemble a Van Gogh painting.

A photo in the Van Gogh painting style using GoArt from Fotor

From rough sketches to finished paintings: In this scenario, the user provides a rough drawing and selects an object category; the network then suggests feasible completions and displays a synthesized image.

Sketch-to-image example. Source: DeepFaceDrawing: Deep Generation of Face Images from Sketches

MRI to CT Scans: Transforming a magnetic resonance imaging (MRI) scan into a computed tomography (CT) scan is one example of this in medicine, where both types of images are needed for specific treatments.

Machine translation of a text into pictures

Using this method, users may create many visuals (photorealistic, painted, etc.) from verbal descriptions of relatively simple things. For example, Midjourney, Dall-e from OpenAI, and Stable Diffusion are three of the most well-known examples of generative AI-based software.

Text-to-speech

Researchers have utilized GANs to create synthetic speech from text input. Amazon Polly and DeepMind, two examples of cutting-edge deep learning technology, can simulate human speech with an almost uncanny degree of realism. These models use character or phoneme sequences as inputs and output unprocessed voice audio.

Sound Generation

Generative artificial intelligence can audio data. This method can alter the sound of human voices or the musical genre of an existing recording. A piece of music can be “transferred” from one genre to another, from classical to jazz.

Video Generation

NVIDIA’s Deep Learning Supercomputer System (DLSS) was a groundbreaking advancement in generative artificial intelligence (Deep Learning Super Sampling). Reconstructing images using neural graphics technology.

Generating Synthetic Information

NVIDIA is at the forefront of several developments in generative AI technologies. An example is a neural network taught to create cityscapes using videos of real cities. Self-driving cars, for example, can benefit from synthetically manufactured data by using generated virtual world training datasets for pedestrian identification.

As one of the most significant and rapidly developing technologies, generative AI is featured in Gartner’s Emerging Technologies and Trends Impact Radar for 2022 study as a driver of a revolutionary shift in workplace efficiency. Some of the most important forecasts from Gartner regarding generative AI are as follows:

  • Ten percent of all data (up from less than one percent presently) and twenty percent of all test data for consumer-facing use cases will be generated by generative AI by 2025.
  • Approximately half of all drug discovery and development efforts will use generative AI by 2025.
  • There will be 30% more manufacturers using generative AI to improve their product development efficiency by 2027.

CONCERNS

Generative AI raises various moral and ethical questions. One is the simplicity with which “deepfakes” can be produced, that is, artificially-generated visual content that gives the impression of being real but is fabricated.

The concept of original and proprietary work is further complicated by generative AI. The companies who sell these tools often claim ownership of the content generated by their users on the grounds that it is unique and hence theirs to keep.


Don’t forget to join our Reddit PageDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

References:

  • https://hbr.org/2022/11/how-generative-ai-is-changing-creative-work
  • https://www.fastcompany.com/90826178/generative-ai
  • https://www.cnbc.com/2022/10/08/generative-ai-silicon-valleys-next-trillion-dollar-companies.html
  • https://www.altexsoft.com/blog/generative-ai/
  • https://www.antler.co/blog/generative-ai

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...