In this article, we will delve into the latest 2022 research updates from key industry leaders in the field of machine learning. From natural language processing and computer vision to generative models and reinforcement learning, we have curated a list of cutting-edge research that will give you an insight into the future of AI.
PaLM is a cutting-edge artificial intelligence model trained across multiple TPU v4 Pods using the Pathways system. Each pod is capable of delivering more than 1 exaflop/s of computing power. This gives PaLM the ability to excel at even difficult tasks such as language understanding and generation, reasoning, and code generation. PaLM is able to outperform other large models on these tasks, including GLaM, GPT-3, Megatron-Turing NLG, Gopher, Chinchilla, and LaMDA.
SegCLR is a technique for easily training detailed, generic representations of a cell’s shape and internal structure using microscopy data. It converts this data into compact embedding representations, making it easier to analyze and greatly simplifying downstream processes compared to working with raw images and segmentation data. SegCLR provides new opportunities for biological research and may be used as a link to other methods for characterizing cells and their subcomponents in high dimensions.
FindIt is a visual grounding model capable of answering a wide range of queries related to finding and identifying objects in images. It is efficient, easy to use, outperforms other state-of-the-art models on referring expression and text-based localization, and shows competitive performance on detection.
Language models have limited capabilities in the area of quantitative reasoning. Google has, however, developed a new model called Minerva that can reason through and solve math, science, and reasoning problems using various techniques like few-shot prompting, scratchpad prompting, and majority voting. To enhance its abilities in quantitative reasoning, Minerva was based on the Pathways Language Model (PaLM) and additionally trained on a dataset of 118GB of scientific papers.
CALM is a technique for improving the speed of text generation in Language Models (LMs) during inference. It is based on the idea that some predictions about the next word in a sentence are easier to make than others. While traditional LMs use the same computing power for all predictions, CALM adjusts the amount of resources used for each prediction based on difficulty. This allows CALM to generate text more quickly while maintaining high output quality.
MLGO is a machine learning framework that optimizes compilers to reduce the cost of running large data center applications. It utilizes reinforcement learning to train neural networks to make decisions that can be used in place of heuristics in LLVM (a widely-used open-source compiler infrastructure for creating high-performance software). MLGO can improve the efficiency of LLVM compilers, which are commonly used in critical applications.
NVIDIA Omniverse is a comprehensive collection of cloud services for developers, artists, and enterprise teams to create, publish, and experience metaverse applications from anywhere. It accelerates complex 3D workflows and enables new ways to visualize, simulate, and program new concepts and ideas.
NVIDIA has introduced the IGX edge AI computing platform for secure autonomous systems. This all-in-one platform enhances safety, security, and perception for healthcare and industrial AI applications. IGX combines hardware with programmable safety features, commercial operating-system support, and AI software, allowing organizations to safely and securely use AI in collaboration with humans.
Dynamic programming is a technique used in various optimization, data processing, and genomics algorithms and is often run on CPUs or FPGAs. However, using DPX instructions on NVIDIA Hopper GPUs can significantly improve speed. The NVIDIA Hopper GPU architecture will dramatically improve the speed of dynamic programming algorithms by up to 40 times with new DPX instructions.
A group of researchers from NVIDIA, Stanford, Oxford Nanopore Technologies, The University of California Santa Cruz, and Google has created a new DNA sequencing method that can produce results in just over 7 hours. The technique can quickly identify genetic causes of diseases and match patients with the appropriate treatments. With the use of Oxford Nanopore, NVIDIA Clara Parabricks, and an UltraRapid Whole Genome Sequencing pipeline container, they were able to simplify the process and make it more efficient, resulting in a 50% reduction in computational costs.
Optimizing the configuration of wind farms is important for companies like Siemens Gamesa Renewable Energy to get the most out of their investment and reduce consumer costs. To minimize the effects of turbines on each other, it is necessary to accurately model the wake they create using high-quality simulations. The Large Eddy Simulation is the gold standard for generating this data, but it can take 40 days to run one iteration for a single turbine on a 100-core CPU. Using NVIDIA Modulus and NVIDIA Omniverse, Siemens Gamesa has significantly reduced this time to just 15 minutes, a 4000X improvement.
A new self-supervised algorithm, data2vec, has been developed to handle speech, vision, and text with high performance. When tested on these individual modalities, it has demonstrated superior results compared to previous algorithms in computer vision and speech and is competitive in natural language processing tasks. This versatile AI has the potential to surpass the capabilities of current systems and open up new possibilities in task performance.
NLLB-200 is the first tool to offer high-quality translations in 200 languages, including previously unsupported ones like Kamba and Lao. It also provides high-quality translations for 55 African languages, a significant improvement from other tools’ poor performance. This single model can translate languages spoken by billions of people worldwide.
Meta’s AI, CICERO, has achieved human-level performance in the strategy game Diplomacy. When playing on webDiplomacy.net, CICERO scored more than double the average human player and ranked in the top 10% of players with multiple games. Diplomacy has traditionally been difficult for AI due to the requirement to understand and predict other players’ motivations and perspectives, create intricate plans, and utilize natural language to negotiate and form alliances. CICERO’s proficiency in using natural language in Diplomacy has even caused other players to prefer working with it over other human participants.
Meta AI has created and made available to the public BlenderBot 3, the first chatbot of its kind with 175B parameters. BlenderBot 3 has the ability to search the internet and engage in conversations about an array of topics. It has been designed to learn and enhance its capabilities and safety through natural conversations and feedback from real users.
SEER is a self-supervised computer vision model developed by Meta AI Research that can learn from any set of images on the internet without labeled data and output an image embedding. It produces more powerful, fair, and robust models that detect valuable information in images. Traditional computer vision systems often don’t work well for pictures from regions with different socioeconomic characteristics due to training on examples mainly from the US and Europe. SEER, however, performs well for images from all areas, including those with diverse income levels.
AV-HuBERT is a highly advanced self-supervised system for understanding speech that is learned by observing people speaking. It is the first system to model both speech and lip movements from raw, untranscribed video data. With the same amount of transcriptions, AV-HuBERT is 75% more accurate than the top audio-visual speech recognition systems.
Meta AI has developed the first database that displays the structures of millions of metagenomic proteins. These proteins, found in soil microbes, ocean depths, and even inside our bodies, vastly outnumber those of animal and plant life but are the least understood on Earth. Analyzing metagenomic structures can assist in solving evolutionary mysteries and identifying proteins that may improve health, the environment, and energy production.
BLIP is a pre-training framework for comprehensive vision-language understanding and generation that has achieved top results on various vision-language tasks like image-text retrieval, image captioning, visual question answering, visual reasoning, visual dialog, zero-shot text-video retrieval, and zero-shot video question answering. BLIP can improve vision-language intelligence in downstream applications like product recommendation and classification on e-commerce platforms.
WarpDrive is a lightweight, flexible, and easy-to-use end-to-end reinforcement learning (RL) framework that allows for orders-of-magnitude faster training on a single GPU. PyTorch Lightning enables users to modularize experimental code and build production-ready workloads quickly. When used together, they can significantly accelerate multi-agent RL research and development.
CodeRL is a framework for synthesizing code by combining pretrained language models and deep reinforcement learning. It uses unit test feedback in model training and inference and integrates with an enhanced CodeT5 model to achieve leading results on competitive programming tasks.
ETSformer is a transformer modified to handle time-series data, combining the strength of classical exponential smoothing methods with transformers to achieve state-of-the-art performance. It can create interpretable, seasonal-trend decomposed forecasts and has demonstrated efficacy across various time-series forecasting applications and datasets by achieving top results.
LAVIS is an open-source library for language-vision research and applications. It offers support for a variety of tasks, datasets, and state-of-the-art models. Its unified interface and modular design make it user-friendly and easy to use. Its comprehensive features and integrated framework make AI language-vision capabilities accessible to a broad audience of researchers and practitioners.
FedNLP1 is a framework for evaluating Federated Learning methods on four common NLP tasks: text classification, sequence tagging, question answering, and sequence-to-sequence generation.
Earthformer is a space-time transformer designed for forecasting Earth systems. It utilizes a generic, efficient, and flexible space-time attention block called Cuboid Attention. Testing on two real-world benchmarks for precipitation nowcasting and El Niño/Southern Oscillation forecasting has shown that Earthformer performs at the state-of-the-art level.
RING-Net is a deep image segmentation network for road inference using GPS trajectories. It is flexible enough to use multiple data sources, such as GPS trajectories and satellite images. It can convert raw GPS trajectories into raster images with trip-related features to infer roads accurately. Testing on public data showed that RING-Net could improve the completeness of a road network.
MEMENTO is a methodology for estimating individual treatment effects in multi-treatment scenarios where treatments are discrete and finite. It has been shown to outperform other techniques for multi-treatment scenarios by nearly 10% in some cases through experiments on real and semi-synthetic datasets.
DIVA is a method for calculating the derivative of a learning task with respect to a dataset. It can be used for tasks such as dataset curation (e.g., removing incorrect annotations, adding relevant samples, or rebalancing) and can optimize the dataset and model parameters as part of the training process without needing a separate validation dataset, unlike traditional AutoML methods.
PAVE is a novel reinforcement learning model that uses the Lazy-MDP formalism to improve low recall by combining information from multiple product neighbors. It outperforms simple aggregation methods such as nearest neighbor, majority vote, and binary classifier ensembles and even outperforms AE models for closed attributes. PAVE is scalable, robust to noisy product neighbors, and performs well on unseen attributes.
PASHA is a method for efficiently tuning machine learning models trained on large datasets with limited computational resources. It dynamically allocates resources for the tuning process based on need. Compared to ASHA solutions, PASHA has been shown to effectively identify good hyperparameter configurations and architectures while using fewer computational resources.
AI2 (Allen Institute for AI)
MemPrompt is a platform that utilizes a sophisticated language model and an interactive feedback system to allow users to clarify tasks and improve the model’s accuracy. When the model does not understand a user’s intent, the user can provide feedback to help the model better understand and respond to their input.
The ACCoRD system is a method for generating diverse descriptions of scientific concepts by analyzing multiple documents. It leverages the various ways a concept is discussed in scientific literature to create illustrations of target concepts in relation to different types of reference concepts.
Līla is a benchmark designed to evaluate the mathematical reasoning skills of AI systems comprehensively. It comprises 140,000 questions across 23 tasks covering various areas, including math ability, language complexity, external knowledge requirements, and question format.
Unified-IO is a neural model that can perform many different AI tasks:
- Classical computer vision tasks: object detection, segmentation, and depth estimation
- Image synthesis tasks: image generation and in-painting
- Tasks that combine vision and language: visual question answering, image captioning, and referring expression comprehension
- Natural language processing tasks: question answering and paraphrasing
Apple presents a hybrid machine learning model that merges a physiological model of heart rate and demand during exercise with neural network embeddings to learn personalized fitness parameters. This model is applied to a large dataset of workout data collected with wearables and can accurately predict heart rate response to exercise demand in new workouts. The learned embeddings also correlate with established metrics that indicate cardiorespiratory fitness.
DeSTSeg is a framework that combines a pre-trained teacher network, a denoising student encoder-decoder, and a segmentation network. When tested on the industrial inspection benchmark dataset, this method achieved state-of-the-art results, including 98.6% accuracy on image-level ROC, 75.8% on pixel-level average precision, and 76.4% on instance-level average precision.
MAEEG is a self-supervised learning model that uses a transformer architecture to learn EEG representations by reconstructing masked EEG features. This model has been shown to significantly improve sleep stage classification accuracy by up to 5% when only a small number of labels are provided.
Latent Temporal Flows is a machine learning method that excels at modeling high-dimensional, dependent time-series data from sensors. It can be used in healthcare-related applications such as early abnormality detection, fertility tracking, and adverse drug effect prediction. This method consistently outperforms the state-of-the-art in multi-step forecasting benchmarks, achieving at least a 10% improvement in performance on various real-world datasets while also being more efficient computationally.
MobileViT is a lightweight, general-purpose vision transformer designed for mobile devices. It offers a new approach to global information processing with transformers by treating them as convolutions. Across various tasks and datasets, MobileViT consistently outperforms networks based on CNNs and ViTs.
ARtonomous is a cost-effective virtual platform for programming robotics. It allows students to use reinforcement learning (RL) and code to train and customize virtual autonomous robots. A study of ARtonomous found that middle school students gained an understanding of RL, were highly engaged, and expressed interest in further learning about machine learning. The platform provides an alternative to traditional, programming-only robotics kits.
GAUDI is a cutting-edge generative model that can generate complex, realistic 3D scenes that can be rendered from a moving camera in an immersive way. It performs exceptionally well on multiple datasets in the unconditional generative setting and can also generate 3D scenes based on conditioning variables such as sparse images or text descriptions.
Please contact us via email (email@example.com) if we missed any cool research.
I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.