Meta Open Sources ‘Velox’: A C++ Vectorized Database Acceleration Library That Optimizes Query Engines And Data Processing

Velox, a unified execution engine, was recently developed and made publicly available by Meta in association with Intel, ByteDance, and Ahana. This function library speeds up the creation of data management systems and makes them more accessible. Although Velox is still developing, experts have confirmed its ability to enhance data management systems through several experimental evaluations. Meta has an extensive infrastructure ecosystem of hundreds of data computing engines supporting their wide range of goods and services. These engines are designed to deal with various workloads, including SQL analysis, stream processing, data acquisition, etc. The development of engines and libraries for feature engineering, data preprocessing, and other machine learning-related tasks and services has expanded with the rapid development of artificial intelligence and machine learning.

Researchers at Meta also noted that while several of their data processing engines share similar characteristics, the majority of the engines were created separately. It is challenging to maintain and enhance these systems because of their fragmentation. Moreover, the continuous upgrades will further decrease the productivity of the whole data feature set in the hardware that handles various workloads, eventually resulting in systems with separate feature sets and conflicting semantics. Meta created Velox as a first step in creating a more effective infrastructure. The fact that Velox is the newest unified execution engine to speed up data management systems and make development easier is significant. While growing and adapting to various computing engines, it unifies the standard data-intensive components of data computing engines. Enhancing reusability improves efficiency and consistency. Similar logic components, such as language front-end, intermediate representation, optimizer, runtime, and execution engine, are frequently used in data computing engines. The modules needed to build such an execution engine are offered by Velox. Furthermore, a single host is used to run every module. The data plane comprises data-intensive activities like expression evaluation, aggregation, sorting, and union.

Velox combines the data computing system’s execution engine via a uniform library, potentially increasing the data computing system’s adaptability. Meta has integrated Velox with over a dozen data systems, including internal stream processing and Presto, Spark, and TorchArrow. Velox is also utilized in other platforms, data extraction systems, and machine learning systems for feature engineering. By creating an open source community and uniting the silos of data computing engines, Meta aims to help the Velox project and obfuscate the distinctions between machine learning infrastructure and conventional data management systems. Researchers also show confidence in Velox’s ability to integrate and unify data management systems. Thus, they hope that the open source community will fully support this project and aid in the creation and acceptance of the library.

Project: https://velox-lib.io/

Paper: https://research.facebook.com/publications/velox-metas-unified-execution-engine/

References:

  • https://engineering.fb.com/2022/08/31/open-source/velox/
  • https://www.ithome.com.tw/news/152868
Please Don't Forget To Join Our ML Subreddit

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.