If data is the fuel for Artificial Intelligence (AI), the computation is the engine. The ever-increasing computational requirements of modern AI systems have prompted investments and R&D in specialized hardware, as well as the development and support of AI runtimes and compilers, with both leading industry players and open-source communities devoting significant resources to developing software for AI workloads.
TQP is the world’s first query processor that runs on Tensor Computation Runtimes, delivering up to 20x faster performance than CPU-only systems.
Tensor Query Processor (TQP) is a query processor that runs atop tensor computation runtimes (TCRs) such as PyTorch, TVM, and ONNX Runtime. It is prototyped by a research team from the University of Washington, UC San Diego, and Microsoft in the new publication Query Processing on Tensor Computation Runtimes.
TQP is the first query processor to operate on TCRs, according to the researchers, and it has been shown to enhance query execution speed by up to 20 times over CPU-only systems and up to 5 times over specialist GPU solutions.
TCRs’ tensor interface is expressive enough to accommodate all commonly used relational operations. A set of algorithms and a compiler stack were offered for converting relational operators into the tensor computation. The Tensor Query Processor technique was compared against state-of-the-art baselines. Data scientists may easily create and deploy deep neural networks (DNNs) using TCRs like PyTorch and TensorFlow, letting them use the exciting potential given by new hardware.
The rising demand for TCRs suggests that hardware solutions aimed particularly towards data-hungry ML are becoming more common, raising the issue of how databases could benefit from these advancements. According to the researchers, their suggested TQP was created to achieve three goals:
- Engineered with the utmost care.
The query processor’s performance should be comparable to that of specialist engines (e.g., it should be as performant as GPU databases on GPU devices). A query processor can operate on various hardware devices, from bespoke ASICs to CPUs and GPUs, working across generations and manufacturers. It’s a huge undertaking to create high-performance bespoke operators for each device backend. Instead of O(n), a strategy that is O(1) was aimed across the number of supported hardware.
TQP uses a typical architecture to assemble relational operators and machine learning models into tensor programming. There are two stages to the workflow:
1) Input queries are translated into an executable tensor program in the compilation step;
2) In the execution phase, input data is first transformed into tensors then fed into the compiled program to create the final query result.
There are four primary levels in the compilation phase:
The parsing layer converts an input SQL statement into an internal intermediate representation (IR) graph depicting the query’s physical plan. Then the canonicalization and optimization layer performs IR-to-IR transformations; the planning layer translates the IR graph generated in the previous layer into an operator plan, and the execution layer generates an executor from the operator plan.
The program controls data conversion into the tensor format by invoking the feeder operator and data transfers to/from device memory and scheduling operators in the specified device in the execution phase created by the compilation phase.
TQP achieves query execution time speedups of up to 20x over CPU-only systems and up to 5x over specialist GPU solutions, according to the findings. TQP further speeds up queries by combining ML predictions with SQL from start to finish, offering up to 5x faster performance than CPU baselines.
Overall, this demonstrates that the proposed TQP can use TCR advances and execute effectively on all supported hardware platforms. The paper Tensor Computation Runtimes for Query Processing is available on arXiv.