Striking Performance: Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows

Generative AI is a crucial trend in personal computing, impacting gaming, creativity, video, productivity, and development. And GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.

Table of Contents

NVIDIA RTX GPUs Power New AI Capabilities – TensorRT-LLM for Windows, RTX VSR 1.5 Update, and Stable Diffusion TensorRT Accelerations

Generative AI on PC is now significantly faster, up to 4 times, with TensorRT-LLM for Windows. This open-source library accelerates the performance of inference for advanced AI language models such as Llama 2 and Code Llama. This development follows the previous announcement of TensorRT-LLM for data

NVIDIA has provided tools to assist developers in enhancing LLM acceleration. These tools include scripts for optimizing custom models with TensorRT-LLM, open-source models optimized with TensorRT, and a developer reference project demonstrating the speed and quality of LLM

- Advertisement -

The Automatic1111 distribution’s Web UI now supports TensorRT acceleration for Stable Diffusion. This feature improves the speed of the generative AI diffusion model by up to 2 times compared to the previous fastest implementation.

NVIDIA has released version 1.5 of RTX Video Super Resolution (VSR) as part of today’s Game Ready Driver. It will also be included in the upcoming NVIDIA Studio Driver, set to release early next month

Supercharging LLMs

LLMs play a crucial role in boosting productivity by facilitating various tasks such as chat interaction, document summarization, email and blog drafting. They are also essential components in the development of AI and other software that can automatically analyze data and generate a wide range of content.

- Advertisement -

TensorRT-LLM, NVIDIA’s library for accelerating LLM inference, gives developers and end users the benefit of LLMs that now can operate up to 4x faster on RTX-powered Windows PCs.

At larger batch sizes, this acceleration greatly enhances the experience for advanced LLM applications, such as writing and coding assistants, which can provide multiple distinct auto-complete suggestions simultaneously. This leads to faster performance and better quality, enabling users to choose the optimal suggestion among them.

TensorRT-LLM acceleration is advantageous when combining LLM capabilities with other technologies, like retrieval-augmented generation (RAG), which pairs an LLM with a vector library or database. RAG allows the LLM to provide tailored responses using specific datasets, such as a user’s email or website articles, for more precise answers.

The LLaMa 2 base model provided an unhelpful response stating that the integrations of NVIDIA technology in Alan Wake 2 had not been announced yet when asked about it.

With the utilization of RAG and the inclusion of recent GeForce news articles in a vector library, connected to the Llama 2 model, the accurate answer was obtained promptly. This approach, along with TensorRT-LLM acceleration, offers users faster and more intelligent solutions.

TensorRT-LLM will soon be available to download from the NVIDIA Developer website. TensorRT-optimized open source models and the RAG demo with GeForce news as a sample project are available at ngc.nvidia.com and GitHub.com/NVIDIA.

Automatic Acceleration

Diffusion models, such as Stable Diffusion, are employed for envisioning and producing impressive and original artwork. The generation of images involves iterations, which may require numerous cycles to achieve the desired output. However, when performed on a low-powered computer, this iterative process can result in hours of waiting time.

TensorRT is a specialized tool that enhances AI models by combining layers, optimizing precision, tuning kernels, and more, leading to improved efficiency and speed during inference. It is crucial for real-time applications and demanding computational tasks.

TensorRT now doubles the speed of Stable Diffusion

Stable Diffusion with TensorRT acceleration, compatible with WebUI from Automatic1111 and available for download, enables faster iterations and reduced waiting time, resulting in quicker image processing. On a GeForce RTX 4090, it outperforms the top Mac implementation using Apple M2 Ultra by 7x.

The TensorRT demo shows developers how to prepare and accelerate diffusion models using TensorRT. It serves as a starting point for turbocharging diffusion pipelines and enabling fast inferencing in applications.

Striking Performance: Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows

Video That’s Super

AI is enhancing various PC experiences for users, and streaming video from platforms like YouTube, Twitch, Prime Video, Disney+, and more is a widespread activity. Thanks to AI and RTX, there are improvements in the image quality of streaming videos.

RTX VSR is an innovative AI pixel processing technology that enhances the quality of streamed video content. It effectively reduces or eliminates artifacts caused by video compression, while simultaneously improving edge sharpness and detail.

The latest update, RTX VSR version 1.5, enhances visual quality by improving models, removing visual disturbances in native resolution playback, and supporting both professional RTX and GeForce RTX 20 Series GPUs based on the Turing architecture.

Retraining the VSR AI model helped it learn to accurately identify the difference between subtle details and compression artifacts. As a result, AI-enhanced images more accurately preserve details during the upscale process. Finer details are more visible, and the overall image looks sharper and crisper.New with

Version 1.5 of the software improves the quality of videos played at the display’s native resolution. Unlike the previous release, it now enhances video even when it is not being upscaled. For instance, when streaming 1080p video to a 1080p display,

The RTX VSR 1.5 is now accessible in the latest Game Ready Driver for all RTX users. It will also be included in the forthcoming NVIDIA Studio Driver, set to release in early next month

RTX VSR, along with other NVIDIA software and tools like DLSS, Omniverse, and AI Workbench, has played a significant role in bringing more than 400 AI-enabled apps and games to the market.

The AI era is here, and RTX is enhancing its advancement.

Tags
NVIDIA

Striking Performance: Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows

NVIDIA RTX GPUs Power New AI Capabilities – TensorRT-LLM for Windows, RTX VSR 1.5 Update, and Stable Diffusion TensorRT Accelerations

Supercharging LLMs

Automatic Acceleration

TensorRT now doubles the speed of Stable Diffusion

Video That’s Super

LEAVE A REPLY Cancel reply

Popular

Related Stories

More from author

About us

Most recent

Most popular

Subscribe