GPU-Accelerated Terminal Rendering in Rust

February 22, 2026·10 min read

RustWebGPU

Modern terminals still render text on the CPU, often at 30-60fps with visible flicker during rapid output. I wanted to see how fast we could go by moving the entire rendering pipeline to the GPU. The answer: 120fps on integrated graphics, with smooth scrolling through megabytes of output. But text on the GPU is surprisingly hard.

Architecture overview

The terminal is split into two processes: the PTY manager (handles the shell process, parses ANSI escape sequences, and maintains the grid state) and the renderer (takes the grid state and draws it via WebGPU). Communication happens over a shared ring buffer in memory.

Created snippet.rust

+17

+struct GridCell {

+ character: char,

+ fg_color: Color,

+ bg_color: Color,

+ bold: bool,

+ italic: bool,

+ underline: bool,

+struct Grid {

+ cells: Vec<GridCell>,

+ cursor_x: usize,

+ cursor_y: usize,

+ cols: usize,

+ rows: usize,

+ scrollback: VecDeque<Vec<GridCell>>,

Glyph caching with a texture atlas

The naive approach — uploading each character as a separate texture — would be catastrophically slow. Instead, we maintain a texture atlas: a single large texture containing rasterized glyphs for every character-glyph combination we've seen. When a new glyph is needed, we render it with a CPU-side font rasterizer (using FreeType via the freetype-rs crate) and upload just that glyph's rectangle to the atlas.

For a typical terminal session, the atlas stabilizes quickly — most people use at most a few hundred distinct characters. The GPU then draws each cell as a textured quad, with instanced rendering to batch all cells in a single draw call.

Created snippet.rust

+16

+// Instance data for each grid cell — uploaded as a GPU buffer

+#[repr(C)]

+struct CellInstance {

+ position: [f32; 2], // screen position

+ tex_coords: [f32; 4], // atlas rectangle

+ fg_color: [f32; 4], // foreground color

+ bg_color: [f32; 4], // background color

+ flags: u32, // bold, italic, underline bits

+// Single draw call renders all visible cells

+render_pass.draw_indexed(

+ 0..6, // quad indices

+ 0, // base vertex

+ 0..visible_cells as u32, // instance range

+);

The text shaping surprise

I expected GPU rasterization to be the bottleneck. It wasn't. The bottleneck was text shaping — converting a sequence of characters into positioned glyphs, accounting for kerning, ligatures, and bidirectional text. This is inherently CPU-bound and sequential. For a 200×80 terminal (16,000 cells), running HarfBuzz shaping on every frame at 120fps means processing nearly 2 million cells per second.

The solution was a shaping cache keyed by (font, characters, features). Most cells don't change between frames, and repeated shaping queries hit the cache. With a 10,000-entry LRU cache, the shaping hit rate exceeds 95% for typical workloads, bringing CPU usage down to under 5% per frame.

GPU rendering achieved 120fps on integrated graphics with 200x80 cells
Text shaping, not rasterization, was the actual bottleneck
A shaping cache with 10k entries achieves >95% hit rate in practice
ANSI escape sequence parsing is a second CPU bottleneck under rapid output
The ring buffer between PTY manager and renderer keeps latency under 1ms

Is this practical?

For most users, the difference between 60fps and 120fps terminal rendering is subtle. But for specific workloads — watching build output scroll by, tailing high-throughput logs, running terminal-based visualizations — the smoothness is genuinely noticeable. More importantly, offloading rendering to the GPU frees the CPU for the shell and applications you actually care about.

← Back to all posts