WebGPU Introduction

WebGPU is an API that exposes GPU to Web. It is a successor of WebGL. WebGPU is currently supported on Chromium and the browsers based on it (Chrome, Microsoft Edge, Opera etc.) as well as on Firefox Nightly. Unfortunately you might have some issues with setting it appropriately outside of Windows. See the current implementation status. You can write around WebGPU in standard browser's Javascript, but there is also a popular Rust implementation (used by Firefox) named wgpu. This article uses JavaScript, but wgpu should be similar in nature.

Code used in this tutorial: https://github.com/akmere/webgpu-tutorial

Infrastructure Overview

GPUAdapter is the first thing you create. It corresponds to a physical GPU.

if (navigator.gpu === undefined) {
  alert("WebGPU is not supported.");
  return;
}
console.log("WebGPU is supported.");
const adapter = await navigator.gpu.requestAdapter();
if (adapter === null) {
  alert("No GPU adapter found.");
  return;
}
const adapterInfo = await adapter.requestAdapterInfo();
console.log(adapterInfo);

GPUDevice is the main interface you use to access GPU functionality. Provided features and limits will influence what and how can be used later. For most purposes they can be left empty.

const device = await adapter.requestDevice({
  label: "device",
  requiredFeatures: [],
  requiredLimits: {},
});

Pipelines define the structure base on which work will be performed. There is GPURenderPipeline to render graphics and GPUComputePipeline to perform general computations, some of which may be more efficient on GPU than on CPU. This structuring includes defining the required inputs that will be transferred by the use of GPUBuffers or GPUTextures to graphical functions. Some related inputs can be logically grouped in GPUBindGroups. Render pipelines will use given data to render graphics on some output texture e.g. HTML canvas element. Compute pipelines will just use the data from buffers, do some math, and return the results through buffers to us.
Shaders, named so for historical reasons, are the functions that run on GPU. Here comes one complication, there is a dedicated language to write them in, WGSL, WebGPU Shading Language, not quite creative name. It has an uncanny resemblance to Rust. To provide our WGSL code to the pipeline we wrap it in GPUShaderModule.
There are 3 kinds of shaders:

vertex shader runs for each vertex, returs its 3D position and can pass data to fragment shader. Used in render pipeline
fragment shader runs for every pixel between the vertices and returns the color. Used in render pipeline
compute shader performs calculations and returns data to us by writing to buffers. Used in compute pipeline

Passes are responsible for putting things together. They specify which pipelines will be used in the process and actually provide data for them. Correspondingly to pipelines there are GPURenderPass and GPUComputePass.
GPUCommandEncoder is the master of the passes, it encodes them and after it is finished, it can be submitted to GPUQueue to be finally sent for execution by GPU.

Render Example

Let our aim be to draw a red triangle. We need a texture to draw it onto so we will use an HTML canvas element.

const canvas = document.createElement("canvas");
canvas.width = 640;
canvas.height = 480;
document.body.appendChild(canvas);
const context = canvas.getContext("webgpu");
if (context === null) {
  console.log("Failed to get rendering context.");
  return;
}
context.configure({
  device,
  format: "bgra8unorm",
});

In modern graphics a triangle is the most popular primitive, a universal building unit of any shape. Every triangle is described by 3 vertices. Every vertex is described by its coordinates. We can build a circle from triangles, the more triangles we use, they will be more difficult to spot. So to project our triangle we need to specify 3 vertices and their coordinates. WebGPU uses coordinate system called Normalized device coordinates (NDC).

x \in (-1,1) \; y \in (-1,1) \; z \in (0,1)

However the output of a vertex shader is a position in so called clip space. There are 4 clip space coordinates: x, y, z, w. The relation between clip space coordinates and NDC is as follows. p is a clip space position.

p = (p.x,p.y,p.z,p.w) \\ ndc = (\frac{p.x}{p.w},\frac{p.y}{p.w},\frac{p.z}{p.w})

If clip space w coordinate is set to 1, then there the transformation to NDC does not make any difference. If clip space coordinates outputted from our vertex shader and transformed like above to NDC coordinates are outside the given ranges, our vertex will not be rendered. Let's look at some examples.

$p_1 = (0,0,0,1)$
$ndc_1 = (\frac{0}{1},\frac{0}{1},\frac{0}{1}) = (0,0,0)$
$p_2 = (0.5,1.5,0,1)$
$ndc_2 = (\frac{0.5}{1},\frac{1.5}{1},\frac{0}{1}) = (0.5,1.5,0)$
$p_3 = (2.0,3.0,1,10)$
$ndc_3 = (\frac{2.0}{10},\frac{3.0}{10},\frac{1}{10}) = (0.2,0.3,0.1)$
$ndc_1$ and $ndc_3$ are visible and $ndc_2$ is not, because its y coordinate is above 1.

We want to draw one simple triangle, 2D is enough for this so we can ignore z coordinate. We will also always set w to 1. The standard input for a vertex shader is a vertex buffer with vertices inside it. both x and y coordinates needs to be between -1 and 1. Let us set.
$v_0 = (-0.75, -0.75, 0, 1)$
$v_1 = (0, 0.75, 0, 1)$
$v_2 = (0.75, -0.75, 0, 1)$
To transmit our vertices data correctly we need to use Float32Array structure.

const VERTICES = Float32Array.from([
    -0.75, -0.75, 0, 1,
    0, 0.75, 0, 1,
    0.75, -0.75, 0, 1,
]);

Now all data that goes into and out of shaders finds itself in buffers. We must create a buffer and write our data into it!

const vertexBuffer = device.createBuffer({
  size: VERTICES.byteLength,
  usage: GPUBufferUsage.VERTEX,
});

We work with pretty low-level stuff here, every buffer must have length specified when it is created. There is also usage which will specify what our buffer can really do. There can be more than one usage specified, some cannot work with others. After creation we can write data into it using GPUQueue, special structure that works with data around CPU-GPU boundary.

device.queue.writeBuffer(vertexBuffer, 0, VERTICES);

You can imagine that when using vertexBuffer next time, you are just passing around memory address, you cannot get data out of it directly, you must use intermediary i.e. GPUQueue.
For this example we will use only this one buffer. This will be the only data transmitted between CPU and GPU. Now we know what our vertex shader's input will be and what its output ought to be (clip coordinates). Create shader.wgsl file in the same folder as your JS file.

@vertex
fn vs(@location(0) position: vec4<f32>) -> @builtin(position)  vec4<f32> {
    return position;
}

@fragment
fn fs(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> {
    return vec4<f32>(1.0, 0.0, 0.0, 1.0);
}

Sometimes, as for this example, vertex and fragment shaders are located in the same file, vertex shader is annotated with @vertex and fragment shader with @fragment. Names itself, here vs and fs are not important. We perform no transformations in our vertex shader, we will just output our input! Why we are annotating position parameter with @location(0) will be clear soon, it is specified by us in our CPU code. Fragment shader should output the rgba color in 0-1 scale so (1,0,0,1) should give us red. We pack our shader code in GPUShaderModule.

const shaderCode = await fetch("./shader.wgsl").then((res) => res.text());
  const shaderModule = device.createShaderModule({
    code: shaderCode,
  });

GPU workload is structured by pipelines. We want to render graphics therefore GPURenderPipeline will be used and we must mention somewhere out there our cute little buffer and shaders' code that should process it. Let's create our pipeline's descriptor. Descriptors are often used in WebGPU as an only input to WebGPU objects' creator functions.

const pipelineDescriptor: GPURenderPipelineDescriptor = {
  vertex: {
    module: shaderModule,
    entryPoint: "vs",
    buffers: [
      {
        arrayStride: 16,
        stepMode: "vertex",
        attributes: [
          {
            shaderLocation: 0,
            offset: 0,
            format: "float32x4",
          },
        ],
      },
    ],
  },
  fragment: {
    module: shaderModule,
    entryPoint: "fs",
    targets: [
      {
        format: navigator.gpu.getPreferredCanvasFormat(),
      },
    ],
  },
  primitive: {
    topology: "triangle-list",
  },
  layout: "auto",
};

Our pipeline descriptor has vertex attribute that specifies our vertex buffers and where they will be used. Where is explained by module containing vertex shader code and entryPoint specifying vertex shader's name. Our buffers list has only one element. arrayStride is in bytes and specifies how long single vertex data in our buffer is. We use Float32Array (32 bits), every vertex is specified by 4 values, 32 bits * 4 = 128 bits = 16 bytes. stepMode can take two values: vertex and instance. Vertex step mode means that for every vertex shader will read next data from the buffer. Instance step mode means that for every vertex inside one instance (in our case triangle) shader will use the same data and read next data for the next instance. In buffer's attributes we specify parts of our buffer, we have only one, position. We assign to it location = 0, that's where location in our shader input is from. offset here means bytes from the beginning of the buffer to this attribute. We have only one attribute, there is nothing before so offset is equal to 0. format corresponds to the format of array our buffer was written to from (Float32Array). We will leave the rest of our descriptor for later but look only at primitive and topology, here we specify that we want to operate on triangles and because of it every instance means 3 vertices.
We have our pipeline descriptor so let's create a pipeline!

const renderPipeline = device.createRenderPipeline(pipelineDescriptor);

Now let's put our structure and filling together. Create command encoder!

const commandEncoder = device.createCommandEncoder();

We will specify our instructions in it and later send them via GPUQueue to GPU. Hierarchy is a bit more extensive, GPUCommandEncoder manages instances of GPURenderPass and GPUComputePass. We are only interested in rendering so let's begin GPURenderPass, but first... a descriptor.

const renderPassDescriptor: GPURenderPassDescriptor = {
  colorAttachments: [
    {
      clearValue: { r: 0, g: 0.5, b: 0, a: 1 },
      loadOp: "clear",
      storeOp: "store",
      view: context.getCurrentTexture().createView(),
    },
  ],
};

GPURenderPassDescriptor has more attributes, but these basics are enough for now. In clearValue we specify color with which our texture will be filled right before drawing other things on it. In view we at last tell GPU where things should be drawn. Now we create GPURenderPassEncoder from the descriptor.

const renderPass = commandEncoder.beginRenderPass(renderPassDescriptor);

We use this RenderPassEncoder to pass to it our structure and data.

renderPass.setPipeline(renderPipeline);
renderPass.setVertexBuffer(0, vertexBuffer);
renderPass.draw(3);

We set previously prepared pipeline, assign our vertexBuffer to the location specified earlier in the pipeline and draw 3 vertices. If we try to draw more, we will try to access nonexistent data and get an error.
But nothing happens yet. Our render pass began but never ended.

renderPass.end();

Still nothing. We still have to put GPU to work by our favorite intermediary, GPUQueue!

device.queue.submit([commandEncoder.finish()]);

Success!