Files
dusk/.claude/display-refactor.md
T
2026-06-18 14:59:21 -05:00

30 KiB

Display Layer Refactor

Vision

The goal is to remove the implicit assumption that all platforms render through a GL-like API, and replace it with a system where each platform owns its rendering stack completely. The scene describes what to draw in platform-neutral terms; the platform decides how to draw it.

This unlocks:

  • Saturn (VDP1/VDP2 command-list, no Z-buffer, affine-only)
  • PlayStation 1 (ordering table, affine textures, GTE fixed-point, CMake SDK)
  • Nintendo 64 (RSP display list, hardware Z-buffer, perspective-correct, real FPU -- closer to modern GL than to Saturn)
  • SNES (PPU tile engine, Mode7 for overworld, no real 3D)
  • Vulkan (explicit, modern, no legacy GL baggage)
  • Native PSP GU (drop PSPGL which is just a compatibility shim)
  • Legacy fixed-function GL as its own standalone target
  • A real first-class 2D UI system not bolted onto 3D space

Why

The current abstraction assumes GPU-style rendering

The current display layer was designed around a GL-like mental model: vertex buffers, shaders, Z-buffered triangle rasterization, and texture objects. duskgl implements this with real OpenGL. duskdolphin does its own GX thing but still matches the same interface (mesh, shader, texture, framebuffer). PSP uses PSPGL -- a library that emulates GL on top of the PSP's native GE/GU hardware, which is entirely different underneath.

Problems this creates:

PSPGL is a lie. The PSP has a native graphics engine (GE/GU) with its own command list, its own vertex formats, and its own display list model. PSPGL translates GL calls into GU calls, but imperfectly -- and we end up paying the abstraction cost without getting GL correctness. Writing directly to GU gives better performance, access to native formats, and correct behavior on edge cases that PSPGL gets wrong.

Legacy GL should not share code with modern GL. The fixed-function pipeline (no shaders, matrix stacks via glMatrixMode, glTexEnv) is meaningfully different from modern GL (VAO/VBO, GLSL, explicit uniform locations). Treating them as "the same thing with a flag" creates a tangle of #ifdef DUSK_OPENGL_LEGACY guards throughout the rendering code. They are separate targets and should be separate platform directories.

Saturn cannot fit the model at all. VDP1 is a command-list processor: you write 32-byte command structs (sprites, quads, lines) into VRAM, then poke a register to trigger execution. There are no vertex buffers, no shaders, no Z-buffer. Depth is pure painter's algorithm -- command order IS the depth. VDP2 composites up to 6 background planes at scanline time; these are tile maps and rotation parameter tables, not meshes. Nothing about the current API maps onto this hardware.

SNES is even further removed. The PPU renders tiles. VRAM holds 8x8 or 16x16 pixel tiles and tile maps; the PPU references these during scanline rendering. There are no draw calls. Mode7 is an affine transform applied to a single background layer (the basis for the overworld map and road perspective effects). Sprites are entries in OAM (Object Attribute Memory). The 65816 CPU writes to memory-mapped registers and VRAM; the PPU does the rest. The concept of "mesh" or "shader" is meaningless here.

Textures loaded as RGBA waste memory and exclude platforms. Loading every texture as 32-bit RGBA and converting at runtime is expensive on memory-constrained platforms (Saturn has ~1 MB total RAM; SNES has 64 KB VRAM) and simply wrong for platforms that have native formats incompatible with RGBA (e.g., PSP's ABGR8888 / BGR5650, Saturn's RGB555 / CI4 / CI8, SNES's 2bpp/4bpp/8bpp indexed). The asset pipeline must compile textures to platform-native formats at build time.

UI in 3D space is wasteful and limiting. Currently UI elements are rendered as geometry projected into screen space, going through the full 3D pipeline. On platforms with dedicated 2D hardware (Saturn VDP2, SNES BG layers), this is actively wrong -- UI should map to a hardware plane, not a 3D draw call. On modern platforms it should be a clean screen-space pass that never touches the 3D depth buffer.


Current Model (Summary)

Scene
  -> shaderBind(shader)
  -> textureBind(texture)
  -> meshDraw(mesh)          <-- immediate draw call per object
  -> meshDraw(mesh)
  -> ...
Platform receives each draw call immediately.
Depth is handled by Z-buffer hardware.
All textures live in GPU memory as RGBA (or Dolphin's tiled RGBA).
UI is rendered as 3D geometry with an orthographic projection.

Key current concepts:

  • mesh_t -- vertex array (triangles/quads), in GPU VBO (GL) or CPU memory (Dolphin)
  • shader_t -- GLSL program (modern GL), GL fixed-function state (legacy GL), or GX matrix + TEV config (Dolphin)
  • texture_t -- GPU texture handle (GL) or tiled CPU buffer (Dolphin); always RGBA at the engine level
  • framebuffer_t -- FBO (GL) or fixed hardware XFB (Dolphin)
  • spritebatch_t -- accumulates 2D quads and flushes in batches of 32; the only existing deferred-submission system in the engine

The spritebatch hints at the right model. Everything needs to work this way.


The Core Shift: Platform-Native Rendering

Before

src/dusk/       Core engine + GL-like rendering API definition
src/duskgl/     OpenGL implementation
src/dusksdl2/   SDL2 window/input (shared)
src/duskpsp/    PSP via PSPGL (shim over GU)
src/duskvita/   Vita via GL ES (similar path to duskgl)
src/duskdolphin/ GameCube/Wii via GX (already custom)
src/dusklinux/  Linux (uses dusksdl2 + duskgl)

After

src/dusk/          Core engine logic + render intent API ONLY
src/dusksdl2/      SDL2 window/input (unchanged)
src/duskgl/        Modern OpenGL (Linux, Vita modern path)
src/duskgllegacy/  Fixed-function OpenGL (older hardware, PSP with PSPGL
                   as a last resort)
src/duskvulkan/    Vulkan (Linux modern, future)
src/duskpsp/       PSP native GU (no PSPGL, direct command lists)
src/duskvita/      Vita native GXM (TBD)
src/duskdolphin/   GameCube/Wii GX (already custom, mostly kept)
src/dusksaturn/    Saturn VDP1/VDP2 (new)
src/duskps1/       PlayStation 1 ordering table + GTE (new)
src/duskn64/       Nintendo 64 RSP/RDP display list (new)
src/dusksnes/      SNES PPU/Mode7 (new, extremely constrained)

src/dusk/ no longer knows about meshes, shaders, or framebuffers. It defines the render intent system: what the scene wants to draw. Each platform directory is entirely self-contained and responsible for translating intents to its native API.


Render Intent System (new)

Instead of the scene calling meshDraw() or shaderBind(), it submits render intents into a renderqueue_t. An intent describes what should appear on screen without prescribing how to draw it.

Primitive intents (3D world)

RENDER_INTENT_QUAD      -- textured quad, 4 vertices or transform + size
RENDER_INTENT_POLYGON   -- filled polygon (convex, up to N vertices)
RENDER_INTENT_LINE      -- line segment or polyline
RENDER_INTENT_SPRITE    -- 2D billboard (always faces camera)
RENDER_INTENT_MESH      -- arbitrary vertex array (GL/GX only; degraded
                           on command-list platforms)

Each intent carries: texture reference, color/tint, depth hint (for painter's algorithm sorting), blend mode, and cull flags.

Background plane intents (2D layers)

RENDER_INTENT_BGPLANE   -- configure a background/tilemap layer

Carries: layer index, tile map data reference, scroll offset, palette, and transform (for Mode7-style affine).

UI intents (screen space)

RENDER_INTENT_UI_RECT   -- solid colored rectangle
RENDER_INTENT_UI_SPRITE -- textured rectangle (UI image)
RENDER_INTENT_UI_TEXT   -- text string at screen position

UI intents are always screen-space. They are never mixed into the 3D world queue. See UI System section below.

Platform translation

Intent Modern GL PSP GU Saturn VDP1 PS1 OT N64 RSP SNES PPU
QUAD VAO + glDraw GU display list distorted-sprite cmd GPU quad packet RSP display list OAM + BG tile
POLYGON VAO + glDraw GU display list polygon cmd GPU poly packet RSP display list OAM
BGPLANE fullscreen quad fullscreen quad VDP2 config fullscreen quad fullscreen quad BG layer config
UI_SPRITE 2D ortho quad 2D GU quad VDP2 BG plane GPU rect packet RDP rectangle BG layer tile
MESH VAO/VBO GU buffers (degrade: quads) (degrade: tris/quads) RSP display list (not supported)

Note: N64 supports both triangles and axis-aligned rectangles natively via RDP. PS1 supports triangles and quads (4-vertex) natively, so neither needs the dead-vertex trick that Saturn requires.


Asset Pipeline: Platform-Native Formats

The problem

All textures currently enter the engine as RGBA and are converted at runtime by each platform (Dolphin retiles to 4x4 blocks; GL uploads as-is). This wastes memory and CPU time, and is impossible for platforms where RGBA is not a valid intermediate format at all.

The solution

The asset compiler (offline, run at build time) produces platform-specific binary bundles. A texture asset has one source (PNG or similar) but N compiled outputs, one per target.

Texture formats by platform

Platform Native Formats Notes
Modern GL RGBA8, RGB8, BC1-BC7 (compressed) Upload directly, GPU handles
Legacy GL RGBA8, RGB8, CI8 (palette via extension) No compressed formats
Vulkan VkFormat variants (RGBA8, BC, ASTC) Chosen at compile time
PSP GU ABGR8888, BGR5650, ABGR1555, ABGR4444, CI4, CI8 Native swizzled format
Saturn VDP1/VDP2 RGB555, CI4, CI8 (15-bit palette in CRAM) Big-endian, packed
PlayStation 1 RGB555 / CI4 / CI8 (CLUT in VRAM) Little-endian; VRAM flat; CLUT at coord
Nintendo 64 RGBA16, RGBA32, IA4-IA16, I4-I8, CI4, CI8 4 KB TMEM; tiles must fit in TMEM banks
GameCube/Wii GX I4, I8, IA4, IA8, RGB565, RGB5A3, RGBA8, CMPR 4x4 tiled, big-endian
SNES PPU 2bpp, 4bpp, 8bpp indexed (CGRAM palette) Tile-packed, no direct access

Asset bundle structure

The .dsk bundle gains a platform tag. The loader picks the right section at runtime (or the build produces a single-platform bundle for constrained targets like SNES/Saturn where there is no spare storage for unused data).


UI System (first-class)

Current problem

UI elements go through the 3D pipeline: they are meshes with an orthographic shader, rendered in the same pass as the world. This means:

  • UI competes for Z-buffer depth with world geometry
  • On Saturn/SNES, UI cannot use dedicated hardware planes
  • Text rendering is tied to the sprite batch which is tied to the 3D pass
  • No separation between "draw the world" and "draw the HUD"

New model

UI is a completely separate rendering context. The world renders first, then the UI renders on top. They share no state.

UI coordinates are always in screen space (pixels or a logical resolution that the platform scales to its native display size). No camera matrix, no projection, no depth buffer involvement.

Platform mapping

Platform UI implementation
Modern GL Separate 2D ortho pass, screen-space quads, no depth test
Legacy GL Same, using fixed-function
PSP GU Separate GU display list, 2D mode
Saturn VDP2 background plane(s) dedicated to UI
PlayStation 1 Separate GPU packet chain, no Z; ordered after world OT
Nintendo 64 RDP rectangle commands in a separate display list segment
GameCube/Wii GX 2D mode or dedicated GX pass
SNES Dedicated BG layer(s) for HUD tiles

On Saturn, the UI occupying VDP2 planes is a genuine hardware win -- the PPU composites it for free at scanline time, costing zero VDP1 commands. On SNES, the HUD must live in a BG layer because there is no alternative.

UI API (proposed)

uiBegin();
  uiDrawRect(x, y, w, h, color);
  uiDrawSprite(x, y, w, h, texture, uvMin, uvMax);
  uiDrawText(x, y, font, string);
uiEnd();   // platform flushes UI to hardware

The uiBegin/uiEnd block collects intents; the platform submits them at frame end in whatever way is appropriate.


SNES / Mode7

SNES is the most constrained platform the engine will ever support and needs its own section because it breaks assumptions that even Saturn keeps.

Hardware

  • CPU: 65816 @ ~3.58 MHz (16-bit, no FPU, no cache)
  • PPU: Tile-based scanline renderer. VRAM holds tile graphics and tile maps. BG layers reference tiles by index.
  • Mode7: A single BG layer with a 2D affine matrix applied per scanline. Used for overworld maps, road perspective (F-Zero), rotation effects. The matrix is set via HDMA (scanline DMA) for per-scanline variation, enabling horizon-perspective effects.
  • Sprites/OAM: Up to 128 sprites (8x8, 16x16, 32x32, 64x64 pixels), 4bpp indexed, up to 8 per scanline.
  • Palette: CGRAM holds 256 entries of 15-bit RGB (512 bytes total). BG layers use sub-palettes of 4/16/256 colors depending on bit depth.
  • VRAM: 64 KB (tiles + tile maps)
  • WRAM: 128 KB work RAM + usually 8 KB SRAM on cart for saves
  • No frame buffer. The PPU renders scanlines directly. You cannot read back what was drawn.
  • No general-purpose draw calls. You configure registers and VRAM before the frame and the PPU does the rest.

What "3D" means on SNES

True 3D is not possible. What can be approximated:

  • Overworld map: Mode7 with a flat texture and HDMA scroll gives a top-down perspective with a horizon line (the classic JRPG overworld).
  • Depth illusion: Mode7 matrix manipulation can simulate a moving camera over flat terrain. Objects are sprites placed at screen positions calculated by software perspective projection.
  • Sprite scaling: Software-scaled sprites using pre-rendered frames or the RSP-style tricks used in Super FX games (Star Fox). Super FX is a co-processor on the cartridge -- base SNES cannot do this.
  • Basic 3D effects: Some games use HDMA color gradient + Mode7 floor with overlaid sprites to create a pseudo-3D look.

The engine plan for SNES: Mode7 overworld (confirmed), sprite-based world objects, BG layer UI. "Basic 3D effects" (pseudo-perspective with sprites) is aspirational -- implementation complexity TBD.

SNES constraints on the engine

  • No dynamic allocation. With 128 KB WRAM, a general-purpose allocator is risky. The engine memory system may need a static pool mode for SNES.
  • No floating point. float_t must resolve to integer or fixed-point.
  • No scripting (JerryScript). The JS engine requires far more than 128 KB RAM. SNES scenes must be compiled C.
  • Asset data in ROM, not a .dsk bundle. SNES loads from cartridge ROM mapped into the address space. The asset system needs a ROM-mapped loader.
  • Tile pipeline. Textures must be pre-converted to SNES tile format (2bpp/4bpp/8bpp, 8x8 pixel tiles, CGRAM palette) at build time. This is a completely different asset output from every other platform.

Platform Inventory

A summary of what each platform's native rendering looks like after the refactor, for reference when designing the intent API.

Modern OpenGL (duskgl)

VAO + VBO mesh storage, GLSL shaders, FBO render targets, Z-buffer. No fixed-function. Targets: Linux, possibly Vita (GXM is preferred).

Legacy OpenGL (duskgllegacy)

Fixed-function pipeline: glMatrixMode, glTexEnv, client-side vertex arrays. No VAO/VBO. Used for: very old desktop hardware, maybe PSP as last resort (PSPGL is this). Targets: legacy desktop, embedded Linux.

Vulkan (duskvulkan)

Explicit pipeline state objects, render passes, descriptor sets, command buffers. Highest ceiling for performance and control. Targets: Linux (modern), future platforms. Not immediate priority but the architecture should not block it.

PSP native GU (duskpsp)

The GE/GU is a display-list GPU. You build a command list in memory and the GU DMA engine processes it asynchronously. Native vertex formats are PSP-specific (ABGR byte order, swizzled textures for cache efficiency). No PSPGL. Targets: PSP hardware and emulators.

Vita (duskvita)

GXM is Sony's Vita GPU API -- closer to modern GL than GU, with explicit shader binaries (.gxp), ring buffers, and GPU sync primitives.

GameCube/Wii GX (duskdolphin)

Already a custom renderer. GX uses immediate-mode vertex submission (GX_Begin / GX_Position1x16 loops), TEV for texture compositing, and hardware XFB double-buffering. Big-endian. Mostly kept as-is; may benefit from being expressed in terms of render intents for consistency.

Saturn VDP1/VDP2 (dusksaturn)

VDP1: command-list (32-byte structs), quad-based, affine texture mapping, no Z-buffer (painter's algorithm). VDP2: up to 6 background planes composited at scanline time. Big-endian dual SH-2, no FPU. Fixed-point math required throughout.

PlayStation 1 (duskps1)

MIPS R3000A @ 33.87 MHz, little-endian, no FPU. GTE (coprocessor 2) handles fixed-point matrix math, perspective divide, and lighting. GPU receives packets via DMA linked-list (the Ordering Table). Primitives: triangles and quads natively (no dead-vertex needed). Texture mapping: affine, same limitation as Saturn. No Z-buffer; depth is OT slot order. VRAM is 1 MB flat (frame buffers + textures + CLUTs share it). SDK: PSn00bSDK, which is CMake-native -- a direct fit for the dusk build system.

Nintendo 64 (duskn64)

VR4300 @ 93.75 MHz, big-endian, real IEEE 754 FPU. Rendering is split between the RSP (geometry: programmable MIPS SIMD, runs microcode up to ~1000 instructions in 4 KB IMEM) and the RDP (rasterization: fixed hardware). RSP produces triangle commands from a CPU-built display list in RDRAM. RDP features: perspective-correct texture mapping, bilinear filtering, hardware Z-buffer. Primitives: triangles and axis-aligned rects. TMEM is 4 KB on-chip texture cache; textures must be loaded into tiles before drawing -- a significant memory management constraint. SDK: libdragon (Unlicense, GCC 14, Makefile-based -- not CMake; this requires a wrapper toolchain file for dusk's build system).

SNES PPU/Mode7 (dusksnes)

Tile-based. VRAM holds tiles and tile maps. Mode7 provides affine transform for one BG layer. Sprites via OAM. No frame buffer. All configuration is memory-mapped registers. 65816 CPU, no FPU, extremely limited RAM.


Threading Model

Current model

The engine uses OS threads for async asset loading (assetXxxLoaderAsync). Platforms that have pthreads or an equivalent RTOS (Linux, PSP, Vita) run worker threads that load data in the background while the game loop runs. The main thread polls or blocks on completion.

The problem

Several target platforms have no OS threading whatsoever, and others have hardware-specific async mechanisms that are nothing like pthreads.

Per-platform reality

Platform Threading Async mechanism
Linux pthreads Worker threads (current)
Vita SceKernelThread Per-SDK threads
PSP SceKernelThread Per-SDK threads
GameCube/Wii libogc LWP Lightweight processes
Saturn None (OS) Slave SH-2 for fixed jobs; CD-ROM via interrupt/callback
PlayStation 1 None (OS) V-blank ISR, 7 DMA channels, CD-ROM callbacks
Nintendo 64 libdragon preview only PI DMA for cartridge; RSP for parallel compute
SNES None DMA (GPDMA/HDMA); NMI V-blank; SPC700 audio is a separate CPU

Saturn slave SH-2: The second SH-2 is not a general-purpose thread. It runs a fixed subroutine you hand-load. The typical use is offloading heavy per-frame computation (geometry transforms, depth sort) while the master SH-2 handles game logic. Communication is via shared WRAM with cache-through addresses to avoid coherency bugs. There is no scheduler and no yield -- it runs to completion.

SNES DMA: GPDMA copies blocks of data (ROM to WRAM, WRAM to VRAM) and halts the CPU for the duration -- it is synchronous from the game's perspective. HDMA runs per-scanline during H-blank, writing to PPU registers without CPU involvement; this is how Mode7 perspective is achieved. Neither is "async" in the programming sense.

SNES NMI: The V-blank NMI fires at the start of every V-blank period. This is the only safe window to write to VRAM and PPU registers. All critical PPU updates must complete within ~1.2ms (the V-blank window).

Proposed model

Introduce a compile-time threading capability flag:

DUSK_THREAD_PTHREAD    -- Linux, maybe Vita
DUSK_THREAD_SCEKERNEL  -- PSP, Vita SDK
DUSK_THREAD_LWP        -- GameCube/Wii libogc
DUSK_THREAD_SLAVE_SH2  -- Saturn slave CPU (job dispatch only)
DUSK_THREAD_NONE       -- SNES (and Saturn master thread view)

The asset loader's async path is gated on having a threading capability. When DUSK_THREAD_NONE is defined, assetXxxLoaderAsync either does not exist or is an alias for the synchronous version. On Saturn, the slave SH-2 is exposed as a distinct API (sh2JobDispatch, sh2JobWait) used only for compute-heavy work, not for I/O.

Asset loading without threads

Saturn: CD-ROM access is initiated via SBL/CDC routines and completes via interrupt callback. The engine's asset loading loop can poll the callback flag in the main loop rather than blocking a thread. This is interrupt-driven cooperative async, not preemptive.

SNES: There is no loading. Assets live in ROM, mapped directly into the 65816 address space. "Loading a texture" means computing a pointer into ROM and copying the tile data to VRAM during V-blank via GPDMA. The asset system on SNES is essentially a VRAM/CGRAM allocator and a DMA scheduler, not a file loader.

Asset system changes

The asset pipeline needs to accommodate three loading models:

  1. File-based (Linux, PSP, Vita, Saturn CD): open file, read bytes, close. Can be sync or thread-async.
  2. DMA/interrupt (Saturn CD-ROM, GC DVD): initiate transfer, poll or callback on completion, no thread blocked.
  3. ROM-mapped (SNES): data is already in the address space; "loading" is a VRAM DMA copy scheduled for V-blank, not file I/O.

The assetstream_t abstraction that currently wraps file I/O needs a third backend for ROM-mapped data, and the async path needs to support callback-based completion as an alternative to thread-based blocking.


What Needs to Change

1. Render intent API (new, in src/dusk/)

Replace mesh_t / shader_t / meshDraw() as scene-facing APIs with renderqueue_t and intent submission functions. src/dusk/ defines the intent types and submission API; platforms implement the flush.

2. Platform renderer directories

Move rendering implementations out of duskgl/ as a shared layer and into fully self-contained platform directories. duskgl/ becomes the modern GL platform only. Add duskgllegacy/, duskvulkan/ as peers.

3. Asset pipeline: platform-native texture formats

The offline asset compiler must produce per-platform texture bundles in native formats. The runtime texture loader expects pre-converted data, not RGBA. textureformat_t grows to cover all platform formats but each platform only ever sees the formats it natively supports.

4. UI system (first-class, separate from 3D)

New src/dusk/ui/ subsystem with uiBegin / uiEnd and intent types for rects, sprites, and text. Platforms implement the flush independently. The 3D spritebatch is retired or scoped to world-space billboards only.

5. Fixed-point / no-FPU math

float_t needs a fixed-point mode. Proposed: define fixed_t as a 16.16 signed integer; define DUSK_MATH_FIXED for platforms that require it (Saturn, SNES). Engine math utilities (mathSin, mathCos, etc.) have fixed-point implementations selected by this flag. float_t on FPU-less platforms becomes a typedef for fixed_t.

6. Background plane abstraction (bgplane_t)

New concept in src/dusk/display/bgplane/. A BG plane has a tile map or bitmap source, scroll offsets, a palette reference, and optional affine parameters (for Mode7-style use). On GL platforms: rendered as a fullscreen textured quad or shader pass. On Saturn: VDP2 config. On SNES: PPU BG layer config.

7. Memory system: static pool mode

For SNES (and possibly Saturn), the general-purpose allocator may be unviable. A compile-time static pool mode (DUSK_MEMORY_STATIC) that uses a fixed-size arena instead of dynamic allocation. All memoryAllocate calls hit the pool; memoryFree is a no-op or a stack pop.

8. Script runtime: optional

JerryScript requires too much RAM for SNES and is marginal on Saturn. The scripting system should be compile-time optional (DUSK_SCRIPTING), not assumed present. SNES/Saturn scenes would be compiled C.


What to Keep

  • Platform macro abstraction pattern (displayplatform.h, etc.) -- works, no reason to change.
  • Directory structure convention for platform directories.
  • Entity-component system -- platform-agnostic, unaffected.
  • Asset loading + .dsk bundle concept (extended for platform formats).
  • The broad subsystem layout: asset, input, display, log, network, save, system, time.

Open Questions

  1. Render intent granularity: How much does the intent API need to express? A MESH intent works on GL/N64 but degrades poorly on Saturn (must split into quads) and is impossible on SNES. Should MESH be a valid intent with a "best effort" contract, or excluded from the portable API entirely?

  2. Threading abstraction depth: Should DUSK_THREAD_SLAVE_SH2 be a first-class concept in the engine's job system, or a Saturn-internal implementation detail the core never sees? Same question applies to N64's RSP as a compute co-processor.

  3. Asset loading async contract: When a platform has no threads, should assetLoadAsync be a no-op alias for assetLoadSync, or return immediately with a completion flag to poll? The polling model is more honest but requires all call sites to handle it.

  4. N64 build system: libdragon uses GNU Make, not CMake. Options are: (a) write a CMake toolchain file that wraps n64.mk, (b) maintain a parallel Makefile just for N64, or (c) wait for upstream CMake support. Which is acceptable long-term?

  5. N64 RSP microcode: Standard libdragon microcodes (Fast3D/F3DEX2) or Tiny3D (community microcode with full T&L + skinning)? Writing custom microcode is powerful but limited to ~1000 MIPS SIMD instructions. This decision gates what 3D features the N64 port can support.

  6. PSPGL fate: Drop immediately in favor of native GU, or keep as a fallback (duskgllegacy) while native GU is built? The two can coexist during transition.

  7. Vulkan priority: Design the intent API with Vulkan in mind from the start, or add it later? Vulkan's explicit pipeline state model may conflict with how stateful platforms (Saturn, SNES) expect things to work.

  8. Background planes on modern platforms: Does bgplane_t degrade to a fullscreen textured quad on GL/Vulkan/N64, or should modern platforms support actual background scene rendering (3D world behind the foreground)?

  9. PS1 ordering table depth: The OT is a fixed-size array (e.g. 4096 slots). Depth precision = number of slots. How deep should the engine's default OT be, and should this be configurable per-scene?

  10. Fixed-point strategy: Does float_t transparently become fixed_t on FPU-less platforms (Saturn, PS1, SNES), or do we require explicit fixed_t in math-heavy paths? Transparent is easiest to port; explicit is faster.

  11. SNES V-blank budget: All VRAM writes must finish within ~1.2ms. Does the engine need a V-blank work queue with a budget checker, or is this left to the game to manage manually?

  12. SNES scripting: JerryScript is out. Pure compiled C, or a lighter scripting layer (Lua is ~100 KB -- tight but possible)?

  13. Asset compiler: New standalone tool, or an extension of the existing asset pipeline? Part of the CMake build or a separate pre-build step?


Proposed Sequence (Draft)

Phase 1 -- Intent API (no behavior change)

  1. Design and stabilize renderqueue_t and intent types
  2. Refactor modern GL path to submit through render intents (same output, new plumbing)
  3. Refactor Dolphin path the same way
  4. Validate no regressions on Linux + GameCube

Phase 2 -- UI system

  1. Extract UI rendering from the 3D path into src/dusk/ui/
  2. Implement UI flush for GL and Dolphin
  3. Wire existing UI elements through the new system

Phase 3 -- Platform splits

  1. Split duskgl/ into duskgl/ (modern) and duskgllegacy/ (fixed-func)
  2. Port PSP to native GU (duskpsp/display/ rewrite, drop PSPGL dependency)
  3. Stub duskvulkan/ structure for future implementation

Phase 4 -- Asset pipeline

  1. Design platform-native texture format system
  2. Extend asset compiler for per-platform output
  3. Update texture loader to expect pre-converted data

Phase 5 -- Saturn

  1. CMake toolchain for SH-2 cross-compile (yaul / libyaul toolchain)
  2. src/dusksaturn/ -- input (SMPC), asset (CD-ROM), log, system
  3. VDP1 backend for render queue (quads, polygons, painter's sort)
  4. VDP2 backend for bgplane_t (tile maps, scroll, palette)
  5. Fixed-point math mode (DUSK_MATH_FIXED)
  6. UI backend (VDP2 plane(s))

Phase 6 -- PlayStation 1

  1. CMake toolchain wrapping PSn00bSDK (already CMake-native)
  2. src/duskps1/ -- input (BIOS pad), asset (CD-ROM libpsxcd), log, system
  3. GTE integration for fixed-point math (reuse DUSK_MATH_FIXED path)
  4. Ordering table builder for render queue (painter's sort, DMA linked-list)
  5. GPU packet backend for intents (tris, quads, rects)
  6. UI backend (separate GPU packet chain after world OT)

Phase 7 -- Nintendo 64

  1. CMake toolchain wrapping libdragon (n64.mk wrapper or toolchain file)
  2. src/duskn64/ -- input (N64 controller via PIF), asset (PI DMA / DragonFS), log, system
  3. RSP display list builder for render queue (Z-buffer path, no sorting)
  4. TMEM tile management for textures
  5. RDP rectangle backend for UI
  6. Decide on RSP microcode (Tiny3D vs standard F3DEX2)

Phase 8 -- SNES

  1. SNES toolchain (cc65 or llvm-mos 65816 target)
  2. Static memory pool mode (DUSK_MEMORY_STATIC)
  3. PPU tile pipeline + VRAM management
  4. Mode7 overworld implementation
  5. OAM sprite system
  6. BG layer UI
  7. Scripting-optional build (DUSK_SCRIPTING off)