30 KiB
Display Layer Refactor
Vision
The goal is to remove the implicit assumption that all platforms render through a GL-like API, and replace it with a system where each platform owns its rendering stack completely. The scene describes what to draw in platform-neutral terms; the platform decides how to draw it.
This unlocks:
- Saturn (VDP1/VDP2 command-list, no Z-buffer, affine-only)
- PlayStation 1 (ordering table, affine textures, GTE fixed-point, CMake SDK)
- Nintendo 64 (RSP display list, hardware Z-buffer, perspective-correct, real FPU -- closer to modern GL than to Saturn)
- SNES (PPU tile engine, Mode7 for overworld, no real 3D)
- Vulkan (explicit, modern, no legacy GL baggage)
- Native PSP GU (drop PSPGL which is just a compatibility shim)
- Legacy fixed-function GL as its own standalone target
- A real first-class 2D UI system not bolted onto 3D space
Why
The current abstraction assumes GPU-style rendering
The current display layer was designed around a GL-like mental model:
vertex buffers, shaders, Z-buffered triangle rasterization, and texture
objects. duskgl implements this with real OpenGL. duskdolphin does its
own GX thing but still matches the same interface (mesh, shader, texture,
framebuffer). PSP uses PSPGL -- a library that emulates GL on top of
the PSP's native GE/GU hardware, which is entirely different underneath.
Problems this creates:
PSPGL is a lie. The PSP has a native graphics engine (GE/GU) with its own command list, its own vertex formats, and its own display list model. PSPGL translates GL calls into GU calls, but imperfectly -- and we end up paying the abstraction cost without getting GL correctness. Writing directly to GU gives better performance, access to native formats, and correct behavior on edge cases that PSPGL gets wrong.
Legacy GL should not share code with modern GL. The fixed-function
pipeline (no shaders, matrix stacks via glMatrixMode, glTexEnv) is
meaningfully different from modern GL (VAO/VBO, GLSL, explicit uniform
locations). Treating them as "the same thing with a flag" creates a tangle
of #ifdef DUSK_OPENGL_LEGACY guards throughout the rendering code.
They are separate targets and should be separate platform directories.
Saturn cannot fit the model at all. VDP1 is a command-list processor: you write 32-byte command structs (sprites, quads, lines) into VRAM, then poke a register to trigger execution. There are no vertex buffers, no shaders, no Z-buffer. Depth is pure painter's algorithm -- command order IS the depth. VDP2 composites up to 6 background planes at scanline time; these are tile maps and rotation parameter tables, not meshes. Nothing about the current API maps onto this hardware.
SNES is even further removed. The PPU renders tiles. VRAM holds 8x8 or 16x16 pixel tiles and tile maps; the PPU references these during scanline rendering. There are no draw calls. Mode7 is an affine transform applied to a single background layer (the basis for the overworld map and road perspective effects). Sprites are entries in OAM (Object Attribute Memory). The 65816 CPU writes to memory-mapped registers and VRAM; the PPU does the rest. The concept of "mesh" or "shader" is meaningless here.
Textures loaded as RGBA waste memory and exclude platforms. Loading every texture as 32-bit RGBA and converting at runtime is expensive on memory-constrained platforms (Saturn has ~1 MB total RAM; SNES has 64 KB VRAM) and simply wrong for platforms that have native formats incompatible with RGBA (e.g., PSP's ABGR8888 / BGR5650, Saturn's RGB555 / CI4 / CI8, SNES's 2bpp/4bpp/8bpp indexed). The asset pipeline must compile textures to platform-native formats at build time.
UI in 3D space is wasteful and limiting. Currently UI elements are rendered as geometry projected into screen space, going through the full 3D pipeline. On platforms with dedicated 2D hardware (Saturn VDP2, SNES BG layers), this is actively wrong -- UI should map to a hardware plane, not a 3D draw call. On modern platforms it should be a clean screen-space pass that never touches the 3D depth buffer.
Current Model (Summary)
Scene
-> shaderBind(shader)
-> textureBind(texture)
-> meshDraw(mesh) <-- immediate draw call per object
-> meshDraw(mesh)
-> ...
Platform receives each draw call immediately.
Depth is handled by Z-buffer hardware.
All textures live in GPU memory as RGBA (or Dolphin's tiled RGBA).
UI is rendered as 3D geometry with an orthographic projection.
Key current concepts:
mesh_t-- vertex array (triangles/quads), in GPU VBO (GL) or CPU memory (Dolphin)shader_t-- GLSL program (modern GL), GL fixed-function state (legacy GL), or GX matrix + TEV config (Dolphin)texture_t-- GPU texture handle (GL) or tiled CPU buffer (Dolphin); always RGBA at the engine levelframebuffer_t-- FBO (GL) or fixed hardware XFB (Dolphin)spritebatch_t-- accumulates 2D quads and flushes in batches of 32; the only existing deferred-submission system in the engine
The spritebatch hints at the right model. Everything needs to work this way.
The Core Shift: Platform-Native Rendering
Before
src/dusk/ Core engine + GL-like rendering API definition
src/duskgl/ OpenGL implementation
src/dusksdl2/ SDL2 window/input (shared)
src/duskpsp/ PSP via PSPGL (shim over GU)
src/duskvita/ Vita via GL ES (similar path to duskgl)
src/duskdolphin/ GameCube/Wii via GX (already custom)
src/dusklinux/ Linux (uses dusksdl2 + duskgl)
After
src/dusk/ Core engine logic + render intent API ONLY
src/dusksdl2/ SDL2 window/input (unchanged)
src/duskgl/ Modern OpenGL (Linux, Vita modern path)
src/duskgllegacy/ Fixed-function OpenGL (older hardware, PSP with PSPGL
as a last resort)
src/duskvulkan/ Vulkan (Linux modern, future)
src/duskpsp/ PSP native GU (no PSPGL, direct command lists)
src/duskvita/ Vita native GXM (TBD)
src/duskdolphin/ GameCube/Wii GX (already custom, mostly kept)
src/dusksaturn/ Saturn VDP1/VDP2 (new)
src/duskps1/ PlayStation 1 ordering table + GTE (new)
src/duskn64/ Nintendo 64 RSP/RDP display list (new)
src/dusksnes/ SNES PPU/Mode7 (new, extremely constrained)
src/dusk/ no longer knows about meshes, shaders, or framebuffers.
It defines the render intent system: what the scene wants to draw.
Each platform directory is entirely self-contained and responsible for
translating intents to its native API.
Render Intent System (new)
Instead of the scene calling meshDraw() or shaderBind(), it submits
render intents into a renderqueue_t. An intent describes what should
appear on screen without prescribing how to draw it.
Primitive intents (3D world)
RENDER_INTENT_QUAD -- textured quad, 4 vertices or transform + size
RENDER_INTENT_POLYGON -- filled polygon (convex, up to N vertices)
RENDER_INTENT_LINE -- line segment or polyline
RENDER_INTENT_SPRITE -- 2D billboard (always faces camera)
RENDER_INTENT_MESH -- arbitrary vertex array (GL/GX only; degraded
on command-list platforms)
Each intent carries: texture reference, color/tint, depth hint (for painter's algorithm sorting), blend mode, and cull flags.
Background plane intents (2D layers)
RENDER_INTENT_BGPLANE -- configure a background/tilemap layer
Carries: layer index, tile map data reference, scroll offset, palette, and transform (for Mode7-style affine).
UI intents (screen space)
RENDER_INTENT_UI_RECT -- solid colored rectangle
RENDER_INTENT_UI_SPRITE -- textured rectangle (UI image)
RENDER_INTENT_UI_TEXT -- text string at screen position
UI intents are always screen-space. They are never mixed into the 3D world queue. See UI System section below.
Platform translation
| Intent | Modern GL | PSP GU | Saturn VDP1 | PS1 OT | N64 RSP | SNES PPU |
|---|---|---|---|---|---|---|
| QUAD | VAO + glDraw | GU display list | distorted-sprite cmd | GPU quad packet | RSP display list | OAM + BG tile |
| POLYGON | VAO + glDraw | GU display list | polygon cmd | GPU poly packet | RSP display list | OAM |
| BGPLANE | fullscreen quad | fullscreen quad | VDP2 config | fullscreen quad | fullscreen quad | BG layer config |
| UI_SPRITE | 2D ortho quad | 2D GU quad | VDP2 BG plane | GPU rect packet | RDP rectangle | BG layer tile |
| MESH | VAO/VBO | GU buffers | (degrade: quads) | (degrade: tris/quads) | RSP display list | (not supported) |
Note: N64 supports both triangles and axis-aligned rectangles natively via RDP. PS1 supports triangles and quads (4-vertex) natively, so neither needs the dead-vertex trick that Saturn requires.
Asset Pipeline: Platform-Native Formats
The problem
All textures currently enter the engine as RGBA and are converted at runtime by each platform (Dolphin retiles to 4x4 blocks; GL uploads as-is). This wastes memory and CPU time, and is impossible for platforms where RGBA is not a valid intermediate format at all.
The solution
The asset compiler (offline, run at build time) produces platform-specific binary bundles. A texture asset has one source (PNG or similar) but N compiled outputs, one per target.
Texture formats by platform
| Platform | Native Formats | Notes |
|---|---|---|
| Modern GL | RGBA8, RGB8, BC1-BC7 (compressed) | Upload directly, GPU handles |
| Legacy GL | RGBA8, RGB8, CI8 (palette via extension) | No compressed formats |
| Vulkan | VkFormat variants (RGBA8, BC, ASTC) | Chosen at compile time |
| PSP GU | ABGR8888, BGR5650, ABGR1555, ABGR4444, CI4, CI8 | Native swizzled format |
| Saturn VDP1/VDP2 | RGB555, CI4, CI8 (15-bit palette in CRAM) | Big-endian, packed |
| PlayStation 1 | RGB555 / CI4 / CI8 (CLUT in VRAM) | Little-endian; VRAM flat; CLUT at coord |
| Nintendo 64 | RGBA16, RGBA32, IA4-IA16, I4-I8, CI4, CI8 | 4 KB TMEM; tiles must fit in TMEM banks |
| GameCube/Wii GX | I4, I8, IA4, IA8, RGB565, RGB5A3, RGBA8, CMPR | 4x4 tiled, big-endian |
| SNES PPU | 2bpp, 4bpp, 8bpp indexed (CGRAM palette) | Tile-packed, no direct access |
Asset bundle structure
The .dsk bundle gains a platform tag. The loader picks the right section
at runtime (or the build produces a single-platform bundle for constrained
targets like SNES/Saturn where there is no spare storage for unused data).
UI System (first-class)
Current problem
UI elements go through the 3D pipeline: they are meshes with an orthographic shader, rendered in the same pass as the world. This means:
- UI competes for Z-buffer depth with world geometry
- On Saturn/SNES, UI cannot use dedicated hardware planes
- Text rendering is tied to the sprite batch which is tied to the 3D pass
- No separation between "draw the world" and "draw the HUD"
New model
UI is a completely separate rendering context. The world renders first, then the UI renders on top. They share no state.
UI coordinates are always in screen space (pixels or a logical resolution that the platform scales to its native display size). No camera matrix, no projection, no depth buffer involvement.
Platform mapping
| Platform | UI implementation |
|---|---|
| Modern GL | Separate 2D ortho pass, screen-space quads, no depth test |
| Legacy GL | Same, using fixed-function |
| PSP GU | Separate GU display list, 2D mode |
| Saturn | VDP2 background plane(s) dedicated to UI |
| PlayStation 1 | Separate GPU packet chain, no Z; ordered after world OT |
| Nintendo 64 | RDP rectangle commands in a separate display list segment |
| GameCube/Wii | GX 2D mode or dedicated GX pass |
| SNES | Dedicated BG layer(s) for HUD tiles |
On Saturn, the UI occupying VDP2 planes is a genuine hardware win -- the PPU composites it for free at scanline time, costing zero VDP1 commands. On SNES, the HUD must live in a BG layer because there is no alternative.
UI API (proposed)
uiBegin();
uiDrawRect(x, y, w, h, color);
uiDrawSprite(x, y, w, h, texture, uvMin, uvMax);
uiDrawText(x, y, font, string);
uiEnd(); // platform flushes UI to hardware
The uiBegin/uiEnd block collects intents; the platform submits them
at frame end in whatever way is appropriate.
SNES / Mode7
SNES is the most constrained platform the engine will ever support and needs its own section because it breaks assumptions that even Saturn keeps.
Hardware
- CPU: 65816 @ ~3.58 MHz (16-bit, no FPU, no cache)
- PPU: Tile-based scanline renderer. VRAM holds tile graphics and tile maps. BG layers reference tiles by index.
- Mode7: A single BG layer with a 2D affine matrix applied per scanline. Used for overworld maps, road perspective (F-Zero), rotation effects. The matrix is set via HDMA (scanline DMA) for per-scanline variation, enabling horizon-perspective effects.
- Sprites/OAM: Up to 128 sprites (8x8, 16x16, 32x32, 64x64 pixels), 4bpp indexed, up to 8 per scanline.
- Palette: CGRAM holds 256 entries of 15-bit RGB (512 bytes total). BG layers use sub-palettes of 4/16/256 colors depending on bit depth.
- VRAM: 64 KB (tiles + tile maps)
- WRAM: 128 KB work RAM + usually 8 KB SRAM on cart for saves
- No frame buffer. The PPU renders scanlines directly. You cannot read back what was drawn.
- No general-purpose draw calls. You configure registers and VRAM before the frame and the PPU does the rest.
What "3D" means on SNES
True 3D is not possible. What can be approximated:
- Overworld map: Mode7 with a flat texture and HDMA scroll gives a top-down perspective with a horizon line (the classic JRPG overworld).
- Depth illusion: Mode7 matrix manipulation can simulate a moving camera over flat terrain. Objects are sprites placed at screen positions calculated by software perspective projection.
- Sprite scaling: Software-scaled sprites using pre-rendered frames or the RSP-style tricks used in Super FX games (Star Fox). Super FX is a co-processor on the cartridge -- base SNES cannot do this.
- Basic 3D effects: Some games use HDMA color gradient + Mode7 floor with overlaid sprites to create a pseudo-3D look.
The engine plan for SNES: Mode7 overworld (confirmed), sprite-based world objects, BG layer UI. "Basic 3D effects" (pseudo-perspective with sprites) is aspirational -- implementation complexity TBD.
SNES constraints on the engine
- No dynamic allocation. With 128 KB WRAM, a general-purpose allocator is risky. The engine memory system may need a static pool mode for SNES.
- No floating point.
float_tmust resolve to integer or fixed-point. - No scripting (JerryScript). The JS engine requires far more than 128 KB RAM. SNES scenes must be compiled C.
- Asset data in ROM, not a .dsk bundle. SNES loads from cartridge ROM mapped into the address space. The asset system needs a ROM-mapped loader.
- Tile pipeline. Textures must be pre-converted to SNES tile format (2bpp/4bpp/8bpp, 8x8 pixel tiles, CGRAM palette) at build time. This is a completely different asset output from every other platform.
Platform Inventory
A summary of what each platform's native rendering looks like after the refactor, for reference when designing the intent API.
Modern OpenGL (duskgl)
VAO + VBO mesh storage, GLSL shaders, FBO render targets, Z-buffer. No fixed-function. Targets: Linux, possibly Vita (GXM is preferred).
Legacy OpenGL (duskgllegacy)
Fixed-function pipeline: glMatrixMode, glTexEnv, client-side vertex
arrays. No VAO/VBO. Used for: very old desktop hardware, maybe PSP as
last resort (PSPGL is this). Targets: legacy desktop, embedded Linux.
Vulkan (duskvulkan)
Explicit pipeline state objects, render passes, descriptor sets, command buffers. Highest ceiling for performance and control. Targets: Linux (modern), future platforms. Not immediate priority but the architecture should not block it.
PSP native GU (duskpsp)
The GE/GU is a display-list GPU. You build a command list in memory and the GU DMA engine processes it asynchronously. Native vertex formats are PSP-specific (ABGR byte order, swizzled textures for cache efficiency). No PSPGL. Targets: PSP hardware and emulators.
Vita (duskvita)
GXM is Sony's Vita GPU API -- closer to modern GL than GU, with explicit shader binaries (.gxp), ring buffers, and GPU sync primitives.
GameCube/Wii GX (duskdolphin)
Already a custom renderer. GX uses immediate-mode vertex submission
(GX_Begin / GX_Position1x16 loops), TEV for texture compositing, and
hardware XFB double-buffering. Big-endian. Mostly kept as-is; may benefit
from being expressed in terms of render intents for consistency.
Saturn VDP1/VDP2 (dusksaturn)
VDP1: command-list (32-byte structs), quad-based, affine texture mapping, no Z-buffer (painter's algorithm). VDP2: up to 6 background planes composited at scanline time. Big-endian dual SH-2, no FPU. Fixed-point math required throughout.
PlayStation 1 (duskps1)
MIPS R3000A @ 33.87 MHz, little-endian, no FPU. GTE (coprocessor 2) handles fixed-point matrix math, perspective divide, and lighting. GPU receives packets via DMA linked-list (the Ordering Table). Primitives: triangles and quads natively (no dead-vertex needed). Texture mapping: affine, same limitation as Saturn. No Z-buffer; depth is OT slot order. VRAM is 1 MB flat (frame buffers + textures + CLUTs share it). SDK: PSn00bSDK, which is CMake-native -- a direct fit for the dusk build system.
Nintendo 64 (duskn64)
VR4300 @ 93.75 MHz, big-endian, real IEEE 754 FPU. Rendering is split between the RSP (geometry: programmable MIPS SIMD, runs microcode up to ~1000 instructions in 4 KB IMEM) and the RDP (rasterization: fixed hardware). RSP produces triangle commands from a CPU-built display list in RDRAM. RDP features: perspective-correct texture mapping, bilinear filtering, hardware Z-buffer. Primitives: triangles and axis-aligned rects. TMEM is 4 KB on-chip texture cache; textures must be loaded into tiles before drawing -- a significant memory management constraint. SDK: libdragon (Unlicense, GCC 14, Makefile-based -- not CMake; this requires a wrapper toolchain file for dusk's build system).
SNES PPU/Mode7 (dusksnes)
Tile-based. VRAM holds tiles and tile maps. Mode7 provides affine transform for one BG layer. Sprites via OAM. No frame buffer. All configuration is memory-mapped registers. 65816 CPU, no FPU, extremely limited RAM.
Threading Model
Current model
The engine uses OS threads for async asset loading (assetXxxLoaderAsync).
Platforms that have pthreads or an equivalent RTOS (Linux, PSP, Vita) run
worker threads that load data in the background while the game loop runs.
The main thread polls or blocks on completion.
The problem
Several target platforms have no OS threading whatsoever, and others have hardware-specific async mechanisms that are nothing like pthreads.
Per-platform reality
| Platform | Threading | Async mechanism |
|---|---|---|
| Linux | pthreads | Worker threads (current) |
| Vita | SceKernelThread | Per-SDK threads |
| PSP | SceKernelThread | Per-SDK threads |
| GameCube/Wii | libogc LWP | Lightweight processes |
| Saturn | None (OS) | Slave SH-2 for fixed jobs; CD-ROM via interrupt/callback |
| PlayStation 1 | None (OS) | V-blank ISR, 7 DMA channels, CD-ROM callbacks |
| Nintendo 64 | libdragon preview only | PI DMA for cartridge; RSP for parallel compute |
| SNES | None | DMA (GPDMA/HDMA); NMI V-blank; SPC700 audio is a separate CPU |
Saturn slave SH-2: The second SH-2 is not a general-purpose thread. It runs a fixed subroutine you hand-load. The typical use is offloading heavy per-frame computation (geometry transforms, depth sort) while the master SH-2 handles game logic. Communication is via shared WRAM with cache-through addresses to avoid coherency bugs. There is no scheduler and no yield -- it runs to completion.
SNES DMA: GPDMA copies blocks of data (ROM to WRAM, WRAM to VRAM) and halts the CPU for the duration -- it is synchronous from the game's perspective. HDMA runs per-scanline during H-blank, writing to PPU registers without CPU involvement; this is how Mode7 perspective is achieved. Neither is "async" in the programming sense.
SNES NMI: The V-blank NMI fires at the start of every V-blank period. This is the only safe window to write to VRAM and PPU registers. All critical PPU updates must complete within ~1.2ms (the V-blank window).
Proposed model
Introduce a compile-time threading capability flag:
DUSK_THREAD_PTHREAD -- Linux, maybe Vita
DUSK_THREAD_SCEKERNEL -- PSP, Vita SDK
DUSK_THREAD_LWP -- GameCube/Wii libogc
DUSK_THREAD_SLAVE_SH2 -- Saturn slave CPU (job dispatch only)
DUSK_THREAD_NONE -- SNES (and Saturn master thread view)
The asset loader's async path is gated on having a threading capability.
When DUSK_THREAD_NONE is defined, assetXxxLoaderAsync either does not
exist or is an alias for the synchronous version. On Saturn, the slave SH-2
is exposed as a distinct API (sh2JobDispatch, sh2JobWait) used only for
compute-heavy work, not for I/O.
Asset loading without threads
Saturn: CD-ROM access is initiated via SBL/CDC routines and completes via interrupt callback. The engine's asset loading loop can poll the callback flag in the main loop rather than blocking a thread. This is interrupt-driven cooperative async, not preemptive.
SNES: There is no loading. Assets live in ROM, mapped directly into the 65816 address space. "Loading a texture" means computing a pointer into ROM and copying the tile data to VRAM during V-blank via GPDMA. The asset system on SNES is essentially a VRAM/CGRAM allocator and a DMA scheduler, not a file loader.
Asset system changes
The asset pipeline needs to accommodate three loading models:
- File-based (Linux, PSP, Vita, Saturn CD): open file, read bytes, close. Can be sync or thread-async.
- DMA/interrupt (Saturn CD-ROM, GC DVD): initiate transfer, poll or callback on completion, no thread blocked.
- ROM-mapped (SNES): data is already in the address space; "loading" is a VRAM DMA copy scheduled for V-blank, not file I/O.
The assetstream_t abstraction that currently wraps file I/O needs a third
backend for ROM-mapped data, and the async path needs to support
callback-based completion as an alternative to thread-based blocking.
What Needs to Change
1. Render intent API (new, in src/dusk/)
Replace mesh_t / shader_t / meshDraw() as scene-facing APIs with
renderqueue_t and intent submission functions. src/dusk/ defines the
intent types and submission API; platforms implement the flush.
2. Platform renderer directories
Move rendering implementations out of duskgl/ as a shared layer and
into fully self-contained platform directories. duskgl/ becomes the
modern GL platform only. Add duskgllegacy/, duskvulkan/ as peers.
3. Asset pipeline: platform-native texture formats
The offline asset compiler must produce per-platform texture bundles in
native formats. The runtime texture loader expects pre-converted data,
not RGBA. textureformat_t grows to cover all platform formats but each
platform only ever sees the formats it natively supports.
4. UI system (first-class, separate from 3D)
New src/dusk/ui/ subsystem with uiBegin / uiEnd and intent types
for rects, sprites, and text. Platforms implement the flush independently.
The 3D spritebatch is retired or scoped to world-space billboards only.
5. Fixed-point / no-FPU math
float_t needs a fixed-point mode. Proposed: define fixed_t as a
16.16 signed integer; define DUSK_MATH_FIXED for platforms that require
it (Saturn, SNES). Engine math utilities (mathSin, mathCos, etc.)
have fixed-point implementations selected by this flag. float_t on
FPU-less platforms becomes a typedef for fixed_t.
6. Background plane abstraction (bgplane_t)
New concept in src/dusk/display/bgplane/. A BG plane has a tile map or
bitmap source, scroll offsets, a palette reference, and optional affine
parameters (for Mode7-style use). On GL platforms: rendered as a
fullscreen textured quad or shader pass. On Saturn: VDP2 config. On SNES:
PPU BG layer config.
7. Memory system: static pool mode
For SNES (and possibly Saturn), the general-purpose allocator may be
unviable. A compile-time static pool mode (DUSK_MEMORY_STATIC) that uses
a fixed-size arena instead of dynamic allocation. All memoryAllocate
calls hit the pool; memoryFree is a no-op or a stack pop.
8. Script runtime: optional
JerryScript requires too much RAM for SNES and is marginal on Saturn.
The scripting system should be compile-time optional (DUSK_SCRIPTING),
not assumed present. SNES/Saturn scenes would be compiled C.
What to Keep
- Platform macro abstraction pattern (
displayplatform.h, etc.) -- works, no reason to change. - Directory structure convention for platform directories.
- Entity-component system -- platform-agnostic, unaffected.
- Asset loading +
.dskbundle concept (extended for platform formats). - The broad subsystem layout: asset, input, display, log, network, save, system, time.
Open Questions
-
Render intent granularity: How much does the intent API need to express? A MESH intent works on GL/N64 but degrades poorly on Saturn (must split into quads) and is impossible on SNES. Should MESH be a valid intent with a "best effort" contract, or excluded from the portable API entirely?
-
Threading abstraction depth: Should
DUSK_THREAD_SLAVE_SH2be a first-class concept in the engine's job system, or a Saturn-internal implementation detail the core never sees? Same question applies to N64's RSP as a compute co-processor. -
Asset loading async contract: When a platform has no threads, should
assetLoadAsyncbe a no-op alias forassetLoadSync, or return immediately with a completion flag to poll? The polling model is more honest but requires all call sites to handle it. -
N64 build system: libdragon uses GNU Make, not CMake. Options are: (a) write a CMake toolchain file that wraps n64.mk, (b) maintain a parallel Makefile just for N64, or (c) wait for upstream CMake support. Which is acceptable long-term?
-
N64 RSP microcode: Standard libdragon microcodes (Fast3D/F3DEX2) or Tiny3D (community microcode with full T&L + skinning)? Writing custom microcode is powerful but limited to ~1000 MIPS SIMD instructions. This decision gates what 3D features the N64 port can support.
-
PSPGL fate: Drop immediately in favor of native GU, or keep as a fallback (
duskgllegacy) while native GU is built? The two can coexist during transition. -
Vulkan priority: Design the intent API with Vulkan in mind from the start, or add it later? Vulkan's explicit pipeline state model may conflict with how stateful platforms (Saturn, SNES) expect things to work.
-
Background planes on modern platforms: Does
bgplane_tdegrade to a fullscreen textured quad on GL/Vulkan/N64, or should modern platforms support actual background scene rendering (3D world behind the foreground)? -
PS1 ordering table depth: The OT is a fixed-size array (e.g. 4096 slots). Depth precision = number of slots. How deep should the engine's default OT be, and should this be configurable per-scene?
-
Fixed-point strategy: Does
float_ttransparently becomefixed_ton FPU-less platforms (Saturn, PS1, SNES), or do we require explicitfixed_tin math-heavy paths? Transparent is easiest to port; explicit is faster. -
SNES V-blank budget: All VRAM writes must finish within ~1.2ms. Does the engine need a V-blank work queue with a budget checker, or is this left to the game to manage manually?
-
SNES scripting: JerryScript is out. Pure compiled C, or a lighter scripting layer (Lua is ~100 KB -- tight but possible)?
-
Asset compiler: New standalone tool, or an extension of the existing asset pipeline? Part of the CMake build or a separate pre-build step?
Proposed Sequence (Draft)
Phase 1 -- Intent API (no behavior change)
- Design and stabilize
renderqueue_tand intent types - Refactor modern GL path to submit through render intents (same output, new plumbing)
- Refactor Dolphin path the same way
- Validate no regressions on Linux + GameCube
Phase 2 -- UI system
- Extract UI rendering from the 3D path into
src/dusk/ui/ - Implement UI flush for GL and Dolphin
- Wire existing UI elements through the new system
Phase 3 -- Platform splits
- Split
duskgl/intoduskgl/(modern) andduskgllegacy/(fixed-func) - Port PSP to native GU (
duskpsp/display/rewrite, drop PSPGL dependency) - Stub
duskvulkan/structure for future implementation
Phase 4 -- Asset pipeline
- Design platform-native texture format system
- Extend asset compiler for per-platform output
- Update texture loader to expect pre-converted data
Phase 5 -- Saturn
- CMake toolchain for SH-2 cross-compile (yaul / libyaul toolchain)
src/dusksaturn/-- input (SMPC), asset (CD-ROM), log, system- VDP1 backend for render queue (quads, polygons, painter's sort)
- VDP2 backend for bgplane_t (tile maps, scroll, palette)
- Fixed-point math mode (
DUSK_MATH_FIXED) - UI backend (VDP2 plane(s))
Phase 6 -- PlayStation 1
- CMake toolchain wrapping PSn00bSDK (already CMake-native)
src/duskps1/-- input (BIOS pad), asset (CD-ROM libpsxcd), log, system- GTE integration for fixed-point math (reuse
DUSK_MATH_FIXEDpath) - Ordering table builder for render queue (painter's sort, DMA linked-list)
- GPU packet backend for intents (tris, quads, rects)
- UI backend (separate GPU packet chain after world OT)
Phase 7 -- Nintendo 64
- CMake toolchain wrapping libdragon (n64.mk wrapper or toolchain file)
src/duskn64/-- input (N64 controller via PIF), asset (PI DMA / DragonFS), log, system- RSP display list builder for render queue (Z-buffer path, no sorting)
- TMEM tile management for textures
- RDP rectangle backend for UI
- Decide on RSP microcode (Tiny3D vs standard F3DEX2)
Phase 8 -- SNES
- SNES toolchain (cc65 or llvm-mos 65816 target)
- Static memory pool mode (
DUSK_MEMORY_STATIC) - PPU tile pipeline + VRAM management
- Mode7 overworld implementation
- OAM sprite system
- BG layer UI
- Scripting-optional build (
DUSK_SCRIPTINGoff)