This commit is contained in:
2026-06-18 20:25:54 -05:00
parent 730a5b2b10
commit 57b2cdb9d1
111 changed files with 865 additions and 3328 deletions
+92
View File
@@ -0,0 +1,92 @@
# Display Refactor Progress
## Immediate Goal
Render a 32x32 white square through the new render opcode stack on Linux.
## Architecture (summary)
See `.claude/display-refactor.md` for the full design.
- `src/dusk/render/` -- opcode format + buffer + submission API (the *contract*).
- Platform backends (e.g. `src/duskgl/`) consume the buffer and translate to native API calls.
- `src/dusk/display/` -- orchestration shell only: `displayInit`, `displayUpdate`, `displayDispose`.
- Scenes call `renderSprite(...)`, `renderClear(...)`. The backend executes the intent.
## Opcode format (32 bytes)
Every command starts with a 4-byte `ropheader_t` (opcode, flags, depth). Two commands defined:
- `ROP_CLEAR` (32 bytes) -- clear with a color.
- `ROP_DRAW_SPRITE` (32 bytes) -- screen-space int16 x/y/w/h + tint color.
## Milestone 1 -- Archive + strip existing display deps ✓
- [x] Old `src/dusk/display/` archived (now deleted from working tree via git).
- [x] Old `src/duskgl/display/` removed (new GL renderer replaces it).
- [x] `engine.c` stripped to minimal subsystems, set to `SCENE_TYPE_TEST`.
- [x] `scene.c` stripped of old display/shader/screen references.
- [x] `console.c` stripped of display deps.
- [x] `ui/CMakeLists.txt` gutted (re-implementation deferred).
- [x] `asset/loader/CMakeLists.txt` -- display loaders disabled.
- [x] `asset/loader/assetloader.h` -- display loader types removed.
- [x] `rpg/overworld/chunk.h` -- mesh_t / meshvertex_t removed.
- [x] `rpg/overworld/map.c` -- mesh/spritebatch calls removed.
- [x] `scene/overworld/sceneoverworld.c` -- stubbed to empty callbacks.
- [x] Test suite display tests disabled.
## Milestone 2 -- Render opcode system ✓
- [x] `src/dusk/render/rop.h` -- `ropheader_t`, `ropclear_t`, `ropsprite_t`.
- [x] `src/dusk/render/ropbuffer.h/.c` -- `ROPBUFFER` global, reset, alloc.
- [x] `src/dusk/render/render.h/.c` -- `renderClear()`, `renderSprite()`.
- [x] `src/dusk/render/CMakeLists.txt`.
## Milestone 3 -- New minimal display shell ✓
- [x] `src/dusk/display/display.h/.c` -- init/update/dispose, calls platform hooks.
- [x] `src/dusk/display/displaystate.h` -- cull/depth/blend flags.
- [x] `src/dusk/display/color.csv` + `CMakeLists.txt` -- color generation kept.
## Milestone 4 -- GL backend ✓
- [x] `src/duskgl/render/rendergl.h/.c`:
- GL 3.3 core shader (ortho projection, solid color, no texture yet).
- `renderGLInit` -- creates VAO/VBO/shader.
- `renderGLFlush(buf, w, h)` -- walks ROPBUFFER, GL calls per opcode.
- `ROP_CLEAR``glClearColor` + `glClear`.
- `ROP_DRAW_SPRITE` → 6-vertex quad, `glDrawArrays`.
- [x] `src/duskgl/error/errorgl.h/.c` -- `errorGLCheck`.
- [x] `src/duskgl/CMakeLists.txt`.
- [x] `src/dusksdl2/display/displaysdl2.h/.c` updated:
- `displaySDL2Init` -- SDL2 window + GL 3.3 context + `renderGLInit`.
- `displaySDL2Flush(ropbuffer_t *)` -- MakeCurrent + `renderGLFlush`.
- `displaySDL2Swap` -- SDL_GL_SwapWindow.
- [x] `src/dusklinux/display/displayplatform.h` updated with new macros.
## Milestone 5 -- Test scene ✓
- [x] `SCENE_TYPE_TEST` added to `scenetype.h/.c`.
- [x] `src/dusk/scene/test/scenetest.h/.c`:
- `renderClear(color(32, 32, 48, 255))` -- dark blue-grey background.
- `renderSprite(100, 100, 32, 32, COLOR_WHITE)` -- 32x32 white square.
- [x] `engine.c` starts with `SCENE_TYPE_TEST`.
## Milestone 6 -- Verified ✓
- [x] Build succeeds with no errors (2026-06-18).
- [x] Engine initializes: SDL window + GL context + shader + test scene.
- [x] No crashes running for 5+ seconds.
- [ ] 32x32 white square visually confirmed on screen.
---
## Status: BUILD PASSING -- awaiting visual confirmation
---
## Decisions log
**2026-06-18** -- `color_t = color4b_t` (from generated `display/color.h`). The color generation pipeline (color.csv + Python tool) is kept in the new minimal `src/dusk/display/CMakeLists.txt`.
**2026-06-18** -- `ROP_SIZE = 32`. All opcodes fixed 32 bytes. 3D quads will be 64 bytes when added later.
**2026-06-18** -- Depth sort deferred. Buffer stores unsorted commands; painter platforms sort on flush. GL uses Z-buffer.
**2026-06-18** -- Texture system not yet wired into the opcode pipeline. `ROP_DRAW_SPRITE` with `texture=0` uses solid tint color only (no sampler). Texture handle system comes next.
**2026-06-18** -- GL backend uses GL 3.3 Core profile. Shader takes screen-space pixel coordinates and converts to clip space using window size queried from SDL each frame.
**2026-06-18** -- `ROPBUFFER` is a global (4096 slots × 32 bytes = 128 KB). Reset at start of each frame in `displayUpdate`.
**2026-06-18** -- `ui/`, `rpg/overworld` display code, asset display loaders all temporarily stubbed/disabled. Will be rewritten against the new render API.
-714
View File
@@ -1,714 +0,0 @@
# Display Layer Refactor
## Vision
The goal is to remove the implicit assumption that all platforms render
through a GL-like API, and replace it with a system where each platform
owns its rendering stack completely. The scene describes *what* to draw
in platform-neutral terms; the platform decides *how* to draw it.
This unlocks:
- Saturn (VDP1/VDP2 command-list, no Z-buffer, affine-only)
- PlayStation 1 (ordering table, affine textures, GTE fixed-point, CMake SDK)
- Nintendo 64 (RSP display list, hardware Z-buffer, perspective-correct,
real FPU -- closer to modern GL than to Saturn)
- SNES (PPU tile engine, Mode7 for overworld, no real 3D)
- Vulkan (explicit, modern, no legacy GL baggage)
- Native PSP GU (drop PSPGL which is just a compatibility shim)
- Legacy fixed-function GL as its own standalone target
- A real first-class 2D UI system not bolted onto 3D space
---
## Why
### The current abstraction assumes GPU-style rendering
The current display layer was designed around a GL-like mental model:
vertex buffers, shaders, Z-buffered triangle rasterization, and texture
objects. `duskgl` implements this with real OpenGL. `duskdolphin` does its
own GX thing but still matches the same interface (mesh, shader, texture,
framebuffer). PSP uses PSPGL -- a library that *emulates* GL on top of
the PSP's native GE/GU hardware, which is entirely different underneath.
Problems this creates:
**PSPGL is a lie.** The PSP has a native graphics engine (GE/GU) with its
own command list, its own vertex formats, and its own display list model.
PSPGL translates GL calls into GU calls, but imperfectly -- and we end up
paying the abstraction cost without getting GL correctness. Writing directly
to GU gives better performance, access to native formats, and correct
behavior on edge cases that PSPGL gets wrong.
**Legacy GL should not share code with modern GL.** The fixed-function
pipeline (no shaders, matrix stacks via glMatrixMode, glTexEnv) is
meaningfully different from modern GL (VAO/VBO, GLSL, explicit uniform
locations). Treating them as "the same thing with a flag" creates a tangle
of `#ifdef DUSK_OPENGL_LEGACY` guards throughout the rendering code.
They are separate targets and should be separate platform directories.
**Saturn cannot fit the model at all.** VDP1 is a command-list processor:
you write 32-byte command structs (sprites, quads, lines) into VRAM, then
poke a register to trigger execution. There are no vertex buffers, no
shaders, no Z-buffer. Depth is pure painter's algorithm -- command order
IS the depth. VDP2 composites up to 6 background planes at scanline time;
these are tile maps and rotation parameter tables, not meshes. Nothing
about the current API maps onto this hardware.
**SNES is even further removed.** The PPU renders tiles. VRAM holds 8x8
or 16x16 pixel tiles and tile maps; the PPU references these during
scanline rendering. There are no draw calls. Mode7 is an affine transform
applied to a single background layer (the basis for the overworld map and
road perspective effects). Sprites are entries in OAM (Object Attribute
Memory). The 65816 CPU writes to memory-mapped registers and VRAM; the
PPU does the rest. The concept of "mesh" or "shader" is meaningless here.
**Textures loaded as RGBA waste memory and exclude platforms.** Loading
every texture as 32-bit RGBA and converting at runtime is expensive on
memory-constrained platforms (Saturn has ~1 MB total RAM; SNES has 64 KB
VRAM) and simply wrong for platforms that have native formats incompatible
with RGBA (e.g., PSP's ABGR8888 / BGR5650, Saturn's RGB555 / CI4 / CI8,
SNES's 2bpp/4bpp/8bpp indexed). The asset pipeline must compile textures
to platform-native formats at build time.
**UI in 3D space is wasteful and limiting.** Currently UI elements are
rendered as geometry projected into screen space, going through the full
3D pipeline. On platforms with dedicated 2D hardware (Saturn VDP2,
SNES BG layers), this is actively wrong -- UI should map to a hardware
plane, not a 3D draw call. On modern platforms it should be a clean
screen-space pass that never touches the 3D depth buffer.
---
## Current Model (Summary)
```
Scene
-> shaderBind(shader)
-> textureBind(texture)
-> meshDraw(mesh) <-- immediate draw call per object
-> meshDraw(mesh)
-> ...
Platform receives each draw call immediately.
Depth is handled by Z-buffer hardware.
All textures live in GPU memory as RGBA (or Dolphin's tiled RGBA).
UI is rendered as 3D geometry with an orthographic projection.
```
Key current concepts:
- `mesh_t` -- vertex array (triangles/quads), in GPU VBO (GL) or CPU
memory (Dolphin)
- `shader_t` -- GLSL program (modern GL), GL fixed-function state
(legacy GL), or GX matrix + TEV config (Dolphin)
- `texture_t` -- GPU texture handle (GL) or tiled CPU buffer (Dolphin);
always RGBA at the engine level
- `framebuffer_t` -- FBO (GL) or fixed hardware XFB (Dolphin)
- `spritebatch_t` -- accumulates 2D quads and flushes in batches of 32;
the only existing deferred-submission system in the engine
The spritebatch hints at the right model. Everything needs to work this way.
---
## The Core Shift: Platform-Native Rendering
### Before
```
src/dusk/ Core engine + GL-like rendering API definition
src/duskgl/ OpenGL implementation
src/dusksdl2/ SDL2 window/input (shared)
src/duskpsp/ PSP via PSPGL (shim over GU)
src/duskvita/ Vita via GL ES (similar path to duskgl)
src/duskdolphin/ GameCube/Wii via GX (already custom)
src/dusklinux/ Linux (uses dusksdl2 + duskgl)
```
### After
```
src/dusk/ Core engine logic + render intent API ONLY
src/dusksdl2/ SDL2 window/input (unchanged)
src/duskgl/ Modern OpenGL (Linux, Vita modern path)
src/duskgllegacy/ Fixed-function OpenGL (older hardware, PSP with PSPGL
as a last resort)
src/duskvulkan/ Vulkan (Linux modern, future)
src/duskpsp/ PSP native GU (no PSPGL, direct command lists)
src/duskvita/ Vita native GXM (TBD)
src/duskdolphin/ GameCube/Wii GX (already custom, mostly kept)
src/dusksaturn/ Saturn VDP1/VDP2 (new)
src/duskps1/ PlayStation 1 ordering table + GTE (new)
src/duskn64/ Nintendo 64 RSP/RDP display list (new)
src/dusksnes/ SNES PPU/Mode7 (new, extremely constrained)
```
`src/dusk/` no longer knows about meshes, shaders, or framebuffers.
It defines the *render intent* system: what the scene wants to draw.
Each platform directory is entirely self-contained and responsible for
translating intents to its native API.
---
## Render Intent System (new)
Instead of the scene calling `meshDraw()` or `shaderBind()`, it submits
render intents into a `renderqueue_t`. An intent describes what should
appear on screen without prescribing how to draw it.
### Primitive intents (3D world)
```
RENDER_INTENT_QUAD -- textured quad, 4 vertices or transform + size
RENDER_INTENT_POLYGON -- filled polygon (convex, up to N vertices)
RENDER_INTENT_LINE -- line segment or polyline
RENDER_INTENT_SPRITE -- 2D billboard (always faces camera)
RENDER_INTENT_MESH -- arbitrary vertex array (GL/GX only; degraded
on command-list platforms)
```
Each intent carries: texture reference, color/tint, depth hint (for
painter's algorithm sorting), blend mode, and cull flags.
### Background plane intents (2D layers)
```
RENDER_INTENT_BGPLANE -- configure a background/tilemap layer
```
Carries: layer index, tile map data reference, scroll offset, palette,
and transform (for Mode7-style affine).
### UI intents (screen space)
```
RENDER_INTENT_UI_RECT -- solid colored rectangle
RENDER_INTENT_UI_SPRITE -- textured rectangle (UI image)
RENDER_INTENT_UI_TEXT -- text string at screen position
```
UI intents are always screen-space. They are never mixed into the 3D
world queue. See UI System section below.
### Platform translation
| Intent | Modern GL | PSP GU | Saturn VDP1 | PS1 OT | N64 RSP | SNES PPU |
|---|---|---|---|---|---|---|
| QUAD | VAO + glDraw | GU display list | distorted-sprite cmd | GPU quad packet | RSP display list | OAM + BG tile |
| POLYGON | VAO + glDraw | GU display list | polygon cmd | GPU poly packet | RSP display list | OAM |
| BGPLANE | fullscreen quad | fullscreen quad | VDP2 config | fullscreen quad | fullscreen quad | BG layer config |
| UI_SPRITE | 2D ortho quad | 2D GU quad | VDP2 BG plane | GPU rect packet | RDP rectangle | BG layer tile |
| MESH | VAO/VBO | GU buffers | (degrade: quads) | (degrade: tris/quads) | RSP display list | (not supported) |
Note: N64 supports both triangles and axis-aligned rectangles natively via
RDP. PS1 supports triangles and quads (4-vertex) natively, so neither needs
the dead-vertex trick that Saturn requires.
---
## Asset Pipeline: Platform-Native Formats
### The problem
All textures currently enter the engine as RGBA and are converted at
runtime by each platform (Dolphin retiles to 4x4 blocks; GL uploads as-is).
This wastes memory and CPU time, and is impossible for platforms where RGBA
is not a valid intermediate format at all.
### The solution
The asset compiler (offline, run at build time) produces platform-specific
binary bundles. A texture asset has one source (PNG or similar) but N
compiled outputs, one per target.
### Texture formats by platform
| Platform | Native Formats | Notes |
|---|---|---|
| Modern GL | RGBA8, RGB8, BC1-BC7 (compressed) | Upload directly, GPU handles |
| Legacy GL | RGBA8, RGB8, CI8 (palette via extension) | No compressed formats |
| Vulkan | VkFormat variants (RGBA8, BC, ASTC) | Chosen at compile time |
| PSP GU | ABGR8888, BGR5650, ABGR1555, ABGR4444, CI4, CI8 | Native swizzled format |
| Saturn VDP1/VDP2 | RGB555, CI4, CI8 (15-bit palette in CRAM) | Big-endian, packed |
| PlayStation 1 | RGB555 / CI4 / CI8 (CLUT in VRAM) | Little-endian; VRAM flat; CLUT at coord |
| Nintendo 64 | RGBA16, RGBA32, IA4-IA16, I4-I8, CI4, CI8 | 4 KB TMEM; tiles must fit in TMEM banks |
| GameCube/Wii GX | I4, I8, IA4, IA8, RGB565, RGB5A3, RGBA8, CMPR | 4x4 tiled, big-endian |
| SNES PPU | 2bpp, 4bpp, 8bpp indexed (CGRAM palette) | Tile-packed, no direct access |
### Asset bundle structure
The `.dsk` bundle gains a platform tag. The loader picks the right section
at runtime (or the build produces a single-platform bundle for constrained
targets like SNES/Saturn where there is no spare storage for unused data).
---
## UI System (first-class)
### Current problem
UI elements go through the 3D pipeline: they are meshes with an orthographic
shader, rendered in the same pass as the world. This means:
- UI competes for Z-buffer depth with world geometry
- On Saturn/SNES, UI cannot use dedicated hardware planes
- Text rendering is tied to the sprite batch which is tied to the 3D pass
- No separation between "draw the world" and "draw the HUD"
### New model
UI is a completely separate rendering context. The world renders first,
then the UI renders on top. They share no state.
UI coordinates are always in screen space (pixels or a logical resolution
that the platform scales to its native display size). No camera matrix,
no projection, no depth buffer involvement.
### Platform mapping
| Platform | UI implementation |
|---|---|
| Modern GL | Separate 2D ortho pass, screen-space quads, no depth test |
| Legacy GL | Same, using fixed-function |
| PSP GU | Separate GU display list, 2D mode |
| Saturn | VDP2 background plane(s) dedicated to UI |
| PlayStation 1 | Separate GPU packet chain, no Z; ordered after world OT |
| Nintendo 64 | RDP rectangle commands in a separate display list segment |
| GameCube/Wii | GX 2D mode or dedicated GX pass |
| SNES | Dedicated BG layer(s) for HUD tiles |
On Saturn, the UI occupying VDP2 planes is a genuine hardware win -- the
PPU composites it for free at scanline time, costing zero VDP1 commands.
On SNES, the HUD must live in a BG layer because there is no alternative.
### UI API (proposed)
```c
uiBegin();
uiDrawRect(x, y, w, h, color);
uiDrawSprite(x, y, w, h, texture, uvMin, uvMax);
uiDrawText(x, y, font, string);
uiEnd(); // platform flushes UI to hardware
```
The `uiBegin`/`uiEnd` block collects intents; the platform submits them
at frame end in whatever way is appropriate.
---
## SNES / Mode7
SNES is the most constrained platform the engine will ever support and
needs its own section because it breaks assumptions that even Saturn keeps.
### Hardware
- **CPU**: 65816 @ ~3.58 MHz (16-bit, no FPU, no cache)
- **PPU**: Tile-based scanline renderer. VRAM holds tile graphics and
tile maps. BG layers reference tiles by index.
- **Mode7**: A single BG layer with a 2D affine matrix applied per
scanline. Used for overworld maps, road perspective (F-Zero), rotation
effects. The matrix is set via HDMA (scanline DMA) for per-scanline
variation, enabling horizon-perspective effects.
- **Sprites/OAM**: Up to 128 sprites (8x8, 16x16, 32x32, 64x64 pixels),
4bpp indexed, up to 8 per scanline.
- **Palette**: CGRAM holds 256 entries of 15-bit RGB (512 bytes total).
BG layers use sub-palettes of 4/16/256 colors depending on bit depth.
- **VRAM**: 64 KB (tiles + tile maps)
- **WRAM**: 128 KB work RAM + usually 8 KB SRAM on cart for saves
- **No frame buffer.** The PPU renders scanlines directly. You cannot
read back what was drawn.
- **No general-purpose draw calls.** You configure registers and VRAM
before the frame and the PPU does the rest.
### What "3D" means on SNES
True 3D is not possible. What can be approximated:
- **Overworld map**: Mode7 with a flat texture and HDMA scroll gives a
top-down perspective with a horizon line (the classic JRPG overworld).
- **Depth illusion**: Mode7 matrix manipulation can simulate a moving
camera over flat terrain. Objects are sprites placed at screen positions
calculated by software perspective projection.
- **Sprite scaling**: Software-scaled sprites using pre-rendered frames
or the RSP-style tricks used in Super FX games (Star Fox). Super FX
is a co-processor on the cartridge -- base SNES cannot do this.
- **Basic 3D effects**: Some games use HDMA color gradient + Mode7 floor
with overlaid sprites to create a pseudo-3D look.
The engine plan for SNES: Mode7 overworld (confirmed), sprite-based world
objects, BG layer UI. "Basic 3D effects" (pseudo-perspective with sprites)
is aspirational -- implementation complexity TBD.
### SNES constraints on the engine
- **No dynamic allocation.** With 128 KB WRAM, a general-purpose allocator
is risky. The engine memory system may need a static pool mode for SNES.
- **No floating point.** `float_t` must resolve to integer or fixed-point.
- **No scripting (JerryScript).** The JS engine requires far more than
128 KB RAM. SNES scenes must be compiled C.
- **Asset data in ROM, not a .dsk bundle.** SNES loads from cartridge ROM
mapped into the address space. The asset system needs a ROM-mapped loader.
- **Tile pipeline.** Textures must be pre-converted to SNES tile format
(2bpp/4bpp/8bpp, 8x8 pixel tiles, CGRAM palette) at build time. This
is a completely different asset output from every other platform.
---
## Platform Inventory
A summary of what each platform's native rendering looks like after the
refactor, for reference when designing the intent API.
### Modern OpenGL (duskgl)
VAO + VBO mesh storage, GLSL shaders, FBO render targets, Z-buffer.
No fixed-function. Targets: Linux, possibly Vita (GXM is preferred).
### Legacy OpenGL (duskgllegacy)
Fixed-function pipeline: `glMatrixMode`, `glTexEnv`, client-side vertex
arrays. No VAO/VBO. Used for: very old desktop hardware, maybe PSP as
last resort (PSPGL is this). Targets: legacy desktop, embedded Linux.
### Vulkan (duskvulkan)
Explicit pipeline state objects, render passes, descriptor sets, command
buffers. Highest ceiling for performance and control. Targets: Linux
(modern), future platforms. Not immediate priority but the architecture
should not block it.
### PSP native GU (duskpsp)
The GE/GU is a display-list GPU. You build a command list in memory and
the GU DMA engine processes it asynchronously. Native vertex formats are
PSP-specific (ABGR byte order, swizzled textures for cache efficiency).
No PSPGL. Targets: PSP hardware and emulators.
### Vita (duskvita)
GXM is Sony's Vita GPU API -- closer to modern GL than GU, with explicit
shader binaries (.gxp), ring buffers, and GPU sync primitives.
### GameCube/Wii GX (duskdolphin)
Already a custom renderer. GX uses immediate-mode vertex submission
(`GX_Begin` / `GX_Position1x16` loops), TEV for texture compositing, and
hardware XFB double-buffering. Big-endian. Mostly kept as-is; may benefit
from being expressed in terms of render intents for consistency.
### Saturn VDP1/VDP2 (dusksaturn)
VDP1: command-list (32-byte structs), quad-based, affine texture mapping,
no Z-buffer (painter's algorithm). VDP2: up to 6 background planes
composited at scanline time. Big-endian dual SH-2, no FPU. Fixed-point
math required throughout.
### PlayStation 1 (duskps1)
MIPS R3000A @ 33.87 MHz, little-endian, no FPU. GTE (coprocessor 2)
handles fixed-point matrix math, perspective divide, and lighting.
GPU receives packets via DMA linked-list (the Ordering Table). Primitives:
triangles and quads natively (no dead-vertex needed). Texture mapping:
affine, same limitation as Saturn. No Z-buffer; depth is OT slot order.
VRAM is 1 MB flat (frame buffers + textures + CLUTs share it). SDK:
PSn00bSDK, which is CMake-native -- a direct fit for the dusk build system.
### Nintendo 64 (duskn64)
VR4300 @ 93.75 MHz, big-endian, real IEEE 754 FPU. Rendering is split
between the RSP (geometry: programmable MIPS SIMD, runs microcode up to
~1000 instructions in 4 KB IMEM) and the RDP (rasterization: fixed
hardware). RSP produces triangle commands from a CPU-built display list
in RDRAM. RDP features: perspective-correct texture mapping, bilinear
filtering, hardware Z-buffer. Primitives: triangles and axis-aligned rects.
TMEM is 4 KB on-chip texture cache; textures must be loaded into tiles
before drawing -- a significant memory management constraint.
SDK: libdragon (Unlicense, GCC 14, Makefile-based -- not CMake; this
requires a wrapper toolchain file for dusk's build system).
### SNES PPU/Mode7 (dusksnes)
Tile-based. VRAM holds tiles and tile maps. Mode7 provides affine transform
for one BG layer. Sprites via OAM. No frame buffer. All configuration is
memory-mapped registers. 65816 CPU, no FPU, extremely limited RAM.
---
## Threading Model
### Current model
The engine uses OS threads for async asset loading (`assetXxxLoaderAsync`).
Platforms that have pthreads or an equivalent RTOS (Linux, PSP, Vita) run
worker threads that load data in the background while the game loop runs.
The main thread polls or blocks on completion.
### The problem
Several target platforms have no OS threading whatsoever, and others have
hardware-specific async mechanisms that are nothing like pthreads.
### Per-platform reality
| Platform | Threading | Async mechanism |
|---|---|---|
| Linux | pthreads | Worker threads (current) |
| Vita | SceKernelThread | Per-SDK threads |
| PSP | SceKernelThread | Per-SDK threads |
| GameCube/Wii | libogc LWP | Lightweight processes |
| Saturn | None (OS) | Slave SH-2 for fixed jobs; CD-ROM via interrupt/callback |
| PlayStation 1 | None (OS) | V-blank ISR, 7 DMA channels, CD-ROM callbacks |
| Nintendo 64 | libdragon preview only | PI DMA for cartridge; RSP for parallel compute |
| SNES | None | DMA (GPDMA/HDMA); NMI V-blank; SPC700 audio is a separate CPU |
**Saturn slave SH-2**: The second SH-2 is not a general-purpose thread.
It runs a fixed subroutine you hand-load. The typical use is offloading
heavy per-frame computation (geometry transforms, depth sort) while the
master SH-2 handles game logic. Communication is via shared WRAM with
cache-through addresses to avoid coherency bugs. There is no scheduler
and no yield -- it runs to completion.
**SNES DMA**: GPDMA copies blocks of data (ROM to WRAM, WRAM to VRAM)
and halts the CPU for the duration -- it is synchronous from the game's
perspective. HDMA runs per-scanline during H-blank, writing to PPU
registers without CPU involvement; this is how Mode7 perspective is
achieved. Neither is "async" in the programming sense.
**SNES NMI**: The V-blank NMI fires at the start of every V-blank period.
This is the only safe window to write to VRAM and PPU registers. All
critical PPU updates must complete within ~1.2ms (the V-blank window).
### Proposed model
Introduce a compile-time threading capability flag:
```
DUSK_THREAD_PTHREAD -- Linux, maybe Vita
DUSK_THREAD_SCEKERNEL -- PSP, Vita SDK
DUSK_THREAD_LWP -- GameCube/Wii libogc
DUSK_THREAD_SLAVE_SH2 -- Saturn slave CPU (job dispatch only)
DUSK_THREAD_NONE -- SNES (and Saturn master thread view)
```
The asset loader's async path is gated on having a threading capability.
When `DUSK_THREAD_NONE` is defined, `assetXxxLoaderAsync` either does not
exist or is an alias for the synchronous version. On Saturn, the slave SH-2
is exposed as a distinct API (`sh2JobDispatch`, `sh2JobWait`) used only for
compute-heavy work, not for I/O.
### Asset loading without threads
**Saturn**: CD-ROM access is initiated via SBL/CDC routines and completes
via interrupt callback. The engine's asset loading loop can poll the
callback flag in the main loop rather than blocking a thread. This is
interrupt-driven cooperative async, not preemptive.
**SNES**: There is no loading. Assets live in ROM, mapped directly into the
65816 address space. "Loading a texture" means computing a pointer into ROM
and copying the tile data to VRAM during V-blank via GPDMA. The asset system
on SNES is essentially a VRAM/CGRAM allocator and a DMA scheduler, not a
file loader.
### Asset system changes
The asset pipeline needs to accommodate three loading models:
1. **File-based** (Linux, PSP, Vita, Saturn CD): open file, read bytes,
close. Can be sync or thread-async.
2. **DMA/interrupt** (Saturn CD-ROM, GC DVD): initiate transfer, poll or
callback on completion, no thread blocked.
3. **ROM-mapped** (SNES): data is already in the address space; "loading"
is a VRAM DMA copy scheduled for V-blank, not file I/O.
The `assetstream_t` abstraction that currently wraps file I/O needs a third
backend for ROM-mapped data, and the async path needs to support
callback-based completion as an alternative to thread-based blocking.
---
## What Needs to Change
### 1. Render intent API (new, in src/dusk/)
Replace `mesh_t` / `shader_t` / `meshDraw()` as scene-facing APIs with
`renderqueue_t` and intent submission functions. `src/dusk/` defines the
intent types and submission API; platforms implement the flush.
### 2. Platform renderer directories
Move rendering implementations out of `duskgl/` as a shared layer and
into fully self-contained platform directories. `duskgl/` becomes the
*modern GL* platform only. Add `duskgllegacy/`, `duskvulkan/` as peers.
### 3. Asset pipeline: platform-native texture formats
The offline asset compiler must produce per-platform texture bundles in
native formats. The runtime texture loader expects pre-converted data,
not RGBA. `textureformat_t` grows to cover all platform formats but each
platform only ever sees the formats it natively supports.
### 4. UI system (first-class, separate from 3D)
New `src/dusk/ui/` subsystem with `uiBegin` / `uiEnd` and intent types
for rects, sprites, and text. Platforms implement the flush independently.
The 3D spritebatch is retired or scoped to world-space billboards only.
### 5. Fixed-point / no-FPU math
`float_t` needs a fixed-point mode. Proposed: define `fixed_t` as a
16.16 signed integer; define `DUSK_MATH_FIXED` for platforms that require
it (Saturn, SNES). Engine math utilities (`mathSin`, `mathCos`, etc.)
have fixed-point implementations selected by this flag. `float_t` on
FPU-less platforms becomes a typedef for `fixed_t`.
### 6. Background plane abstraction (bgplane_t)
New concept in `src/dusk/display/bgplane/`. A BG plane has a tile map or
bitmap source, scroll offsets, a palette reference, and optional affine
parameters (for Mode7-style use). On GL platforms: rendered as a
fullscreen textured quad or shader pass. On Saturn: VDP2 config. On SNES:
PPU BG layer config.
### 7. Memory system: static pool mode
For SNES (and possibly Saturn), the general-purpose allocator may be
unviable. A compile-time static pool mode (`DUSK_MEMORY_STATIC`) that uses
a fixed-size arena instead of dynamic allocation. All `memoryAllocate`
calls hit the pool; `memoryFree` is a no-op or a stack pop.
### 8. Script runtime: optional
JerryScript requires too much RAM for SNES and is marginal on Saturn.
The scripting system should be compile-time optional (`DUSK_SCRIPTING`),
not assumed present. SNES/Saturn scenes would be compiled C.
---
## What to Keep
- Platform macro abstraction pattern (`displayplatform.h`, etc.) -- works,
no reason to change.
- Directory structure convention for platform directories.
- Entity-component system -- platform-agnostic, unaffected.
- Asset loading + `.dsk` bundle concept (extended for platform formats).
- The broad subsystem layout: asset, input, display, log, network, save,
system, time.
---
## Open Questions
1. **Render intent granularity**: How much does the intent API need to
express? A MESH intent works on GL/N64 but degrades poorly on Saturn
(must split into quads) and is impossible on SNES. Should MESH be a
valid intent with a "best effort" contract, or excluded from the portable
API entirely?
2. **Threading abstraction depth**: Should `DUSK_THREAD_SLAVE_SH2` be a
first-class concept in the engine's job system, or a Saturn-internal
implementation detail the core never sees? Same question applies to N64's
RSP as a compute co-processor.
3. **Asset loading async contract**: When a platform has no threads, should
`assetLoadAsync` be a no-op alias for `assetLoadSync`, or return
immediately with a completion flag to poll? The polling model is more
honest but requires all call sites to handle it.
4. **N64 build system**: libdragon uses GNU Make, not CMake. Options are:
(a) write a CMake toolchain file that wraps n64.mk, (b) maintain a
parallel Makefile just for N64, or (c) wait for upstream CMake support.
Which is acceptable long-term?
5. **N64 RSP microcode**: Standard libdragon microcodes (Fast3D/F3DEX2) or
Tiny3D (community microcode with full T&L + skinning)? Writing custom
microcode is powerful but limited to ~1000 MIPS SIMD instructions.
This decision gates what 3D features the N64 port can support.
6. **PSPGL fate**: Drop immediately in favor of native GU, or keep as a
fallback (`duskgllegacy`) while native GU is built? The two can coexist
during transition.
7. **Vulkan priority**: Design the intent API with Vulkan in mind from the
start, or add it later? Vulkan's explicit pipeline state model may
conflict with how stateful platforms (Saturn, SNES) expect things to work.
8. **Background planes on modern platforms**: Does `bgplane_t` degrade to a
fullscreen textured quad on GL/Vulkan/N64, or should modern platforms
support actual background scene rendering (3D world behind the foreground)?
9. **PS1 ordering table depth**: The OT is a fixed-size array (e.g. 4096
slots). Depth precision = number of slots. How deep should the engine's
default OT be, and should this be configurable per-scene?
10. **Fixed-point strategy**: Does `float_t` transparently become `fixed_t`
on FPU-less platforms (Saturn, PS1, SNES), or do we require explicit
`fixed_t` in math-heavy paths? Transparent is easiest to port; explicit
is faster.
11. **SNES V-blank budget**: All VRAM writes must finish within ~1.2ms.
Does the engine need a V-blank work queue with a budget checker, or is
this left to the game to manage manually?
12. **SNES scripting**: JerryScript is out. Pure compiled C, or a lighter
scripting layer (Lua is ~100 KB -- tight but possible)?
13. **Asset compiler**: New standalone tool, or an extension of the existing
asset pipeline? Part of the CMake build or a separate pre-build step?
---
## Proposed Sequence (Draft)
### Phase 1 -- Intent API (no behavior change)
1. Design and stabilize `renderqueue_t` and intent types
2. Refactor modern GL path to submit through render intents (same output,
new plumbing)
3. Refactor Dolphin path the same way
4. Validate no regressions on Linux + GameCube
### Phase 2 -- UI system
5. Extract UI rendering from the 3D path into `src/dusk/ui/`
6. Implement UI flush for GL and Dolphin
7. Wire existing UI elements through the new system
### Phase 3 -- Platform splits
8. Split `duskgl/` into `duskgl/` (modern) and `duskgllegacy/` (fixed-func)
9. Port PSP to native GU (`duskpsp/display/` rewrite, drop PSPGL dependency)
10. Stub `duskvulkan/` structure for future implementation
### Phase 4 -- Asset pipeline
11. Design platform-native texture format system
12. Extend asset compiler for per-platform output
13. Update texture loader to expect pre-converted data
### Phase 5 -- Saturn
14. CMake toolchain for SH-2 cross-compile (yaul / libyaul toolchain)
15. `src/dusksaturn/` -- input (SMPC), asset (CD-ROM), log, system
16. VDP1 backend for render queue (quads, polygons, painter's sort)
17. VDP2 backend for bgplane_t (tile maps, scroll, palette)
18. Fixed-point math mode (`DUSK_MATH_FIXED`)
19. UI backend (VDP2 plane(s))
### Phase 6 -- PlayStation 1
20. CMake toolchain wrapping PSn00bSDK (already CMake-native)
21. `src/duskps1/` -- input (BIOS pad), asset (CD-ROM libpsxcd), log, system
22. GTE integration for fixed-point math (reuse `DUSK_MATH_FIXED` path)
23. Ordering table builder for render queue (painter's sort, DMA linked-list)
24. GPU packet backend for intents (tris, quads, rects)
25. UI backend (separate GPU packet chain after world OT)
### Phase 7 -- Nintendo 64
26. CMake toolchain wrapping libdragon (n64.mk wrapper or toolchain file)
27. `src/duskn64/` -- input (N64 controller via PIF), asset (PI DMA /
DragonFS), log, system
28. RSP display list builder for render queue (Z-buffer path, no sorting)
29. TMEM tile management for textures
30. RDP rectangle backend for UI
31. Decide on RSP microcode (Tiny3D vs standard F3DEX2)
### Phase 8 -- SNES
32. SNES toolchain (cc65 or llvm-mos 65816 target)
33. Static memory pool mode (`DUSK_MEMORY_STATIC`)
34. PPU tile pipeline + VRAM management
35. Mode7 overworld implementation
36. OAM sprite system
37. BG layer UI
38. Scripting-optional build (`DUSK_SCRIPTING` off)