Docs
This commit is contained in:
@@ -0,0 +1,714 @@
|
||||
# Display Layer Refactor
|
||||
|
||||
## Vision
|
||||
|
||||
The goal is to remove the implicit assumption that all platforms render
|
||||
through a GL-like API, and replace it with a system where each platform
|
||||
owns its rendering stack completely. The scene describes *what* to draw
|
||||
in platform-neutral terms; the platform decides *how* to draw it.
|
||||
|
||||
This unlocks:
|
||||
- Saturn (VDP1/VDP2 command-list, no Z-buffer, affine-only)
|
||||
- PlayStation 1 (ordering table, affine textures, GTE fixed-point, CMake SDK)
|
||||
- Nintendo 64 (RSP display list, hardware Z-buffer, perspective-correct,
|
||||
real FPU -- closer to modern GL than to Saturn)
|
||||
- SNES (PPU tile engine, Mode7 for overworld, no real 3D)
|
||||
- Vulkan (explicit, modern, no legacy GL baggage)
|
||||
- Native PSP GU (drop PSPGL which is just a compatibility shim)
|
||||
- Legacy fixed-function GL as its own standalone target
|
||||
- A real first-class 2D UI system not bolted onto 3D space
|
||||
|
||||
---
|
||||
|
||||
## Why
|
||||
|
||||
### The current abstraction assumes GPU-style rendering
|
||||
|
||||
The current display layer was designed around a GL-like mental model:
|
||||
vertex buffers, shaders, Z-buffered triangle rasterization, and texture
|
||||
objects. `duskgl` implements this with real OpenGL. `duskdolphin` does its
|
||||
own GX thing but still matches the same interface (mesh, shader, texture,
|
||||
framebuffer). PSP uses PSPGL -- a library that *emulates* GL on top of
|
||||
the PSP's native GE/GU hardware, which is entirely different underneath.
|
||||
|
||||
Problems this creates:
|
||||
|
||||
**PSPGL is a lie.** The PSP has a native graphics engine (GE/GU) with its
|
||||
own command list, its own vertex formats, and its own display list model.
|
||||
PSPGL translates GL calls into GU calls, but imperfectly -- and we end up
|
||||
paying the abstraction cost without getting GL correctness. Writing directly
|
||||
to GU gives better performance, access to native formats, and correct
|
||||
behavior on edge cases that PSPGL gets wrong.
|
||||
|
||||
**Legacy GL should not share code with modern GL.** The fixed-function
|
||||
pipeline (no shaders, matrix stacks via glMatrixMode, glTexEnv) is
|
||||
meaningfully different from modern GL (VAO/VBO, GLSL, explicit uniform
|
||||
locations). Treating them as "the same thing with a flag" creates a tangle
|
||||
of `#ifdef DUSK_OPENGL_LEGACY` guards throughout the rendering code.
|
||||
They are separate targets and should be separate platform directories.
|
||||
|
||||
**Saturn cannot fit the model at all.** VDP1 is a command-list processor:
|
||||
you write 32-byte command structs (sprites, quads, lines) into VRAM, then
|
||||
poke a register to trigger execution. There are no vertex buffers, no
|
||||
shaders, no Z-buffer. Depth is pure painter's algorithm -- command order
|
||||
IS the depth. VDP2 composites up to 6 background planes at scanline time;
|
||||
these are tile maps and rotation parameter tables, not meshes. Nothing
|
||||
about the current API maps onto this hardware.
|
||||
|
||||
**SNES is even further removed.** The PPU renders tiles. VRAM holds 8x8
|
||||
or 16x16 pixel tiles and tile maps; the PPU references these during
|
||||
scanline rendering. There are no draw calls. Mode7 is an affine transform
|
||||
applied to a single background layer (the basis for the overworld map and
|
||||
road perspective effects). Sprites are entries in OAM (Object Attribute
|
||||
Memory). The 65816 CPU writes to memory-mapped registers and VRAM; the
|
||||
PPU does the rest. The concept of "mesh" or "shader" is meaningless here.
|
||||
|
||||
**Textures loaded as RGBA waste memory and exclude platforms.** Loading
|
||||
every texture as 32-bit RGBA and converting at runtime is expensive on
|
||||
memory-constrained platforms (Saturn has ~1 MB total RAM; SNES has 64 KB
|
||||
VRAM) and simply wrong for platforms that have native formats incompatible
|
||||
with RGBA (e.g., PSP's ABGR8888 / BGR5650, Saturn's RGB555 / CI4 / CI8,
|
||||
SNES's 2bpp/4bpp/8bpp indexed). The asset pipeline must compile textures
|
||||
to platform-native formats at build time.
|
||||
|
||||
**UI in 3D space is wasteful and limiting.** Currently UI elements are
|
||||
rendered as geometry projected into screen space, going through the full
|
||||
3D pipeline. On platforms with dedicated 2D hardware (Saturn VDP2,
|
||||
SNES BG layers), this is actively wrong -- UI should map to a hardware
|
||||
plane, not a 3D draw call. On modern platforms it should be a clean
|
||||
screen-space pass that never touches the 3D depth buffer.
|
||||
|
||||
---
|
||||
|
||||
## Current Model (Summary)
|
||||
|
||||
```
|
||||
Scene
|
||||
-> shaderBind(shader)
|
||||
-> textureBind(texture)
|
||||
-> meshDraw(mesh) <-- immediate draw call per object
|
||||
-> meshDraw(mesh)
|
||||
-> ...
|
||||
Platform receives each draw call immediately.
|
||||
Depth is handled by Z-buffer hardware.
|
||||
All textures live in GPU memory as RGBA (or Dolphin's tiled RGBA).
|
||||
UI is rendered as 3D geometry with an orthographic projection.
|
||||
```
|
||||
|
||||
Key current concepts:
|
||||
- `mesh_t` -- vertex array (triangles/quads), in GPU VBO (GL) or CPU
|
||||
memory (Dolphin)
|
||||
- `shader_t` -- GLSL program (modern GL), GL fixed-function state
|
||||
(legacy GL), or GX matrix + TEV config (Dolphin)
|
||||
- `texture_t` -- GPU texture handle (GL) or tiled CPU buffer (Dolphin);
|
||||
always RGBA at the engine level
|
||||
- `framebuffer_t` -- FBO (GL) or fixed hardware XFB (Dolphin)
|
||||
- `spritebatch_t` -- accumulates 2D quads and flushes in batches of 32;
|
||||
the only existing deferred-submission system in the engine
|
||||
|
||||
The spritebatch hints at the right model. Everything needs to work this way.
|
||||
|
||||
---
|
||||
|
||||
## The Core Shift: Platform-Native Rendering
|
||||
|
||||
### Before
|
||||
|
||||
```
|
||||
src/dusk/ Core engine + GL-like rendering API definition
|
||||
src/duskgl/ OpenGL implementation
|
||||
src/dusksdl2/ SDL2 window/input (shared)
|
||||
src/duskpsp/ PSP via PSPGL (shim over GU)
|
||||
src/duskvita/ Vita via GL ES (similar path to duskgl)
|
||||
src/duskdolphin/ GameCube/Wii via GX (already custom)
|
||||
src/dusklinux/ Linux (uses dusksdl2 + duskgl)
|
||||
```
|
||||
|
||||
### After
|
||||
|
||||
```
|
||||
src/dusk/ Core engine logic + render intent API ONLY
|
||||
src/dusksdl2/ SDL2 window/input (unchanged)
|
||||
src/duskgl/ Modern OpenGL (Linux, Vita modern path)
|
||||
src/duskgllegacy/ Fixed-function OpenGL (older hardware, PSP with PSPGL
|
||||
as a last resort)
|
||||
src/duskvulkan/ Vulkan (Linux modern, future)
|
||||
src/duskpsp/ PSP native GU (no PSPGL, direct command lists)
|
||||
src/duskvita/ Vita native GXM (TBD)
|
||||
src/duskdolphin/ GameCube/Wii GX (already custom, mostly kept)
|
||||
src/dusksaturn/ Saturn VDP1/VDP2 (new)
|
||||
src/duskps1/ PlayStation 1 ordering table + GTE (new)
|
||||
src/duskn64/ Nintendo 64 RSP/RDP display list (new)
|
||||
src/dusksnes/ SNES PPU/Mode7 (new, extremely constrained)
|
||||
```
|
||||
|
||||
`src/dusk/` no longer knows about meshes, shaders, or framebuffers.
|
||||
It defines the *render intent* system: what the scene wants to draw.
|
||||
Each platform directory is entirely self-contained and responsible for
|
||||
translating intents to its native API.
|
||||
|
||||
---
|
||||
|
||||
## Render Intent System (new)
|
||||
|
||||
Instead of the scene calling `meshDraw()` or `shaderBind()`, it submits
|
||||
render intents into a `renderqueue_t`. An intent describes what should
|
||||
appear on screen without prescribing how to draw it.
|
||||
|
||||
### Primitive intents (3D world)
|
||||
|
||||
```
|
||||
RENDER_INTENT_QUAD -- textured quad, 4 vertices or transform + size
|
||||
RENDER_INTENT_POLYGON -- filled polygon (convex, up to N vertices)
|
||||
RENDER_INTENT_LINE -- line segment or polyline
|
||||
RENDER_INTENT_SPRITE -- 2D billboard (always faces camera)
|
||||
RENDER_INTENT_MESH -- arbitrary vertex array (GL/GX only; degraded
|
||||
on command-list platforms)
|
||||
```
|
||||
|
||||
Each intent carries: texture reference, color/tint, depth hint (for
|
||||
painter's algorithm sorting), blend mode, and cull flags.
|
||||
|
||||
### Background plane intents (2D layers)
|
||||
|
||||
```
|
||||
RENDER_INTENT_BGPLANE -- configure a background/tilemap layer
|
||||
```
|
||||
|
||||
Carries: layer index, tile map data reference, scroll offset, palette,
|
||||
and transform (for Mode7-style affine).
|
||||
|
||||
### UI intents (screen space)
|
||||
|
||||
```
|
||||
RENDER_INTENT_UI_RECT -- solid colored rectangle
|
||||
RENDER_INTENT_UI_SPRITE -- textured rectangle (UI image)
|
||||
RENDER_INTENT_UI_TEXT -- text string at screen position
|
||||
```
|
||||
|
||||
UI intents are always screen-space. They are never mixed into the 3D
|
||||
world queue. See UI System section below.
|
||||
|
||||
### Platform translation
|
||||
|
||||
| Intent | Modern GL | PSP GU | Saturn VDP1 | PS1 OT | N64 RSP | SNES PPU |
|
||||
|---|---|---|---|---|---|---|
|
||||
| QUAD | VAO + glDraw | GU display list | distorted-sprite cmd | GPU quad packet | RSP display list | OAM + BG tile |
|
||||
| POLYGON | VAO + glDraw | GU display list | polygon cmd | GPU poly packet | RSP display list | OAM |
|
||||
| BGPLANE | fullscreen quad | fullscreen quad | VDP2 config | fullscreen quad | fullscreen quad | BG layer config |
|
||||
| UI_SPRITE | 2D ortho quad | 2D GU quad | VDP2 BG plane | GPU rect packet | RDP rectangle | BG layer tile |
|
||||
| MESH | VAO/VBO | GU buffers | (degrade: quads) | (degrade: tris/quads) | RSP display list | (not supported) |
|
||||
|
||||
Note: N64 supports both triangles and axis-aligned rectangles natively via
|
||||
RDP. PS1 supports triangles and quads (4-vertex) natively, so neither needs
|
||||
the dead-vertex trick that Saturn requires.
|
||||
|
||||
---
|
||||
|
||||
## Asset Pipeline: Platform-Native Formats
|
||||
|
||||
### The problem
|
||||
|
||||
All textures currently enter the engine as RGBA and are converted at
|
||||
runtime by each platform (Dolphin retiles to 4x4 blocks; GL uploads as-is).
|
||||
This wastes memory and CPU time, and is impossible for platforms where RGBA
|
||||
is not a valid intermediate format at all.
|
||||
|
||||
### The solution
|
||||
|
||||
The asset compiler (offline, run at build time) produces platform-specific
|
||||
binary bundles. A texture asset has one source (PNG or similar) but N
|
||||
compiled outputs, one per target.
|
||||
|
||||
### Texture formats by platform
|
||||
|
||||
| Platform | Native Formats | Notes |
|
||||
|---|---|---|
|
||||
| Modern GL | RGBA8, RGB8, BC1-BC7 (compressed) | Upload directly, GPU handles |
|
||||
| Legacy GL | RGBA8, RGB8, CI8 (palette via extension) | No compressed formats |
|
||||
| Vulkan | VkFormat variants (RGBA8, BC, ASTC) | Chosen at compile time |
|
||||
| PSP GU | ABGR8888, BGR5650, ABGR1555, ABGR4444, CI4, CI8 | Native swizzled format |
|
||||
| Saturn VDP1/VDP2 | RGB555, CI4, CI8 (15-bit palette in CRAM) | Big-endian, packed |
|
||||
| PlayStation 1 | RGB555 / CI4 / CI8 (CLUT in VRAM) | Little-endian; VRAM flat; CLUT at coord |
|
||||
| Nintendo 64 | RGBA16, RGBA32, IA4-IA16, I4-I8, CI4, CI8 | 4 KB TMEM; tiles must fit in TMEM banks |
|
||||
| GameCube/Wii GX | I4, I8, IA4, IA8, RGB565, RGB5A3, RGBA8, CMPR | 4x4 tiled, big-endian |
|
||||
| SNES PPU | 2bpp, 4bpp, 8bpp indexed (CGRAM palette) | Tile-packed, no direct access |
|
||||
|
||||
### Asset bundle structure
|
||||
|
||||
The `.dsk` bundle gains a platform tag. The loader picks the right section
|
||||
at runtime (or the build produces a single-platform bundle for constrained
|
||||
targets like SNES/Saturn where there is no spare storage for unused data).
|
||||
|
||||
---
|
||||
|
||||
## UI System (first-class)
|
||||
|
||||
### Current problem
|
||||
|
||||
UI elements go through the 3D pipeline: they are meshes with an orthographic
|
||||
shader, rendered in the same pass as the world. This means:
|
||||
- UI competes for Z-buffer depth with world geometry
|
||||
- On Saturn/SNES, UI cannot use dedicated hardware planes
|
||||
- Text rendering is tied to the sprite batch which is tied to the 3D pass
|
||||
- No separation between "draw the world" and "draw the HUD"
|
||||
|
||||
### New model
|
||||
|
||||
UI is a completely separate rendering context. The world renders first,
|
||||
then the UI renders on top. They share no state.
|
||||
|
||||
UI coordinates are always in screen space (pixels or a logical resolution
|
||||
that the platform scales to its native display size). No camera matrix,
|
||||
no projection, no depth buffer involvement.
|
||||
|
||||
### Platform mapping
|
||||
|
||||
| Platform | UI implementation |
|
||||
|---|---|
|
||||
| Modern GL | Separate 2D ortho pass, screen-space quads, no depth test |
|
||||
| Legacy GL | Same, using fixed-function |
|
||||
| PSP GU | Separate GU display list, 2D mode |
|
||||
| Saturn | VDP2 background plane(s) dedicated to UI |
|
||||
| PlayStation 1 | Separate GPU packet chain, no Z; ordered after world OT |
|
||||
| Nintendo 64 | RDP rectangle commands in a separate display list segment |
|
||||
| GameCube/Wii | GX 2D mode or dedicated GX pass |
|
||||
| SNES | Dedicated BG layer(s) for HUD tiles |
|
||||
|
||||
On Saturn, the UI occupying VDP2 planes is a genuine hardware win -- the
|
||||
PPU composites it for free at scanline time, costing zero VDP1 commands.
|
||||
On SNES, the HUD must live in a BG layer because there is no alternative.
|
||||
|
||||
### UI API (proposed)
|
||||
|
||||
```c
|
||||
uiBegin();
|
||||
uiDrawRect(x, y, w, h, color);
|
||||
uiDrawSprite(x, y, w, h, texture, uvMin, uvMax);
|
||||
uiDrawText(x, y, font, string);
|
||||
uiEnd(); // platform flushes UI to hardware
|
||||
```
|
||||
|
||||
The `uiBegin`/`uiEnd` block collects intents; the platform submits them
|
||||
at frame end in whatever way is appropriate.
|
||||
|
||||
---
|
||||
|
||||
## SNES / Mode7
|
||||
|
||||
SNES is the most constrained platform the engine will ever support and
|
||||
needs its own section because it breaks assumptions that even Saturn keeps.
|
||||
|
||||
### Hardware
|
||||
|
||||
- **CPU**: 65816 @ ~3.58 MHz (16-bit, no FPU, no cache)
|
||||
- **PPU**: Tile-based scanline renderer. VRAM holds tile graphics and
|
||||
tile maps. BG layers reference tiles by index.
|
||||
- **Mode7**: A single BG layer with a 2D affine matrix applied per
|
||||
scanline. Used for overworld maps, road perspective (F-Zero), rotation
|
||||
effects. The matrix is set via HDMA (scanline DMA) for per-scanline
|
||||
variation, enabling horizon-perspective effects.
|
||||
- **Sprites/OAM**: Up to 128 sprites (8x8, 16x16, 32x32, 64x64 pixels),
|
||||
4bpp indexed, up to 8 per scanline.
|
||||
- **Palette**: CGRAM holds 256 entries of 15-bit RGB (512 bytes total).
|
||||
BG layers use sub-palettes of 4/16/256 colors depending on bit depth.
|
||||
- **VRAM**: 64 KB (tiles + tile maps)
|
||||
- **WRAM**: 128 KB work RAM + usually 8 KB SRAM on cart for saves
|
||||
- **No frame buffer.** The PPU renders scanlines directly. You cannot
|
||||
read back what was drawn.
|
||||
- **No general-purpose draw calls.** You configure registers and VRAM
|
||||
before the frame and the PPU does the rest.
|
||||
|
||||
### What "3D" means on SNES
|
||||
|
||||
True 3D is not possible. What can be approximated:
|
||||
- **Overworld map**: Mode7 with a flat texture and HDMA scroll gives a
|
||||
top-down perspective with a horizon line (the classic JRPG overworld).
|
||||
- **Depth illusion**: Mode7 matrix manipulation can simulate a moving
|
||||
camera over flat terrain. Objects are sprites placed at screen positions
|
||||
calculated by software perspective projection.
|
||||
- **Sprite scaling**: Software-scaled sprites using pre-rendered frames
|
||||
or the RSP-style tricks used in Super FX games (Star Fox). Super FX
|
||||
is a co-processor on the cartridge -- base SNES cannot do this.
|
||||
- **Basic 3D effects**: Some games use HDMA color gradient + Mode7 floor
|
||||
with overlaid sprites to create a pseudo-3D look.
|
||||
|
||||
The engine plan for SNES: Mode7 overworld (confirmed), sprite-based world
|
||||
objects, BG layer UI. "Basic 3D effects" (pseudo-perspective with sprites)
|
||||
is aspirational -- implementation complexity TBD.
|
||||
|
||||
### SNES constraints on the engine
|
||||
|
||||
- **No dynamic allocation.** With 128 KB WRAM, a general-purpose allocator
|
||||
is risky. The engine memory system may need a static pool mode for SNES.
|
||||
- **No floating point.** `float_t` must resolve to integer or fixed-point.
|
||||
- **No scripting (JerryScript).** The JS engine requires far more than
|
||||
128 KB RAM. SNES scenes must be compiled C.
|
||||
- **Asset data in ROM, not a .dsk bundle.** SNES loads from cartridge ROM
|
||||
mapped into the address space. The asset system needs a ROM-mapped loader.
|
||||
- **Tile pipeline.** Textures must be pre-converted to SNES tile format
|
||||
(2bpp/4bpp/8bpp, 8x8 pixel tiles, CGRAM palette) at build time. This
|
||||
is a completely different asset output from every other platform.
|
||||
|
||||
---
|
||||
|
||||
## Platform Inventory
|
||||
|
||||
A summary of what each platform's native rendering looks like after the
|
||||
refactor, for reference when designing the intent API.
|
||||
|
||||
### Modern OpenGL (duskgl)
|
||||
|
||||
VAO + VBO mesh storage, GLSL shaders, FBO render targets, Z-buffer.
|
||||
No fixed-function. Targets: Linux, possibly Vita (GXM is preferred).
|
||||
|
||||
### Legacy OpenGL (duskgllegacy)
|
||||
|
||||
Fixed-function pipeline: `glMatrixMode`, `glTexEnv`, client-side vertex
|
||||
arrays. No VAO/VBO. Used for: very old desktop hardware, maybe PSP as
|
||||
last resort (PSPGL is this). Targets: legacy desktop, embedded Linux.
|
||||
|
||||
### Vulkan (duskvulkan)
|
||||
|
||||
Explicit pipeline state objects, render passes, descriptor sets, command
|
||||
buffers. Highest ceiling for performance and control. Targets: Linux
|
||||
(modern), future platforms. Not immediate priority but the architecture
|
||||
should not block it.
|
||||
|
||||
### PSP native GU (duskpsp)
|
||||
|
||||
The GE/GU is a display-list GPU. You build a command list in memory and
|
||||
the GU DMA engine processes it asynchronously. Native vertex formats are
|
||||
PSP-specific (ABGR byte order, swizzled textures for cache efficiency).
|
||||
No PSPGL. Targets: PSP hardware and emulators.
|
||||
|
||||
### Vita (duskvita)
|
||||
|
||||
GXM is Sony's Vita GPU API -- closer to modern GL than GU, with explicit
|
||||
shader binaries (.gxp), ring buffers, and GPU sync primitives.
|
||||
|
||||
### GameCube/Wii GX (duskdolphin)
|
||||
|
||||
Already a custom renderer. GX uses immediate-mode vertex submission
|
||||
(`GX_Begin` / `GX_Position1x16` loops), TEV for texture compositing, and
|
||||
hardware XFB double-buffering. Big-endian. Mostly kept as-is; may benefit
|
||||
from being expressed in terms of render intents for consistency.
|
||||
|
||||
### Saturn VDP1/VDP2 (dusksaturn)
|
||||
|
||||
VDP1: command-list (32-byte structs), quad-based, affine texture mapping,
|
||||
no Z-buffer (painter's algorithm). VDP2: up to 6 background planes
|
||||
composited at scanline time. Big-endian dual SH-2, no FPU. Fixed-point
|
||||
math required throughout.
|
||||
|
||||
### PlayStation 1 (duskps1)
|
||||
|
||||
MIPS R3000A @ 33.87 MHz, little-endian, no FPU. GTE (coprocessor 2)
|
||||
handles fixed-point matrix math, perspective divide, and lighting.
|
||||
GPU receives packets via DMA linked-list (the Ordering Table). Primitives:
|
||||
triangles and quads natively (no dead-vertex needed). Texture mapping:
|
||||
affine, same limitation as Saturn. No Z-buffer; depth is OT slot order.
|
||||
VRAM is 1 MB flat (frame buffers + textures + CLUTs share it). SDK:
|
||||
PSn00bSDK, which is CMake-native -- a direct fit for the dusk build system.
|
||||
|
||||
### Nintendo 64 (duskn64)
|
||||
|
||||
VR4300 @ 93.75 MHz, big-endian, real IEEE 754 FPU. Rendering is split
|
||||
between the RSP (geometry: programmable MIPS SIMD, runs microcode up to
|
||||
~1000 instructions in 4 KB IMEM) and the RDP (rasterization: fixed
|
||||
hardware). RSP produces triangle commands from a CPU-built display list
|
||||
in RDRAM. RDP features: perspective-correct texture mapping, bilinear
|
||||
filtering, hardware Z-buffer. Primitives: triangles and axis-aligned rects.
|
||||
TMEM is 4 KB on-chip texture cache; textures must be loaded into tiles
|
||||
before drawing -- a significant memory management constraint.
|
||||
SDK: libdragon (Unlicense, GCC 14, Makefile-based -- not CMake; this
|
||||
requires a wrapper toolchain file for dusk's build system).
|
||||
|
||||
### SNES PPU/Mode7 (dusksnes)
|
||||
|
||||
Tile-based. VRAM holds tiles and tile maps. Mode7 provides affine transform
|
||||
for one BG layer. Sprites via OAM. No frame buffer. All configuration is
|
||||
memory-mapped registers. 65816 CPU, no FPU, extremely limited RAM.
|
||||
|
||||
---
|
||||
|
||||
## Threading Model
|
||||
|
||||
### Current model
|
||||
|
||||
The engine uses OS threads for async asset loading (`assetXxxLoaderAsync`).
|
||||
Platforms that have pthreads or an equivalent RTOS (Linux, PSP, Vita) run
|
||||
worker threads that load data in the background while the game loop runs.
|
||||
The main thread polls or blocks on completion.
|
||||
|
||||
### The problem
|
||||
|
||||
Several target platforms have no OS threading whatsoever, and others have
|
||||
hardware-specific async mechanisms that are nothing like pthreads.
|
||||
|
||||
### Per-platform reality
|
||||
|
||||
| Platform | Threading | Async mechanism |
|
||||
|---|---|---|
|
||||
| Linux | pthreads | Worker threads (current) |
|
||||
| Vita | SceKernelThread | Per-SDK threads |
|
||||
| PSP | SceKernelThread | Per-SDK threads |
|
||||
| GameCube/Wii | libogc LWP | Lightweight processes |
|
||||
| Saturn | None (OS) | Slave SH-2 for fixed jobs; CD-ROM via interrupt/callback |
|
||||
| PlayStation 1 | None (OS) | V-blank ISR, 7 DMA channels, CD-ROM callbacks |
|
||||
| Nintendo 64 | libdragon preview only | PI DMA for cartridge; RSP for parallel compute |
|
||||
| SNES | None | DMA (GPDMA/HDMA); NMI V-blank; SPC700 audio is a separate CPU |
|
||||
|
||||
**Saturn slave SH-2**: The second SH-2 is not a general-purpose thread.
|
||||
It runs a fixed subroutine you hand-load. The typical use is offloading
|
||||
heavy per-frame computation (geometry transforms, depth sort) while the
|
||||
master SH-2 handles game logic. Communication is via shared WRAM with
|
||||
cache-through addresses to avoid coherency bugs. There is no scheduler
|
||||
and no yield -- it runs to completion.
|
||||
|
||||
**SNES DMA**: GPDMA copies blocks of data (ROM to WRAM, WRAM to VRAM)
|
||||
and halts the CPU for the duration -- it is synchronous from the game's
|
||||
perspective. HDMA runs per-scanline during H-blank, writing to PPU
|
||||
registers without CPU involvement; this is how Mode7 perspective is
|
||||
achieved. Neither is "async" in the programming sense.
|
||||
|
||||
**SNES NMI**: The V-blank NMI fires at the start of every V-blank period.
|
||||
This is the only safe window to write to VRAM and PPU registers. All
|
||||
critical PPU updates must complete within ~1.2ms (the V-blank window).
|
||||
|
||||
### Proposed model
|
||||
|
||||
Introduce a compile-time threading capability flag:
|
||||
|
||||
```
|
||||
DUSK_THREAD_PTHREAD -- Linux, maybe Vita
|
||||
DUSK_THREAD_SCEKERNEL -- PSP, Vita SDK
|
||||
DUSK_THREAD_LWP -- GameCube/Wii libogc
|
||||
DUSK_THREAD_SLAVE_SH2 -- Saturn slave CPU (job dispatch only)
|
||||
DUSK_THREAD_NONE -- SNES (and Saturn master thread view)
|
||||
```
|
||||
|
||||
The asset loader's async path is gated on having a threading capability.
|
||||
When `DUSK_THREAD_NONE` is defined, `assetXxxLoaderAsync` either does not
|
||||
exist or is an alias for the synchronous version. On Saturn, the slave SH-2
|
||||
is exposed as a distinct API (`sh2JobDispatch`, `sh2JobWait`) used only for
|
||||
compute-heavy work, not for I/O.
|
||||
|
||||
### Asset loading without threads
|
||||
|
||||
**Saturn**: CD-ROM access is initiated via SBL/CDC routines and completes
|
||||
via interrupt callback. The engine's asset loading loop can poll the
|
||||
callback flag in the main loop rather than blocking a thread. This is
|
||||
interrupt-driven cooperative async, not preemptive.
|
||||
|
||||
**SNES**: There is no loading. Assets live in ROM, mapped directly into the
|
||||
65816 address space. "Loading a texture" means computing a pointer into ROM
|
||||
and copying the tile data to VRAM during V-blank via GPDMA. The asset system
|
||||
on SNES is essentially a VRAM/CGRAM allocator and a DMA scheduler, not a
|
||||
file loader.
|
||||
|
||||
### Asset system changes
|
||||
|
||||
The asset pipeline needs to accommodate three loading models:
|
||||
|
||||
1. **File-based** (Linux, PSP, Vita, Saturn CD): open file, read bytes,
|
||||
close. Can be sync or thread-async.
|
||||
2. **DMA/interrupt** (Saturn CD-ROM, GC DVD): initiate transfer, poll or
|
||||
callback on completion, no thread blocked.
|
||||
3. **ROM-mapped** (SNES): data is already in the address space; "loading"
|
||||
is a VRAM DMA copy scheduled for V-blank, not file I/O.
|
||||
|
||||
The `assetstream_t` abstraction that currently wraps file I/O needs a third
|
||||
backend for ROM-mapped data, and the async path needs to support
|
||||
callback-based completion as an alternative to thread-based blocking.
|
||||
|
||||
---
|
||||
|
||||
## What Needs to Change
|
||||
|
||||
### 1. Render intent API (new, in src/dusk/)
|
||||
|
||||
Replace `mesh_t` / `shader_t` / `meshDraw()` as scene-facing APIs with
|
||||
`renderqueue_t` and intent submission functions. `src/dusk/` defines the
|
||||
intent types and submission API; platforms implement the flush.
|
||||
|
||||
### 2. Platform renderer directories
|
||||
|
||||
Move rendering implementations out of `duskgl/` as a shared layer and
|
||||
into fully self-contained platform directories. `duskgl/` becomes the
|
||||
*modern GL* platform only. Add `duskgllegacy/`, `duskvulkan/` as peers.
|
||||
|
||||
### 3. Asset pipeline: platform-native texture formats
|
||||
|
||||
The offline asset compiler must produce per-platform texture bundles in
|
||||
native formats. The runtime texture loader expects pre-converted data,
|
||||
not RGBA. `textureformat_t` grows to cover all platform formats but each
|
||||
platform only ever sees the formats it natively supports.
|
||||
|
||||
### 4. UI system (first-class, separate from 3D)
|
||||
|
||||
New `src/dusk/ui/` subsystem with `uiBegin` / `uiEnd` and intent types
|
||||
for rects, sprites, and text. Platforms implement the flush independently.
|
||||
The 3D spritebatch is retired or scoped to world-space billboards only.
|
||||
|
||||
### 5. Fixed-point / no-FPU math
|
||||
|
||||
`float_t` needs a fixed-point mode. Proposed: define `fixed_t` as a
|
||||
16.16 signed integer; define `DUSK_MATH_FIXED` for platforms that require
|
||||
it (Saturn, SNES). Engine math utilities (`mathSin`, `mathCos`, etc.)
|
||||
have fixed-point implementations selected by this flag. `float_t` on
|
||||
FPU-less platforms becomes a typedef for `fixed_t`.
|
||||
|
||||
### 6. Background plane abstraction (bgplane_t)
|
||||
|
||||
New concept in `src/dusk/display/bgplane/`. A BG plane has a tile map or
|
||||
bitmap source, scroll offsets, a palette reference, and optional affine
|
||||
parameters (for Mode7-style use). On GL platforms: rendered as a
|
||||
fullscreen textured quad or shader pass. On Saturn: VDP2 config. On SNES:
|
||||
PPU BG layer config.
|
||||
|
||||
### 7. Memory system: static pool mode
|
||||
|
||||
For SNES (and possibly Saturn), the general-purpose allocator may be
|
||||
unviable. A compile-time static pool mode (`DUSK_MEMORY_STATIC`) that uses
|
||||
a fixed-size arena instead of dynamic allocation. All `memoryAllocate`
|
||||
calls hit the pool; `memoryFree` is a no-op or a stack pop.
|
||||
|
||||
### 8. Script runtime: optional
|
||||
|
||||
JerryScript requires too much RAM for SNES and is marginal on Saturn.
|
||||
The scripting system should be compile-time optional (`DUSK_SCRIPTING`),
|
||||
not assumed present. SNES/Saturn scenes would be compiled C.
|
||||
|
||||
---
|
||||
|
||||
## What to Keep
|
||||
|
||||
- Platform macro abstraction pattern (`displayplatform.h`, etc.) -- works,
|
||||
no reason to change.
|
||||
- Directory structure convention for platform directories.
|
||||
- Entity-component system -- platform-agnostic, unaffected.
|
||||
- Asset loading + `.dsk` bundle concept (extended for platform formats).
|
||||
- The broad subsystem layout: asset, input, display, log, network, save,
|
||||
system, time.
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Render intent granularity**: How much does the intent API need to
|
||||
express? A MESH intent works on GL/N64 but degrades poorly on Saturn
|
||||
(must split into quads) and is impossible on SNES. Should MESH be a
|
||||
valid intent with a "best effort" contract, or excluded from the portable
|
||||
API entirely?
|
||||
|
||||
2. **Threading abstraction depth**: Should `DUSK_THREAD_SLAVE_SH2` be a
|
||||
first-class concept in the engine's job system, or a Saturn-internal
|
||||
implementation detail the core never sees? Same question applies to N64's
|
||||
RSP as a compute co-processor.
|
||||
|
||||
3. **Asset loading async contract**: When a platform has no threads, should
|
||||
`assetLoadAsync` be a no-op alias for `assetLoadSync`, or return
|
||||
immediately with a completion flag to poll? The polling model is more
|
||||
honest but requires all call sites to handle it.
|
||||
|
||||
4. **N64 build system**: libdragon uses GNU Make, not CMake. Options are:
|
||||
(a) write a CMake toolchain file that wraps n64.mk, (b) maintain a
|
||||
parallel Makefile just for N64, or (c) wait for upstream CMake support.
|
||||
Which is acceptable long-term?
|
||||
|
||||
5. **N64 RSP microcode**: Standard libdragon microcodes (Fast3D/F3DEX2) or
|
||||
Tiny3D (community microcode with full T&L + skinning)? Writing custom
|
||||
microcode is powerful but limited to ~1000 MIPS SIMD instructions.
|
||||
This decision gates what 3D features the N64 port can support.
|
||||
|
||||
6. **PSPGL fate**: Drop immediately in favor of native GU, or keep as a
|
||||
fallback (`duskgllegacy`) while native GU is built? The two can coexist
|
||||
during transition.
|
||||
|
||||
7. **Vulkan priority**: Design the intent API with Vulkan in mind from the
|
||||
start, or add it later? Vulkan's explicit pipeline state model may
|
||||
conflict with how stateful platforms (Saturn, SNES) expect things to work.
|
||||
|
||||
8. **Background planes on modern platforms**: Does `bgplane_t` degrade to a
|
||||
fullscreen textured quad on GL/Vulkan/N64, or should modern platforms
|
||||
support actual background scene rendering (3D world behind the foreground)?
|
||||
|
||||
9. **PS1 ordering table depth**: The OT is a fixed-size array (e.g. 4096
|
||||
slots). Depth precision = number of slots. How deep should the engine's
|
||||
default OT be, and should this be configurable per-scene?
|
||||
|
||||
10. **Fixed-point strategy**: Does `float_t` transparently become `fixed_t`
|
||||
on FPU-less platforms (Saturn, PS1, SNES), or do we require explicit
|
||||
`fixed_t` in math-heavy paths? Transparent is easiest to port; explicit
|
||||
is faster.
|
||||
|
||||
11. **SNES V-blank budget**: All VRAM writes must finish within ~1.2ms.
|
||||
Does the engine need a V-blank work queue with a budget checker, or is
|
||||
this left to the game to manage manually?
|
||||
|
||||
12. **SNES scripting**: JerryScript is out. Pure compiled C, or a lighter
|
||||
scripting layer (Lua is ~100 KB -- tight but possible)?
|
||||
|
||||
13. **Asset compiler**: New standalone tool, or an extension of the existing
|
||||
asset pipeline? Part of the CMake build or a separate pre-build step?
|
||||
|
||||
---
|
||||
|
||||
## Proposed Sequence (Draft)
|
||||
|
||||
### Phase 1 -- Intent API (no behavior change)
|
||||
1. Design and stabilize `renderqueue_t` and intent types
|
||||
2. Refactor modern GL path to submit through render intents (same output,
|
||||
new plumbing)
|
||||
3. Refactor Dolphin path the same way
|
||||
4. Validate no regressions on Linux + GameCube
|
||||
|
||||
### Phase 2 -- UI system
|
||||
5. Extract UI rendering from the 3D path into `src/dusk/ui/`
|
||||
6. Implement UI flush for GL and Dolphin
|
||||
7. Wire existing UI elements through the new system
|
||||
|
||||
### Phase 3 -- Platform splits
|
||||
8. Split `duskgl/` into `duskgl/` (modern) and `duskgllegacy/` (fixed-func)
|
||||
9. Port PSP to native GU (`duskpsp/display/` rewrite, drop PSPGL dependency)
|
||||
10. Stub `duskvulkan/` structure for future implementation
|
||||
|
||||
### Phase 4 -- Asset pipeline
|
||||
11. Design platform-native texture format system
|
||||
12. Extend asset compiler for per-platform output
|
||||
13. Update texture loader to expect pre-converted data
|
||||
|
||||
### Phase 5 -- Saturn
|
||||
14. CMake toolchain for SH-2 cross-compile (yaul / libyaul toolchain)
|
||||
15. `src/dusksaturn/` -- input (SMPC), asset (CD-ROM), log, system
|
||||
16. VDP1 backend for render queue (quads, polygons, painter's sort)
|
||||
17. VDP2 backend for bgplane_t (tile maps, scroll, palette)
|
||||
18. Fixed-point math mode (`DUSK_MATH_FIXED`)
|
||||
19. UI backend (VDP2 plane(s))
|
||||
|
||||
### Phase 6 -- PlayStation 1
|
||||
20. CMake toolchain wrapping PSn00bSDK (already CMake-native)
|
||||
21. `src/duskps1/` -- input (BIOS pad), asset (CD-ROM libpsxcd), log, system
|
||||
22. GTE integration for fixed-point math (reuse `DUSK_MATH_FIXED` path)
|
||||
23. Ordering table builder for render queue (painter's sort, DMA linked-list)
|
||||
24. GPU packet backend for intents (tris, quads, rects)
|
||||
25. UI backend (separate GPU packet chain after world OT)
|
||||
|
||||
### Phase 7 -- Nintendo 64
|
||||
26. CMake toolchain wrapping libdragon (n64.mk wrapper or toolchain file)
|
||||
27. `src/duskn64/` -- input (N64 controller via PIF), asset (PI DMA /
|
||||
DragonFS), log, system
|
||||
28. RSP display list builder for render queue (Z-buffer path, no sorting)
|
||||
29. TMEM tile management for textures
|
||||
30. RDP rectangle backend for UI
|
||||
31. Decide on RSP microcode (Tiny3D vs standard F3DEX2)
|
||||
|
||||
### Phase 8 -- SNES
|
||||
32. SNES toolchain (cc65 or llvm-mos 65816 target)
|
||||
33. Static memory pool mode (`DUSK_MEMORY_STATIC`)
|
||||
34. PPU tile pipeline + VRAM management
|
||||
35. Mode7 overworld implementation
|
||||
36. OAM sprite system
|
||||
37. BG layer UI
|
||||
38. Scripting-optional build (`DUSK_SCRIPTING` off)
|
||||
Reference in New Issue
Block a user