715 lines
30 KiB
Markdown
715 lines
30 KiB
Markdown
# Display Layer Refactor
|
|
|
|
## Vision
|
|
|
|
The goal is to remove the implicit assumption that all platforms render
|
|
through a GL-like API, and replace it with a system where each platform
|
|
owns its rendering stack completely. The scene describes *what* to draw
|
|
in platform-neutral terms; the platform decides *how* to draw it.
|
|
|
|
This unlocks:
|
|
- Saturn (VDP1/VDP2 command-list, no Z-buffer, affine-only)
|
|
- PlayStation 1 (ordering table, affine textures, GTE fixed-point, CMake SDK)
|
|
- Nintendo 64 (RSP display list, hardware Z-buffer, perspective-correct,
|
|
real FPU -- closer to modern GL than to Saturn)
|
|
- SNES (PPU tile engine, Mode7 for overworld, no real 3D)
|
|
- Vulkan (explicit, modern, no legacy GL baggage)
|
|
- Native PSP GU (drop PSPGL which is just a compatibility shim)
|
|
- Legacy fixed-function GL as its own standalone target
|
|
- A real first-class 2D UI system not bolted onto 3D space
|
|
|
|
---
|
|
|
|
## Why
|
|
|
|
### The current abstraction assumes GPU-style rendering
|
|
|
|
The current display layer was designed around a GL-like mental model:
|
|
vertex buffers, shaders, Z-buffered triangle rasterization, and texture
|
|
objects. `duskgl` implements this with real OpenGL. `duskdolphin` does its
|
|
own GX thing but still matches the same interface (mesh, shader, texture,
|
|
framebuffer). PSP uses PSPGL -- a library that *emulates* GL on top of
|
|
the PSP's native GE/GU hardware, which is entirely different underneath.
|
|
|
|
Problems this creates:
|
|
|
|
**PSPGL is a lie.** The PSP has a native graphics engine (GE/GU) with its
|
|
own command list, its own vertex formats, and its own display list model.
|
|
PSPGL translates GL calls into GU calls, but imperfectly -- and we end up
|
|
paying the abstraction cost without getting GL correctness. Writing directly
|
|
to GU gives better performance, access to native formats, and correct
|
|
behavior on edge cases that PSPGL gets wrong.
|
|
|
|
**Legacy GL should not share code with modern GL.** The fixed-function
|
|
pipeline (no shaders, matrix stacks via glMatrixMode, glTexEnv) is
|
|
meaningfully different from modern GL (VAO/VBO, GLSL, explicit uniform
|
|
locations). Treating them as "the same thing with a flag" creates a tangle
|
|
of `#ifdef DUSK_OPENGL_LEGACY` guards throughout the rendering code.
|
|
They are separate targets and should be separate platform directories.
|
|
|
|
**Saturn cannot fit the model at all.** VDP1 is a command-list processor:
|
|
you write 32-byte command structs (sprites, quads, lines) into VRAM, then
|
|
poke a register to trigger execution. There are no vertex buffers, no
|
|
shaders, no Z-buffer. Depth is pure painter's algorithm -- command order
|
|
IS the depth. VDP2 composites up to 6 background planes at scanline time;
|
|
these are tile maps and rotation parameter tables, not meshes. Nothing
|
|
about the current API maps onto this hardware.
|
|
|
|
**SNES is even further removed.** The PPU renders tiles. VRAM holds 8x8
|
|
or 16x16 pixel tiles and tile maps; the PPU references these during
|
|
scanline rendering. There are no draw calls. Mode7 is an affine transform
|
|
applied to a single background layer (the basis for the overworld map and
|
|
road perspective effects). Sprites are entries in OAM (Object Attribute
|
|
Memory). The 65816 CPU writes to memory-mapped registers and VRAM; the
|
|
PPU does the rest. The concept of "mesh" or "shader" is meaningless here.
|
|
|
|
**Textures loaded as RGBA waste memory and exclude platforms.** Loading
|
|
every texture as 32-bit RGBA and converting at runtime is expensive on
|
|
memory-constrained platforms (Saturn has ~1 MB total RAM; SNES has 64 KB
|
|
VRAM) and simply wrong for platforms that have native formats incompatible
|
|
with RGBA (e.g., PSP's ABGR8888 / BGR5650, Saturn's RGB555 / CI4 / CI8,
|
|
SNES's 2bpp/4bpp/8bpp indexed). The asset pipeline must compile textures
|
|
to platform-native formats at build time.
|
|
|
|
**UI in 3D space is wasteful and limiting.** Currently UI elements are
|
|
rendered as geometry projected into screen space, going through the full
|
|
3D pipeline. On platforms with dedicated 2D hardware (Saturn VDP2,
|
|
SNES BG layers), this is actively wrong -- UI should map to a hardware
|
|
plane, not a 3D draw call. On modern platforms it should be a clean
|
|
screen-space pass that never touches the 3D depth buffer.
|
|
|
|
---
|
|
|
|
## Current Model (Summary)
|
|
|
|
```
|
|
Scene
|
|
-> shaderBind(shader)
|
|
-> textureBind(texture)
|
|
-> meshDraw(mesh) <-- immediate draw call per object
|
|
-> meshDraw(mesh)
|
|
-> ...
|
|
Platform receives each draw call immediately.
|
|
Depth is handled by Z-buffer hardware.
|
|
All textures live in GPU memory as RGBA (or Dolphin's tiled RGBA).
|
|
UI is rendered as 3D geometry with an orthographic projection.
|
|
```
|
|
|
|
Key current concepts:
|
|
- `mesh_t` -- vertex array (triangles/quads), in GPU VBO (GL) or CPU
|
|
memory (Dolphin)
|
|
- `shader_t` -- GLSL program (modern GL), GL fixed-function state
|
|
(legacy GL), or GX matrix + TEV config (Dolphin)
|
|
- `texture_t` -- GPU texture handle (GL) or tiled CPU buffer (Dolphin);
|
|
always RGBA at the engine level
|
|
- `framebuffer_t` -- FBO (GL) or fixed hardware XFB (Dolphin)
|
|
- `spritebatch_t` -- accumulates 2D quads and flushes in batches of 32;
|
|
the only existing deferred-submission system in the engine
|
|
|
|
The spritebatch hints at the right model. Everything needs to work this way.
|
|
|
|
---
|
|
|
|
## The Core Shift: Platform-Native Rendering
|
|
|
|
### Before
|
|
|
|
```
|
|
src/dusk/ Core engine + GL-like rendering API definition
|
|
src/duskgl/ OpenGL implementation
|
|
src/dusksdl2/ SDL2 window/input (shared)
|
|
src/duskpsp/ PSP via PSPGL (shim over GU)
|
|
src/duskvita/ Vita via GL ES (similar path to duskgl)
|
|
src/duskdolphin/ GameCube/Wii via GX (already custom)
|
|
src/dusklinux/ Linux (uses dusksdl2 + duskgl)
|
|
```
|
|
|
|
### After
|
|
|
|
```
|
|
src/dusk/ Core engine logic + render intent API ONLY
|
|
src/dusksdl2/ SDL2 window/input (unchanged)
|
|
src/duskgl/ Modern OpenGL (Linux, Vita modern path)
|
|
src/duskgllegacy/ Fixed-function OpenGL (older hardware, PSP with PSPGL
|
|
as a last resort)
|
|
src/duskvulkan/ Vulkan (Linux modern, future)
|
|
src/duskpsp/ PSP native GU (no PSPGL, direct command lists)
|
|
src/duskvita/ Vita native GXM (TBD)
|
|
src/duskdolphin/ GameCube/Wii GX (already custom, mostly kept)
|
|
src/dusksaturn/ Saturn VDP1/VDP2 (new)
|
|
src/duskps1/ PlayStation 1 ordering table + GTE (new)
|
|
src/duskn64/ Nintendo 64 RSP/RDP display list (new)
|
|
src/dusksnes/ SNES PPU/Mode7 (new, extremely constrained)
|
|
```
|
|
|
|
`src/dusk/` no longer knows about meshes, shaders, or framebuffers.
|
|
It defines the *render intent* system: what the scene wants to draw.
|
|
Each platform directory is entirely self-contained and responsible for
|
|
translating intents to its native API.
|
|
|
|
---
|
|
|
|
## Render Intent System (new)
|
|
|
|
Instead of the scene calling `meshDraw()` or `shaderBind()`, it submits
|
|
render intents into a `renderqueue_t`. An intent describes what should
|
|
appear on screen without prescribing how to draw it.
|
|
|
|
### Primitive intents (3D world)
|
|
|
|
```
|
|
RENDER_INTENT_QUAD -- textured quad, 4 vertices or transform + size
|
|
RENDER_INTENT_POLYGON -- filled polygon (convex, up to N vertices)
|
|
RENDER_INTENT_LINE -- line segment or polyline
|
|
RENDER_INTENT_SPRITE -- 2D billboard (always faces camera)
|
|
RENDER_INTENT_MESH -- arbitrary vertex array (GL/GX only; degraded
|
|
on command-list platforms)
|
|
```
|
|
|
|
Each intent carries: texture reference, color/tint, depth hint (for
|
|
painter's algorithm sorting), blend mode, and cull flags.
|
|
|
|
### Background plane intents (2D layers)
|
|
|
|
```
|
|
RENDER_INTENT_BGPLANE -- configure a background/tilemap layer
|
|
```
|
|
|
|
Carries: layer index, tile map data reference, scroll offset, palette,
|
|
and transform (for Mode7-style affine).
|
|
|
|
### UI intents (screen space)
|
|
|
|
```
|
|
RENDER_INTENT_UI_RECT -- solid colored rectangle
|
|
RENDER_INTENT_UI_SPRITE -- textured rectangle (UI image)
|
|
RENDER_INTENT_UI_TEXT -- text string at screen position
|
|
```
|
|
|
|
UI intents are always screen-space. They are never mixed into the 3D
|
|
world queue. See UI System section below.
|
|
|
|
### Platform translation
|
|
|
|
| Intent | Modern GL | PSP GU | Saturn VDP1 | PS1 OT | N64 RSP | SNES PPU |
|
|
|---|---|---|---|---|---|---|
|
|
| QUAD | VAO + glDraw | GU display list | distorted-sprite cmd | GPU quad packet | RSP display list | OAM + BG tile |
|
|
| POLYGON | VAO + glDraw | GU display list | polygon cmd | GPU poly packet | RSP display list | OAM |
|
|
| BGPLANE | fullscreen quad | fullscreen quad | VDP2 config | fullscreen quad | fullscreen quad | BG layer config |
|
|
| UI_SPRITE | 2D ortho quad | 2D GU quad | VDP2 BG plane | GPU rect packet | RDP rectangle | BG layer tile |
|
|
| MESH | VAO/VBO | GU buffers | (degrade: quads) | (degrade: tris/quads) | RSP display list | (not supported) |
|
|
|
|
Note: N64 supports both triangles and axis-aligned rectangles natively via
|
|
RDP. PS1 supports triangles and quads (4-vertex) natively, so neither needs
|
|
the dead-vertex trick that Saturn requires.
|
|
|
|
---
|
|
|
|
## Asset Pipeline: Platform-Native Formats
|
|
|
|
### The problem
|
|
|
|
All textures currently enter the engine as RGBA and are converted at
|
|
runtime by each platform (Dolphin retiles to 4x4 blocks; GL uploads as-is).
|
|
This wastes memory and CPU time, and is impossible for platforms where RGBA
|
|
is not a valid intermediate format at all.
|
|
|
|
### The solution
|
|
|
|
The asset compiler (offline, run at build time) produces platform-specific
|
|
binary bundles. A texture asset has one source (PNG or similar) but N
|
|
compiled outputs, one per target.
|
|
|
|
### Texture formats by platform
|
|
|
|
| Platform | Native Formats | Notes |
|
|
|---|---|---|
|
|
| Modern GL | RGBA8, RGB8, BC1-BC7 (compressed) | Upload directly, GPU handles |
|
|
| Legacy GL | RGBA8, RGB8, CI8 (palette via extension) | No compressed formats |
|
|
| Vulkan | VkFormat variants (RGBA8, BC, ASTC) | Chosen at compile time |
|
|
| PSP GU | ABGR8888, BGR5650, ABGR1555, ABGR4444, CI4, CI8 | Native swizzled format |
|
|
| Saturn VDP1/VDP2 | RGB555, CI4, CI8 (15-bit palette in CRAM) | Big-endian, packed |
|
|
| PlayStation 1 | RGB555 / CI4 / CI8 (CLUT in VRAM) | Little-endian; VRAM flat; CLUT at coord |
|
|
| Nintendo 64 | RGBA16, RGBA32, IA4-IA16, I4-I8, CI4, CI8 | 4 KB TMEM; tiles must fit in TMEM banks |
|
|
| GameCube/Wii GX | I4, I8, IA4, IA8, RGB565, RGB5A3, RGBA8, CMPR | 4x4 tiled, big-endian |
|
|
| SNES PPU | 2bpp, 4bpp, 8bpp indexed (CGRAM palette) | Tile-packed, no direct access |
|
|
|
|
### Asset bundle structure
|
|
|
|
The `.dsk` bundle gains a platform tag. The loader picks the right section
|
|
at runtime (or the build produces a single-platform bundle for constrained
|
|
targets like SNES/Saturn where there is no spare storage for unused data).
|
|
|
|
---
|
|
|
|
## UI System (first-class)
|
|
|
|
### Current problem
|
|
|
|
UI elements go through the 3D pipeline: they are meshes with an orthographic
|
|
shader, rendered in the same pass as the world. This means:
|
|
- UI competes for Z-buffer depth with world geometry
|
|
- On Saturn/SNES, UI cannot use dedicated hardware planes
|
|
- Text rendering is tied to the sprite batch which is tied to the 3D pass
|
|
- No separation between "draw the world" and "draw the HUD"
|
|
|
|
### New model
|
|
|
|
UI is a completely separate rendering context. The world renders first,
|
|
then the UI renders on top. They share no state.
|
|
|
|
UI coordinates are always in screen space (pixels or a logical resolution
|
|
that the platform scales to its native display size). No camera matrix,
|
|
no projection, no depth buffer involvement.
|
|
|
|
### Platform mapping
|
|
|
|
| Platform | UI implementation |
|
|
|---|---|
|
|
| Modern GL | Separate 2D ortho pass, screen-space quads, no depth test |
|
|
| Legacy GL | Same, using fixed-function |
|
|
| PSP GU | Separate GU display list, 2D mode |
|
|
| Saturn | VDP2 background plane(s) dedicated to UI |
|
|
| PlayStation 1 | Separate GPU packet chain, no Z; ordered after world OT |
|
|
| Nintendo 64 | RDP rectangle commands in a separate display list segment |
|
|
| GameCube/Wii | GX 2D mode or dedicated GX pass |
|
|
| SNES | Dedicated BG layer(s) for HUD tiles |
|
|
|
|
On Saturn, the UI occupying VDP2 planes is a genuine hardware win -- the
|
|
PPU composites it for free at scanline time, costing zero VDP1 commands.
|
|
On SNES, the HUD must live in a BG layer because there is no alternative.
|
|
|
|
### UI API (proposed)
|
|
|
|
```c
|
|
uiBegin();
|
|
uiDrawRect(x, y, w, h, color);
|
|
uiDrawSprite(x, y, w, h, texture, uvMin, uvMax);
|
|
uiDrawText(x, y, font, string);
|
|
uiEnd(); // platform flushes UI to hardware
|
|
```
|
|
|
|
The `uiBegin`/`uiEnd` block collects intents; the platform submits them
|
|
at frame end in whatever way is appropriate.
|
|
|
|
---
|
|
|
|
## SNES / Mode7
|
|
|
|
SNES is the most constrained platform the engine will ever support and
|
|
needs its own section because it breaks assumptions that even Saturn keeps.
|
|
|
|
### Hardware
|
|
|
|
- **CPU**: 65816 @ ~3.58 MHz (16-bit, no FPU, no cache)
|
|
- **PPU**: Tile-based scanline renderer. VRAM holds tile graphics and
|
|
tile maps. BG layers reference tiles by index.
|
|
- **Mode7**: A single BG layer with a 2D affine matrix applied per
|
|
scanline. Used for overworld maps, road perspective (F-Zero), rotation
|
|
effects. The matrix is set via HDMA (scanline DMA) for per-scanline
|
|
variation, enabling horizon-perspective effects.
|
|
- **Sprites/OAM**: Up to 128 sprites (8x8, 16x16, 32x32, 64x64 pixels),
|
|
4bpp indexed, up to 8 per scanline.
|
|
- **Palette**: CGRAM holds 256 entries of 15-bit RGB (512 bytes total).
|
|
BG layers use sub-palettes of 4/16/256 colors depending on bit depth.
|
|
- **VRAM**: 64 KB (tiles + tile maps)
|
|
- **WRAM**: 128 KB work RAM + usually 8 KB SRAM on cart for saves
|
|
- **No frame buffer.** The PPU renders scanlines directly. You cannot
|
|
read back what was drawn.
|
|
- **No general-purpose draw calls.** You configure registers and VRAM
|
|
before the frame and the PPU does the rest.
|
|
|
|
### What "3D" means on SNES
|
|
|
|
True 3D is not possible. What can be approximated:
|
|
- **Overworld map**: Mode7 with a flat texture and HDMA scroll gives a
|
|
top-down perspective with a horizon line (the classic JRPG overworld).
|
|
- **Depth illusion**: Mode7 matrix manipulation can simulate a moving
|
|
camera over flat terrain. Objects are sprites placed at screen positions
|
|
calculated by software perspective projection.
|
|
- **Sprite scaling**: Software-scaled sprites using pre-rendered frames
|
|
or the RSP-style tricks used in Super FX games (Star Fox). Super FX
|
|
is a co-processor on the cartridge -- base SNES cannot do this.
|
|
- **Basic 3D effects**: Some games use HDMA color gradient + Mode7 floor
|
|
with overlaid sprites to create a pseudo-3D look.
|
|
|
|
The engine plan for SNES: Mode7 overworld (confirmed), sprite-based world
|
|
objects, BG layer UI. "Basic 3D effects" (pseudo-perspective with sprites)
|
|
is aspirational -- implementation complexity TBD.
|
|
|
|
### SNES constraints on the engine
|
|
|
|
- **No dynamic allocation.** With 128 KB WRAM, a general-purpose allocator
|
|
is risky. The engine memory system may need a static pool mode for SNES.
|
|
- **No floating point.** `float_t` must resolve to integer or fixed-point.
|
|
- **No scripting (JerryScript).** The JS engine requires far more than
|
|
128 KB RAM. SNES scenes must be compiled C.
|
|
- **Asset data in ROM, not a .dsk bundle.** SNES loads from cartridge ROM
|
|
mapped into the address space. The asset system needs a ROM-mapped loader.
|
|
- **Tile pipeline.** Textures must be pre-converted to SNES tile format
|
|
(2bpp/4bpp/8bpp, 8x8 pixel tiles, CGRAM palette) at build time. This
|
|
is a completely different asset output from every other platform.
|
|
|
|
---
|
|
|
|
## Platform Inventory
|
|
|
|
A summary of what each platform's native rendering looks like after the
|
|
refactor, for reference when designing the intent API.
|
|
|
|
### Modern OpenGL (duskgl)
|
|
|
|
VAO + VBO mesh storage, GLSL shaders, FBO render targets, Z-buffer.
|
|
No fixed-function. Targets: Linux, possibly Vita (GXM is preferred).
|
|
|
|
### Legacy OpenGL (duskgllegacy)
|
|
|
|
Fixed-function pipeline: `glMatrixMode`, `glTexEnv`, client-side vertex
|
|
arrays. No VAO/VBO. Used for: very old desktop hardware, maybe PSP as
|
|
last resort (PSPGL is this). Targets: legacy desktop, embedded Linux.
|
|
|
|
### Vulkan (duskvulkan)
|
|
|
|
Explicit pipeline state objects, render passes, descriptor sets, command
|
|
buffers. Highest ceiling for performance and control. Targets: Linux
|
|
(modern), future platforms. Not immediate priority but the architecture
|
|
should not block it.
|
|
|
|
### PSP native GU (duskpsp)
|
|
|
|
The GE/GU is a display-list GPU. You build a command list in memory and
|
|
the GU DMA engine processes it asynchronously. Native vertex formats are
|
|
PSP-specific (ABGR byte order, swizzled textures for cache efficiency).
|
|
No PSPGL. Targets: PSP hardware and emulators.
|
|
|
|
### Vita (duskvita)
|
|
|
|
GXM is Sony's Vita GPU API -- closer to modern GL than GU, with explicit
|
|
shader binaries (.gxp), ring buffers, and GPU sync primitives.
|
|
|
|
### GameCube/Wii GX (duskdolphin)
|
|
|
|
Already a custom renderer. GX uses immediate-mode vertex submission
|
|
(`GX_Begin` / `GX_Position1x16` loops), TEV for texture compositing, and
|
|
hardware XFB double-buffering. Big-endian. Mostly kept as-is; may benefit
|
|
from being expressed in terms of render intents for consistency.
|
|
|
|
### Saturn VDP1/VDP2 (dusksaturn)
|
|
|
|
VDP1: command-list (32-byte structs), quad-based, affine texture mapping,
|
|
no Z-buffer (painter's algorithm). VDP2: up to 6 background planes
|
|
composited at scanline time. Big-endian dual SH-2, no FPU. Fixed-point
|
|
math required throughout.
|
|
|
|
### PlayStation 1 (duskps1)
|
|
|
|
MIPS R3000A @ 33.87 MHz, little-endian, no FPU. GTE (coprocessor 2)
|
|
handles fixed-point matrix math, perspective divide, and lighting.
|
|
GPU receives packets via DMA linked-list (the Ordering Table). Primitives:
|
|
triangles and quads natively (no dead-vertex needed). Texture mapping:
|
|
affine, same limitation as Saturn. No Z-buffer; depth is OT slot order.
|
|
VRAM is 1 MB flat (frame buffers + textures + CLUTs share it). SDK:
|
|
PSn00bSDK, which is CMake-native -- a direct fit for the dusk build system.
|
|
|
|
### Nintendo 64 (duskn64)
|
|
|
|
VR4300 @ 93.75 MHz, big-endian, real IEEE 754 FPU. Rendering is split
|
|
between the RSP (geometry: programmable MIPS SIMD, runs microcode up to
|
|
~1000 instructions in 4 KB IMEM) and the RDP (rasterization: fixed
|
|
hardware). RSP produces triangle commands from a CPU-built display list
|
|
in RDRAM. RDP features: perspective-correct texture mapping, bilinear
|
|
filtering, hardware Z-buffer. Primitives: triangles and axis-aligned rects.
|
|
TMEM is 4 KB on-chip texture cache; textures must be loaded into tiles
|
|
before drawing -- a significant memory management constraint.
|
|
SDK: libdragon (Unlicense, GCC 14, Makefile-based -- not CMake; this
|
|
requires a wrapper toolchain file for dusk's build system).
|
|
|
|
### SNES PPU/Mode7 (dusksnes)
|
|
|
|
Tile-based. VRAM holds tiles and tile maps. Mode7 provides affine transform
|
|
for one BG layer. Sprites via OAM. No frame buffer. All configuration is
|
|
memory-mapped registers. 65816 CPU, no FPU, extremely limited RAM.
|
|
|
|
---
|
|
|
|
## Threading Model
|
|
|
|
### Current model
|
|
|
|
The engine uses OS threads for async asset loading (`assetXxxLoaderAsync`).
|
|
Platforms that have pthreads or an equivalent RTOS (Linux, PSP, Vita) run
|
|
worker threads that load data in the background while the game loop runs.
|
|
The main thread polls or blocks on completion.
|
|
|
|
### The problem
|
|
|
|
Several target platforms have no OS threading whatsoever, and others have
|
|
hardware-specific async mechanisms that are nothing like pthreads.
|
|
|
|
### Per-platform reality
|
|
|
|
| Platform | Threading | Async mechanism |
|
|
|---|---|---|
|
|
| Linux | pthreads | Worker threads (current) |
|
|
| Vita | SceKernelThread | Per-SDK threads |
|
|
| PSP | SceKernelThread | Per-SDK threads |
|
|
| GameCube/Wii | libogc LWP | Lightweight processes |
|
|
| Saturn | None (OS) | Slave SH-2 for fixed jobs; CD-ROM via interrupt/callback |
|
|
| PlayStation 1 | None (OS) | V-blank ISR, 7 DMA channels, CD-ROM callbacks |
|
|
| Nintendo 64 | libdragon preview only | PI DMA for cartridge; RSP for parallel compute |
|
|
| SNES | None | DMA (GPDMA/HDMA); NMI V-blank; SPC700 audio is a separate CPU |
|
|
|
|
**Saturn slave SH-2**: The second SH-2 is not a general-purpose thread.
|
|
It runs a fixed subroutine you hand-load. The typical use is offloading
|
|
heavy per-frame computation (geometry transforms, depth sort) while the
|
|
master SH-2 handles game logic. Communication is via shared WRAM with
|
|
cache-through addresses to avoid coherency bugs. There is no scheduler
|
|
and no yield -- it runs to completion.
|
|
|
|
**SNES DMA**: GPDMA copies blocks of data (ROM to WRAM, WRAM to VRAM)
|
|
and halts the CPU for the duration -- it is synchronous from the game's
|
|
perspective. HDMA runs per-scanline during H-blank, writing to PPU
|
|
registers without CPU involvement; this is how Mode7 perspective is
|
|
achieved. Neither is "async" in the programming sense.
|
|
|
|
**SNES NMI**: The V-blank NMI fires at the start of every V-blank period.
|
|
This is the only safe window to write to VRAM and PPU registers. All
|
|
critical PPU updates must complete within ~1.2ms (the V-blank window).
|
|
|
|
### Proposed model
|
|
|
|
Introduce a compile-time threading capability flag:
|
|
|
|
```
|
|
DUSK_THREAD_PTHREAD -- Linux, maybe Vita
|
|
DUSK_THREAD_SCEKERNEL -- PSP, Vita SDK
|
|
DUSK_THREAD_LWP -- GameCube/Wii libogc
|
|
DUSK_THREAD_SLAVE_SH2 -- Saturn slave CPU (job dispatch only)
|
|
DUSK_THREAD_NONE -- SNES (and Saturn master thread view)
|
|
```
|
|
|
|
The asset loader's async path is gated on having a threading capability.
|
|
When `DUSK_THREAD_NONE` is defined, `assetXxxLoaderAsync` either does not
|
|
exist or is an alias for the synchronous version. On Saturn, the slave SH-2
|
|
is exposed as a distinct API (`sh2JobDispatch`, `sh2JobWait`) used only for
|
|
compute-heavy work, not for I/O.
|
|
|
|
### Asset loading without threads
|
|
|
|
**Saturn**: CD-ROM access is initiated via SBL/CDC routines and completes
|
|
via interrupt callback. The engine's asset loading loop can poll the
|
|
callback flag in the main loop rather than blocking a thread. This is
|
|
interrupt-driven cooperative async, not preemptive.
|
|
|
|
**SNES**: There is no loading. Assets live in ROM, mapped directly into the
|
|
65816 address space. "Loading a texture" means computing a pointer into ROM
|
|
and copying the tile data to VRAM during V-blank via GPDMA. The asset system
|
|
on SNES is essentially a VRAM/CGRAM allocator and a DMA scheduler, not a
|
|
file loader.
|
|
|
|
### Asset system changes
|
|
|
|
The asset pipeline needs to accommodate three loading models:
|
|
|
|
1. **File-based** (Linux, PSP, Vita, Saturn CD): open file, read bytes,
|
|
close. Can be sync or thread-async.
|
|
2. **DMA/interrupt** (Saturn CD-ROM, GC DVD): initiate transfer, poll or
|
|
callback on completion, no thread blocked.
|
|
3. **ROM-mapped** (SNES): data is already in the address space; "loading"
|
|
is a VRAM DMA copy scheduled for V-blank, not file I/O.
|
|
|
|
The `assetstream_t` abstraction that currently wraps file I/O needs a third
|
|
backend for ROM-mapped data, and the async path needs to support
|
|
callback-based completion as an alternative to thread-based blocking.
|
|
|
|
---
|
|
|
|
## What Needs to Change
|
|
|
|
### 1. Render intent API (new, in src/dusk/)
|
|
|
|
Replace `mesh_t` / `shader_t` / `meshDraw()` as scene-facing APIs with
|
|
`renderqueue_t` and intent submission functions. `src/dusk/` defines the
|
|
intent types and submission API; platforms implement the flush.
|
|
|
|
### 2. Platform renderer directories
|
|
|
|
Move rendering implementations out of `duskgl/` as a shared layer and
|
|
into fully self-contained platform directories. `duskgl/` becomes the
|
|
*modern GL* platform only. Add `duskgllegacy/`, `duskvulkan/` as peers.
|
|
|
|
### 3. Asset pipeline: platform-native texture formats
|
|
|
|
The offline asset compiler must produce per-platform texture bundles in
|
|
native formats. The runtime texture loader expects pre-converted data,
|
|
not RGBA. `textureformat_t` grows to cover all platform formats but each
|
|
platform only ever sees the formats it natively supports.
|
|
|
|
### 4. UI system (first-class, separate from 3D)
|
|
|
|
New `src/dusk/ui/` subsystem with `uiBegin` / `uiEnd` and intent types
|
|
for rects, sprites, and text. Platforms implement the flush independently.
|
|
The 3D spritebatch is retired or scoped to world-space billboards only.
|
|
|
|
### 5. Fixed-point / no-FPU math
|
|
|
|
`float_t` needs a fixed-point mode. Proposed: define `fixed_t` as a
|
|
16.16 signed integer; define `DUSK_MATH_FIXED` for platforms that require
|
|
it (Saturn, SNES). Engine math utilities (`mathSin`, `mathCos`, etc.)
|
|
have fixed-point implementations selected by this flag. `float_t` on
|
|
FPU-less platforms becomes a typedef for `fixed_t`.
|
|
|
|
### 6. Background plane abstraction (bgplane_t)
|
|
|
|
New concept in `src/dusk/display/bgplane/`. A BG plane has a tile map or
|
|
bitmap source, scroll offsets, a palette reference, and optional affine
|
|
parameters (for Mode7-style use). On GL platforms: rendered as a
|
|
fullscreen textured quad or shader pass. On Saturn: VDP2 config. On SNES:
|
|
PPU BG layer config.
|
|
|
|
### 7. Memory system: static pool mode
|
|
|
|
For SNES (and possibly Saturn), the general-purpose allocator may be
|
|
unviable. A compile-time static pool mode (`DUSK_MEMORY_STATIC`) that uses
|
|
a fixed-size arena instead of dynamic allocation. All `memoryAllocate`
|
|
calls hit the pool; `memoryFree` is a no-op or a stack pop.
|
|
|
|
### 8. Script runtime: optional
|
|
|
|
JerryScript requires too much RAM for SNES and is marginal on Saturn.
|
|
The scripting system should be compile-time optional (`DUSK_SCRIPTING`),
|
|
not assumed present. SNES/Saturn scenes would be compiled C.
|
|
|
|
---
|
|
|
|
## What to Keep
|
|
|
|
- Platform macro abstraction pattern (`displayplatform.h`, etc.) -- works,
|
|
no reason to change.
|
|
- Directory structure convention for platform directories.
|
|
- Entity-component system -- platform-agnostic, unaffected.
|
|
- Asset loading + `.dsk` bundle concept (extended for platform formats).
|
|
- The broad subsystem layout: asset, input, display, log, network, save,
|
|
system, time.
|
|
|
|
---
|
|
|
|
## Open Questions
|
|
|
|
1. **Render intent granularity**: How much does the intent API need to
|
|
express? A MESH intent works on GL/N64 but degrades poorly on Saturn
|
|
(must split into quads) and is impossible on SNES. Should MESH be a
|
|
valid intent with a "best effort" contract, or excluded from the portable
|
|
API entirely?
|
|
|
|
2. **Threading abstraction depth**: Should `DUSK_THREAD_SLAVE_SH2` be a
|
|
first-class concept in the engine's job system, or a Saturn-internal
|
|
implementation detail the core never sees? Same question applies to N64's
|
|
RSP as a compute co-processor.
|
|
|
|
3. **Asset loading async contract**: When a platform has no threads, should
|
|
`assetLoadAsync` be a no-op alias for `assetLoadSync`, or return
|
|
immediately with a completion flag to poll? The polling model is more
|
|
honest but requires all call sites to handle it.
|
|
|
|
4. **N64 build system**: libdragon uses GNU Make, not CMake. Options are:
|
|
(a) write a CMake toolchain file that wraps n64.mk, (b) maintain a
|
|
parallel Makefile just for N64, or (c) wait for upstream CMake support.
|
|
Which is acceptable long-term?
|
|
|
|
5. **N64 RSP microcode**: Standard libdragon microcodes (Fast3D/F3DEX2) or
|
|
Tiny3D (community microcode with full T&L + skinning)? Writing custom
|
|
microcode is powerful but limited to ~1000 MIPS SIMD instructions.
|
|
This decision gates what 3D features the N64 port can support.
|
|
|
|
6. **PSPGL fate**: Drop immediately in favor of native GU, or keep as a
|
|
fallback (`duskgllegacy`) while native GU is built? The two can coexist
|
|
during transition.
|
|
|
|
7. **Vulkan priority**: Design the intent API with Vulkan in mind from the
|
|
start, or add it later? Vulkan's explicit pipeline state model may
|
|
conflict with how stateful platforms (Saturn, SNES) expect things to work.
|
|
|
|
8. **Background planes on modern platforms**: Does `bgplane_t` degrade to a
|
|
fullscreen textured quad on GL/Vulkan/N64, or should modern platforms
|
|
support actual background scene rendering (3D world behind the foreground)?
|
|
|
|
9. **PS1 ordering table depth**: The OT is a fixed-size array (e.g. 4096
|
|
slots). Depth precision = number of slots. How deep should the engine's
|
|
default OT be, and should this be configurable per-scene?
|
|
|
|
10. **Fixed-point strategy**: Does `float_t` transparently become `fixed_t`
|
|
on FPU-less platforms (Saturn, PS1, SNES), or do we require explicit
|
|
`fixed_t` in math-heavy paths? Transparent is easiest to port; explicit
|
|
is faster.
|
|
|
|
11. **SNES V-blank budget**: All VRAM writes must finish within ~1.2ms.
|
|
Does the engine need a V-blank work queue with a budget checker, or is
|
|
this left to the game to manage manually?
|
|
|
|
12. **SNES scripting**: JerryScript is out. Pure compiled C, or a lighter
|
|
scripting layer (Lua is ~100 KB -- tight but possible)?
|
|
|
|
13. **Asset compiler**: New standalone tool, or an extension of the existing
|
|
asset pipeline? Part of the CMake build or a separate pre-build step?
|
|
|
|
---
|
|
|
|
## Proposed Sequence (Draft)
|
|
|
|
### Phase 1 -- Intent API (no behavior change)
|
|
1. Design and stabilize `renderqueue_t` and intent types
|
|
2. Refactor modern GL path to submit through render intents (same output,
|
|
new plumbing)
|
|
3. Refactor Dolphin path the same way
|
|
4. Validate no regressions on Linux + GameCube
|
|
|
|
### Phase 2 -- UI system
|
|
5. Extract UI rendering from the 3D path into `src/dusk/ui/`
|
|
6. Implement UI flush for GL and Dolphin
|
|
7. Wire existing UI elements through the new system
|
|
|
|
### Phase 3 -- Platform splits
|
|
8. Split `duskgl/` into `duskgl/` (modern) and `duskgllegacy/` (fixed-func)
|
|
9. Port PSP to native GU (`duskpsp/display/` rewrite, drop PSPGL dependency)
|
|
10. Stub `duskvulkan/` structure for future implementation
|
|
|
|
### Phase 4 -- Asset pipeline
|
|
11. Design platform-native texture format system
|
|
12. Extend asset compiler for per-platform output
|
|
13. Update texture loader to expect pre-converted data
|
|
|
|
### Phase 5 -- Saturn
|
|
14. CMake toolchain for SH-2 cross-compile (yaul / libyaul toolchain)
|
|
15. `src/dusksaturn/` -- input (SMPC), asset (CD-ROM), log, system
|
|
16. VDP1 backend for render queue (quads, polygons, painter's sort)
|
|
17. VDP2 backend for bgplane_t (tile maps, scroll, palette)
|
|
18. Fixed-point math mode (`DUSK_MATH_FIXED`)
|
|
19. UI backend (VDP2 plane(s))
|
|
|
|
### Phase 6 -- PlayStation 1
|
|
20. CMake toolchain wrapping PSn00bSDK (already CMake-native)
|
|
21. `src/duskps1/` -- input (BIOS pad), asset (CD-ROM libpsxcd), log, system
|
|
22. GTE integration for fixed-point math (reuse `DUSK_MATH_FIXED` path)
|
|
23. Ordering table builder for render queue (painter's sort, DMA linked-list)
|
|
24. GPU packet backend for intents (tris, quads, rects)
|
|
25. UI backend (separate GPU packet chain after world OT)
|
|
|
|
### Phase 7 -- Nintendo 64
|
|
26. CMake toolchain wrapping libdragon (n64.mk wrapper or toolchain file)
|
|
27. `src/duskn64/` -- input (N64 controller via PIF), asset (PI DMA /
|
|
DragonFS), log, system
|
|
28. RSP display list builder for render queue (Z-buffer path, no sorting)
|
|
29. TMEM tile management for textures
|
|
30. RDP rectangle backend for UI
|
|
31. Decide on RSP microcode (Tiny3D vs standard F3DEX2)
|
|
|
|
### Phase 8 -- SNES
|
|
32. SNES toolchain (cc65 or llvm-mos 65816 target)
|
|
33. Static memory pool mode (`DUSK_MEMORY_STATIC`)
|
|
34. PPU tile pipeline + VRAM management
|
|
35. Mode7 overworld implementation
|
|
36. OAM sprite system
|
|
37. BG layer UI
|
|
38. Scripting-optional build (`DUSK_SCRIPTING` off)
|