TODO:
- [ ] Primitives
    - [ ] 2D Points
    - [x] 2D Lines
    - [ ] Spheres
    - [ ] Cilinders (with/without caps)
    - [-] Meshes
        - [ ] Handle optional params for:
            - normals -> fallback to flat shading with shader derivatives
            - uvs -> fallback to flat color
            - tangents and normal maps (?)
            - joint indices, weights, joints
        - [ ] Fixup when correctly handling materials and
    - [x] 2D Images
    - [x] 3D Images (e.g. textured quads)
- [ ] Renderer architecture / path tracer support
    - [ ] Single object vs batching -> especially relevant for path tracing where you need to batch things in a tlas
        - [ ] Lights
        - [ ] Materials
        - [ ] Instancing
        - [ ] Support both raster and path tracing with accumulation
- [ ] Buffer and image management:
    - [ ] UploadableBuffer and UploadableImage are currently only used
          by renderer for bulk uniforms. I think the main issue with this
          is that it only solves the startup / first time upload issue, which
          can already be solved by .from_data.
          It does handle DEVICE_MAPPED_WITH_FALLBACK but .from_data also does.
          The fact that it keeps the staging buffer around is not that useful because
          we can re-utilize it during rendering only if we alloc 1 per frame.
          Shapes of upload:
          - Sync upload once at creation                               -> .from_data
          - Sync upload once after creation (batched)                  -> bulk_upload
          - Per frame upload with one GPU buffer per sequence frame    -> preupload GpuProperty (lazily allocates staging buffers, shared across frames)
          - Per frame upload with single GPU buffer for whole sequence -> streaming GpuProperty (preallocated staging buffers, shared across frames, also supports prefetching with additional staging and GPU buffers)
          We are technically missing but we might not need it:
          - Per frame upload without a sequence (e.g. bulk uniforms) -> this is in spirit the same as streaming GpuProperty but without relying on it (maybe GpuProperty should rely on this?)
            - For GPU resources (non-mappable buffers or images)
            - For buffers we should check if BAR memory is available and if we want to use and alloc per-frame mappable buffers, otherwise alloc per-frame staging and 1 device resource.
    - [ ] Support buffer suballocation in gpu property for preuplaod and image array to reduce number of resources that we need for a sequence
    - [ ] Implement invalidation of preuploaded properties. If using mapped buffers (CPU or BAR) we can just alloc the remaining frames_in_flight - 1 buffers and switch to ring buffer.
    - [ ] Implement smarter logic to pick which upload mode to use. For smaller properties we want bar if available I guess.
    - [ ] Prefetching is currently assuming linearly increasing frame indices
- [-] Server
    - [ ] Test design and finish implementing
    - [ ] Client python package
    - [ ] C/C++ and rust lib
- [x] Frame helpers (ideally at object level, shared across all primitives)
    - Fixed framerate
    - Variable rate
    - Missing frames / holes in data (different from hold last pose)
    - Repeat vs hold vs disappear at end of sequence
    -> think about how this fits together with streaming/prefetching/animated properties
    -> also think about how to visualize this in a timeline viewer
- [x] Transform properties, kinematic trees and node descriptor for constants
    - [x] Transform 2D
    - [x] Cleanup constants dup
- [x] Uniform pool helper
- [x] Images and upload
- [x] .create currently has to be called manually on primitives
- [x] Streaming properties
    - [x] lines with list instead of np array
    - [x] heterogeneous size with max_size
    - [x] streaming prop
    - [x] trigger all uploads before blocking for rendering -> made properties managed by renderer
    - [x] prefetching
            -> likely make a simple mesh class and recreate sequence
            -> this means also porting some GUI, likey making some helpers, ideally also adding the profiler
    - [x] unify GpuBufferProperty and GpuImageProperty
    - [ ] More viewer playback controls (e.g. timeline view, property inspectors)
        - Look at imguizmo sequencer, but likely re-implement with custom use-case in mind
        - Likely want to do most of the ui work in C++ and just offer callbacks for input like click/over/tooltip etc..
        - Look at rerun timeline for feature ideas
    - [ ] Think about getter/setters to update properties and explicit redraw to update properties
- [x] Camera movement
    - [x] Define interface / config / keybindings
    - [x] Initialize camera controls with up / distance from config
    - [x] Implement getters for front, up, right vectors from camera
    - [x] Drag start / end detection
    - [x] Camera mode dependent rotate/pan
    - [x] Scroll zoom (how to handle moving the target (or reducing the distance)?)
    Extra:
    - [ ] 2D controls -> pan only and change zoom mode to modify ortho size?
- [ ] Viewport:
    - [x] resize
    - [ ] multiple viewports
    - [ ] related to UI if decide to do viewports in ImGui windows, not clear how to do default placement (check docs)
- [ ] UI: -> likely in common ui place that can be customized / modified (helpers for things like default layout as well, likely configurable)
    - [x] Scene tree and property view -> custom widget callback per property?
    - [x] Playback UI
    - [x] Fps display for debug
    - [x] Configurable keybindings
    - [x] NEXT: Expose imgui.get_io for things like want_capture_mouse (or add helper for that)
    - [x] Show memory usage in stats
    - [ ] Better/configurable handling of imgui.ini file
    - [ ] Port built-in profiler
    - [ ] Memory breakdown view -> also plot of usage per-frame to spot mid-frame allocs
- [ ] Shaders
    - [ ] cache
    - [ ] export to spirv during package
    - [ ] hot reloading
- [x] Config:
    - [x] Improve camera initialization (especially ortho for 2D case)
    - [x] vulkan validation options
    - [x] add a way to set viewer position in addition to size? (expose glfw for this)
    - [ ] load from disk / yaml + default locations (e.g. home? cwd?)
    - [ ] Think about clean way to handle handedness / zy up conventions.
        -> wrap things that care in a separate module, make different default symbols base on config ?
        -> maybe math and camera utils should expose both, and viewer use config to pick
- [x] Hook pyxpg logging into python logging
- [ ] Cleanup before release:
    - [ ] Public vs private API (hide / protect internal stuff)
    - [ ] Unit tests on all python versions (likely with lavapipe on CI)
    - [ ] Rename some badly named things:
        - UploadMethod constants
    - [ ] Is StreamingProperty any useful? Should users instead just subclass Property?
    - [ ] Re-organize utils directory, what really is a util anyway?
- [ ] Extra Features (likely at viewer level with pyxpg helpers / wrappers (optional xpg features enabled on python release)):
    - [ ] Meshoptimizer + meshlets
    - [ ] Gaussian splats
    - [ ] Ray marching / octrees
    - [ ] Marching cubes
    - [ ] Pointclouds
    - [ ] Framegraph
- [ ] Why does @cache on shader break refcounting on context, especially for an object that does not have any reference to ctx
      -> investigate further and maybe make minimal repo if it's actually a nanobind issue
- [x] separate concepts of frames in flight and number of swapchain images -> needs xpg changes too


Tools:
- [x] mypy:
    # configured in tool.mypy
    pip install mypy
    mypy ambra
- [x] ruff:
    # configured in tool.ruff.*
    pip install ruff
    # Sort imports
    ruff check --select I --fix
    # Format
    ruff format
    # Lint (not including examples)
    ruff check --exclude examples
    ruff check --exclude examples --fix
- [ ] run on CI once we have automated build / tests


Notes:

Rendering features:
    - Raster (normal viewer):
        - 2D and 3D viewports
        - multi-layer OIT (start from a PoC with depth peeling)
    - Path tracer:
        - Objects that could potentially be path traced:
            - Meshes
            - Other primitives -> should just be meshed or use intersection shaders? depends on complexity of primitive maybe?
            - Need light info too
        - Support for more complex camera models and effects
            - Lens distortion (could be useful even for visualizing different camera models)
            - DoF
            - Vignetting
        - Accumulation settings
        - Light sampling with alias tables
        - Raytracer design:
            - waverfront -> multi-pass, more complex
            - compute -> easiest
            - ray-tracing pipeline ->
        - Ray tracing pipeline with callable shaders
    - Plans:
        - Lights
            - Point, Directional and area first (maybe also leverage slang type, for path tracer)?
            - Environment -> only for path tracer or also do prefiltered IBL?
        - Materials
            - start with common material model for all objects, or maybe just a few materials with dynamic dispatch (leverage slang types)
        - Emissive:
            - Not sure if we need this or if better to just do mesh lights at first, we are not building a general purpose path tracer (or are we?)
        - Volumes (?)
- Ideas:
    - Should rendering code be per object or per-renderer?
        -> Per object has a clear path to user extensions, a bit more unclear for
        -> Per renderer can unlock more "batched" logic
            -> Can we still get the batch logic with a per-renderer approach?
    - Can we batch barriers for GPU uploads?
        - e.g.: instead of copy, barrier, copy, barrier, copy, barrier do copy, copy, copy, (barrier, barrier, barrier)
    - Generic viewer type with just basics + specialized viewer types for specific use cases (careful about composition vs specialization)
        - Scene viewer -> classic scene graph
            - Single frame viewer
            - Sequence viewer

Server:
-> 3 layers
    -> raw message (format, type, length, data)
    -> per format parser
    -> parsed message
    Protocols: tcp, http, websockets
    Formats: binary, json, msgpack, pickle (maybe behind off-by-default flag for better security?)
    Builtin-messages:
    - Frame / playback control
    - camera control
    - create, update, delete objects
-> what about REST? it might be convenient to speak with the viewer direclty in rest, different API?
   wrap this API into a rest API? e.g. JSON for body is same as this, and type encoded in endpoint? seems doable
    -> what we have now will be a TcpServer, can also have an  HttpServer and maybe others too?, basically different ways to produce a RawMessage
-> Handle shutdown of TcpServer
    -> exceptions in parsing raw messages should be handled gracefully and log (wrap async callback and print info)
    -> exceptions in main thread should still have the http server exit -> this does not seem to happen correctly atm (maybe connections are keeping this alive in read_exact?) need to switch to async?
-> try small http server and port of websockets server as PoC

Renderer:
-> think about what is the best way to support different types of rendering, and how to not duplicate a huge amount of code
-> 2D vs 3D, raster vs raytrace vs path trace (e.g. accumulation), quality mode (e.g. depth peeling, MSAA, etc..)
-> how does this play out with implicit prefetching / scene stepping? ideally orthogonal?

Thoughts:
-> Lets have the interface always be CPU objects, the distinction between data and streaming properties makes sense to
   me and allows users to customize how the data is loaded but still giving the easy interface with implicit conversion for arrays
-> Renderables should be able to constru
-> Later we can maybe provide some kind of escape hatch for giving gpu buffers directly for these properties, another option
   would be to have renderables that can
-> Big questions that remain:
    -> can properties be shared across objects? since animation is on the property, i dont see any issue with this
    -> If properties are shared, how to handle their GPU counterparts? Are those owned by objects?
       Are they part of the property itself but optional? How does the user configure prefetching vs preload?

Issues found while implementing skeletal animation:
[ ] if passing [N, 1] instead of [N] we can end up creating LOTS of frames (and therefore LOTS of buffers)
    -> we can likely check if last dimension is 1 in something that usually is not (e.g. indices) and warn / raise
    -> We can anyways try to optimize number of buffers that we actually create buy suballocation
[ ] easy to pass an image as np.float64 if you are doing ops (e.g. dstacking alpha), warn / raise if dtype does not match format

[ ] per frame uploadable property:
    -> per frame bar memory
    -> per frame cpu buffer backed by a single GPU buffer
        -> for some things this is similar to non-preuploaded resources
        -> similarities:
            - we already have CPU buffers or BAR buffers for each frame
        -> differences:
            - no need for async CPU load logic, data is provided every frame (or same as previous frame)
            - can be thought as keyed not by animation frame but by a monotonically increasing global frame counter
            - prefetching does not make sense because we cannot predict future
        Questions:
        -> can we create a sort of LiveProperty that implements this?
        -> or should we instead build the non-preuploaded resources on top of something that supports this?
    -> I think we are in for a big rewrite here:
        -> we don't actually need more than 1 GPU buffer unless prefetching:
            -> BAR + CPU is anyways on CPU buffers that are per-frame buffered (currently BAR is on GPU buffer but maybe should be moved)
            -> GFX + TRANSFER non-prefetch buffers do not need double buffering because we are not overlapping frames like that
            -> we can keep treating prefetch buffers separate how we have been doing
        -> live data is not keyed to a specific frame. If we have invalidation we can have properties with 1 frame (always at frame 0) and update it.
        -> we actually want a way to invalidate CPU and GPU buffers if the data has changed.
            -> this will trigger a new upload using the same mechanisms and will cover the
        -> for non-streaming upload, do we lazily alloc upload cpu buffers, can we still opt in when creating custom properties?
            -> also think about easy way to customize upload settings, e.g. if we don't yet know the wanted dtype shape
            -> somehow related is also that maybe for some properties multiple dtypes are allowed, and sometimes even shapes:
                - textures
                - vertex attribute dtypes
                - vertex attribute number of joint indices / weights
                -> do we have a cleaner way to handle those other than (-1, -1, -1) as we do with images now?

