In router mode, when --sleep-idle-seconds triggers, the child subprocess unloads the model from VRAM but the process remains alive and attached to the GPU, consuming ~600MiB per idle subprocess: # ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results