Spaces:
Running
Dynamic ZeroGPU Duration
Hi everyone, I want to share my code to request dynamic GPU duration on ZeroGPU.
I am happy to contribute this code to the spaces package, but I can't find the repo for it. (The link on PyPI is mislinked to the huggingface_hub repo, and I can't find the relevant code in that repo.) Does Hugging Face want to open source the repo for spaces?
from typing import Callable
from functools import partial
import gradio as gr
import spaces
import spaces.config
from spaces.zero.decorator import P, R
def _dynGPU(
fn: Callable[P, R] | None, duration: Callable[P, int], min=30, max=300, step=10
) -> Callable[P, R]:
if not spaces.config.Config.zero_gpu:
return fn
funcs = [
(t, spaces.GPU(duration=t)(lambda *args, **kwargs: fn(*args, **kwargs)))
for t in range(min, max + 1, step)
]
def wrapper(*args, **kwargs):
requirement = duration(*args, **kwargs)
# find the function that satisfies the duration requirement
for t, func in funcs:
if t >= requirement:
gr.Info(f"Acquiring ZeroGPU for {t} seconds")
return func(*args, **kwargs)
# if no function is found, return the last one
gr.Info(f"Acquiring ZeroGPU for {funcs[-1][0]} seconds")
return funcs[-1][1](*args, **kwargs)
return wrapper
def dynGPU(
fn: Callable[P, R] | None = None,
duration: Callable[P, int] = lambda: 60,
min=30,
max=300,
step=10,
) -> Callable[P, R]:
if fn is None:
return partial(_dynGPU, duration=duration, min=min, max=max, step=step)
return _dynGPU(fn, duration, min, max, step)
It's very similar to the @spaces.GPU decorator but accepts duration as a function that shares the same parameters as the decorated one and returns the desired GPU time in seconds.
I have tested it in my space: https://huggingface.co/spaces/JacobLinCool/vocal-separation
The usage in my space requests GPU time based on the audio length:
def measure_duration(audio: str, model: str) -> int:
y, sr = librosa.load(audio, sr=44100)
return int(librosa.get_duration(y=y, sr=sr) / 3.0)
@dynGPU(duration=measure_duration)
def separate(audio: str, model: str) -> Tuple[str, str]:
separator = separators[model]
outs = separator.separate(audio)
outs = [os.path.join(tempfile.gettempdir(), out) for out in outs]
# roformers
if len(outs) == 2:
return outs[1], outs[0]
# demucs
if len(outs) == 4:
bgm = merge(outs[:3])
return outs[3], bgm
raise gr.Error("Unknown output format")
Which works well for me, and I think others may be interested in this.
This looks cool!
How you measure for text-generation let said using llama-cpp-python is basically by the weight of the file?
so curious ...
Thank you for sharing
I didn't try it on text-generation tasks yet. But I think that experiments are needed, and this largely depends on prior experiences.
The estimation will be on two aspects: model size and user input (e.g. duration for audio and prompt length for text generation).
Theoretically, you can calculate the FLOPs required by the model during computation, but I think the performance of hardware varies.
Very interesting this would improve how consume GPU giving a better exp for users using ZeroGPU
You can download the source distribution https://pypi.org/project/spaces/#files from the Download Files section of PyPI
Hi @JacobLinCool , thanks for your contribution!
spaces package (and more specifically spaces.zero sub-package) is not (yet ?) open-sourced but I'm happy to integrate "dynamic duration" in @spaces.GPU
Technically speaking, I think that we should be able to do it without needing to wrap one function per duration (thus we'll benefit from idle-reuse whatever the duration)
(if interested you can take a look at spaces.zero.client to see how duration ends up being used)
API-wise, I was thinking of something like:
def get_duration(prompt, steps):
return steps // 7
@spaces.GPU(duration=get_duration)
def generate(prompt, steps):
return pipe(prompt, num_inference_steps=steps)
Rule would be pretty simple:
If duration kwarg is callable, then it will be called with the same *args and **kwargs than the current @spaces.GPU decorated function call (just like in your dynGPU version) and it should return a duration
I agree that creating a function for each duration is monkey patching-like.
After digging into the code inside the spaces package, just like you said, one approach may be to calculate timedelta with the user function before calling client.schedule in generator_function_wrapper.
Looking forward to it being integrated!
Dynamic duration is now available. Feel free to test it out!
(it's a power user feature for now but it will be in the README at some point)