Memory Requirements to run `Qwen/Qwen3.5-397B-A17B`

#20

by alvarobartt - opened about 1 month ago

Hey all,

See below the visual output of hf-mem on the estimated memory required to load Qwen/Qwen3.5-397B-A17B and run the inference, including the KV cache estimation.

uvx hf-mem --model-id Qwen/Qwen3.5-397B-A17B --experimental --kv-cache-dtype fp8

Let me know if that's useful! 🤗

saireddy

27 days ago

@alvarobartt Can you please help us by running the same for FP-8 variant ?

alvarobartt

27 days ago

Hey @saireddy I just did! But note that https://github.com/alvarobartt/hf-mem is open-source so you can run those yourself as e.g. uvx hf-mem --model-id Qwen/Qwen3.5-397B-A17B-FP8 --experimental --kv-cache-dtype fp8, let me know if you have any issue 🤗

https://huggingface.co/Qwen/Qwen3.5-397B-A17B-FP8/discussions/6

YouJiacheng

16 days ago

15 full attention with GQA2-d256 & FP8 KV should use 15kB/token KV cache? your result is 30kB/token.
I checked your code (https://github.com/alvarobartt/hf-mem) and found that it wrongly assumes two things

all hidden layers are assumed to be full attention, while Qwen3.5-397B-A17B actually has 15 full attention layers in total 60 layers. This ×4 the result.
head_dim = hidden_size // num_attention_heads, while Qwen3.5-397B-A17B actually uses head_dim = 256, hidden_size = 4096 and num_attention_heads = 32. This ×0.5 the result.
:D

alvarobartt

16 days ago

Yes @YouJiacheng there are some known issues, thanks for reporting those clearly! Would you mind opening an issue in https://github.com/alvarobartt/hf-mem/issues?

Thanks for taking the time to respond! 🤗

alvarobartt

6 days ago

Hey again @YouJiacheng thanks a lot for the suggestion, I've already fixed it and I've mentioned you in the release notes at https://github.com/alvarobartt/hf-mem/releases/tag/0.5.0, see the fixed output below (still under the --experimental flag) 🤗

uvx hf-mem --model-id Qwen/Qwen3.5-397B-A17B --experimental --kv-cache-dtype fp8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment