Spaces:

PromiseGameFi
/

Qwency

Build error

App Files Files Community

Promise Emmanuel Oluwadare commited on Mar 3

Commit

1efb585

1 Parent(s): 266954d

Add Qwen3.5-0.8B OpenAI-compatible server

Browse files

Files changed (5) hide show

.dockerignore +11 -0
Dockerfile +32 -0
README.md +58 -4
requirements.txt +2 -0
start_server.py +61 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,11 @@

+.git
+.gitignore
+__pycache__
+*.pyc
+*.pyo
+*.pyd
+.Python
+venv
+.venv
+dist
+build

Dockerfile ADDED Viewed

	@@ -0,0 +1,32 @@

+FROM python:3.11-slim
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS"
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    cmake \
+    pkg-config \
+    libopenblas-dev \
+  && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+COPY requirements.txt /app/requirements.txt
+RUN pip install --upgrade pip && pip install -r /app/requirements.txt
+COPY start_server.py /app/start_server.py
+ENV PORT=7860 \
+    MODEL_REPO=unsloth/Qwen3.5-0.8B-GGUF \
+    MODEL_FILE=Qwen3.5-0.8B-Q4_K_M.gguf \
+    MODEL_DIR=/tmp/models \
+    N_CTX=4096 \
+    N_THREADS=4 \
+    CHAT_FORMAT=chatml
+EXPOSE 7860
+CMD ["python", "/app/start_server.py"]

README.md CHANGED Viewed

@@ -1,10 +1,64 @@
 ---
-title: Qwency
-emoji: 🚀
-colorFrom: gray
 colorTo: gray
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Qwen3.5 0.8B OpenAI API
+emoji: "🧠"
+colorFrom: amber
 colorTo: gray
 sdk: docker
+app_port: 7860
 pinned: false
 ---
+# Hugging Face Space Template (OpenAI-Compatible Qwen 0.8B)
+This folder is ready to be used as a Docker Space that serves `Qwen3.5-0.8B` behind OpenAI-style endpoints:
+- `GET /v1/models`
+- `POST /v1/chat/completions` (with streaming)
+## 1) Create the Space
+1. Go to Hugging Face -> **New Space**.
+2. Select **Docker** SDK.
+3. Choose hardware:
+   - For free testing: **CPU Basic**.
+4. Create the Space.
+## 2) Upload these files
+Upload all files from this folder to the root of that Space repository:
+- `Dockerfile`
+- `requirements.txt`
+- `start_server.py`
+- `.dockerignore`
+- `README.md` (this file)
+## 3) Set Space Variables (Settings -> Variables and secrets)
+Recommended defaults:
+- `MODEL_REPO=unsloth/Qwen3.5-0.8B-GGUF`
+- `MODEL_FILE=Qwen3.5-0.8B-Q4_K_M.gguf`
+- `N_CTX=4096`
+- `N_THREADS=4`
+- `CHAT_FORMAT=chatml`
+Optional:
+- `API_KEY=<your-secret>` to require bearer auth.
+- `HF_TOKEN=<token>` if your model repo is private.
+## 4) Connect frontend
+In this app's Settings:
+- Preset: `Hugging Face Space`
+- Base URL: `https://<your-space-name>.hf.space/v1`
+- Model Name: `Qwen3.5-0.8B-Q4_K_M.gguf`
+- API Key: only if you set `API_KEY` in the Space
+## Notes
+- Free CPU Spaces can sleep when idle and cold-start slowly.
+- First boot includes model download, so startup may take a few minutes.
+- If you hit memory pressure, use a smaller GGUF quantization file.

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ llama-cpp-python[server]>=0.2.90
2	+ huggingface_hub>=0.25.0

start_server.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import os
+import subprocess
+import sys
+from pathlib import Path
+from huggingface_hub import hf_hub_download
+def read_env(name: str, default: str) -> str:
+    value = os.getenv(name, default).strip()
+    return value or default
+def main() -> None:
+    repo_id = read_env("MODEL_REPO", "unsloth/Qwen3.5-0.8B-GGUF")
+    filename = read_env("MODEL_FILE", "Qwen3.5-0.8B-Q4_K_M.gguf")
+    model_dir = Path(read_env("MODEL_DIR", "/tmp/models"))
+    port = read_env("PORT", "7860")
+    n_ctx = read_env("N_CTX", "4096")
+    n_threads = read_env("N_THREADS", "4")
+    chat_format = read_env("CHAT_FORMAT", "chatml")
+    api_key = os.getenv("API_KEY", "").strip()
+    model_dir.mkdir(parents=True, exist_ok=True)
+    token = os.getenv("HF_TOKEN", "").strip() or os.getenv("HUGGING_FACE_HUB_TOKEN", "").strip() or None
+    model_path = hf_hub_download(
+        repo_id=repo_id,
+        filename=filename,
+        token=token,
+        local_dir=str(model_dir),
+    )
+    command = [
+        sys.executable,
+        "-m",
+        "llama_cpp.server",
+        "--model",
+        model_path,
+        "--host",
+        "0.0.0.0",
+        "--port",
+        port,
+        "--n_ctx",
+        n_ctx,
+        "--n_threads",
+        n_threads,
+        "--chat_format",
+        chat_format,
+    ]
+    if api_key:
+        command.extend(["--api_key", api_key])
+    print("Starting OpenAI-compatible model server:")
+    print(" ".join(command))
+    subprocess.run(command, check=True)
+if __name__ == "__main__":
+    main()