--- base_model: - katanemo/Arch-Router-1.5B language: - en - id library_name: transformers license: other license_name: katanemo-research license_link: https://huggingface.co/katanemo/Arch-Router-1.5B/blob/main/LICENSE pipeline_tag: text-generation tags: - routing - preference - llm - qwen2.5 - reasoning - Indonesian - SKK - internal-routing - question-complexity paper: https://arxiv.org/abs/2506.16655 --- # vidavox/SKK-Router-1.5B **Version:** v1.0 – SKK Router for internal routing **Base model:** [katanemo/Arch-Router-1.5B](https://huggingface.co/katanemo/Arch-Router-1.5B) (itself built on Qwen2.5-1.5B-Instruct) :contentReference[oaicite:0]{index=0} SKK-Router-1.5B is a domain-specialized router model fine-tuned from Arch-Router-1.5B for **question complexity routing** inside an internal SKK agent system. Instead of routing across many domains and actions, this model focuses on a **single domain** (SKK upstream oil & gas and related KSMI regulations) and chooses between: - a **non-reasoning model** for **basic** questions - a **reasoning model** for **complex** questions The model outputs a minimal JSON object: ```json {"route": "basic"} ``` or ```json {"route": "complex"} ``` It is designed for **internal orchestration**, not for direct end-user text generation. --- ## 1. Intended Use ### Primary use case * **Task:** Route incoming questions to either a *basic* or *complex* LLM path based on question difficulty and reasoning requirements. * **Domain:** SKK internal agent system, with content grounded in **KSMI** and related SKK upstream O&G documents. * **Users:** Internal systems and engineers building the SKK agent stack. Not intended for general public use. ### What the routes mean * **`"basic"` route** * Short, direct, or factoid-style questions. * Queries that can be answered with light or no multi-step reasoning. * Good for low-latency, low-cost non-reasoning models. * **`"complex"` route** * Multi-step reasoning, multi-constraint, or ambiguous questions. * Questions that require combining multiple facts, interpreting regulations, or deeper analysis. * Intended for slower, more capable reasoning models. ### Out of scope * General conversational use outside SKK / KSMI context. * Safety-critical routing (e.g., medical, legal, or financial decisions). * Direct Q&A: this router only **selects** models; it does not itself produce the final answer. --- ## 2. How It Relates to Arch-Router Arch-Router-1.5B is a 1.5B-parameter preference-aligned router that maps queries to user-defined domains and actions for flexible multi-model routing. ([Hugging Face][1]) SKK-Router-1.5B: * keeps the **same routing prompt format** as the original Arch-Router model (including the JSON route output). * narrows the routing space to **question complexity** within the SKK domain. * is trained on a bilingual (Indonesian/English) mix of **synthetic and manually-written Q&A** tailored to SKK’s internal use. If you are already familiar with Arch-Router, you can plug this model in as a **drop-in replacement** for the router, as long as your route configuration reflects the `"basic"` and `"complex"` choices used during fine-tuning. --- ## 3. Model Architecture * **Backbone:** Qwen2.5-1.5B-Instruct via Arch-Router-1.5B ([Hugging Face][2]) * **Parameters:** ≈1.5B (same as base router) ([Hugging Face][1]) * **Tokenizer & chat template:** inherited from Arch-Router-1.5B. * **Fine-tune type:** PEFT/LoRA fine-tune on Arch-Router-1.5B, followed by **merging the adapter into the base weights** to form a standalone checkpoint (`vidavox/SKK-Router-1.5B`). Languages: * **Indonesian** (Bahasa Indonesia) * **English** --- ## 4. Training Data The fine-tune uses a private, domain-specific dataset: ```text DatasetDict({ train: 3096 samples val: 884 samples test: 443 samples }) ``` Each split has the following fields: * `instruction`: the main user question / request. * `input`: optional auxiliary context (may be empty). * `route`: original label in the data pipeline. * `output_route`: JSON string used as the target, e.g. `{"route": "basic"}`. ### Data sources * Synthetic conversations and prompts generated to reflect SKK’s internal workflows. * Manually authored Q&A examples capturing realistic SKK / KSMI questions. * All data is **private** and not released with this model. * Domain focus: questions grounded in **KSMI** and related SKK upstream O&G regulations. ### Label space For this fine-tune, the router is effectively binary: * `basic` – non-reasoning route * `complex` – reasoning route The original Arch-Router `"other"` route is present in the **base model** evaluation but not used as a target in the fine-tuned test set (see evaluation below). --- ## 5. Training Details * **Framework:** [TRL](https://github.com/huggingface/trl) `SFTTrainer` with `SFTConfig` (supervised fine-tuning). * **Adapter:** PEFT / LoRA attached to Arch-Router-1.5B; final model created by merging adapters into base. * **Hardware:** single **NVIDIA GeForce RTX 3090** GPU. Key training configuration (high-level): * `per_device_train_batch_size = 2` * `per_device_eval_batch_size = 4` * `gradient_accumulation_steps = 8` → effective batch size ≈ 16 (sequence-wise) * Early stopping with patience = 1 based on validation loss. * Train/val splits above; `test` used only for the final benchmark. For full configuration details, see the `Router-SFTTrainer.ipynb` notebook in this repository. --- ## 6. Evaluation The model was evaluated on a held-out **test set of 443 samples**, containing only `basic` and `complex` routes as the target labels. ### 6.1 Route distribution Comparison of how often each model predicts each route: | Route | Target test data | Fine-tuned model | Base Arch-Router | | ------- | ---------------- | ---------------- | ---------------- | | Basic | 147 | 160 | 201 | | Complex | 296 | 283 | 156 | | Other | 0 | 0 | 86 | Observations: * The **fine-tuned model** routes almost all queries to `basic` or `complex`, matching the target distribution closely. * The **base Arch-Router** tends to: * over-predict `basic`, and * send many SKK-style queries to the generic `other` route. ### 6.2 Routing accuracy Accuracy is computed as: * prediction is correct if the chosen `"route"` matches the `output_route` label for that sample. | Metric | Fine-tuned model | Base Arch-Router | | ---------------------- | ---------------- | ---------------- | | Basic route accuracy | **91.50%** | 74.83% | | Complex route accuracy | **93.10%** | 45.27% | | **Overall accuracy** | **92.55%** | 55.08% | Improvements (absolute percentage points): * **Basic route:** +16.67 pp * **Complex route:** +47.83 pp * **Overall:** +37.47 pp In practice, this means: * The router is **much more reliable** at distinguishing between simple and complex SKK queries. * Mis-routing complex questions to the non-reasoning path is drastically reduced compared to the base Arch-Router. > Note: These metrics are computed on private, synthetic + manually-authored data tailored to the SKK domain. Performance on other domains may be substantially different. --- ## 7. How to Use > ⚠️ **Important:** This model assumes the same overall routing prompt structure as `katanemo/Arch-Router-1.5B`. For best results, follow the upstream Arch-Router prompt format and simply adapt the `route_config` to your use case. ([Hugging Face][1]) ### 7.1 Minimal example ```python import json from typing import Any, Dict, List from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "vidavox/SKK-Router-1.5B" model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained(model_name) # Please use our provided prompt for best performance TASK_INSTRUCTION = """ You are a helpful assistant designed to find the best suited route. You are provided with route description within XML tags: {routes} {conversation} """ FORMAT_PROMPT = """ Your task is to decide which route is best suit with user intent on the conversation in XML tags. Follow the instruction: 1. If the latest intent from user is irrelevant or user intent is full filled, response with other route {"route": "other"}. 2. You must analyze the route descriptions and find the best match route for user latest intent. 3. You only response the name of the route that best matches the user's request, use the exact name in the . Based on your analysis, provide your response in the following JSON formats if you decide to match any route: {"route": "route_name"} """ # Define route config route_config = [ { "name": "basic", "description": "Answering simple questions that ask for factual information, term meanings, or general knowledge.", }, { "name": "complex", "description": "Handling specific, complex, or multi (more than one task) questions that require multi-step reasoning and interaction with databases to fetch and process data. For example, answering questions that need calculations, data analysis, or synthesis of information from multiple sources.", }, ] # Helper function to create the system prompt for our model def format_prompt( route_config: List[Dict[str, Any]], conversation: List[Dict[str, Any]] ): return ( TASK_INSTRUCTION.format( routes=json.dumps(route_config), conversation=json.dumps(conversation) ) + FORMAT_PROMPT ) # Define conversations conversation = [ { "role": "user", "content": "Apa pengertian dari Cadangan A dan berapa jumlahnya untuk Lapangan X?", } ] route_prompt = format_prompt(route_config, conversation) messages = [ {"role": "user", "content": route_prompt}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) # 2. Generate generated_ids = model.generate( input_ids=input_ids, # or just positional: model.generate(input_ids, …) max_new_tokens=32768, ) # 3. Strip the prompt from each sequence prompt_lengths = input_ids.shape[1] # same length for every row here generated_only = [ output_ids[prompt_lengths:] # slice off the prompt tokens for output_ids in generated_ids ] # 4. Decode if you want text response = tokenizer.batch_decode(generated_only, skip_special_tokens=True)[0] print(response) ``` In the actual SKK agent system, this `"route"` is then used to decide whether to call the **basic** or **reasoning** LLM. --- ## 8. Limitations & Known Failure Modes ### Limitations * **Multi-turn conversations:** The model may be less reliable on very long, multi-turn chats with shifting intent. It was primarily trained on shorter, focused interactions. * **Ambiguous queries:** If the question does not clearly indicate complexity (e.g., vague or underspecified prompts), the router may pick an unintuitive route. * **Out-of-domain content:** Questions unrelated to SKK / KSMI / upstream O&G may be routed unpredictably, since the training data is domain-specific. * **Binary perspective:** The router assumes a simple **basic vs complex** split; if you need multiple levels of reasoning or different tools, you may need to extend the label space and re-train. ### Safety considerations * Not designed for **medical, legal, or financial** decision-making. * Should not be used in settings where an incorrect routing decision can cause **harm or safety-critical failures**. * Outputs are **not** explanations; they are discrete labels used for orchestration. --- ## 9. Bias & Data Caveats * Training data is heavily skewed toward: * SKK upstream petroleum / regulatory topics. * Text derived from or inspired by **KSMI** and related technical documents. * Language mix: * Bilingual Indonesian/English, but primarily focused on expert / technical wording typical for this domain. * As a result: * The model may **over-assume** that questions with regulatory or technical phrasing are “complex”. * It may not behave sensibly on informal, social-media style data or on domains very different from SKK. Because the underlying data is private and internal, users **cannot** independently audit its biases or coverage. Treat this model as **highly specialized** rather than general-purpose. --- ## 10. License & Usage This model is a fine-tuned derivative of **katanemo/Arch-Router-1.5B**, which is distributed under the **Katanemo research license**. ([Hugging Face][1]) * **License on this repo:** `other` – `katanemo-research`. * By using this model, you must comply with: * the original **Katanemo license** for Arch-Router, and * any additional internal policies that apply to SKK data and systems. ### Intended usage policy * **Allowed / intended:** * Research and experimentation on routing for question complexity. * Internal use as part of the **SKK Internal Agent System**. * Exploration of routing strategies in similar regulatory or technical domains, provided you have rights to the underlying data. * **Not recommended / discouraged:** * Exposing this router directly to end users as a chatbot. * Using it as a general-purpose router outside its domain without additional evaluation. * Using the model, or any system built with it, as the sole basis for safety-critical decisions. This description is **not legal advice**. For any production or commercial deployment, please review the **Katanemo research license** and your own organizational policies with qualified counsel. --- ## 11. Citation If you use this model or build upon it in academic or technical work, please consider citing the Arch-Router paper: ```bibtex @article{tran2025archrouter, title = {Arch-Router: Aligning LLM Routing with Human Preferences}, author = {Tran, Co and Paracha, Salman and Hafeez, Adil and Chen, Shuguang}, journal = {arXiv preprint arXiv:2506.16655}, year = {2025} } ``` And you may also reference this checkpoint as: > vidavox/SKK-Router-1.5B (v1.0 – SKK Router for internal routing), fine-tuned from katanemo/Arch-Router-1.5B on SKK-specific synthetic and manually curated routing data for basic vs complex question routing. [1]: https://huggingface.co/katanemo/Arch-Router-1.5B?utm_source=chatgpt.com "katanemo/Arch-Router-1.5B" [2]: https://huggingface.co/katanemo/Arch-Router-1.5B/commit/c3a3b356644a64c519091e56d1a19d013eb5290e?utm_source=chatgpt.com "Upload folder using huggingface_hub · katanemo/Arch- ..."