Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

.gitignore +1 -5
README.md +41 -348
model.safetensors +2 -2
tokenizer.json +1 -6
tokenizer_config.json +1 -1

.gitignore CHANGED Viewed

@@ -1,5 +1 @@
-checkpoints/*
-test_cot_now.py
-__pycache__/
-evaluate_cot_model.py
-cot_extended_training/*


1	+ inl_llm/*

README.md CHANGED Viewed

@@ -1,376 +1,69 @@
----
-language: en
-license: cc-by-nc-4.0
-tags:
-- text-generation
-- integrator-neuron
-- custom-architecture
-pipeline_tag: text-generation
----
-# INL Architecture - Integrator Neuron Layer
-**Production-ready neural architecture** using **Integrator Neuron dynamics** - replaces traditional FFN layers with iterative dynamics. **Universal architecture** that works for any type of model: LLMs, vision transformers, multimodal, diffusion, RL policies, etc.
-### Architecture Features
-- **Universal** - Build LLMs, vision models, audio, multimodal, diffusion, RL agents with same architecture
-- **HuggingFace ready** - Drop-in replacement for FFN in any transformer
-- **KV caching** - Full support for efficient autoregressive generation
-- **Adaptive compute** - Auto-stops when converged (30-50% faster)
-- **Parameter efficient** - Shared controllers = 96% fewer params than FFN
-- **Bio-inspired** - Based on integrator neurons from neuroscience
-- **Configurable** - Tune iterations, controllers, equilibrium for your task
-### This Checkpoint
-**Example implementation**: 1.1B parameter **language model** with INL architecture.
-- 25 layers × 5 iterations/layer = rich iterative computation
-- But the **architecture scales** from 100M to 100B+ params
-- And works for **any domain** (language, vision, audio, etc.)
-## What is INL?
-**Traditional transformers** use static feedforward layers:
-```python
-x_out = x + FFN(x)  # One-shot computation
-```
-**INL-LLM** uses iterative integrator dynamics to find equilibrium:
-```python
-# Each of the 25 layers performs 5 iterations (configurable)
-# Total: 25 layers × 5 iterations = 125 computation steps
-for iteration in range(num_iterations_per_layer):  # = 5
-    error = x - mu  # Distance from learned equilibrium
-    v_next = alpha * v + (1 - alpha) * v_target - beta * error
-    x_next = x + dt * gate * v_next
-```
-**Result**: The model "thinks" iteratively like biological integrator neurons, achieving better parameter efficiency through shared dynamics and adaptive early stopping.
 ## Model Details
-| Parameter | Value |
-|-----------|-------|
-| Parameters | 1.1B |
-| d_model | 1728 |
-| Layers | 25 |
-| Attention heads | 32 |
-| Iterations/layer | 5 (configurable: more = better quality but slower) |
-| Context length | 2048 |
-| Vocabulary | 50,261 |
-### Key Optimizations
-- **Shared controllers**: One controller shared across all 25 layers (96% fewer parameters)
-- **Low-rank embeddings**: 87% fewer embedding parameters
-- **Adaptive stopping**: Stops when converged (30-50% faster inference)
-- **Sparse excitation**: 90% sparsity for efficiency
-## Usage
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    "Pacific-Prime/pacific-prime",
-    trust_remote_code=True,
-    torch_dtype="bfloat16"
-)
-tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/pacific-prime")
-# Generate with KV caching (default, much faster!)
-prompt = "The future of AI is"
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=100,
-    temperature=0.8,
-    use_cache=True  # Enable KV cache (default)
-)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
-### Chat Format
-```python
-messages = [
-    {"role": "user", "content": "What is machine learning?"}
-]
-chat_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(chat_text, return_tensors="pt")
 outputs = model.generate(**inputs, max_new_tokens=100)
 ```
-Special tokens: `<USER>`, `<ASSISTANT>`, `<SYSTEM>`, `<ERROR>`
-## vLLM Serving
 ```bash
 python -m vllm.entrypoints.openai.api_server \
-    --model Pacific-Prime/pacific-prime \
     --trust-remote-code \
-    --dtype bfloat16
-```
-## Why Integrator Neurons?
-**Main benefit**: Achieve similar quality with fewer parameters through parameter sharing and iterative refinement.
-- **Parameter efficiency**: One shared controller for all 25 layers (instead of 25 separate FFNs)
-- **Adaptive computation**: Stops iterating early when converged (faster inference)
-- **Iterative refinement**: Each layer "thinks" multiple times instead of one-shot computation
-- **Interpretable**: Can visualize how the model converges to solutions
-- **Bio-inspired**: Mimics integrator neurons found in neuroscience
-## Learn More
-For detailed technical documentation about the INL architecture:
-- **GitHub Repository**: [ARKITEKTURE_TRANSFORMER_ADL](https://github.com/pacific-prime777/ARKITEKTURE_TRANSFORMER_ADL)
-- **Architecture Docs**: See the repo for implementation details, training code, and benchmarks
-## Convergence Theorem
-### Mathematical Formulation
-The INL architecture implements a discrete-time dynamical system that converges to a learned equilibrium point. For each layer:
-```python
-error = x - mu                                          # (1)
-v_next = alpha * v + (1 - alpha) * v_target - beta * error  # (2)
-x_next = x + dt * gate * v_next                        # (3)
-```
-**Theorem (Asymptotic Convergence):**
-Given the discrete dynamics above, if the following stability conditions hold:
-1. **Damping condition**: `0 < alpha < 1`
-2. **Restoring force**: `beta > 0`
-3. **Time step bound**: `dt < 2/(beta * sqrt(1 - alpha²))`
-4. **Gating**: `0 ≤ gate ≤ 1`
-Then for any initial state `(x₀, v₀)`, the system converges asymptotically to the equilibrium:
-```
-lim(n→∞) x_n = mu
-lim(n→∞) v_n = v_target
-```
-**Formally**: `∀ε > 0, ∃N ∈ ℕ : ∀n > N ⟹ ||x_n - mu|| < ε`
-### Proof Sketch
-The system behaves as a **damped harmonic oscillator** in the embedding space:
-1. **Energy function**: Define `E(n) = ½||x_n - mu||² + ½||v_n - v_target||²`
-2. **Energy decay**: Under stability conditions, `E(n+1) < E(n)` for all `n`
-3. **Lower bound**: `E(n) ≥ 0` always
-4. **Conclusion**: By monotone convergence theorem, `E(n) → 0`, thus `x_n → mu`
-The proof follows from discrete Lyapunov stability analysis. The parameters `alpha` (damping), `beta` (restoring force), and `dt` (discretization step) control the convergence rate and oscillation behavior.
-### Convergence Modes
-| Regime | Condition | Behavior |
-|--------|-----------|----------|
-| **Underdamped** | `alpha² < 4*beta*dt` | Oscillates then converges |
-| **Critically damped** | `alpha² = 4*beta*dt` | Fastest convergence (no overshoot) |
-| **Overdamped** | `alpha² > 4*beta*dt` | Slow monotonic convergence |
-### Practical Implications
-**Hybrid Discrete-Continuous Approximation:**
-```
-Discrete (finite iterations)  ←→  Continuous (infinite time)
-        ↓                              ↓
-    GPU-friendly                  Theoretical limit
 ```
-- **5 iterations**: Fast, 70-80% convergence quality
-- **10 iterations**: Balanced, 85-95% convergence
-- **50+ iterations**: Near-perfect, 98%+ convergence
-- **∞ iterations**: Theoretical guarantee (impractical)
-**Adaptive Early Stopping:**
-The architecture monitors `||error||` and stops when:
 ```python
-if ||x_n - mu|| < tolerance:  # Converged!
-    break  # Save 30-50% compute
-```
-This makes the system both **theoretically grounded** (convergence guarantee) and **practically efficient** (adaptive compute).
-### Connection to Neural ODEs
-In the continuous limit (`dt → 0`), the dynamics become:
-```
-dx/dt = gate * v
-dv/dt = -(1-alpha)/dt * v + (1-alpha)/dt * v_target - beta * (x - mu)
-```
-This is a **second-order ODE** with learned equilibrium `mu`, combining:
-- **Physics-inspired** dynamics (momentum, damping, restoring force)
-- **Learned** target state (mu, v_target from neural network)
-### Why This Matters
-1. **Theoretical guarantees**: Not just empirical - proven convergence
-2. **Interpretability**: Physics-based dynamics are explainable
-3. **Robustness**: Stable across wide parameter ranges
-4. **Efficiency**: Can trade iterations for quality (5 for speed, 50 for precision)
-5. **Universal**: Same convergence theory applies to all domains (text, vision, audio)
----
-## Empirical Stability Analysis
-### Stability Region Characterization
-We performed extensive empirical analysis to validate the theoretical convergence guarantees and characterize the practical stability region. The analysis explores the parameter space of `alpha` (damping) and `p = dt * g * beta` (effective time step × restoring force).
-**Key Finding**: The system exhibits three distinct behavioral regimes:
-1. **STABLE** (ρ < 1): Green region - guaranteed convergence
-2. **NEAR-BOUNDARY** (ρ ≈ 1): Yellow region - convergence but slower
-3. **UNSTABLE** (ρ > 1): Red region - divergence
-![Stability Contour](stability_contour.png)
-The empirical stability boundary closely matches the theoretical sufficient condition:
-```
-Stable if: 0 ≤ alpha < 1  AND  0 < p < 2(1 + alpha)
-```
-### Eigenvalue Analysis
-The spectral radius (maximum eigenvalue magnitude) determines system stability. For convergence, we need `ρ(J) < 1` where `J` is the Jacobian of the discrete dynamics.
-![Eigenvalue Examples](eigenvalue_examples.png)
-**Representative parameter sets:**
-- **Safe** (α=0.1, p=0.4): ρ ≈ 0.5 - Fast, stable convergence
-- **Near-bound** (α=0.3, p=1.6): ρ ≈ 0.57 - Stable but approaching boundary
-- **Unstable** (α=0.5, p=2.5): ρ ≈ 0.7 - Exceeds stability bound, diverges
-- **Damped** (α=0.7, p=0.2): ρ ≈ 0.83 - High damping, slow convergence
-- **High-alpha** (α=0.9, p=1.0): ρ ≈ 0.95 - Near-critical, very slow
-![Spectral Radius Heatmap](spectral_radius_heatmap.png)
-The heatmap reveals the complete stability landscape in (α, p) space. Dark blue regions (ρ < 0.5) converge rapidly, while yellow/green regions (ρ > 1.0) are unstable.
-### Convergence Dynamics
-Energy trajectories `E(n) = ½||x_n - mu||² + ½||v_n - v_target||²` demonstrate convergence behavior:
-![Energy Trajectories](energy_trajectories.png)
-**Observations:**
-- **Damped** (red, α=0.2): Fastest initial decay, oscillatory but converges
-- **Safe/Near-bound** (blue/orange): Smooth exponential decay to equilibrium
-- **Unstable** (green, α=0.8, p=2.5): Energy fails to decay, remains elevated
-- **High-alpha** (purple, α=0.9): Slowest convergence due to high damping
-### Practical Parameter Selection
-Based on empirical analysis, recommended parameter ranges for INL layers:
-| Use Case | α (damping) | p (dt×g×β) | Behavior | Iterations Needed |
-|----------|-------------|------------|----------|-------------------|
-| **Fast inference** | 0.1 - 0.3 | 0.3 - 1.0 | Quick convergence | 5-10 |
-| **Balanced** | 0.3 - 0.6 | 0.5 - 1.5 | Stable, moderate speed | 10-20 |
-| **High precision** | 0.4 - 0.7 | 0.4 - 1.2 | Slow but accurate | 20-50 |
-| **Avoid** | > 0.8 | > 2.0 | Too slow or unstable | N/A |
-**Safety margin**: Stay well within the theoretical bound `p < 2(1+α)`. Practical recommendation: `p < 1.5(1+α)` for reliable convergence with finite iterations.
-### Connection to Model Architecture
-The **Pacific Prime 1.1B** model uses:
-- `alpha` ≈ 0.4-0.6 (moderate damping)
-- `p` ≈ 0.8-1.2 (safe region)
-- 5 iterations/layer (sufficient for 85-95% convergence)
-These parameters balance:
-- **Convergence quality**: 90%+ of theoretical equilibrium
-- **Inference speed**: ~30-50% faster than full convergence
-- **Stability**: Robust across diverse inputs and training stages
-### Theoretical vs. Empirical
-| Aspect | Theoretical | Empirical |
-|--------|-------------|-----------|
-| **Condition** | `p < 2(1+α)` | `p < 1.8(1+α)` (practical) |
-| **Convergence** | Asymptotic (n→∞) | 85-95% in 5-10 iterations |
-| **Guarantee** | Mathematical proof | Statistical validation |
-| **Application** | Infinite time | Finite GPU budget |
-The empirical analysis validates the theory while providing practical guidance for finite-iteration deployment. The stability region is robust: small parameter perturbations during training don't cause instability.
-### Validation Methodology
-**Data**: Sampled 11 α values × 100 p values (1,100 parameter combinations)
-**Metrics**:
-- Spectral radius computation via eigenvalue analysis
-- Energy trajectory simulation (300 iterations)
-- Convergence rate measurement
-**Tools**: NumPy, SciPy, Matplotlib for numerical analysis
-For full analysis code, see: [stability_analysis.ipynb](link-to-notebook)
-## Optimizations
-### KV Caching
-Full KV caching support for fast autoregressive generation.
-```python
-# Automatic caching with .generate()
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=100,
-    use_cache=True  # Enable KV caching (default)
 )
-# Manual caching for custom generation loops
-past_key_values = None
-for _ in range(max_tokens):
-    outputs = model(input_ids, past_key_values=past_key_values, use_cache=True)
-    past_key_values = outputs.past_key_values
-    # ... get next token ...
 ```
-**Benefits**:
-- **1.1-1.3× faster** generation for long sequences (100+ tokens)
-- Compatible with HuggingFace `.generate()` and vLLM
-- Beam search supported via `_reorder_cache()`
-- Minimal memory overhead (<1%)
-**How it works**: Unlike standard transformers that cache K, V for attention, INL-LLM only needs to cache attention states. Integrator dynamics (x, v) are computed fresh for each token since they operate within each layer, not across tokens.
-**Performance Note**: The speedup is more modest than standard transformers (which get 10-20× gains) because **INL architecture is dominated by integrator iterations, not attention**. Most compute (70-90%) goes to iterative dynamics (3-10 iterations per layer × 12-25 layers), while attention is only ~10-30% of FLOPs. The cache optimizes that 10-30%, giving ~1.1-1.3× overall speedup. This is an architectural tradeoff - you get richer dynamics at the cost of less cache benefit.
-## Technical Requirements
-- Requires `trust_remote_code=True` (custom INL architecture)
-- Python 3.8+, PyTorch 2.0+, transformers 4.35+
-## Citation
-```bibtex
-@misc{inl-llm-2024,
-  author = {Boris Peyriguère},
-  title = {INL-LLM: Integrator Neural Language Model},
-  year = {2024},
-  url = {https://huggingface.co/Pacific-Prime/pacific-prime}
-}
-```
-**License**: CC BY-NC 4.0 (Non-Commercial - Contact author for commercial use)

+# INL-LLM HuggingFace Format
+This is a HuggingFace-compatible version of the INL-LLM model (1.1B parameters).
 ## Model Details
+- **Architecture**: inl-llm
+- **Parameters**: ~1.1B effective parameters
+- **d_model**: 1728
+- **Layers**: 25
+- **Heads**: 32
+- **Vocab Size**: 50261
+## Usage with HuggingFace
 ```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("/home/boris/vAgent/architecture/checkpoints/inl_11b_hf", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("/home/boris/vAgent/architecture/checkpoints/inl_11b_hf")
+# Generate
+inputs = tokenizer("Hello, I am", return_tensors="pt")
 outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0]))
 ```
+## Usage with vLLM
 ```bash
+# Install vLLM
+pip install vllm
+# Serve with vLLM
 python -m vllm.entrypoints.openai.api_server \
+    --model /home/boris/vAgent/architecture/checkpoints/inl_11b_hf \
     --trust-remote-code \
+    --dtype bfloat16 \
+    --max-model-len 2048
 ```
+Then use OpenAI-compatible API:
 ```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
+response = client.chat.completions.create(
+    model="inl_11b_hf",
+    messages=[
+        {"role": "user", "content": "What is machine learning?"}
+    ],
+    temperature=0.8,
+    max_tokens=100
 )
+print(response.choices[0].message.content)
 ```
+## Optimizations Enabled
+- Low-rank embeddings: True
+- Shared controllers: True
+- Hierarchical equilibrium: group_size=64
+- Sparse excitation: 10.0% sparsity
+- Adaptive stopping: True
+Converted from: /home/boris/vAgent/architecture/checkpoints/inl_1b_model

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:12164b4034153ec2f3eb299953d1da3070cc4e4748e8eaad124d82830df2ecb9
-size 4140509476

 version https://git-lfs.github.com/spec/v1
+oid sha256:eef434843bd7d70147372e012ec7f8fc1164dd5517991013780d10aec6dd9aae
+size 4442160156

tokenizer.json CHANGED Viewed

@@ -1,11 +1,6 @@
 {
   "version": "1.0",
-  "truncation": {
-    "direction": "Right",
-    "max_length": 65,
-    "strategy": "LongestFirst",
-    "stride": 0
-  },
   "padding": null,
   "added_tokens": [
     {

 {
   "version": "1.0",
+  "truncation": null,
   "padding": null,
   "added_tokens": [
     {

tokenizer_config.json CHANGED Viewed

@@ -52,7 +52,7 @@
   "clean_up_tokenization_spaces": false,
   "eos_token": "<|endoftext|>",
   "extra_special_tokens": {},
-  "model_max_length": 1024,
   "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"

   "clean_up_tokenization_spaces": false,
   "eos_token": "<|endoftext|>",
   "extra_special_tokens": {},
+  "model_max_length": 2048,
   "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"