--- license: apache-2.0 datasets: - Skylion007/openwebtext language: - en pipeline_tag: text-generation tags: - research - convolutional - fft - transformer-alternative - causal-lm --- # GCLM — Global Convolutional Language Model ## Model Summary **GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**. Instead of attention heads, GCLM uses: - **Local depthwise convolutions** for short-range context - **FFT-based global convolutions** for long-range sequence modeling This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling. > GCLM is a transformer alternative — not a transformer replacement. --- ## Architecture Overview - Token + learned positional embeddings - Stacked convolutional blocks: - Local depthwise + pointwise convolution - Optional global FFT convolution every *N* layers - Feedforward MLP - Residual connections + LayerNorm - Causal language modeling head **Key properties:** - No attention mechanism - No KV cache - Linear memory scaling with sequence length - Extremely long-context friendly (tested up to 8k+ tokens) --- ## Training Data The model was trained on: - **Skylion007/openwebtext** This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content. --- ## Intended Use **Primary use cases:** - Research into transformer alternatives - Long-context modeling experiments - Architectural ablation studies - Educational exploration of non-attention sequence models **Not intended for:** - Safety-critical applications - Medical, legal, or financial advice - Deployment as a production chatbot without additional alignment work --- ## Limitations - This model is **research-grade**, not instruction-tuned - Outputs may be: - Incoherent - Factually incorrect - Biased or unsafe - Performance characteristics differ significantly from transformer LMs - No reinforcement learning or alignment tuning applied --- ## Ethical Considerations GCLM was trained on publicly available web data and may reflect societal biases present in that data. Users are responsible for: - Applying appropriate filtering - Avoiding harmful or misleading use cases - Evaluating outputs critically --- ## License This model is released under the **Apache License 2.0**. You are free to: - Use - Modify - Distribute - Use commercially Attribution and license preservation are required. Patent rights are explicitly granted under this license. --- ## Citation If you use GCLM in your research, please cite or reference the project. ## Important The model will not be put in the repo until it has finished training.