GCLM β Global Convolutional Language Model
Model Summary
GCLM (Global Convolutional Language Model) is an experimental causal language model that replaces traditional self-attention with a hybrid local + global convolutional architecture.
Instead of attention heads, GCLM uses:
- Local depthwise convolutions for short-range context
- FFT-based global convolutions for long-range sequence modeling
This design explores whether global receptive fields can be achieved efficiently without quadratic attention, while remaining compatible with standard autoregressive language modeling.
GCLM is a transformer alternative β not a transformer replacement.
Architecture Overview
- Token + learned positional embeddings
- Stacked convolutional blocks:
- Local depthwise + pointwise convolution
- Optional global FFT convolution every N layers
- Feedforward MLP
- Residual connections + LayerNorm
- Causal language modeling head
Key properties:
- No attention mechanism
- No KV cache
- Linear memory scaling with sequence length
- Extremely long-context friendly (tested up to 8k+ tokens)
Training Data
The model was trained on:
- Skylion007/openwebtext
This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.
Intended Use
Primary use cases:
- Research into transformer alternatives
- Long-context modeling experiments
- Architectural ablation studies
- Educational exploration of non-attention sequence models
Not intended for:
- Safety-critical applications
- Medical, legal, or financial advice
- Deployment as a production chatbot without additional alignment work
Limitations
- This model is research-grade, not instruction-tuned
- Outputs may be:
- Incoherent
- Factually incorrect
- Biased or unsafe
- Performance characteristics differ significantly from transformer LMs
- No reinforcement learning or alignment tuning applied
Ethical Considerations
GCLM was trained on publicly available web data and may reflect societal biases present in that data.
Users are responsible for:
- Applying appropriate filtering
- Avoiding harmful or misleading use cases
- Evaluating outputs critically
License
This model is released under the Apache License 2.0.
You are free to:
- Use
- Modify
- Distribute
- Use commercially
Attribution and license preservation are required.
Patent rights are explicitly granted under this license.
Citation
If you use GCLM in your research, please cite or reference the project.
Important
The model will not be put in the repo until it has finished training.