|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- Skylion007/openwebtext |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- research |
|
|
- convolutional |
|
|
- fft |
|
|
- transformer-alternative |
|
|
- causal-lm |
|
|
--- |
|
|
|
|
|
# GCLM — Global Convolutional Language Model |
|
|
|
|
|
## Model Summary |
|
|
|
|
|
**GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**. |
|
|
|
|
|
Instead of attention heads, GCLM uses: |
|
|
- **Local depthwise convolutions** for short-range context |
|
|
- **FFT-based global convolutions** for long-range sequence modeling |
|
|
|
|
|
This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling. |
|
|
|
|
|
> GCLM is a transformer alternative — not a transformer replacement. |
|
|
|
|
|
--- |
|
|
|
|
|
## Architecture Overview |
|
|
|
|
|
- Token + learned positional embeddings |
|
|
- Stacked convolutional blocks: |
|
|
- Local depthwise + pointwise convolution |
|
|
- Optional global FFT convolution every *N* layers |
|
|
- Feedforward MLP |
|
|
- Residual connections + LayerNorm |
|
|
- Causal language modeling head |
|
|
|
|
|
**Key properties:** |
|
|
- No attention mechanism |
|
|
- No KV cache |
|
|
- Linear memory scaling with sequence length |
|
|
- Extremely long-context friendly (tested up to 8k+ tokens) |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on: |
|
|
- **Skylion007/openwebtext** |
|
|
|
|
|
This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
**Primary use cases:** |
|
|
- Research into transformer alternatives |
|
|
- Long-context modeling experiments |
|
|
- Architectural ablation studies |
|
|
- Educational exploration of non-attention sequence models |
|
|
|
|
|
**Not intended for:** |
|
|
- Safety-critical applications |
|
|
- Medical, legal, or financial advice |
|
|
- Deployment as a production chatbot without additional alignment work |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- This model is **research-grade**, not instruction-tuned |
|
|
- Outputs may be: |
|
|
- Incoherent |
|
|
- Factually incorrect |
|
|
- Biased or unsafe |
|
|
- Performance characteristics differ significantly from transformer LMs |
|
|
- No reinforcement learning or alignment tuning applied |
|
|
|
|
|
--- |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
GCLM was trained on publicly available web data and may reflect societal biases present in that data. |
|
|
|
|
|
Users are responsible for: |
|
|
- Applying appropriate filtering |
|
|
- Avoiding harmful or misleading use cases |
|
|
- Evaluating outputs critically |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the **Apache License 2.0**. |
|
|
|
|
|
You are free to: |
|
|
- Use |
|
|
- Modify |
|
|
- Distribute |
|
|
- Use commercially |
|
|
|
|
|
Attribution and license preservation are required. |
|
|
Patent rights are explicitly granted under this license. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use GCLM in your research, please cite or reference the project. |
|
|
|
|
|
|
|
|
## Important |
|
|
|
|
|
The model will not be put in the repo until it has finished training. |