umm-dev's picture
Update README.md
c517fa7 verified
---
license: apache-2.0
datasets:
- Skylion007/openwebtext
language:
- en
pipeline_tag: text-generation
tags:
- research
- convolutional
- fft
- transformer-alternative
- causal-lm
---
# GCLM — Global Convolutional Language Model
## Model Summary
**GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**.
Instead of attention heads, GCLM uses:
- **Local depthwise convolutions** for short-range context
- **FFT-based global convolutions** for long-range sequence modeling
This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling.
> GCLM is a transformer alternative — not a transformer replacement.
---
## Architecture Overview
- Token + learned positional embeddings
- Stacked convolutional blocks:
- Local depthwise + pointwise convolution
- Optional global FFT convolution every *N* layers
- Feedforward MLP
- Residual connections + LayerNorm
- Causal language modeling head
**Key properties:**
- No attention mechanism
- No KV cache
- Linear memory scaling with sequence length
- Extremely long-context friendly (tested up to 8k+ tokens)
---
## Training Data
The model was trained on:
- **Skylion007/openwebtext**
This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.
---
## Intended Use
**Primary use cases:**
- Research into transformer alternatives
- Long-context modeling experiments
- Architectural ablation studies
- Educational exploration of non-attention sequence models
**Not intended for:**
- Safety-critical applications
- Medical, legal, or financial advice
- Deployment as a production chatbot without additional alignment work
---
## Limitations
- This model is **research-grade**, not instruction-tuned
- Outputs may be:
- Incoherent
- Factually incorrect
- Biased or unsafe
- Performance characteristics differ significantly from transformer LMs
- No reinforcement learning or alignment tuning applied
---
## Ethical Considerations
GCLM was trained on publicly available web data and may reflect societal biases present in that data.
Users are responsible for:
- Applying appropriate filtering
- Avoiding harmful or misleading use cases
- Evaluating outputs critically
---
## License
This model is released under the **Apache License 2.0**.
You are free to:
- Use
- Modify
- Distribute
- Use commercially
Attribution and license preservation are required.
Patent rights are explicitly granted under this license.
---
## Citation
If you use GCLM in your research, please cite or reference the project.
## Important
The model will not be put in the repo until it has finished training.