GCLM — Global Convolutional Language Model

Model Summary

GCLM (Global Convolutional Language Model) is an experimental causal language model that replaces traditional self-attention with a hybrid local + global convolutional architecture.

Instead of attention heads, GCLM uses:

Local depthwise convolutions for short-range context
FFT-based global convolutions for long-range sequence modeling

This design explores whether global receptive fields can be achieved efficiently without quadratic attention, while remaining compatible with standard autoregressive language modeling.

GCLM is a transformer alternative — not a transformer replacement.

Architecture Overview

Token + learned positional embeddings
Stacked convolutional blocks:
- Local depthwise + pointwise convolution
- Optional global FFT convolution every N layers
- Feedforward MLP
- Residual connections + LayerNorm
Causal language modeling head

Key properties:

No attention mechanism
No KV cache
Linear memory scaling with sequence length
Extremely long-context friendly (tested up to 8k+ tokens)

Training Data

The model was trained on:

Skylion007/openwebtext

This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.

Intended Use

Primary use cases:

Research into transformer alternatives
Long-context modeling experiments
Architectural ablation studies
Educational exploration of non-attention sequence models

Not intended for:

Safety-critical applications
Medical, legal, or financial advice
Deployment as a production chatbot without additional alignment work

Limitations

This model is research-grade, not instruction-tuned
Outputs may be:
- Incoherent
- Factually incorrect
- Biased or unsafe
Performance characteristics differ significantly from transformer LMs
No reinforcement learning or alignment tuning applied

Ethical Considerations

GCLM was trained on publicly available web data and may reflect societal biases present in that data.

Users are responsible for:

Applying appropriate filtering
Avoiding harmful or misleading use cases
Evaluating outputs critically

License

This model is released under the Apache License 2.0.

You are free to:

Use
Modify
Distribute
Use commercially

Attribution and license preservation are required.
Patent rights are explicitly granted under this license.

Citation

If you use GCLM in your research, please cite or reference the project.

Important

The model will not be put in the repo until it has finished training.

Downloads last month: -; Downloads are not tracked for this model. How to track

umm-dev
/

simple-gclm-implementation