GCLM β€” Global Convolutional Language Model

Model Summary

GCLM (Global Convolutional Language Model) is an experimental causal language model that replaces traditional self-attention with a hybrid local + global convolutional architecture.

Instead of attention heads, GCLM uses:

  • Local depthwise convolutions for short-range context
  • FFT-based global convolutions for long-range sequence modeling

This design explores whether global receptive fields can be achieved efficiently without quadratic attention, while remaining compatible with standard autoregressive language modeling.

GCLM is a transformer alternative β€” not a transformer replacement.


Architecture Overview

  • Token + learned positional embeddings
  • Stacked convolutional blocks:
    • Local depthwise + pointwise convolution
    • Optional global FFT convolution every N layers
    • Feedforward MLP
    • Residual connections + LayerNorm
  • Causal language modeling head

Key properties:

  • No attention mechanism
  • No KV cache
  • Linear memory scaling with sequence length
  • Extremely long-context friendly (tested up to 8k+ tokens)

Training Data

The model was trained on:

  • Skylion007/openwebtext

This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.


Intended Use

Primary use cases:

  • Research into transformer alternatives
  • Long-context modeling experiments
  • Architectural ablation studies
  • Educational exploration of non-attention sequence models

Not intended for:

  • Safety-critical applications
  • Medical, legal, or financial advice
  • Deployment as a production chatbot without additional alignment work

Limitations

  • This model is research-grade, not instruction-tuned
  • Outputs may be:
    • Incoherent
    • Factually incorrect
    • Biased or unsafe
  • Performance characteristics differ significantly from transformer LMs
  • No reinforcement learning or alignment tuning applied

Ethical Considerations

GCLM was trained on publicly available web data and may reflect societal biases present in that data.

Users are responsible for:

  • Applying appropriate filtering
  • Avoiding harmful or misleading use cases
  • Evaluating outputs critically

License

This model is released under the Apache License 2.0.

You are free to:

  • Use
  • Modify
  • Distribute
  • Use commercially

Attribution and license preservation are required.
Patent rights are explicitly granted under this license.


Citation

If you use GCLM in your research, please cite or reference the project.

Important

The model will not be put in the repo until it has finished training.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train umm-dev/simple-gclm-implementation