PLDR-LLM-v52-110M-1

Model Description

PLDR-LLM-v52-110M-1 is a large language model from power law decoder representations with KV-cache and G-cache support, which is a new foundational language model architecture that utilizes power law graph attention to generate deductive and inductive outputs. This model has a parameter size of 110M. It is similar to PLDRv51-110M-1 whose architecture and training details are provided in Table 1 of the research paper titled PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference.

The difference for PLDR-LLM-v52-* models from PLDR-LLM-v51-* is that the rotary positional embedding (RoPE) implementation uses the GPT-NeoX style approach that is also used for Llama in Huggingface Transformers library. GPT-NeoX style approach is where half of the hidden dims are rotated instead of GPT-J style RoPE implementation which rotates every-other-two hidden dims. This approach makes the PLDR-LLM implementation more compatible with rest of the transformers library.
GPT-J style approach is the approach that was also used in the original implementation of PLDR-LLM as well as the official implementation of Llama. More details can be found here. The paper introducing rotary positional embeddings can be found here.

Training data

PLDR-LLM-v52-110M-1 was pretrained on the RefinedWeb, a publicly available English web dataset with extensive filtering and deduplication.

Training procedure

This model was trained for ~8B tokens on RefinedWeb over 250k steps per rank. It was trained autoregressively with cross-entropy loss. This model was trained with the custom model implementation of PLDR-LLM for the Huggingface Transformers library. Training parameters were similar to PLDRv51-110M-1 from research paper. Learning rate and number of warm-up steps were set at 1.2x10^-3 and 2000.

Intended Use and Limitations

This model is intended to be used for research purposes. Given text as input prompt, it carries out next token prediction to generate continuation text. The context length for this model is 1024 tokens.

How to Use

Via Huggingface Transformers Library

PLDR-LLM has custom model support for Huggingface Transformers library. PLDR-LLM with custom code is evaluated on Transformers 4.56.1 available at the time.

Using pipeline:

from transformers import pipeline

text_generator = pipeline(
                          task="text-generation",
                          model="fromthesky/PLDR-LLM-v52-110M-1",
                          device="cuda", # or "cpu"
                          trust_remote_code=True
                         )

prompt="The quick brown fox jumps over the lazy dog."

output=text_generator(prompt, top_p=0.6, top_k=0, temperature=1, do_sample=True,               
                      tokenizer_encode_kwargs={"add_special_tokens":False}, 
                      use_cache=True, max_new_tokens=100)
print(output[0]["generated_text"])

Using AutoModel:

from transformers import AutoModelForCausalLM, AutoTokenizer
device="cuda" # or "cpu"
model=AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="fromthesky/PLDR-LLM-v52-110M-1",
                                           device_map=device,
                                           trust_remote_code=True
                                          )
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path="fromthesky/PLDR-LLM-v52-110M-1",
                                        add_eos_token=False,
                                        legacy=False,
                                        trust_remote_code=True
                                       )
                                       
prompt="The quick brown fox jumps over the lazy dog."
        
inputs = tokenizer([prompt], return_tensors="pt").to(device=device)
generated_ids = model.generate(**inputs,
                               max_new_tokens=100, 
                               top_p=0.6,
                               top_k=0, 
                               temperature=1, 
                               do_sample=True,
                               use_cache=True
                              )
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

PLDR-LLM specific configurations:

custom_G_type: None for learned G values during pretraining, 'identity' for LLM with SDPA equivalent, 'random' for G values from a random normal distribution, 'external' for custom G values that can be assigned after model initialization. This setting is more important for training purposes, for inference it is set in the model config.json file.
cache_first_G: For batched inference, if set to True, cache G values from the first sample prompt in batch for all samples. If set to False, cache G values separately for each sample prompts in batch. For contrastive generation with custom_G_value=None, this needs to be set to True.
reference_rope: If set to True, RoPE implementation implemented in the original paper is used. This is the case for model pretrained in this repo. If set to False, RoPE implementation from the Huggingface Transformers library is used.
output_pldr_attentions=True returns the deductive outputs and learnable parameters of power law graph attention module as tuple containing: the output of the residual metric learner (metric tensor, A), output (A_LM) after application of iSwiGLU on metric tensor, learned exponents of potential tensor, learned weights for energy-curvature tensor, learned bias for energy-curvature tensor, energy-curvature tensor (G_LM), and attention weights.

See config.json for other model configuration details.

Notes:

This implementation of PLDR-LLM custom code was evaluated on Transformers 4.56.1 and pytorch 2.6.0.
We also have a fork of transformers library with PLDR-LLM model support for future development. The PLDR-LLM model files are added to the library so custom model files are not necessary.

      git clone https://github.com/burcgokden/transformers
      cd transformers
      git checkout add_PLDR_LLM
      pip install -e ".[dev]"

Static cache is not supported for models with custom_G_type=None.
PLDR-LLM uses EOS token "[END]" during pretraining to indicate end of a sequence. For text generation, we do not need to add the EOS token to the prompt. To achieve this, add_eos_token=False can be set in tokenizer_config.json file or while initializing the tokenizer model. For text generation pipeline call method, tokenizer_encode_kwargs={"add_special_tokens":False} can be used.
When add_bos_token=False and add_eos_token=False are set for the tokenizer model, prompt "" is an invalid input for single batch inference as it doesn't contain any tokens. When padding is enabled, batched inference with prompt "" as one of the samples causes its input_ids to be pad tokens and attention_mask to be all zeros. This edge case is handled differently for _attn_implementation='eager' and 'sdpa', resulting in different generation outputs for this prompt. Setting add_bos_token=True, add_eos_token=True or explicitly providing prompt as "[PAD]", "[START]", or "[END]" gives same output for either implementation. This issue does not affect KV-cache and G-cache.

LM Evaluation Harness Support

The model can be used with a fork of LM-Evaluation-Harness Suite with PLDR-LLM with KV-cache and G-cache support: lm-evaluation-harness-with-PLDR-LLM-kvg-cache.

Limitations and Biases

Large Language Models may generate text that is profane, lewd, socially unacceptable or offensive based on the contents of the dataset it was pretrained. RefinedWeb is a dataset that is as toxic and biased as the Pile. Please see the papers for RefinedWeb and the Pile for more information. Moreover, large language models are also susceptible to hallucinations and may generate text that contains incorrect, irrelevant or misleading information. Since it is very hard to expect the contents of generated text ahead of time, the output of the large language models need to be heavily moderated and curated to avoid undesired content to appear without warning.

Eval results

The model is evaluated on benchmarks with zero-shot setting in a similar way that was presented in research paper

Benchmark	Score
ARC-c	22.53
ARC-e	36.49
Hellaswag	29.20
OpenBookQA	27.00
PIQA	63.00
SIQA	41.81
Winogrande	49.96
Average-1	38.19
TruthfulQA	45.00
Average-2	38.95

BibTeX entry and citation info

Please cite this model as:

@misc{gokden2025pldrllmkvgcache,
      title={PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference}, 
      author={Burc Gokden},
      year={2025},
      eprint={2502.13502},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.13502}, 
}

@misc{gokden2024pldrllm,
      title={PLDR-LLM: Large Language Model from Power Law Decoder Representations}, 
      author={Burc Gokden},
      year={2024},
      eprint={2410.16703},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.16703}, 
}