NeuroBLAST evaluation

by Javedalam - opened 6 days ago

6 days ago

NeuroBLAST-V3-SYNTH-EC-150000 (mkurman/NeuroBLAST-V3-SYNTH-EC-150000)

About 0.6B parameters, custom “NeuroBLAST V3” hybrid architecture, trained on the PleIAs/SYNTH dataset. It’s explicitly marked as an experimental early checkpoint trained on short contexts with a high learning rate, mainly for architecture evaluation, not as a polished chat model.

What the model card implicitly claims / suggests

From the HF page and example code, the expectations are roughly:

A small causal-LM that can do basic text generation and simple conversational outputs.

Meant to be probed for reasoning/architecture behavior, not production use.

No claim of strong RLHF-style instruction following; it’s a raw early training snapshot, short context (768 tokens) and still in “pre-decay” phase.

So: it’s positioned as “interesting experimental architecture, small SLM, can generate text and be poked with questions,” not “stable, aligned assistant.”

What it actually did well in your tests

From what you ran in Colab:

When you gave it open-ended conceptual prompts (“Explain quantum entanglement to a 10-year-old…”, “Why does time slow down in physics/psychology/evolution?”), it produced reasonably coherent, readable paragraphs once we fixed the display.

For these same conceptual prompts, it stayed on topic for a few hundred tokens and produced text that looks like an explanation, with some structure and understandable language.

In other words, as a generic text generator for high-level explanations, it’s not terrible for 600M and an early checkpoint.

So as a language generator it’s okay: it can talk, it can stay roughly on topic, and it can sound like a junior explainer.

What it clearly failed at

But the moment you asked it to behave like an assistant instead of a parrot, it broke:

Instruction following

Could not obey “list 10 everyday activities, numbered 1–10, one short example, keep under 12 lines, end with ‘Done’.”

Repeated “Find a new job” and the same sentences over and over, ignored the length limit, and didn’t end correctly.

Even very simple structure

Failed “write exactly three different activities as bullets.”

Failed “write exactly these three words, each on its own line: apple / banana / carrot” — it merged them and kept talking.

Failed the “START…END, stop after last period” test by dropping “carrot.” and adding an extra “assistant”.

Math and formal reasoning

On the cyclist algebra problem it invented nonsense equations and concluded speeds like 0 km/h or 1 km/h, completely detached from the correct solution (18 km/h out, 24 km/h back).

Its Bayes theorem reasoning degenerated into random 0.5 / 0.54 / 0.086 numbers with no coherent structure.

Stopping and constraints

Repeatedly violated “do NOT add anything else”, “keep answer under N lines”, “write exactly…”.

Once it starts generating, it tends to drift and pad with filler phrases or random tokens.

Taxonomy / hierarchy

When asked for a multi-level taxonomy of human activities, it wrote a long meta-essay about “how to build a taxonomy” instead of actually giving one. No real categories, no subcategories, no examples.

In short: it can talk, but it can’t behave.

Short verdict

NeuroBLAST-V3-SYNTH-EC-150000 is an interesting 600M experimental architecture that can produce fluent English and halfway decent high-level explanations, but as an assistant it is almost unusable:

It is not an instruction-tuned model, however, its a promising model. Very small model , suitable to run on small devices

It fails at basic formatting, lists, constraints, and simple math.

The Google colab notebook running the model. Also the resources used to run the model

https://colab.research.google.com/drive/1HsH5xsaU6T5zFNu3qNrq8_HtU5OiqH7d?usp=sharing

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment