Video Analysis - Action Recognition & Understanding - a mindchain Collection

mindchain 's Collections

Agentic

Nvidia - Realtime Speech to Speech

NVIDIA GR00T N1 - Humanoid Robotics Foundation Models

Google TranslateGemma - 55 Language Translation Models

NVIDIA Nemotron PII - Privacy & Data Protection Dataset

Unitree Z1 Arm - Dual Dexterity Manipulation Data

Unitree G1 Dex1 - Humanoid Robot Dexterity Datasets

Unitree Robotics - G1_Dex3_datasets

Unitree G1 BrainCo - Grasping & Manipulation Data

Unitree UnifoLM WMA - World Model Agent for Robotics

Hugging Face - LeRobot - Pi0 (Old Version)

LeRobot Pi0.5 - Robotics Foundation Model v0.5

Hugging Face - LeRobot - Open X-Embodiment

LeRobot SmolVLA - Compact Vision-Language-Action

LeRobot Pi0 - HuggingFace Robotics Foundation Model

Hugging Face - LeRobot - Behavior 1K

LeRobot XVLA - Cross-Embodiment Vision-Language-Action

Atlas RL - Intelligent Architecture Reinforcement Learning

Hyper Graph Reasoning - Knowledge Graphs for AI Agents

Dual RTX 6000 Build - 96GB VRAM Optimized LLMs

NVIDIA Nemotron Orchestrator - Multi-Model Routing

LeRobot Pi0Fast - Real-Time Robotics Inference

Meta RoBERTa - Pretrained NLP & Text Classification

Google Embedding Gemma - Text Embeddings for RAG

NVIDIA Physical AI - Autonomous Vehicles & Robotics

Nvidia Thor + Rasberry + Oak 4D Dual Build

Qwen3 VL Reranker - Multimodal RAG Ranking Models

Qwen3 VL Embeddings - Multimodal Vector Search

Facebook/Meta - Research Plan Dataset

NVIDIA Nemotron Content Safety - Toxicity Detection

NVIDIA Clara Medical - Healthcare & Clinical NLP

NVIDIA Clara Biology - Genomics & Protein AI

NVIDIA Clara Molecular - Drug Discovery & Chemistry

NVIDIA Clara Medical - Clinical AI & Radiology

Nvidia Nemotron RAG - Reranking

NVIDIA Nemotron Embeddings - RAG & Vector Search

NVIDIA Alpamayo-R1 - Reasoning & Physical AI Models

DiT - Diffusion Transformer for Video & Audio Gen

NVIDIA Nemotron Speech - ASR & Text-to-Speech

NVIDIA Nemotron Cascade - Multi-Stage LLM Inference

OpenAI GPT-OSS - Steering Vectors & SAE Research

Google Gemma 3 LiteRT - Mobile & Edge Optimized

NVIDIA Cosmos Reason 2 - World Model Reasoning

NVIDIA Cosmos Transfer 2.5 - Style & Domain Transfer

NVIDIA Cosmos 2 - Cosmos-Predict 2.5

Robotics - Foundation Models for Embodied AI

Edge & Smartphone - On-Device Mobile AI Models

NVIDIA NeMo Gym - RL Agent Training Datasets

NVIDIA Nemotron Safety - AI Alignment Datasets

NVIDIA Nemotron RAG Datasets - Retrieval Training

NVIDIA Nemotron VLM - Vision-Language Training Data

Google Gemma 3N - Mobile multimordal Edition

Deep Thinking - Extended Chain-of-Thought Reasoning

Small Thinking - Compact Reasoning Models for Edge

Small Coders - Lightweight Code Generation Models

Self-Correcting Delta Transformer - Adaptive LLMs

YOLO - Real-Time Object Detection Models

Meta SAM - Segment Anything Models (Image & Audio)

Affordable Coding APIs - Cost-Effective LLM Endpoints

Edge LLMs - Ultra-Compact High-Performance Models

RLM - Neuro-Symbolic Architecture - Reasonig Traces

NVIDIA Nemotron Personas - Regional Character Data

NVIDIA Nemotron Post-Training - RLHF & SFT Data

NVIDIA Nemotron Reward - RLHF & Alignment Models

NVIDIA Nemotron Pre-Training - Foundation Model Data

Embeddings - Semantic Search & RAG Vector Models

Topological Transformer - Deepseek

Edge Translation - On-Device Multilingual NLP

Deep Research - Autonomous AI Literature Review

Qwen Long Reasoning - Extended Context CoT Models

PP-StructureV3 - Document Analysis & Table OCR

OCR Models - Optical Character Recognition & Text Extraction

Circuit Sparsity - Neural Network Interpretability

IQuest LoopCoder - Iterative Code Generation Models

Text to Motion - Human Animation & Gesture AI

ASR Models - Automatic Speech Recognition & Transcription

TTS - Text-to-Speech & Voice Synthesis Models

Mobile App AI - On-Device Agents & Function Calling

Audio Segmenting - Meta SAM 3 Audio

Hybrid Attention - Efficient Transformer Architectures

NVIDIA Nemotron V3 - Post-Training Datasets

Datasets Pretraining - Nemotron V3

Open Source AI - Fully Open Weights & Training Data

Byte Level Models - Tokenizer-Free Language Models

Image to 3D - Single-Image 3D Reconstruction

Video Analysis - Action Recognition & Understanding

Deep Research Agents - Specialized Search & Reasoning

Diffusion LLMs - Non-Autoregressive Text Generation

IBM Granite - Enterprise AI & Code Generation

Video Generation - Text-to-Video & AI Synthesis

Small OCR - Lightweight Text Recognition for Edge

Graphics AI - Visual Computing & Image Synthesis

Hierarchical RL - Multi-Level Decision Making

Meta VL-JEPA - Vision-Language Prediction Models

Bread & Butter - Top Production-Ready LLMs 2025

Google Gemma Scope 2 - Neuronpedia

Haddock Custom Sparse Autodecoders

Google Gemma - Quantized

Nvidia Nemo-Gym

Trained

Google T5 Gemma 2

Nvidia - Nemotron - Mamba/Transformers Combo Hybride

Google FunctionGemma (Gemma 3)

Video Analysis - Action Recognition & Understanding

updated 25 days ago

Video analysis models for action recognition, temporal understanding, and video content classification

allenai/Molmo2-8B

Image-Text-to-Text • 9B • Updated 19 days ago • 73.4k • • 149