amd
/

NPU-Nomic-embed-text-v1.5-ryzen-strix-cpp

Model card Files Files and versions

NPU-Nomic-embed-text-v1.5-ryzen-strix-cpp / README.md

makn87amd's picture

Update README.md

cd9e01d verified 3 months ago

|

history blame contribute delete

3.42 kB

	# ONNX Runtime C++ Example for Nomic-embed-text-v1.5 on Ryzen AI NPU

	This project demonstrates how to run ONNX models using ONNX Runtime with C++ on AMD Ryzen AI NPU hardware. The application compares performance between CPU and NPU execution when the system configuration supports it.

	## Prerequisites

	### Software Requirements
	- Ryzen AI 1.4 (RAI 1.5) - AMD's AI acceleration software stack
	- CMake (version 3.15 or higher)
	- Visual Studio 2022 with C++ development tools
	- Python/Conda environment with Ryzen AI 1.4 installed

	### Hardware Requirements
	- AMD Ryzen processor with integrated NPU (Phoenix or Hawk Point architecture)

	### Environment Variables
	Before building and running the application, ensure the following environment variables are properly configured:

	- `XLNX_VART_FIRMWARE`: Path to the Xilinx VART firmware directory
	- `RYZEN_AI_INSTALLATION_PATH`: Path to your Ryzen AI 1.5 installation directory

	These variables are typically set during the Ryzen AI 1.4 installation process. If not set, the NPU execution will fail.

	## Build Instructions

	1. Activate the Ryzen AI environment:
	```bash
	conda activate <your-rai-environment-name>
	```

	2. Build the project:
	```bash
	compile.bat
	```

	The build process will generate the executable in the `build\Release` directory along with all necessary dependencies.

	## Usage

	By default, the model will be run on CPU followed by NPU.

	Navigate to the build output directory and run the application:

	### Basic Example
	```bash
	cd build\Release
	quicktest.exe -m <model_name> -c <configuration_file_name> --cache_dir <directory_containing_model_cache> --cache_key <name_of_cache_directory> -i <number_of_iters>
	```

	### To run NOMIC using the pre-built Model Cache

	Using the prebuild cache will elminate model compilation, which can last several minutes
	To use the existing `nomic_model_cache` directory for faster startup, run:

	```bash
	cd build\Release
	quicktest.exe -m ..\..\nomic_bf16.onnx -c vaiml_config.json --cache_dir . --cache_key modelcachekey -i 5
	```

	This example:
	- Uses the pre-compiled model cache in `nomic_model_cache` for faster inference initialization
	- Runs 5 iterations to better demonstrate performance differences between CPU and NPU

	## Command Line Options

	\| Option \| Long Form \| Description \|
	\|--------\|-----------\|-------------\|
	\| `-m` \| `--model` \| Path to the ONNX model file \|
	\| `-c` \| `--config` \| Path to the VitisAI configuration JSON file \|
	\| `-d` \| `--cache_dir` \| Directory path for model cache storage \|
	\| `-k` \| `--cache_key` \| Path to the cache key directory \|
	\| `-i` \| `--iters` \| Number of inference iterations to execute \|

	## Project Structure

	- `src/main.cpp` - Main application entry point
	- `src/npu_util.cpp/h` - NPU utility functions and helpers
	- `src/cxxopts.hpp` - Command-line argument parsing library
	- `nomic_bf16.onnx` - Sample ONNX model (bf16 precision)
	- `vaiml_config.json` - VitisAI EP configuration file
	- `CMakeLists.txt` - CMake build configuration

	## Notes

	- The application automatically detects NPU availability and falls back to CPU execution if the NPU is not accessible
	- Model caching is used to improve subsequent inference performance
	- The included `cxxopts` header library provides robust command-line argument parsing
	- Ensure your conda environment is activated before building to access the necessary Ryzen AI libraries