| # ONNX Runtime C++ Example for Nomic-embed-text-v1.5 on Ryzen AI NPU | |
| This project demonstrates how to run ONNX models using ONNX Runtime with C++ on AMD Ryzen AI NPU hardware. The application compares performance between CPU and NPU execution when the system configuration supports it. | |
| ## Prerequisites | |
| ### Software Requirements | |
| - **Ryzen AI 1.4** (RAI 1.5) - AMD's AI acceleration software stack | |
| - **CMake** (version 3.15 or higher) | |
| - **Visual Studio 2022** with C++ development tools | |
| - **Python/Conda** environment with Ryzen AI 1.4 installed | |
| ### Hardware Requirements | |
| - AMD Ryzen processor with integrated NPU (Phoenix or Hawk Point architecture) | |
| ### Environment Variables | |
| Before building and running the application, ensure the following environment variables are properly configured: | |
| - **`XLNX_VART_FIRMWARE`**: Path to the Xilinx VART firmware directory | |
| - **`RYZEN_AI_INSTALLATION_PATH`**: Path to your Ryzen AI 1.5 installation directory | |
| These variables are typically set during the Ryzen AI 1.4 installation process. If not set, the NPU execution will fail. | |
| ## Build Instructions | |
| 1. **Activate the Ryzen AI environment:** | |
| ```bash | |
| conda activate <your-rai-environment-name> | |
| ``` | |
| 2. **Build the project:** | |
| ```bash | |
| compile.bat | |
| ``` | |
| The build process will generate the executable in the `build\Release` directory along with all necessary dependencies. | |
| ## Usage | |
| By default, the model will be run on CPU followed by NPU. | |
| Navigate to the build output directory and run the application: | |
| ### Basic Example | |
| ```bash | |
| cd build\Release | |
| quicktest.exe -m <model_name> -c <configuration_file_name> --cache_dir <directory_containing_model_cache> --cache_key <name_of_cache_directory> -i <number_of_iters> | |
| ``` | |
| ### To run NOMIC using the pre-built Model Cache | |
| Using the prebuild cache will elminate model compilation, which can last several minutes | |
| To use the existing `nomic_model_cache` directory for faster startup, run: | |
| ```bash | |
| cd build\Release | |
| quicktest.exe -m ..\..\nomic_bf16.onnx -c vaiml_config.json --cache_dir . --cache_key modelcachekey -i 5 | |
| ``` | |
| This example: | |
| - Uses the pre-compiled model cache in `nomic_model_cache` for faster inference initialization | |
| - Runs 5 iterations to better demonstrate performance differences between CPU and NPU | |
| ## Command Line Options | |
| | Option | Long Form | Description | | |
| |--------|-----------|-------------| | |
| | `-m` | `--model` | Path to the ONNX model file | | |
| | `-c` | `--config` | Path to the VitisAI configuration JSON file | | |
| | `-d` | `--cache_dir` | Directory path for model cache storage | | |
| | `-k` | `--cache_key` | Path to the cache key directory | | |
| | `-i` | `--iters` | Number of inference iterations to execute | | |
| ## Project Structure | |
| - `src/main.cpp` - Main application entry point | |
| - `src/npu_util.cpp/h` - NPU utility functions and helpers | |
| - `src/cxxopts.hpp` - Command-line argument parsing library | |
| - `nomic_bf16.onnx` - Sample ONNX model (bf16 precision) | |
| - `vaiml_config.json` - VitisAI EP configuration file | |
| - `CMakeLists.txt` - CMake build configuration | |
| ## Notes | |
| - The application automatically detects NPU availability and falls back to CPU execution if the NPU is not accessible | |
| - Model caching is used to improve subsequent inference performance | |
| - The included `cxxopts` header library provides robust command-line argument parsing | |
| - Ensure your conda environment is activated before building to access the necessary Ryzen AI libraries | |