UL Benchmarks Installing Stable Diffusion models manually

In some regions, UL Procyon cannot automatically download the required AI models. In these cases, users will have to manually download the models.

Pytorch Models

Stable Diffusion 1.5

HFID	nmkd/stable-diffusion-v1-5
Link	https://huggingface.co/nmkd/stable-diffusion-1.5-fp16/tree/main
Variant
Note	Used for TensorRT, ONNXRuntime-DirectML Olive-Optimized, OpenVINO and CoreML. Conversion is run locally.

Stable Diffusion XL

HFID	stabilityai/stable-diffusion-xl-base-1.0
Link	https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main
Variant	Pytorch fp16 (safetensors)
Note	Used for TensorRT, OpenVINO and Core ML. Conversion is run locally. Only the .safetensors variants of the models are needed.

HFID	madebyollin/sdxl-vae-fp16-fix
Link	https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
Variant	fp16 (safetensors)
Note	Used for TensorRT, Olive Optimized model for ONNX Runtime with DirectML, OpenVINO and Core ML. Conversion is run locally.

Converted Olive-optimized ONNX models

Stable Diffusion XL

HFID	greentree/SDXL-olive-optimized
Link	https://huggingface.co/greentree/SDXL-olive-optimized/tree/main
Variant	ONNX Olive Optimized (ONNX)
Note	Used for ONNX Runtime with DirectML. No conversion is run.

Converted AMD-optimized ONNX models

Stable Diffusion 1.5

HFID	amd/stable-diffusion-1.5_io16_amdgpu
Link	https://huggingface.co/amd/stable-diffusion-1.5_io16_amdgpu
Variant	AMD-optimized (ONNX)
Use	Used for ONNX Runtime with DirectML. No conversion is run.

Stable Diffusion XL

HFID	amd/stable-diffusion-xl-1.0_io16_amdgpu
Link	https://huggingface.co/amd/stable-diffusion-1.5_io16_amdgpu
Variant	AMD-optimized (ONNX)
Use	Used for ONNX Runtime with DirectML. No conversion is run.

Quantized OpenVINO models

Stable Diffusion 1.5

HFID	intel/sd-1.5-square-quantized
Link	https://huggingface.co/Intel/sd-1.5-square-quantized/tree/main/INT8
Variant	Int8 Quantized OVIR
Use	Used for OpenVINO Runtime with int8 precision. No conversion is run for these models. Requires the full SD15 fp16 pytorch models for converting the Text Encoder and VAE.
Files	INT8/time_proj_constants.npy INT8/time_proj_constants.raw INT8/unet_int8.bin INT8/unet_int8.xml INT8/unet_time_proj.bin INT8/unet_time_proj.xml

Quantized RyzenAI ONNX models

Stable Diffusion 1.5

HFID	amd/stable-diffusion-1.5-amdnpu
Link	https://huggingface.co/amd/stable-diffusion-1.5-amdnpu/tree/main
Variant
Use	Used for ONNX Runtime RyzenAI NPU execution. No conversion is run for these models.
Files	scheduler/scheduler_config.json text_encoder/model.onnx tokenizer/merges.txt tokenizer/special_tokens_map.json tokenizer/tokenizer_config.json tokenizer/vocab.json unet/dd_metastate_SD15_Unet_NhwcConv_0-conv_inConv.ctrlpkt unet/dd_metastate_SD15_Unet_NhwcConv_0-conv_inConv.fconst unet/dd_metastate_SD15_Unet_NhwcConv_0-conv_inConv.state unet/dd_metastate_SD15_Unet_NhwcConv_0-conv_inConv.super unet/model_NHWC.onnx unet_w8a16/.cache

Quantized Qualcomm QNN models

Stable Diffusion 1.5

HFID	qualcomm/Stable-Diffusion-v1.5
Link	https://huggingface.co/qualcomm/Stable-Diffusion-v1.5
Commit	w8a16 Quantized QNN
Note	Used for QNN Runtime with int8 precision. No conversion is run. Requires the UNET, tokenizer and scheduler of the original SD15 fp16 pytorch model to be placed on disk as well.
Files	TextEncoder_Quantized.bin UNet_Quantized.bin VAEDecoder_Quantized.bin

Installing the models

For Windows

By default, the benchmark is installed in

%ProgramData%\UL\Procyon\chops\dlc\ai-imagegeneration-benchmark\

If it does not exist, create a subfolder named ‘models’ in this directory:
```
%ProgramData%\UL\Procyon\chops\dlc\ai-imagegeneration-benchmark\models 
```
In this ‘models’ folder, create the following subfolders based on the tests you are looking to run:
1. For non-converted Pytorch models:
  Create a subfolder 'pytorch' and place each full Pytorch model in it with the model's HF ID in the folder structure; E.g.
```
...\ai-imagegeneration-benchmark\models\pytorch\nmkd\stable-diffusion-1.5-fp16\<each subfolder of the model>
```
  Please note:
  The first run of benchmarks using these models can take significantly longer, as the models need to be converted.
2. For converted Olive Optimized ONNX models for ONNX Runtime with DirectML:
  Create a subfolder ‘onnx_olive_optimized’ and place each full model in it with the model’s HF ID in the folder structure; E.g.
```
...\ai-imagegeneration-benchmark\models\onnx_olive_optimized\nmkd\stable-diffusion-1.5-fp16\<each subfolder of the model> 
```
3. For converted AMD Optimized ONNX models for ONNX Runtime with DirectML:
  Create a subfolder ‘onnx_amd_optimized’ and place each full model in it with the model’s HF ID in the folder structure; E.g.
```
...\ai-imagegeneration-benchmark\models\onnx_amd_optimized\nmkd\stable-diffusion-1.5-fp16\<each subfolder of the model> 
```
4. For quantized ONNX RyzenAI models for ONNX Runtime with RyzenAI:
  Create a subfolder ‘onnx_amd_optimized’ and place each full model in it with the model’s HF ID in the folder structure; E.g.
```
...\ai-imagegeneration-benchmark\models\onnx_amd_optimized\amd\stable-diffusion-1.5-amdnpu\<each subfolder of the model>
```
  Note that unet and vae_decoder have _w8a16 suffix in the directory name.
5. For quantized OVIR models for OpenVINO Runtime:
  Create a directory ‘ovir\<HF ID>\unet_int8’ and place each part of the int8 model in it:
```
...\ai-imagegeneration-benchmark\models\ovir\intel\sd-1.5-square-quantized\unet_int8\<each required unet part>
```
6. For quantized QNN models for QNN Runtime:
  Create a directory ‘qnn\<HF ID>\unet’ and place each model in it:
```
...\ai-imagegeneration-benchmark\models\qnn\qualcomm\Stable-Diffusion-v1.5\<submodel>\<submodel>.bin keeping the original name of the files: 
...\text_encoder\TextEncoder_Quantized.bin 
...\unet\UNet_Quantized.bin 
...\vae_decoder\VAEDecoder_Quantized.bin 

Follow the instructions in step (2.1) for the required pytorch model files
```

Note:

Not all models for all engines are required to always be present in the installation directory.

For OpenVINO, only the OVIR models must exist.
For ONNX Runtime-DirectML, only the Olive-optimized ONNX models must exist.
For TensorRT, only the Engine created for the current settings (batch size, resolution) and hardware must exist. The Engine is generated from the CUDA-optimized ONNX models in case changes are made.

For macOS

The location of benchmark models are installed in two different directories depending on whether the AI Image Generation Benchmark is being run as a root user or not.

When using the .pkg installed version of Procyon Image Generation as a non-root user (default), the benchmark models are installed into the following directory:
/Users/Shared/Library/UL/Procyon/mac-ai-imagegeneration-benchmark/models
When run as root, the models are instead installed into the AI Image Generation Benchmark for macOS installation directory:
/Library/UL/Procyon/AIImageGeneration/chops/dlc/mac-ai-imagegeneration-benchmark/models
When extracting the models from a .zip package, the models are installed into the extracted zip:
<path-to-extracted-zip>/AIImageGeneration/chops/dlc/mac-ai-imagegeneration-benchmark/models

Benchmarks

UL Procyon

Getting started

Office Productivity Benchmark

AI Computer Vision Benchmark

AI Image Generation Benchmark

AI Text Generation Benchmark

Battery Life Benchmark

Photo Editing Benchmark

Video Editing Benchmark

One-Hour Battery Consumption Benchmark

Procyon Labs

Command line guide

Installing Stable Diffusion models manually

Pytorch Models

Converted Olive-optimized ONNX models

Converted AMD-optimized ONNX models

Quantized OpenVINO models

Stable Diffusion 1.5

Quantized Qualcomm QNN models

Installing the models

For Windows

For macOS

UL Procyon

Benchmarks

Services

Support

Compare

UL Benchmarks

About UL