Polygraphy 工具完整介紹

Post author:darwin
Post published:2026 年 3 月 21 日
Post category:工具教學
Post comments:0 Comments

Polygraphy 是由 NVIDIA 開發的一套開源深度學習模型除錯（Debug）與分析工具集。它屬於 NVIDIA TensorRT 生態系統的一部分，主要目的是幫助開發者在將模型部署到 TensorRT 之前或之後，能夠快速地進行推理結果驗證、效能分析以及精度除錯。
簡單來說，Polygraphy 就像是深度學習模型的「測謊機」與「聽診器」，用來檢查模型在不同框架（如 ONNX Runtime, TensorRT）之間轉換時，是否發生了數值誤差或效能瓶頸。
使用 Polygraphy，您可以達成以下核心目標：
A. 驗證模型轉換的正確性 (Accuracy Verification)
- 目標： 確保模型從 PyTorch/TensorFlow 轉成 ONNX 或 TensorRT 引擎（Engine）後，輸出的結果與原始模型一致。
- 做法： 自動比較不同後端（例如 ONNX Runtime vs. TensorRT）對同一輸入的推理結果，並報告誤差值（絕對誤差與相對誤差）。
B. 隔離與定位錯誤 (Error Isolation)
- 目標： 當模型轉換後精度下降，找出具體是「哪一層（Layer）」或「哪個子圖（Subgraph）」出了問題。
- 做法： 提供二分搜尋法（Bisect）或其他策略，自動縮小範圍，定位導致精度崩潰的具體層。
C. 效能分析與基準測試 (Performance Profiling)
- 目標： 評估模型在 TensorRT 上的推論速度。
- 做法： 運行基準測試（Benchmark），測量延遲（Latency）、吞吐量（Throughput）以及記憶體使用情況。
D. 簡化除錯流程 (Debugging Workflow)
- 目標： 快速檢查模型架構或清理模型。
- 做法： 提供查看模型結構、修改 ONNX 模型（如摺疊常數、提取子圖）的功能。

安裝方法
- pip install nvidia-pyindex
- pip install polygraphy
主要分為下面六大工具，依照不同的用法來使用，最常使用的是run和inspect
- run: Load/convert, run inference, compare accuracy across backends (載入/轉換，執行推論，比較不同後端的準確度)
- convert: Convert model to a specified format (e.g. TRT engine) (將模型轉換為指定格式，例如 TRT engine)
- inspect:
  - model: Show text representation of a model (顯示模型的文字表示形式)
  - data: Show details about pickled input/output data (顯示關於 pickled 輸入/輸出數據的詳細資訊)
  - tactics: Show contents of a Polygraphy tactic replay file (顯示 Polygraphy 策略重播檔案的內容)
- surgeon:
  - extract: Extract a subgraph from an ONNX model (從 ONNX 模型中提取子圖)
  - sanitize: Simplify a graph by removing dead layers, folding constants (透過移除無效層、摺疊常數來簡化圖形)
  - insert: Insert node in ONNX model, optionally replacing subgraphs (在 ONNX 模型中插入節點，可選擇替換子圖)
- template:
  - trt-network: Generate template script to manually define TRT network (生成模板腳本以手動定義 TRT 網路)
- debug:
  - build: Repeatedly build TensorRT engines (重複建置 TensorRT 引擎)
  - precision: Run in higher precision to preserve accuracy (以較高精度執行以保持準確度)
  - diff-tactics: Figure out differences between multiple tactic replays (找出多個策略重播之間的差異)
  - reduce: Reduce a failing model to minimal failing case (將失敗的模型縮減為最小的失敗案例)

分析模型 inspect

polygraphy inspect model $onnx_path \
--show layers --display-as trt > save.txt

–display-as trt: 顯示layer的名稱是從tensorrt 產生的，默認是從onnx 模型產生
輸出: 輸出會看到類似下面的字串，箭頭右邊就是輸出的名稱
- Node 1270 | /model.30/heads.2/anc2vec/anc2vec/Conv [Op: Conv]
  - {/model.30/heads.2/anc2vec/Softmax_output_0,
    - model.30.heads.2.anc2vec.anc2vec.weight}
    - > {/model.30/heads.2/anc2vec/anc2vec/Conv_output_0}
- Node 1271 | /model.30/heads.2/anc2vec/Gather_4 [Op: Gather]
  - {/model.30/heads.2/anc2vec/anc2vec/Conv_output_0,
    - /model.22/heads.0/anc2vec/Constant_output_0}
    - > {output_17 [dtype=float32, shape=(‘batch’, ‘Gatheroutput_2_dim_1’, ‘Gatheroutput_2_dim_2’, ‘Gatheroutput_2_dim_3’)]}

比對模型 run

比如要比對onnx 模型與轉出的tensorrt 模型錯在哪裡的時候，可以跑以下的指令

polygraphy run $onnx_path \
    --onnxrt \
    --trt \
    --int8\
    --data-loader-script $DATA_PROVIDER \
    --calibration-cache polygraphy_dataprovider_500.cache \
    --trt-outputs $layer_name \
    --onnx-outputs $layer_name \
    --atol 0.05 --rtol 0.05 \
    --fail-fast

精度對齊工具 Run

polygraphy run demo_simplify.onnx \
        --trt --onnxrt \
        --trt-outputs mark all\
        --onnx-outputs mark all \
        --atol 1e-2 --rtol 1e-3 \
        --fail-fast\
        --val-range[0,1]

如果是要看轉出int8的模型要加上, calibration 步驟

polygraphy run $onnx_path \
    --onnxrt \
    --trt \
    --int8 \
    --data-loader-script $DATA_PROVIDER \
    --calibration-cache polygraphy_dataprovider_500.cache \
    --trt-outputs $layer_name \
    --onnx-outputs $layer_name \
    --atol 0.05 --rtol 0.05 \
    --fail-fast

# calibration.py
import os
import glob
import cv2
import numpy as np

# Path to your calibration dataset
DATA_DIR = ".../images"
HEIGHT = 640
WIDTH = 640
BATCH_SIZE = 1
MAX_IMAGES = 500 # Only need a few images to debug accuracy

def preprocess_image(img, target_size=(640, 640), background_color=(114, 114, 114)):
    target_w, target_h = target_size
    h, w = img.shape[:2]
    scale = min(target_w / w, target_h / h)
    new_w = int(w * scale)
    new_h = int(h * scale)
    resized_img = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_LANCZOS4)
    padded_img = np.full((target_h, target_w, 3), background_color, dtype=np.uint8)
    pad_left = (target_w - new_w) // 2
    pad_top = (target_h - new_h) // 2
    padded_img[pad_top:pad_top+new_h, pad_left:pad_left+new_w] = resized_img
    return padded_img

def load_data():
    """
    Generator function for Polygraphy CLI.
    Yields a dictionary: {'input_name': numpy_array}
    """
    img_paths = glob.glob(os.path.join(DATA_DIR, "*.jpg")) + \
                glob.glob(os.path.join(DATA_DIR, "*.png"))
    img_paths = sorted(img_paths)[:MAX_IMAGES]

    print(f"[DataProvider] Loading {len(img_paths)} images from {DATA_DIR}")

    batch_data = []
    
    # NOTE: You might need to change 'images' to the actual input name of your model.
    # Check it with Netron or polygraphy inspect.
    input_name = "images" 

    for path in img_paths:
        img = cv2.imread(path)
        if img is None: continue
        
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = preprocess_image(img, (WIDTH, HEIGHT))
        img = img.astype(np.float32) / 255.0
        img = np.transpose(img, (2, 0, 1)) # CHW
        
        batch_data.append(img)

        if len(batch_data) == BATCH_SIZE:
            # Yield batch
            batch_np = np.ascontiguousarray(np.stack(batch_data, axis=0))
            yield {input_name: batch_np}
            batch_data = []

    if batch_data:
        batch_np = np.ascontiguousarray(np.stack(batch_data, axis=0))
        yield {input_name: batch_np}

參數解釋
- –onnxrt：使用 ONNX Runtime 作為對照組（標準答案）。
- –trt：使用 TensorRT 作為測試組（被測對象）。
- –int8：啟用 INT8 量化模式（會壓縮精度）。
- –data-loader-script：使用外部腳本提供真實數據（用於校準），而非隨機數據。
- –calibration-cache：指定量化校準檔案（紀錄每一層的數值範圍 Scale）。如果有提供這個指令，會加入整個量化的過程，做完量化後cache 會儲存在這個檔案，下次執行同樣的指令會先找找看有沒有這個檔案存在，有的話會直接載入
- –trt-outputs / –onnx-outputs：強制輸出並監測指定層（Layer）的數值。也可以指定mark all ，代表不在乎特別哪一層，全部都比對，不過根據經驗，這樣的模式很容易OOM
- –atol / –rtol：設定誤差容許範圍（絕對誤差 0.05 / 相對誤差 5%）。
- –fail-fast：一旦發現錯誤（超過容許範圍）就立刻停止程式。

跑二分法找出最佳的配置

先分別跑兩個模型，得知兩個模型不同的輸出並且把它存成json檔案

polygraphy run $engine_path \
     --trt \
     --save-inputs golden_inputs.json \
     --save-outputs golden_outputs_fp16.json \
     --silent
     
polygraphy run $ONNX_MODEL \
    --onnxrt \
    --load-inputs golden_inputs_fp16.json \
    --save-outputs golden_outputs_onnx.json \
    --silent

建立比較的腳本

polygraphy run $engine_path \
    --trt \
    --load-inputs golden_inputs_fp16.json \
    --load-outputs golden_outputs_onnx.json \
    --atol 0.1 --rtol 0.01 \
    --check-error-stat mean \
    --fail-fast > check_engine_run.log 2>&1

最後跑二分法

polygraphy debug precision $ONNX_MODEL \
    --int8 \
    --fp16 \
    --precision-constraints obey \
    --calibration-cache polygraphy_calibration.cache \
    --data-loader-script $DATA_PROVIDER \
    --data-loader-func-name load_data \
    --check 'n=1; while [ -f polygraphy_debug_$n.engine ]; do n=$((n+1)); done; cp polygraphy_debug.engine polygraphy_debug_$n.engine; ./scripts/check_engine.sh' \
    --mode bisect \
    --log-file debug_precision.log

參數介紹
- –int8 / –fp16：指定 INT8 為測試目標（可能出錯的精度），並以 FP16 作為安全回退（Fallback）精度。
- –precision-constraints obey：強制 TensorRT 嚴格遵守每一層指定的精度，禁止自動優化修改，以確保除錯準確。
- –calibration-cache：載入 INT8 量化所需的校準數據（紀錄每一層的動態範圍）。
- –data-loader-script：指定提供真實輸入數據的 Python 腳本（用於構建 Engine）。
- –data-loader-func-name：指定上述腳本中，具體負責載入數據的函數名稱。
- –check ‘…’：定義驗證邏輯，此處指令會自動備份 Engine 並執行 check_engine.sh 來判斷精度是否合格。
- –mode bisect：使用二分搜尋法來查找問題層（比逐層檢查快），自動縮小範圍找出導致精度下降的層。
- –log-file：將除錯過程詳細記錄到指定檔案中。