BenchmarkFebruary 2026

Person Detection on Snapdragon 8 Gen 3

FP32 vs INT8 model validation with automated quality gates and Ed25519-signed evidence bundles. Tested on real hardware via Qualcomm AI Hub.

Inference Time

FP32

0.1760ms

INT8

0.1870ms

Gate: ≤ 1.0 ms

Peak Memory

FP32

121.51MB

INT8

124.66MB

Gate: ≤ 150 MB

Model Size

FP32

1.07MB

INT8

0.32MB

Both models passed all quality gates

2/2 gates passed for FP32 • 2/2 gates passed for INT8 • Results are Ed25519-signed and tamper-proof

Executive Summary

EdgeGate automatically validated both FP32 and INT8 variants of a person-detection model on a real Snapdragon 8 Gen 3 (sm8650) device via Qualcomm AI Hub. Both models achieved sub-millisecond inference and passed all quality gates. The results are captured in Ed25519-signed evidence reports that provide tamper-proof proof of compliance.

The INT8 model is 70.5% smaller than FP32 (322 KB vs 1.07 MB) while maintaining comparable sub-millisecond inference. Both pass identical quality gates automatically.

Test Setup

Model Architecture

MobileNet-style depthwise separable CNN for binary person detection. Accepts 224×224 RGB images normalized to [0,1]. Outputs class logits for [no_person, person].

FP32: 270,146 parameters • 1.07 MB ONNX

INT8: 71,074 parameters • 322 KB ONNX

Target Device

Qualcomm Snapdragon 8 Gen 3 (sm8650), Samsung Galaxy S24 family. Tested via Qualcomm AI Hub cloud device farm.

Config: "Snapdragon 8 gen 3 multi"

Access: Qualcomm AI Hub API

Detailed Results

MetricFP32INT8Delta
Inference time0.1760 ms0.1870 ms+6.2%
Peak memory121.51 MB124.66 MB+2.6%
Model file size1.07 MB322 KB-70.5%
Parameters270,14671,074-73.7%
Gate pass rate2/2 (100%)2/2 (100%)

Gate Evaluation

GateThresholdFP32INT8
inference_time_ms≤ 1.0 ms✓ PASS(0.176 ms, 82.4% margin)✓ PASS(0.187 ms, 81.3% margin)
peak_memory_mb≤ 150 MB✓ PASS(121.51 MB, 19.0% margin)✓ PASS(124.66 MB, 16.9% margin)

Key Observations

Both models are production-ready. Sub-millisecond inference with 80%+ margin against the gate threshold means there is significant headroom for more complex models or tighter latency budgets.

Model size is the primary INT8 benefit. The INT8 variant is 70.5% smaller on disk, which matters for OTA updates, storage-constrained devices, and download times. Inference performance is comparable.

Runtime memory is similar. Despite having 73.7% fewer parameters, the INT8 model uses slightly more peak memory (124.66 MB vs 121.51 MB). This is expected — the Snapdragon runtime allocates device-level resources that aren't proportional to model size alone.

Automated gating works. Both models were automatically evaluated against quality gates with clear PASS/FAIL verdicts. No manual interpretation needed.

Evidence & Auditability

Each benchmark run produced a signed evidence report containing model identity (SHA-256 hash), device attestation (hardware ID, firmware version, runtime configuration), test configuration, raw metrics, gate verdicts, and an Ed25519 cryptographic signature ensuring tamper-proof results.

FP32 Evidence Report

dc2e9f67

SHA-256: 0a1baffb1197...

INT8 Evidence Report

875d3c6f

SHA-256: ef1d360ccc60...

Methodology

Models were uploaded to Qualcomm AI Hub as ONNX format and compiled for the target Snapdragon 8 Gen 3 chipset. AI Hub ran on-device profiling and returned inference time and peak memory metrics. EdgeGate consumed these metrics, evaluated them against configured gates, and generated signed evidence reports.

Limitations: This benchmark covers a single model architecture (person detection CNN) on one device family (Snapdragon 8 Gen 3). The INT8 variant uses channel reduction rather than true post-training quantization. Single test configuration was used.

Run your own benchmark

EdgeGate tests your models on real Snapdragon devices with automated quality gates and signed evidence. Free tier includes 10 runs/month.