Your edge AI passed in the cloud. Can you prove it on the device?
Quantize a model for the NPU and it can silently lose accuracy, fall back to CPU, or burn 4× the battery. When a safety assessor asks what shipped, a green checkmark isn't evidence. EdgeGate gates every model on real Snapdragon hardware in CI and signs an ISO 26262-traceable evidence bundle for every result.
Your model never leaves your AWS account.
From model upload to hardware-validated CI
Watch how EdgeGate catches on-device regressions before they reach production — in under 60 seconds.
Edge AI breaks in ways you can't test for — until now
Emulators and cloud GPUs can't replicate what happens on a real Snapdragon device in the field.
Cloud tests lie
Your model scores 95% accuracy on cloud GPUs. On a Snapdragon chipset running at 40°C? It drops to 71%. Cloud benchmarks don’t predict on-device behavior.
Hardware is unpredictable
Thermal throttling, firmware updates, and power states change how your model runs. These variables don’t exist in simulation — only on real silicon.
Regressions ship silently
A weight update that looks fine in your training pipeline quietly degrades latency by 3x on device. Without hardware-in-the-loop CI, you won’t know until users complain.
From git push to hardware-validated in minutes
Add hardware regression testing to your existing CI/CD pipeline. No infrastructure to manage.
Push your model
Upload your ONNX model and create a pipeline in the dashboard. Set pass/fail gates for inference time and peak memory on your target device.
# Pipeline config (via Dashboard or API)
model: resnet18_fp32.onnx
format: onnx # embedded weights
gates:
inference_time_ms: "<=1.0"
peak_memory_mb: "<=150"
device: sm8650 # Samsung Galaxy S24Test on real hardware
EdgeGate runs your model on physical Snapdragon devices through Qualcomm AI Hub. No emulators. Median-of-N measurements with warmup exclusion for deterministic results.
[EdgeGate] Device: Samsung Galaxy S24 (SM8650)
[EdgeGate] Compiling via Qualcomm AI Hub...
[EdgeGate] Profiling on-device (median-of-N)
[EdgeGate] Inference: 0.176ms ✓ (gate: ≤1.0ms)
[EdgeGate] Peak memory: 121.51 MB ✓ (gate: ≤150 MB)
[EdgeGate] Model size: 1.07 MB (270,146 params)Gate your PR
Results flow back to your CI pipeline as a pass/fail gate. Failed gates block the merge. Every run produces a signed evidence bundle with SHA-256 hashes for auditability.
✓ 2/2 GATES PASSED — PR #247 can merge
Evidence bundle: dc2e9f67
model_hash: sha256:4f8a2c...
signed: Ed25519 (workspace key)
device: SM8650 (Samsung Galaxy S24)
inference: 0.176ms | memory: 121.51 MBSet up edge-AI CI gates by talking to Claude.
EdgeGate's MCP server turns 10 minutes of dashboard clicks into one prompt. Works with Claude Code, Cursor, and Claude Desktop.
You
Use the edgegate MCP to set up a CI gate for my MobileNet ONNX model. Gates: inference_time_ms ≤ 10, peak_memory_mb ≤ 150. Devices: Galaxy S24, Galaxy S23.
{ "name": "Mobile Team" }✓ workspace_id: ws_9f2a1c
{ "path": "./mobilenet_v2.onnx" }✓ model_id: mdl_3e8b07
{ "name": "MobileNet Gates", "devices": ["sm8650","sm8550"],
"gates": { "inference_time_ms": "<=10", "peak_memory_mb": "<=150" } }✓ pipeline_id: pip_7c4d22
{ "pipeline_id": "pip_7c4d22" }✓ run_id: jgomvedq5 (in flight)
{ "workspace_id": "ws_9f2a1c" }✓ .github/workflows/edgegate.yml ready to commit
Claude
Done. Workspace Mobile Team, pipeline MobileNet Gates created, run jgomvedq5 in flight, and .github/workflows/edgegate.yml is ready to commit. Want me to push it?
What you would have done manually
- 1Log into EdgeGate dashboard
- 2Navigate to Settings → AI Hub Integration
- 3Paste Qualcomm AI Hub API token
- 4Create new workspace for your team
- 5Click Upload Model → pick your ONNX file
- 6Wait for upload (varies by model size)
- 7Create pipeline → set name and description
- 8Add devices: Galaxy S24, Galaxy S23
- 9Set gates: inference_time_ms ≤ 10, peak_memory_mb ≤ 150
- 10Settings → CI/CD → Generate HMAC key
- 11Copy workflow YAML → add to .github/workflows/
- 12Commit and push to verify CI wires up
30 seconds, not 10 minutes
Total wall clock from prompt to wired CI. Includes workspace creation, model upload, pipeline config, and the gh commands.
Stays in your editor
Never context-switch to the browser. The MCP server is a real MCP tool running locally — not a chat widget or browser extension.
Same signed evidence bundles
MCP calls hit the same API as the dashboard. Audit trails, Ed25519 signatures, and SHA-256 manifests are identical.
Get started in 2 steps
# 1. Generate an API key in Settings → API Keys
# 2. Install in Claude Code (MCP + 6 slash commands):
/plugin install https://github.com/frozo-ai/edgegate-mcp
# Cursor / Claude Desktop? Use the interactive installer:
npx edgegate-mcp-install
# That's it. Restart your MCP client and start prompting.Everything you need to ship edge AI with confidence
Purpose-built for teams deploying AI models to Snapdragon-powered devices.
Quantized models silently fall back to CPU — 4× the battery, half the speed
Silent NPU Fallback Detection
EdgeGate measures npu_compute_percent on every run and blocks merges when execution drifts off the NPU. Catches the failure mode that latency-only gates miss — inference passes, but the model is now running on a path that drains battery in hours, not days.
Testing model variants across devices is a serial, slow process
Multi-Model × Multi-Device Matrix
Compare FP32, INT8, pruned, distilled — across Snapdragon 8 Gen 3, X Elite, and your full target fleet — in one run. Up to 25 cells, executed in parallel via Celery chord fan-out. One gate blocks the merge if any cell fails.
You can see the new run’s metrics, but not what changed since main
Run-Over-Run Signed Diffs
Every run computes a structured diff against its baseline: metric deltas, gate flips, per-device breakdown. The diff is folded into the signed evidence bundle, so the commit→commit comparison is cryptographically anchored — not just a UI rendering.
Vision gates don’t catch LLM regressions — latency means nothing without throughput
LLM Gating (TTFT · TPS · NPU%)
First-class gates for time-to-first-token, tokens/sec, and NPU residency on small LLMs (TinyLlama, Phi-3-mini, Llama 3.2 1B/3B). Catch the case where a quantization tweak halves throughput while leaving per-token latency unchanged.
Downstream systems trust your CI status, but can’t verify it themselves
Signed Bundle Registry
Every run produces an Ed25519-signed evidence bundle exposed via a public registry. Fleet management, MLOps, and compliance tools fetch + verify bundles directly. Two-sided audit log makes tampering detectable from either end.
Hardware testing is a manual, out-of-band process
CI/CD Native
Drop EdgeGate into GitHub Actions or GitLab CI. One workflow step. Results appear as PR checks. HMAC-signed CI requests with replay protection. Workspace concurrency = 1, so parallel CI jobs serialize cleanly without stomping on devices.
Three guarantees, by design.
Run gates on your proprietary models without handing them over.
Read-only IAM
s3:GetObject + s3:HeadObject only. Never PutObject or Delete*.
External ID required
Workspace-scoped UUID stops confused-deputy attacks at STS.
1:1 CloudTrail correlation
Every aws_request_id in our audit appears in your CloudTrail.
Unprecedented
Performance Insights
Visualize your model's performance delta across different firmware versions, temperatures, and battery states. Our dashboard provides a granular view that emulators simply cannot match.
- done_allInference Latency (Median-of-N Gating)
- done_allPeak Memory vs. Gate Threshold
- done_allFP32 vs INT8 Model Comparison
For teams shipping AI to real devices
Whether you're building robots, drones, smart cameras, or mobile AI features — if it runs on Snapdragon, EdgeGate is your regression safety net.
ML Engineers
You train and optimize models for edge deployment. EdgeGate lets you validate that your INT8 quantization actually works on target hardware before merging.
- checkModel quantization validation
- checkAccuracy regression checks
- checkCross-device compatibility testing
Embedded / IoT Engineers
You build firmware and applications for Snapdragon-powered devices. EdgeGate catches latency regressions and thermal issues your desktop benchmarks miss.
- checkLatency gate enforcement
- checkThermal throttling detection
- checkFirmware update impact testing
DevOps / ML Platform Teams
You own the CI/CD pipeline. EdgeGate plugs into GitHub Actions or GitLab CI with one YAML file and gives you deterministic hardware gates.
- checkCI/CD integration in minutes
- checkHMAC-signed webhook triggers
- checkSigned evidence for audit trails
Industries using EdgeGate
Stop shipping blind to hardware
Join the waitlist for early access. Be the first to add hardware regression gates to your CI pipeline.