Edge Inference vs. Cloud: Why Sending Video to AWS Isn't a Safety Strategy

Edge inference vs cloud architecture for construction safety

Construction safety technology vendors who process video in AWS or Azure are building a product that sounds useful but doesn't perform the most safety-critical function: real-time proximity alerting. They've optimized for the wrong constraint. Scalable cloud infrastructure is valuable for analytics, reporting, and model training. It is not appropriate for the detection-to-alert path that prevents struck-by fatalities.

This isn't a controversial claim — it's basic network physics. Here's the math that shows why.

The latency budget for a proximity alert

Consider the scenario that kills the most construction workers: a worker entering the swing radius of an operating tower crane. The crane operator may have limited sight lines; the site foreman may be on the far side of the building. The detection system is the only entity that sees the hazard in real time. How long does it have to alert the foreman before the worker is at genuine risk?

A tower crane slewing at 0.3 rpm covers roughly 1.8 degrees per second. At 40-meter radius, a point on the crane hook travels approximately 1.25 meters per second during a slew. A worker standing at the exclusion zone boundary who doesn't immediately retreat will be struck in 3-5 seconds if the crane continues the slew. The alert latency budget — from detection to corrective action — is realistically 2-3 seconds for a proximity alert to be operationally useful rather than a post-incident record.

Now map that budget to a cloud inference architecture.

Where cloud latency accumulates

Step one: video frame capture and encoding on the camera. An IP camera encoding at H.264 with a 500ms GOP adds 250-500ms of encoding latency before the frame is available for transmission. Some higher-end cameras process at lower latency, but $800 jobsite-grade IP cameras running RTSP streams do not.

Step two: network transmission from the jobsite to the cloud. AWS us-east-1 from Houston measures 35-55ms under good conditions. However, construction site internet connectivity comes from LTE routers, CBRS fixed wireless, or whatever the site's ISP provides — and upstream bandwidth on a 48-camera deployment is roughly 200-400 Mbps at 720p. Congestion on shared LTE connections adds intermittent latency spikes of 200-500ms during peak usage periods like shift start and lunch breaks.

Step three: inference compute at the cloud endpoint. YOLOv8 inference on a single 720p frame takes approximately 15-25ms on an A10G GPU. But a 48-stream deployment isn't processing one frame at a time; it's processing 48 frames at 15fps — 720 inference requests per second. Queuing under that load adds variable latency depending on instance sizing. At minimum, assume 50-100ms for compute under load.

Step four: alert delivery. REST API call from cloud inference server to mobile push notification service (Firebase Cloud Messaging, Apple APNs) — another 80-150ms to the supervisor's phone app.

Total minimum latency, cloud path: 250ms (encoding) + 45ms (network) + 75ms (inference) + 100ms (alert delivery) = 470ms in a favorable scenario. In realistic conditions with encoding overhead and network congestion: 800ms-1.5 seconds for the alert to reach the supervisor. Add 2-3 seconds for the supervisor to register the notification and react. You're at 3-4.5 seconds total from event to human response.

For PPE compliance detection — a worker who's been on site all day without a hard hat — that latency is acceptable. For proximity alerts near moving equipment, it's not. The worker has moved roughly 4-5 meters in the time it took the alert to arrive.

How edge inference changes the latency budget

With on-site edge inference, the architecture eliminates the network round-trip for the detection path. Cameras transmit RTSP streams over the local site network — typically wired Ethernet or high-bandwidth 5GHz Wi-Fi — to the edge server. Network hop from camera to edge server: 2-8ms. Inference on-device: 15-25ms. Alert dispatch to supervisor mobile via local Wi-Fi or on-site cellular: 80-150ms.

Total alert latency, edge path: 8ms (local network) + 20ms (inference) + 100ms (alert delivery) = approximately 130ms in typical conditions. Under load with 48 streams: 150-250ms. That's a 3-5x improvement over the cloud path under favorable conditions and a 6-10x improvement in real-world degraded conditions.

The 1.2-second alert latency we report in our product specifications is our 95th percentile measured figure at the Houston pilot site across all 48 camera streams. Average latency was 780ms. The worst 5% of alerts — typically during periods when the edge server was processing model update batches — took up to 2.1 seconds. Even the worst-case figure is within the safety-critical window for proximity alerts. The cloud architecture's best-case is not.

What the cloud handles well

Edge inference for the detection-to-alert path doesn't mean cloud is irrelevant to the architecture. Our platform sends detection event logs, BLE sensor telemetry, and summarized alert records to cloud storage after the fact — typically with 60-120 second delay. The cloud handles everything that doesn't require sub-second latency: OSHA 300 log generation, analytics dashboards, model retraining, multi-site reporting, and Procore/Autodesk ACC data synchronization.

The clean architectural boundary: real-time detection and alerting runs on edge hardware. Everything downstream of the detection event runs in the cloud. Vendors who run real-time inference in the cloud have made an architectural choice that compromises the primary safety function to reduce their hardware costs and capital requirements for customers. That tradeoff may be acceptable for non-safety applications; it's not acceptable for proximity alerting around moving equipment.

The bandwidth constraint also eliminates pure cloud

Independent of latency, a 48-camera deployment at 720p/15fps generates approximately 144 Mbps of video data. Transmitting that continuously to the cloud requires a dedicated 150+ Mbps uplink — a connectivity requirement that most construction sites don't have and that costs $800-1,500 per month in enterprise cellular or fixed wireless service. Our edge architecture transmits only detection events and compressed thumbnail frames — approximately 2-5 Mbps of cloud-bound traffic for a fully loaded 48-camera site. The bandwidth reduction pays for the edge server hardware within 18-24 months at typical enterprise connectivity pricing.

Site connectivity failure modes

Construction sites lose internet connectivity. LTE routers experience outages when the carrier has coverage gaps or tower maintenance. Scheduled maintenance on site wireless infrastructure happens. Fiber cuts occur during excavation work. A cloud-dependent architecture with no local fallback stops functioning during connectivity outages. Our edge architecture continues detecting and alerting via local site network during internet outages — detection events are queued locally and uploaded when connectivity resumes. For a safety system, degraded-but-functional behavior during connectivity failures is a hard requirement, not a nice-to-have.

The cost argument for cloud is weaker than it looks

The business case for cloud inference is that it eliminates the need for on-site edge hardware. An edge server capable of running 48 RTSP streams costs $10,000-15,000. A cloud-only architecture eliminates that capital expense. But cloud compute for 720 inference requests per second — 48 streams at 15fps — costs approximately $3,200-4,800 per month on AWS (A10G instance at $0.76/hr, sized for peak load). That's $38,000-57,000 per year in cloud compute cost that the edge architecture doesn't incur. The edge server pays for itself in 3-4 months for a full 48-camera deployment.

If you're evaluating cloud-inference construction safety vendors, ask them what their per-site monthly cloud compute cost is at full camera deployment. Then divide by 12 and add it to the capital comparison with edge deployments. The cloud option is not lower total cost of ownership — it's lower upfront capital with higher ongoing operational cost and worse safety performance on the most critical detection scenarios.

For questions about our edge architecture or to request latency measurements from the Houston pilot, contact us at contact@hardhatpulse.com.