Azure Container Registry Geo-replication Q&A

We’ve had a few questions on how ACR Geo-replication works, and how it can be used.

Q: “How long does it take to geo-replicate an image across replicated regions”

A: It depends on the image size and which regions that are synchronizing.
Larger layers will take longer, and synchronizing between regions uses public internet paths, which have varied response times, so we have no good way to provide a specific SLA.
On the positive side; ACR Geo-replication uses a multi-master sync topology. Meaning, you can push to any region and the rest of the regions will become eventually consistent. Using regional webhooks, you can get notifications when an image arrives to a specific region.
Since docker layers are immutable, the majority of image pushes are non-conflicting. The one exception is reusing a tag, which should be avoided for deployed images, but important for base images. When two regions update the same tag, ACR follows “last writer wins” logic.
Reference Links:

Q: “Does geo-replication handle disaster recovery scenarios”

A: Yes, mostly. We have some more work to fully support storage outages.
ACR geo-replication was initially intended to support network-close deployments. Using Azure Traffic Manager, each replicated region is placed in the Traffic Manager pool. using the Performance traffic-routing method, requests are routed to the closest replicated region. If health checks fail, the failing region will be pulled from the Traffic Manager pool, leaving requests to flow to the next closes region.
In some recent storage outages, we’ve realized our performance optimizations have left some gaps where storage outages are not detected by the health checks. We have some additional work to complete before we can claim an SLA for geo-replication in disaster recovery scenarios.

Additional Links: