As you consider which container registry to run for storing container images, there are a number of important factors to consider, beyond just the basic features.
Data at the Speed of Light
Images can be, and typically are fairly large. There are optimizations to be had regarding the total size and the layering of your images. But consider that the content of the images must travel from the registry to the node running the image. While we can talk about image layer optimizations, for the point of choosing a registry, consider whatever the size is, it will take longer to travel, the larger it is and the farther away it is.
Related to the size factor, when a node pulls an image, how far does it have to travel? How many networks does it need to traverse? Are you paying egress fees from a cloud? Even if it’s between regions of the same cloud vendor, you’ll most likely pay egress fees and travel across the public internet to get between regions.
When developing internet based applications, you’ve already decided you’re at the risk of things far outside of your control. How can you minimize the points of failure? Consider the network, but also consider DNS issues, like the DNS attack several years ago that took github and many other services offline. Unless you knew what IP addresses to put in your local host files, connectivity was down. Then, what did you do for PaaS/SaaS systems that you couldn’t hack the local hosts file?
Even if you may allow public/anonymous pulls of some images, I would imagine you at least need permissions to push an image. Consider what authentication barriers you must traverse. If the person that has push/pull rights wins lotto, and leaves the company, how easy is it to disable their account?
A while back, we had someone leave Microsoft for AWS. (yes, people do a lot of swapping around, and I’m happy to say we have many people coming from AWS and google to Microsoft or back to Microsoft, but I digress…). The person contacted us after being gone for a few months, and let us know they still had access to the microsoft/ org on Docker Hub. With no federation between the docker hub auth model and our corporate AD accounts, there’s no reasonable way to track these accounts. Be sure to choose a system that federates your corporate identity.
Choose the registry offered by your cloud provider
For the reasons noted above, this should be obvious. Each major cloud vendor has a container registry offering. If you’re in AWS, choose ecr. Google, choose gcr. And, if you’re in Azure, choose ACR. Each cloud does the best we can to make sure images are pulled quickly across our internal networks, have a common auth model across all our cloud resources and are backed by reliable, secure and highly scalable infrastructure. If you see pulling images across clouds is faster or more reliable then the cloud providers registry, something is very very wrong and one of us has some work to do…
When running nodes in multiple regions, a subset of the issues still arise. While you may have a common auth model across the multiple regions, and each region has its reliability SLAs, you’re still faced with network close operations. If you have anything more than a minimal workload, you’ll likely want a local registry to that region.
Consider 5 nodes that must pull a 1gig image. Each time a pull is required, each node must pull the delta layers across the internet. You face egress fees, latency and risk reliability issues. If the image was pushed to a local replica, it only needs to go once, then all the nodes benefit from a local pull.
CDN can help here, as the image layers are by far the most problematic as they are the large elements. However, a registry is a combination of a REST API and the serving of storage layers. If a registry supports CDN for relatively local content, how fast will the newly built image be available? How many locations is the registry REST endpoint located?
This is where I can pitch the value of ACR and the geo-replication capabilities. With ACR Geo-replication, you push to a single registry, served by the network-closest region, and all the other regions will receive the newly pushed image. With regionalized webhooks, you can receive notifications when the image arrives in each region, allowing local deployments. Here’s another post on: Working with ACR Geo-replication notifications
What about vendor offerings?
Using ecr, gcr or acr doesn’t preclude using a vendors registry. But, I will suggest asking this question: When was the last time you asked your cloud vendor to use your hardware in their racks? Each cloud has invested in their infrastructure, including the underlying registries. For the same reasons, I’d ask your vendor if they leverage the registry provided by each cloud. If not, why? How are they “better”, how do they handle all the issues above? Vendors/Partners like Codefresh build atop the cloud vendors registry infrastructure. When running Codefresh for resources in the google cloud, you should choose a google backed registry. Likewise, we’ve recently announced Codefresh integrates with Azure Container Registry (ACR). Codefresh has a lot of great value to add, atop the registry. From an Azure point of view, we see it as a win/win. The customer gets a great integrated experience we offer with acr and all the various container hosts in Azure, while the customer gets the added benefits Codefresh offers. From an Azure perspective, we still see the revenue to justify the investments in running ACR, while we all benefit from a partner sales and support.
So, choose the registry of your cloud, and if you’re using a vendor solution that adds value, leverage the clouds registry so you get the best of both.
Using Docker Hub for production deployments
Docker hub adds a balance of simplicity and core requirements. While it appears easy to use Docker Hub for your images, does it meet the requirements above? Is it network-close? Does it share your auth model? Does it run in the region of your deployment?
Docker Hub runs on AWS, so if you’re using AWS services, you might think it’s at least network close. But, do you have a common auth model? Are you in control of the base images you reference? I would absolutely recommend pushing publicly accessible images on Docker Hub as it’s the primary place people look for images. While Microsoft is transitioning all Microsoft software distribution to the Microsoft Container Registry, we still syndicate the catalog of information to Docker Hub.
However, even if running in AWS, keeping all images used for development and deployment in your own private registry provides the security, governance and control required. This includes maintaining a corporate cache of the base images used in your dockerfiles.
What about On-Prem offerings?
This is where it does get a little tricky. For the same reasons you want a local registry in each region, you’ll also want a local replica. If you run a large number of small stores or remote offices that need to pull images, you can also benefit from ACR geo-replication capabilities, as ACR will handle all the big issues. If you have a decent size workload, you’ll want a local replica. This is where the vendors have clear have additional value. This is where they do want and need to run their own instance of docker/distribution to host images locally, while providing a means to replicate to your primary cloud.
What about multi-cloud
You could argue, multi-cloud is just another instance of hybrid on-prem and cloud offerings. If you run any reasonable workload on multiple clouds, you’ll want a local registry to that cloud, and yes, run multiple registries. You’ve made a decision to run resources in multiple clouds. A registry is part of your production deployment as production nodes that scale, upgrade need to re-pull the images they must run.
Summing it up:
- Choose a registry offered by your cloud provider
- Keep the registry in the same local network as each deployment
- When using one of the great vendor offerings, ask if they’re using, or provide integration with the clouds core registry
- When on-prem, or widely dispersed, consider a geo-replicated, or at least CDN backed registry for close enough deployments. If each on-prem location is big or important enough to not be dependent on the internet, run a local replica.
I hope this helps, and I suspect this will inspire a number of interesting conversations…