As a developer and a cloud architect, the network side took me the most time to master. But after working on multiple projects in small to huge companies, I wanted to document the different approaches to building, managing and communicating across VPCs along with their pros and cons. The idea is to have a kind of decision tree to select the best solution.
Don't forget there is no one size fits all solution, there is only better designs for specific needs.
The different designs we'll address are :
- Avoid cross communication
- VPC peering
- AWS Transit Gateway
- VPC endpoint services
- VPC sharing
If you're reading this article I assume you already have the basic knowledge so I'll make it fast.
A VPC stands for Virtual Private Cloud, it's an isolated network within which all your resources are able to communicate between each other. AWS default VPC with this CIDR block 172.31.0.0/16. This provides 65,536 private IPv4 addresses
A subnet is a section of the VPC allocated in an Avaibility Zone. AWS default subnet are size /20 block so 4094 private IPv4 (172.31.0.0/20, 172.31.16.0/20, 172.31.32.0/20) . A subnet can be public when it has an Internet Gateway attached otherwise the subnet is considered private. It means resources in the private subnet won't have access to internet (no yum update allowed for example).
Nowadays, AWS recommends the multi accounts architecture to isolate your different services into their own dedicated account.
Because of this design, the question is : since VPC are isolated, how do services running in different VPC can communicate each other ?
When I'm working on a design or an account reorganization, I first challenge the need for VPC's cross communication. Do you really need your VPC to communicate with each other or can you better design your architecture to avoid it? This question really matters because if you can avoid "bridging" your VPC you will drastically ease the netorkwing management in your account and the security. It also has some advantages we'll see later.
Use case: instead of exposing an endpoint to get some statistics, you could push your stats into an accessible shared location. Basically can you push instead of pull your data.
VPC peering is the easiest way to create a bridge between 2 VPC.
VPC A asks VPC B for peering. VPC B accepts the connection. VPCs update their route table and the connection is estasblished.
This pattern is simple because we add no components, it works cross region and is also cost effective because it's free. This design is perfect if you have few VPC because the subnet CIDR block can't overlap. It's harder to manage in a Account Factory pattern (when you can create multiple accounts with the same process). Peering works with VPCs in either same account or different accounts, within the same region or across regions. So it can be useful when you want to peer your ec2, rds or lambdas.
The VPC peering is a great solution but it becomes hard to manage when your network scale to hundreds or thousands VPCs. Managing route tables and avoid IP overlaping isn't easy. Here is where AWS Transit Gateway comes in.
Without and with AWS Transit Gateway
We can see that Transit Gateway centralize the management of all VPC connections through this hub-and-spoke component. It allows you also to connect your on-prem VPC. It's easy to understand how it simplified the management of hundreds of VPCs. Some companies with large networks creates a dedicated AWS account for the network management. This is where the AWS Transit Gateway would lives in.
So if Transit Gateway looks so perfect, should you forget about peering? Not at all. Transit Gateway has some drawbacks compared to peering. One of them is that this structure is charged per VPC attachement. 730 hours * 0.05$ = 37$ per VPC attached without the traffic charge which can be expensive for medium companies. Transit Gateway will also add another hop in your communication so if you are looking for high performance it's a think about it. And finally, transit gateway comes with a 50Gbps bandwith limit.
Because you can't have IP overlapping in your transit gateway you have to be very meticulous about how you allocate your CIDR block. Usually you want to use a small range like /28
Sometimes you don't need to expose all your VPC, you simply need to expose an endpoint. Remember my first use case in Solution 1, you want to expose a /metrics endpoint for an internal monitoring tool like prometheus. If it's your only endpoint, maybe peering your VPC is a bit overengineered. Another solution is to create a VPC endpoint.
VPC endpoint service
This solution works with a provider/consumer system. The provider exposes its service through a VPC endpoint service via NLB and the consumer consumes the service through a VPC ENI. This solution is not cross-region compatible. The connection is uni directional, it means the provider won't be able to access services in the consumer's VPC.
Another key point is that provider can only expose services through a Network Load Balancer. About security you can define VPC endpoint policies which are IAM policies. Finally each interface endpoint can support a bandwidth of up to 10 Gbps per Availability Zone and automatically scales up to 40 Gbps.
Each Endpoint Interface is charged 7.3$ per month per AZ without the traffic.
From AWS, "sharing VPCs is useful when network isolation between teams does not need to be strictly managed by the VPC owner, but the account level users and permissions must be"
This model works perfectly in an Account Factory pattern where you need to create a new account for each new service/project. The cloud admin creates a VPC and share it with all other accounts. In this scenario, the management is really simple because there is no inter-VPC connection. All services are in the same VPC so able to communicate with each others. The security is managed by Security Groups.
AWS VPC Sharing
This design works well if all the services deployed in the shared VPC belong to the same owner (think about a private network in a bank for example), but I don't recommend it if you want to isolate some of your customers.
This architecture has no quota and lower the costs because there is no data transfert charges for traffic in the same AZ across multi accounts.
One of the drawback is that sharing VPC is only available across accounts in the same organization. So if you plan to take off account of your organization it won't be straight forward.
Now that we have a better understanding of the different design according to different needs, let's summarize it.
Feel free to add your comments and I will happy to update this article.