When new cloud services emerge, user guides and blogs often prioritize application features and usage, leaving infrastructure engineers scrambling for crucial details about networking, security, and operational considerations. However, these aspects are vital for ensuring smooth deployments, maintenance, and support.
This blog bridges that gap by focusing on Amazon Bedrock from an infrastructure engineer's perspective, empowering you to quickly grasp the essentials, understand its impact on your infrastructure, and navigate key areas like networking, security, segmentation, and day-to-day operations.
Building Block Concepts
Before diving into Amazon Bedrock, lets quickly revisit the key concepts and terminologies. Feel free to skip if you are already familiar
What is the difference between AI and ML?
AI is the goal: Imagine intelligence like a mountain. AI encompasses different approaches to reach that peak, mimicking human-like thinking.
ML is a path: Machine learning is one way to climb that mountain. It uses algorithms that learn from data to solve problems, improving over time.
Not all climbers are the same: While many AI systems use ML, some (like rule-based systems) don't. But all ML applications fall under the broader umbrella of AI.
What is Generative AI?
Generative AI is a type of AI that creates new content, like text, images, music, or even code.
It learns the patterns from existing data and uses them to generate something original.
What is a Foundational Model?
Think superpowered AI: Foundational models are trained on massive data and can perform many different tasks, like writing, translating, or generating images.
They're adaptable: Instead of needing specific training for each task, they can be fine-tuned for various uses, saving time and effort.
Building blocks for the future: They serve as a base for developing specialized AI applications, shaping diverse fields from healthcare to entertainment.
What is an LLM?
LLM stands for Large Language Model: It's an AI trained on massive amounts of text, allowing it to process and generate language in impressive ways.
Think advanced text ninja: It can understand language nuances, translate languages, write different kinds of creative content, and answer your questions in an informative way.
Still learning: While powerful, LLMs are constantly evolving and can sometimes make mistakes or produce biased outputs.
What is Amazon SageMaker?
Machine learning made easy: Amazon SageMaker is a cloud platform that simplifies building, training, and deploying machine learning models.
No ML headaches: It offers tools for data prep, training, hosting, and managing models, all in one place.
For all skill levels: Whether you're a beginner or an expert, SageMaker can help you unlock the power of ML.
What is Amazon Bedrock?
Lego set for AI: Amazon Bedrock offers access to powerful foundation models from various companies, like AI building blocks.
Easy AI integration: Use a single API to access, fine-tune, and use these models for creative text, image, and code generation.
Focus on your ideas: Leave the AI infrastructure to Amazon Bedrock, letting you concentrate on building groundbreaking applications.
With the basics out of the way. Let's begin.
Introduction
Amazon Bedrock is a managed service by AWS which simplifies consumption of Generative AI by making base models from Amazon and other 3rd party providers available via API. The account and infrastructure for Amazon Bedrock are specific to the model provider and is hosted in their escrow accounts which are owned and managed by AWS.
Amazon Bedrock Back-end Architecture
Amazon Bedrock offers Foundation Models with single or multi-tenancy options where
1- Customers can use the base model in a Multi-tenant setup using a read-only mode so no one can make changes to the model
2- Make a copy of the base model and fine-tune it based on their own customization in a Single-tenant setup which is kept in the Fine-tuned model S3 bucket.
3- Train the model using their own training Data. All the customization and training done are only accessible to the customer and is never fed back to the common base models.
In this model, you use Amazon SageMaker for Training Orchestration. An Aamzon SageMaker Training job is started in the model providers account
SageMaker then reaches out to the customer S3 bucket for the training data and finally the fine-tuned model is put in a s3 bucket in the model providers account.
The customer data is never placed in the model providers account.
Now this fine tuned model s3 bucket is all trained and can be used as a Single Tenant endpoint as well.
Amazon Bedrock Front-end Architecture
The Amazon Bedrock service is made accessible via a API Endpoint, just like any other Amazon service. This API endpoint can be reached by different means
From a VPC using NAT Gateway (or if Client-A has public IP, it can go directly over IGW)
Using AWS Private Link
From on-prem, using Direct Connect (via TGW) (or directly over internet)
End-to-end network walk thru
The following diagram pieces the front-end and back-end architecture together
Walking from left to right in the diagram above, Amazon Bedrock APIs are accessible via multiple ways as mentioned above. Depending on whether you are using an existing model or making a copy for your own use, it doesnt have much customer managed infra dependency. However, if the model needs to be trained on your data where you may use Amazon SageMaker, it will require access to your data that lives in S3.
Lets look at the right side of the above diagram which shows how Amazon SageMaker would get access to your S3 buckets especially if the bucket is not exposed to internet and is only accessible via a VPC and has bucket specific policies. In this case SageMaker will need to be configured so it drops an ENI in a subnet in customer VPC-C which is provisioned with a S3 Gateway Endpoint with access to the appropriate S3 buckets.
Connectivity, Network Security and Observability Considerations
a. NAT Gateway:
NAT Gateway is a quick and easy way to get started however it provides unrestricted access to internet and cannot be restricted to specific destinations only. NAT Gateway also has its own per hour and per GB cost that should be factored in. Lastly, if multiple VPCs need access to Amazon Bedrock, each VPC will require its own NAT Gateways.
b. AWS PrivateLink
AWS PrivateLink solves some of the challenges with NAT Gateway by providing a secure path to Amazon Bedrock API endpoint. However, following two challenges must be considered by infrastructure owners
i. Many customers deploy multiple applications in the same VPC but different subnets. If you have multiple applications in the same VPC where only select applications or subnets should have access to Amazon Bedrock, Security Groups can be used to limit the access. However, Security Groups in such case will require more operational involvement.
ii. When multiple VPCs need access to Amazon Bedrock, AWS PrivateLinks can become very costly. In these instances, consolidating AWS PrivateLinks to select centralized services VPCs may be a solution.
iii. Use S3 Gateway Endpoint to reduce cost associated with VPC PrivateLink Endpoint (instantiation /hr and data processing /GB).
c. Prod vs Non-Prod Segmentation
When there is a centralized VPC with access to Amazon Bedrock (PrivateLink), its not easy to limit only specific VPCs to have access to it. As several organization choose to have separate training models for Production versus non-prod (such as test, dev), it becomes critical that the network access are segmented as well.
d. Network Observability
As traffic to Bedrock maybe coming from different VPCs, regions and on-prem locations, across different accounts, operations teams need an easy and consolidated way to see the traffic (for compliance and support). This may be a combination of VPC Flow Logs, Netflow, Cloud Trail and other data sources.
e. Hybrid access
If the source of traffic is outside AWS, an Bedrock API can be accessed over internet or via an AWS PrivateLink accessed over Direct Connect or IPSec VPN.
Comments