Bastion Host & Networking¶
This document explains the architecture and usage of the AWS bastion host.
1. Architecture Overview¶
Why do we need a bastion?¶
We use OIDC, with the unpriviliged runners in the central KNMI build account. We have the opinion that the gitlab
runners should not be the entity on which we control permissions/access, they should be 'dumb'. Rather we control
permissions on the short-lived identities used in OIDC. However, this does create some difficulties where our pipelines
need to access resources inside our VPC's. Our core services (Grafana, Prometheus, etc.) are deployed in private subnets
with no public internet access. To interact with their APIs (e.g., from CI/CD runners or local developer machines), we
must go through a secure entry point. For this we chose to use a simple bastion host as single entrypoint.
How does it work?¶
We use a "zero-trust" bastion model. Instead of relying on SSH keys, we use AWS Systems Manager (SSM) Session Manager to provide secure port forwarding.
- Infrastructure: A single EC2 instance (
module.bastion_ec2_instance) is deployed by Terraform into theprivatesubnets. - No Public IP: The bastion has no public IP address and no open inbound ports.
- Access: All access is handled only through the AWS SSM API. This is more secure and fully auditable via CloudTrail.
A typical connection flow looks like this:
- A CI runner or local user (with valid AWS credentials) authenticates with the AWS SSM API.
- The user requests a
StartSessionfor port forwarding. - SSM securely connects to the bastion instance (via the SSM agent).
- The bastion opens a connection to the final destination (e.g., the Grafana ALB) on the private network.
- The user's local port is now securely tunneled to the private service.
2. CI/CD Integration (.connect-bastion)¶
In GitLab CI, any job that needs to talk to a private service (like Grafana) simply extends: .connect-bastion. This
job template, defined in
/.gitlab/ci/setup-bastion-connection.gitlab-ci.yml,
performs a very specific setup.
The /etc/hosts Trick¶
The most critical piece of this job is the before_script. It performs two key actions:
-
It modifies
/etc/hosts:Why? This tricks the runner's OS. When a tool like
gcxtries to connect tohttps://sre.dev.knmi.cloud, the OS resolves this domain name to127.0.0.1. -
It starts the tunnel on port 443:
Why? This command (which calls the
misetask) starts the SSM port forwarding. It maps the runner's local port 443 to the bastion, which then forwards the traffic to the real Grafana ALB.
The result: When gcx sends a request to https://sre.dev.knmi.cloud, the OS sends it to 127.0.0.1:443. The SSM
tunnel picks it up and forwards it. This also correctly preserves the Host: sre.dev.knmi.cloud header, which our AWS
ALB needs to route the request to the correct service.
3. Local Development Setup¶
Maintainers can use the same mise task to connect from their local machines.
Prerequisites¶
- mise installed and configured
- AWS CLI configured with appropriate permissions
- Session Manager plugin installed (automatically handled by the script)
How to Connect¶
-
Run the
misetask:The task will run in the background. You are now connected.
-
Access Grafana: You can now access Grafana by pointing your tools to
https_localhost:4380.- Browser: Open
https://localhost:4380(you will need to bypass the SSL certificate warning, as the cert is forsre.dev.knmi.cloud, notlocalhost). -
CLI:
- Browser: Open
4. Troubleshooting¶
Port already in use¶
This means another process is already using the local port.
Solution: Check what is using the port, or specify a different local port.
Session Manager plugin missing¶
The script attempts to automatically download and install the plugin, but if you encounter issues, you can install it manually.
Access Denied / Instance Not Found¶
This is almost always an AWS authentication error.
Solution: Check that you are authenticated to the correct AWS account (dev or ops).
- Ensure your assumed role has
ssm:StartSessionpermissions on the bastion instance. - Ensure the
gotoawstool is picking up the correct instance ID. You may need to be more specific if wildcards fail:mise run connect-bastion --instance <instance-id>