top of page

Have You Ever Tried SQLing Your Cloud? The Power of Steampipe



Infrastructure


We, the production engineers, are probably all very familiar with this scenario:


Your TL: "Can you tell me which instance type is used for our use1-consul-01 server?"


You: "Sure."

Logs into AWS

Opens EC2 dashboard

Goes to EC2 'Instances' page

Write 'use1-consul-01' into the search bar


You: "We use m6in.xlarge"



Consider this one as well:


Your TL: "Can you let me know what are our most used instance types? Let's say, top 15 ones?"


You

contemplating at a certain point in space

Thinking to yourself: "ohhhhhh I can use AI!!!!"


You: "Sure!"

Opening Claude

* Typing: "Write me a Python code that will fetch me the 15 most used instance types in my VPC"

Thinking to yourself: "oh wait, it should get the creds from env vars, and get the vpc-id as param"

A few iterations of prompts, runs, until you get the expected result


You: "OK, got it, sending it over"



Now what if:


Your TL: "Can you let me know what are our most used instance types? Let's say, top 15 ones?"


You: "Sure!"

Running 'steampipe query "select count(instance_id) as instance_count, instance_type from

aws_ec2_instance group by instance_type order by instance_count desc limit 15"'


You: "Got it, sending it over :)"



The efficiency is a bit exaggerated but believe me, it takes much less time (and my SQL skills improved), and since I heard about Steampipe, this tool has become a significant part of my daily workflow.


Steampipe merges the power of SQL queries with the ability to access cloud data effortlessly, making it an extra useful tool for anyone managing complex environments.



What is Steampipe?

An open-source zero-ETL tool fetching data directly from APIs and services. It uses plugins and each plugin is a service with a set of APIs for querying (I use AWS and Kubernetes).


The beauty of Steampipe lies in its ability to treat the cloud resources you have as tables in a SQL database (the plugins I use for example, are actually Postgres FDWs). This allows you to  leverage (postgres-flavoured) SQL syntax to gain insights without the need for cumbersome code.



Key Features of Steampipe:

  • Unified Querying: Use SQL to interact with multiple APIs seamlessly (e.g. AWS EC2 and S3 APIs).

  • Extensible: Support for various plugins allows cross-platform data interaction (AWS and K8s plugins).

  • Open Source: Community-driven development promotes innovation and collaboration.



Real-World Example

There was a security initiative to change a highly used IAM role to a different one with more granular permissions. First, we needed to evaluate the scope of the task. In other words, we needed DATA!!


The role (named insecure-role for the sake of example) was used only with instance-profiles, so what we need is to find the number of instances that use this IAM role.


Sounds pretty simple, right?


A possible course of action is trying to do it in the AWS console, which I am not even going to get into. Another one is trying to do it with Python, which is a viable option on all accounts.


For this you can use your favorite model to generate the code, and try to run it.

But in my experience, there are always tweaks and adjustments to make it work, which can be tedious if I have to make these tweaks every time I need to fetch some data.


The next option is, as expected, using Steampipe. I had it installed on my local machine (already configured with AWS profile and the list of our regions). Without too much complexity, I first generated the temp creds for the AWS profile, and then entered the interactive query shell of Steampipe.


Then I constructed the SQL statement. First I had to find out which columns I wanted to fetch. 

This can be done using:

Infrastructure

I found the column I want to filter by: iam_instance_profile_arn. I need to group it by region, and get the amount of instances that have this attached.

Infrastructure

As simple as that, I got the result:

SQL statement

Now, I can give a much better estimation for the effort to deprecate the insecure-role across all our regions.


That is one example of how to handle relatively simple, but otherwise tedious fetches of data (that I find myself doing pretty frequently).


I still use Python many times when I fetch data BTW, but it’s more useful in the cases where I need to perform some logic after I get the data.



Getting Started with Steampipe (AWS and Kubernetes Plugins)

To illustrate the power of Steampipe, let’s walk through a few examples until eventually you will see how both AWS and Kubernetes plugins are used together in harmony.


Step 0: Install Steampipe


Install Steampipe (of course there are installations for Mac, Linux and Windows).


Step 1: Install the Necessary Plugins

First, ensure you have Steampipe plugins installed:

SQL statement steampipe

Step 2: Getting Ready to Query

After installation of the plugins you have config files created for them under ~/.steampipe/config/.

In aws .spc you’ll find essential stuff like:

SQL statement steampipe

Where you can configure the AWS profile (assumed to be existing locally) and the regions you want to query.


In kubernetes .spc in the same directory:

SQL statement steampipe

So you can configure any kubeconfig you have present on your workstation.

When those are configured, you are good to go!!!


NOTE: I use the default single connection in the config files. When using more connections the name format of tables you query changes.


Step 3: Query!!

Starting with a simple query using the AWS plugin. All virtual tables that are under that plugin have the format aws_* (like aws_vpc, aws_vpc_security_group, aws_ec2_instance, etc).


To see what VPCs you have in your account in the regions you configured in aws .spc:


SQL statement VPC

  • There are many more columns than shown in the above query (it has been redacted for the sake of example).

  • Be advised, when querying large amounts of resources, this can be API intensive and even sometimes result in rate-limits by AWS. To avert that, add limit 10 or the like, at the end of the query.


A simple example of querying with the kubernetes plugin:

SQL Kubernetes

  • The ->> operator is used to extract a value of a certain key in a json object.The object looked like: {“a”: “b”, ..., “osImage”: “Amazon Linux 2”, ... } 

  • When pointing to the desired kubeconfig, steampipe will try querying all the contexts in there unless you configure a single context specifically.


There are more commands I curated that can be very useful for getting started.


Step 4: Taking it One Step Further

Let’s try to make it a bit more useful than just getting data (even though that in itself is pretty useful).


Say I want to find all the ingress (inbound) security group rules in a certain VPC vpc-1234567890abcdef that contain the range 192.168.10.0/24.


To do that, I can use something like this:


VPC

  • The <<= operator is a Postgres function on network addresses (the column type is cidr). 


Step 5: (Bonus) Taking it One Step Sideways

Now, say I want to find the tag with the key environment on the instance a pod is running on.

For example, let’s take the pod aws-node-aaabb, which is on a problematic node:


AWS Infra

Key points here:

  • The LEFT(TEXT(i.private_ip_address), -3) part is to convert the inet type to text and remove redundant chars to make the join possible.

  • This part: addresses->0 ->> 'address', is used to take the value of the key address from the first item in the addresses list of objects.

  • All these are postgres functions (docs).



Conclusion

Steampipe has simplified data access for me, allowing for cross-referencing between different sources and enabling me to derive insights with ease. Initially, it was a tool I had never heard of, but it quickly became essential for me. Now, it’s my go-to solution for fetching data from my AWS environment, especially when I need to cross-reference it with Kubernetes resources. 


If you haven’t explored Steampipe yet, I highly recommend giving it a try.



Useful Queries

  • Show the AWS security group inbound rules (that contain CIDRs as source) that are contained in another rule (e.g. 2 rules on port 8080, where the 1st allows 192.168.10.0/24 and the 2nd allows 192.168.0.0/16).

Useful Queries

Getting VPC CIDRs for a certain VPC.

VPC CIDR

NOTE: Since kubectl is a very flexible and robust tool, getting info in the Kubernetes context can be done with it on its own. But when you want to get some AWS-context data on the K8s resources, that’s when Steampipe can be really useful. 


Nevertheless, here are a few queries using the Kubernetes plugin:


  • Extracting the startup delay in seconds (the amount of seconds it took for pods since their creation time until the container is started)

Kubernetes AWS

  • Amount of failed scheduling events per namespace in the last 24 hours.

Kubernetes AWS scheduling
  • Amount of pending pods per namespace

pending pods per namespace

 
Omer Dolev

This post was written by Omer Dolev

 

More of Wix Engineering's updates and insights: 

Comments


bottom of page