pulumi-hugo-cn/themes/default/content/blog/observability-with-infrastructure-as-code/index.md

---
title: "Observability with Infrastructure as Code"
date: 2021-02-26
meta_desc: "Andy Davies from Reaktor introduces observability into infrastructure as code with the Pulumi Automation API"

#meta_image: infra-observability.png
authors:
- andy-davies
tags:
- guest-post
- automation-api
- observability
- honeycomb

meta_image: "reaktor.png"
---

**Guest Article:** [Andy Davies](https://andydote.co.uk) is a Senior Software Developer at [Reaktor](https://reaktor.com),
a tech company based in Amsterdam, New York, and Finland, writing about using the Pulumi Automation API to add
observability to infrastructure provisioning.

<!--more-->

When using the [Pulumi Automation API](https://www.pulumi.com/blog/tag/automation-api/) to create applications that
provisions infrastructure, it is very handy to use observability techniques to ensure the application functions
correctly and to find where performance bottlenecks are.

One of the applications I work on creates a VPC and Bastion host and then stores the credentials in a Vault instance.
The problem is that the 'create infrastructure' part is an opaque blob, in that I can see it takes 129 seconds to create,
but I can’t see what it’s doing or why it takes this amount of time.

![honeycomb traces of one pulumi stack resource](before.png)

So can I do better?

### The Initial Application

In this example, I use [Honeycomb's](https://honeycomb.io/) [Go Beeline](https://github.com/honeycombio/beeline-go/) to
capture all the data I care about; durations, errors, and any context which is 'interesting':

```go
func main() {

	beeline.Init(beeline.Config{
		WriteKey: os.Getenv("HONEYCOMB_API_KEY"),
		Dataset:  "pulumi-demo",
	})
	defer beeline.Close()

	ctx, span := beeline.StartSpan(context.Background(), "basic-vpc")
	defer span.Send()

	name := auto.FullyQualifiedStackName(os.Getenv("PULUMI_USERNAME"), "basic-vpc", "dev")
	stack, err := auto.UpsertStackInlineSource(ctx, name, "basic-vpc", func(pc *pulumi.Context) error {

		azs, err := getAvailabilityZones(ctx)
		if err != nil {
			beeline.AddField(ctx, "err", err)
			return err
		}

		v, err := vpc.NewVpc(ctx, pc, "dev", &vpc.VpcArgs{
			Description:           "dev",
			BaseCidr:              "192.168.0.0/16",
			AvailabilityZoneNames: azs,
			S3Endpoint:            true,
			DynamoEndpoint:        true,
		})
		if err != nil {
			beeline.AddField(ctx, "err", err)
			return err
		}
	})

	if err != nil {
		beeline.AddField(ctx, "err", err)
		os.Exit(1)
	}

	if err := stack.SetConfig(ctx, "aws:region", auto.ConfigValue{Value: os.Getenv("PULUMI_REGION")}); err != nil {
		beeline.AddField(ctx, "err", err)
		os.Exit(1)
	}

	ws := stack.Workspace()
	if err := ws.InstallPlugin(ctx, "aws", "v3.23.0"); err != nil {
		beeline.AddField(ctx, "err", err)
		os.Exit(1)
	}

	if _, err := stack.Refresh(ctx); err != nil {
		beeline.AddField(ctx, "err", err)
		os.Exit(1)
	}

	stream := optup.ProgressStreams(os.Stdout)
	if _, err := stack.Up(ctx, stream); err != nil {
		beeline.AddField(ctx, "err", err)
		os.Exit(1)
	}

	//vault code

}
```

### Adding Infrastructure Observability

To get a handle on what happens when `stack.Up()` runs, I have implemented a custom `io.Writer`, which is
passed into the `ProgressStream` constructor.

The custom progress stream's `Write` method is called once for each line emitted, which allows us to start new spans
when a resource starts being constructed, and send them when construction completes. This is currently achieved by
parsing the console output text, but I gather in the future, it will be possible to get streamed JSON blobs that can
be unmarshaled into go structs.

```go
type pulumiBeeline struct {
	ctx      context.Context
	contexts map[string]func()
}

func NewPulumiBeeline(ctx context.Context) *pulumiBeeline {
	return &pulumiBeeline{
		ctx:  	ctx,
		contexts: map[string]func(){},
	}
}

func (cw *pulumiBeeline) Write(p []byte) (n int, err error) {

	// todo: make more robust, support modifications, deletions etc.
	line := strings.TrimSpace(string(p))
	parts := strings.Split(line, " ")
	if len(parts) < 5 {
		return len(p), nil
	}

	//+  aws-vpc dev creating
	//+  <type> <name> <action>
	resourceType := parts[2]
	resourceName := parts[3]
	resourceAction := parts[4]

	if resourceAction == "creating" {
		c, s := beeline.StartSpan(cw.ctx, resourceName)
		beeline.AddField(c, "type", resourceType)
		// add other things here

		cw.contexts[resourceName] = s.Send
	}

	if resourceAction == "created" {
		cw.contexts[resourceName]()
	}

	return len(p), nil
}
```

Modifying the `optup.ProgressStreams` is the only change needed to the original application:

```go
stream := optup.ProgressStreams(os.Stdout, NewPulumiBeeline(ctx))
if _, err := stack.Up(ctx, stream); err != nil {
	beeline.AddField(ctx, "err", err)
	os.Exit(1)
}
```

When I rerun this program, I can see a lot more information in my Honeycomb traces, which shows me
that Pulumi is _highly_ parallelized and provides a better idea of where the time is spent when creating
infrastructure. In this example, it’s the NAT Gateways:

![honeycomb traces of all infrastructure resources in the pulumi stack](after.png)

In the future, I want to expand this to cover far more details. Among those details are why resources were
created/modified/destroyed and collecting more information on what causes resources to fail.

## Wrapping Up

In the end, this turned out to be much easier to achieve than I had hoped. Using Pulumi programmatically was a huge
productivity boost, instead of running `os.Exec` directly.

I am looking forward to all the new kinds of tooling I can build to solve my user's problems by continuing to utilize
Honeycomb for my observability and Pulumi for my infrastructure.