13. March 2024

An Intro to Data Build Tool (dbt)

DBT is a tool to simplify populating the relationships between different tables. With DBT, you can specify the queries used to create your tables, as well as parameterizing portions of those queries. You can also add data tests.

more

13. November 2023

Finding Sator Squares

I had a visit with my mother recently where she introduced me to the idea of Sator Squares. It’s a five letter acrostic, popular in ancient Rome, and originally rediscovered during the excavation of Pompei and Herculaneum. It has the interesting property that transposition of the matrix is an identity operation.

more

25. October 2023

An uninformative error in bigquery

Just a quick note about an uninformative error I saw in bigquery the other day and was having trouble finding on google. If you see the error:

MaterializedView is required for DerivationSpec

When trying to create a materialized view in bigquery on google cloud platform (GCP), it means you didn’t specify the query, or including an empty query. It’s an easy mistake to make. Maybe this will help someone some day.

13. July 2023

Schemas and SqlTransform in Beam

Beam has support for working with collections of data that conforms to a schema, and you can use SQL Transforms to transform this data. This feels a bit more like working with data in spark, but beam does not have the same level of optimization.

more

13. October 2022

Safe Navigation in Python

In a chain of method calls, what happens when one of the calls returns a null? The next chained call will throw some kind of reference error. But what if you don’t want to deal with a reference error? What if you want the null value to be the answer? That’s where a safe navigation operator is useful.

more

21. September 2022

A Tale From Microservice Hell

Ages ago I was working as a contractor on a data engineering team, working on a knowledge graph platform that was built on a service architecture.

more