Bug report: []map[string]any is not a subtype of []any

In my team we prefer to use Go when writing data ingestion pipelines consuming from JSON APIs 1. We have made a simple three-step design of extract->transform->load, with each step being loosely coupled and composable. For instance, we could have an extract step that fetches JSON responses from an endpoint with basic authentication, a transform step using DuckDB, and finally loading the transformed data into BigQuery. One of the tools we use in the transform step is Benthos’s Bloblang. Bloblang is a DSL for mapping and transforming data, particularly well suited for JSON manipulation. It is written in Go, is untyped and has a syntax that feels somewhat JavaScript-inspired 2. I am pretty new to Go, and I am yet to experience all its quirks and footguns, which there is plenty of, see for example this book on the subject. This post is about one such quirk, which I experienced as a nasty tricky-to-identify runtime bug (discovered in the development phase though πŸ˜…).

A common design for JSON payloads we fetch from our third-party data providers is to insert it unchanged into our data warehouse, or only with minor transformations (like transforming unix times to timestamp), following the EtLT pattern. This is often preferable due to 1. when JSON fields are not explicitly mapped to data warehouse columns, we skip schema migration issues completely 3 and 2. the cleaning and data transformation all the way from raw data to data products is owned by data analysts with domain knowledge. Go is a statically typed language, and it provides three ways for decoding JSON data:

  1. Create a struct type that matches the fields of interest and decode the JSON object into those, like exemplified here: https://go.dev/blog/json. This is basically the same functionality that pydantic provides to Python. You type validate all JSON fields to a corresponding Go type.
  2. Use the type map[string]any, which is Go’s equivalent of a “JSON data structure”: a key-value map where keys must be string and values can be anything. All valid JSON complies with this type (top level array is not valid JSON).
  3. Use the type any. any accepts arbitrary data, which – obviously – also comprise all JSON payloads.

For the EtLT pattern of “forwarding” arbitrary JSON payloads, only option 2 and 3 is viable. I prefer to use map[string]any whenever possible, as it gives at least some static type validation; any provides none.

In the transform step we use the bloblang#Executor.Query to transform the received JSON payloads according to a bloblang mapping. And here is the crux:bloblang is untyped (as a JSON mapping language should be) but Go is typed. So, in our case, we need to pass data of some specific type from Go to the Query method, whether we like it or not. Here is the documentation for Query:

func (e *Executor) Query(val any) (any, error)

Query executes a Bloblang mapping against a value and returns the result. The argument and return values can be structured using the same map[string]interface{} and []interface{} types as would be returned by the Go standard json package unmarshaler.

If the mapping results in the root of the new document being deleted then ErrRootDeleted is returned, which can be used as a signal to filter rather than fail the mapping.

When sending val to Query as map[string]any with the result key being []map[string]any, the following error occurred:

execute bloblang failed: query bloblang failed: failed assignment (line 8): expected array value, got null from field this.results

This is the relevant part of the bloblang mapping that failed:

let filtered = this.results.map_each(
    ele -> ele.<some_filtering_logic>
)

and the payload looks something like this:

{
  "results": [
    {"foo": "bar", "baz": ["barbaz", "foobar"], ...},
    {"foo": "bar2", "baz": ["bazbar", "barfoo"], ...},
  	...
  ]
}

This is strange, results is clearly an array. I downloaded the payload and tried to run the transform locally via benthos to make sure there was nothing wrong with payloads were valid JSON. But everything ran as expected. Could this be an issue related to Go’s types?

For the results field, I had intentionally used my preferred type for arbitrary JSON payloads in an array, []map[string]any, to pass as input to the Query(val any) method. When I changed the type of results to []any (and no changes to the data itself), everything ran as expected! Problem solved, happy for that, but this was a surprising bug to me.

So, how can this happen? I provided the text above from this post to ChatGPT (GPT-4) with this prompt:

I am writing the blog post below. Could you explain to me how a map[string]any with any being of type []map[string]any could produce a runtime error in bloblang#Executor.Query while the exact same data with any being []any works as expected?

<…the text above..>

Here’s the answer I got:

The issue you encountered stems from the way Go’s type system interacts with interfaces, particularly the any interface, and how type assertions or type checks are performed in Go when interfacing with dynamically typed systems or languages, such as Bloblang.

In Go, any is an alias for interface{}, which can hold values of any type. When you use map[string]any, you’re creating a map where the values can be of any type, and Go’s runtime will keep track of the actual types of those values. Similarly, []any is a slice where each element can be of any type.

When you pass a map[string]any to Bloblang’s Query method, and within that map, you have a key (e.g., results) pointing to a value of type []map[string]any, you’re in a situation where the Go type system’s static nature subtly conflicts with Bloblang’s dynamic expectations.

Bloblang, being dynamically typed and designed to work seamlessly with JSON data, expects to operate on structures that mirror the flexibility of JSON. When the Bloblang documentation mentions that Query can work with map[string]interface{} and []interface{}, it’s assuming that these structures will be as flexible as JSON, allowing for any type of nested data.

However, when you pass []map[string]any as part of the structure, you’re providing a slice of maps with a specific type (map[string]any). This is subtly different from []any in terms of how Go treats type assertions and type checks. The []map[string]any is more specific and does not seamlessly translate to []any without an explicit conversion or assertion, because Go’s type system does not automatically consider []map[string]any to be a subtype of []any due to its static typing rules.

When you changed the type of results to []any, you aligned more closely with the expectations of Bloblang (and dynamic typing in general). This change means that each element of the results slice is now considered an any type, offering the flexibility for those elements to be maps, slices, strings, or any other type, without the static type constraints imposed by []map[string]any.

This alignment allows Bloblang to process the data as it would dynamically typed JSON, performing its operations without encountering the type assertion issues that arose when the data was structured as []map[string]any.

The runtime error you encountered was due to the subtle differences in how Go’s static type system interacts with dynamically typed systems like Bloblang. By changing the type from []map[string]any to []any, you provided the flexibility needed for Bloblang to process the data as expected, highlighting an interesting intersection between static and dynamic typing paradigms.

Aha. This makes sense. I also liked Gemini Advanced’s response so I included that one as well:

Absolutely! Let’s break down why the use of map[string]any vs. []any caused the runtime error within Bloblang, specifically the bloblang#Executor.Query method.

The Core Issue: Type Mismatch in Go

At the heart of this matter lies Go’s static typing. While bloblang can manipulate JSON with dynamic types, data passed to it must conform to Go’s specific type requirements. Let’s analyze the scenario:

  1. The Query Expectation: The bloblang#Executor.Query method’s signature is Query(val any) (any, error). The any type is Go’s equivalent of “anything goes.” It provides extreme flexibility but sacrifices type safety.
  2. map[string]any vs. []any:
    • map[string]any is a map with string keys and values that can be of any type. This is often used to represent JSON objects due to their similar structure.
    • []any is a slice (similar to an array) that can hold values of any type.
  3. The Type Discrepancy: Although you might intuitively expect that a []map[string]any (array of JSON-like objects) qualifies as a []any, Go’s type system does not consider them directly interchangeable. It’s a subtle but essential distinction.

Impact on Bloblang

Here’s how this type mismatch played out within Bloblang:

  1. Accessing results: In your Bloblang code, this.results was correctly recognized as an array.
  2. The map_each Problem: The map_each function in Bloblang iterates over arrays. Crucially, it likely expects elements within the array to be of a consistent type. When you used []map[string]any, each element was a map with potential type variability. This violated the assumption of map_each.
  3. The “got null” Error: The attempt to process a map[string]any element within a context expecting a uniform type likely led to an internal type mismatch or ’null’ being passed in, triggering the error message.

Solution

By switching to []any, you provided an array where elements could truly be anything. This satisfied the broader type expectation of bloblang#Executor.Query and the map_each function.

Key Takeaways

  • Go’s static typing ensures robust code but requires careful attention when interfacing with dynamic environments.
  • Even when working with JSON-like structures, be mindful of the precise Go types you are using to represent that data.
  • The flexibility of any comes with the trade-off of losing type information. Use it judiciously.

Nice, it even attempts to explain exactly what went wrong within Bloblang. The explanation sounds reasonable, but I will not spend time trying to verify it. So be careful, it might be a hallucination 😁

I will continue to use []map[string]any whenever I can for arbitrary JSON, but I’ll keep in mind that this type might be too strict for functions that expect arbitrary data. Go figure.


  1. Commonly referred to as REST(ful) APIs. I deliberately try to avoid the term, after having learned recently that a RESTful service was originally meant to be something quite different. See for example this post for more about this. ↩︎

  2. But hang on, why don’t we use Benthos for the entire pipeline, you might ask. The short answer is that we tried, but we found that the http_client input in Benthos was not flexible enough. We could not get it to behave the way we wanted it for our batch pipelines; in particular we failed in making a dynamic input (frequently varying url target) together with full control over retry and logging behaviour. We could have tried to implement a Benthos plugin for our needs, but opted for writing everything in Go ourselves. ↩︎

  3. Which can be a huge time saver for a small data engineering team. ↩︎