Bug report: []map[string]any is not a subtype of []any
Background
In my team we prefer to use Go when writing data ingestion pipelines consuming from JSON APIs 1. We have made a simple three-step design of extract->transform->load
, with each step being loosely coupled and composable. For instance, we could have an extract
step that fetches JSON responses from an endpoint with basic authentication, a transform
step using DuckDB, and finally load
ing the transformed data into BigQuery. One of the tools we use in the transform
step is Benthos’s Bloblang. Bloblang is a DSL for mapping and transforming data, particularly well suited for JSON manipulation. It is written in Go, is untyped and has a syntax that feels somewhat JavaScript-inspired 2. I am pretty new to Go, and I am yet to experience all its quirks and footguns, which there is plenty of, see for example this book on the subject. This post is about one such quirk, which I experienced as a nasty tricky-to-identify runtime bug (discovered in the development phase though π
).
A common design for JSON payloads we fetch from our third-party data providers is to insert it unchanged into our data warehouse, or only with minor transformations (like transforming unix times to timestamp
), following the EtLT pattern. This is often preferable due to 1. when JSON fields are not explicitly mapped to data warehouse columns, we skip schema migration issues completely 3 and 2. the cleaning and data transformation all the way from raw data to data products is owned by data analysts with domain knowledge. Go is a statically typed language, and it provides three ways for decoding JSON data:
- Create a
struct
type that matches the fields of interest and decode the JSON object into those, like exemplified here: https://go.dev/blog/json. This is basically the same functionality that pydantic provides to Python. You type validate all JSON fields to a corresponding Go type. - Use the type
map[string]any
, which is Go’s equivalent of a “JSON data structure”: a key-value map where keys must bestring
and values can be anything. All valid JSON complies with this type (top levelarray
is not valid JSON). - Use the type
any
.any
accepts arbitrary data, which – obviously – also comprise all JSON payloads.
For the EtLT pattern of “forwarding” arbitrary JSON payloads, only option 2 and 3 is viable. I prefer to use map[string]any
whenever possible, as it gives at least some static type validation; any
provides none.
The bug
In the transform
step we use the bloblang#Executor.Query to transform the received JSON payloads according to a bloblang
mapping. And here is the crux:bloblang
is untyped (as a JSON mapping language should be) but Go is typed. So, in our case, we need to pass data of some specific type from Go to the Query
method, whether we like it or not. Here is the documentation for Query
:
func (*Executor) Query ΒΆ
func (e *Executor) Query(val any) (any, error)
Query executes a Bloblang mapping against a value and returns the result. The argument and return values can be structured using the same map[string]interface{} and []interface{} types as would be returned by the Go standard json package unmarshaler.
If the mapping results in the root of the new document being deleted then ErrRootDeleted is returned, which can be used as a signal to filter rather than fail the mapping.
When sending val
to Query
as map[string]any
with the result
key being []map[string]any
, the following error occurred:
execute bloblang failed: query bloblang failed: failed assignment (line 8): expected array value, got null from field
this.results
This is the relevant part of the bloblang
mapping that failed:
let filtered = this.results.map_each(
ele -> ele.<some_filtering_logic>
)
and the payload looks something like this:
{
"results": [
{"foo": "bar", "baz": ["barbaz", "foobar"], ...},
{"foo": "bar2", "baz": ["bazbar", "barfoo"], ...},
...
]
}
This is strange, results
is clearly an array. I downloaded the payload and tried to run the transform locally via benthos
to make sure there was nothing wrong with payloads were valid JSON. But everything ran as expected. Could this be an issue related to Go’s types?
[]map[string]any
is not a subtype of []any
For the results
field, I had intentionally used my preferred type for arbitrary JSON payloads in an array, []map[string]any
, to pass as input to the Query(val any)
method. When I changed the type of results
to []any
(and no changes to the data itself), everything ran as expected! Problem solved, happy for that, but this was a surprising bug to me.
So, how can this happen? I provided the text above from this post to ChatGPT (GPT-4) with this prompt:
I am writing the blog post below. Could you explain to me how a
map[string]any
withany
being of type[]map[string]any
could produce a runtime error inbloblang#Executor.Query
while the exact same data withany
being[]any
works as expected?<…the text above..>
Here’s the answer I got:
The issue you encountered stems from the way Go’s type system interacts with interfaces, particularly the
any
interface, and how type assertions or type checks are performed in Go when interfacing with dynamically typed systems or languages, such as Bloblang.In Go,
any
is an alias forinterface{}
, which can hold values of any type. When you usemap[string]any
, you’re creating a map where the values can be of any type, and Go’s runtime will keep track of the actual types of those values. Similarly,[]any
is a slice where each element can be of any type.The Core of the Issue
When you pass a
map[string]any
to Bloblang’sQuery
method, and within that map, you have a key (e.g.,results
) pointing to a value of type[]map[string]any
, you’re in a situation where the Go type system’s static nature subtly conflicts with Bloblang’s dynamic expectations.Bloblang, being dynamically typed and designed to work seamlessly with JSON data, expects to operate on structures that mirror the flexibility of JSON. When the Bloblang documentation mentions that
Query
can work withmap[string]interface{}
and[]interface{}
, it’s assuming that these structures will be as flexible as JSON, allowing for any type of nested data.However, when you pass
[]map[string]any
as part of the structure, you’re providing a slice of maps with a specific type (map[string]any
). This is subtly different from[]any
in terms of how Go treats type assertions and type checks. The[]map[string]any
is more specific and does not seamlessly translate to[]any
without an explicit conversion or assertion, because Go’s type system does not automatically consider[]map[string]any
to be a subtype of[]any
due to its static typing rules.Why Changing to
[]any
WorkedWhen you changed the type of
results
to[]any
, you aligned more closely with the expectations of Bloblang (and dynamic typing in general). This change means that each element of theresults
slice is now considered anany
type, offering the flexibility for those elements to be maps, slices, strings, or any other type, without the static type constraints imposed by[]map[string]any
.This alignment allows Bloblang to process the data as it would dynamically typed JSON, performing its operations without encountering the type assertion issues that arose when the data was structured as
[]map[string]any
.Conclusion
The runtime error you encountered was due to the subtle differences in how Go’s static type system interacts with dynamically typed systems like Bloblang. By changing the type from
[]map[string]any
to[]any
, you provided the flexibility needed for Bloblang to process the data as expected, highlighting an interesting intersection between static and dynamic typing paradigms.
Aha. This makes sense. I also liked Gemini Advanced’s response so I included that one as well:
Absolutely! Let’s break down why the use of
map[string]any
vs.[]any
caused the runtime error within Bloblang, specifically thebloblang#Executor.Query
method.The Core Issue: Type Mismatch in Go
At the heart of this matter lies Go’s static typing. While
bloblang
can manipulate JSON with dynamic types, data passed to it must conform to Go’s specific type requirements. Let’s analyze the scenario:
- The
Query
Expectation: Thebloblang#Executor.Query
method’s signature isQuery(val any) (any, error)
. Theany
type is Go’s equivalent of “anything goes.” It provides extreme flexibility but sacrifices type safety.map[string]any
vs.[]any
:
map[string]any
is a map with string keys and values that can be of any type. This is often used to represent JSON objects due to their similar structure.[]any
is a slice (similar to an array) that can hold values of any type.- The Type Discrepancy: Although you might intuitively expect that a
[]map[string]any
(array of JSON-like objects) qualifies as a[]any
, Go’s type system does not consider them directly interchangeable. It’s a subtle but essential distinction.Impact on Bloblang
Here’s how this type mismatch played out within Bloblang:
- Accessing
results
: In your Bloblang code,this.results
was correctly recognized as an array.- The
map_each
Problem: Themap_each
function in Bloblang iterates over arrays. Crucially, it likely expects elements within the array to be of a consistent type. When you used[]map[string]any
, each element was a map with potential type variability. This violated the assumption ofmap_each
.- The “got null” Error: The attempt to process a
map[string]any
element within a context expecting a uniform type likely led to an internal type mismatch or ’null’ being passed in, triggering the error message.Solution
By switching to
[]any
, you provided an array where elements could truly be anything. This satisfied the broader type expectation ofbloblang#Executor.Query
and themap_each
function.Key Takeaways
- Go’s static typing ensures robust code but requires careful attention when interfacing with dynamic environments.
- Even when working with JSON-like structures, be mindful of the precise Go types you are using to represent that data.
- The flexibility of
any
comes with the trade-off of losing type information. Use it judiciously.
Nice, it even attempts to explain exactly what went wrong within Bloblang. The explanation sounds reasonable, but I will not spend time trying to verify it. So be careful, it might be a hallucination π
Note to self
I will continue to use []map[string]any
whenever I can for arbitrary JSON, but I’ll keep in mind that this type might be too strict for functions that expect arbitrary data. Go figure.
Commonly referred to as REST(ful) APIs. I deliberately try to avoid the term, after having learned recently that a RESTful service was originally meant to be something quite different. See for example this post for more about this. ↩︎
But hang on, why don’t we use Benthos for the entire pipeline, you might ask. The short answer is that we tried, but we found that the http_client input in Benthos was not flexible enough. We could not get it to behave the way we wanted it for our batch pipelines; in particular we failed in making a dynamic input (frequently varying
url
target) together with full control over retry and logging behaviour. We could have tried to implement a Benthos plugin for our needs, but opted for writing everything in Go ourselves. ↩︎Which can be a huge time saver for a small data engineering team. ↩︎