Azure KQL: Access sub-columns using the bag_unpack plugin

Overview

When accessing information using KQL, sometimes you have a column that contains sub-columns that you want to access. There are a couple of different ways to obtain this information and I will show you two ways in this blog post.

The first way will be to extract each column individually. While this works and, at times, may be the easier way to go, if you have a lot of columns this can be tedious. You also have to consider what happens if the column has different sub-columns (as our example does)?

The second way is to use the bag_unpack plugin. This allows you to extract all the columns at one time. It should be noted, this will only work with proper JSON columns. There are other ways to access data that is just stored using a delimiter.

Accessing sub-columns

For this example, I am going to use the “Entities” column in the Microsoft Sentinel’s “SecurityAlert” table. If you do not have Microsoft Sentinel, why not? 🙂 You can use any column in a table that stores its data as JSON.

The query below shows is the basic KQL query I am starting with. I am only showing the “Entities” column at this point

SecurityAlert
// Convert the Entities column to a dynamic type and then
//expand each entry in the JSON array into its own column,
//converting the new column into a dynamic type automatically
| mv-expand todynamic(Entities) to typeof(dynamic)
| project Entities

This will show results similar to what is shown in the image below

Figure 1 – Basic results

We can see that the “Entity” column is made up of different columns. “$id” is common throughout but that is the only sub-column that is.

So now we can access the sub-columns. Since the “Entities” column is a dynamic type, we can just access the different sub-columns. The code below shows how to access some of the columns. You could also combine all the “extend” commands on one line if you desire.

SecurityAlert
// Convert the Entities column to a dynamic type and then
//expand each entry in the JSON array into its own column,
//converting the new column into a dynamic type automatically
| mv-expand todynamic(Entities) to typeof(dynamic)
| project Entities
| extend id = Entities.$id
| extend Name = Entities.Name
| extend Address = Entities.Address

This will produce results like what is shown below. Note that if the row does not have a value for the specified sub-column, it is just blank which is good.

Figure 2 – Results after extend command

This works pretty well and isn’t hard to do, although it may take a while to type everything in. You also need to know all the columns so that you can access them directly.

There is another way to do whit without knowing all the columns, and that is by using “bag_unpack”

bag_unpack

If you go to the documentation page for “bag_unpack”, you would see if it not listed as an operator, a statement, or a function. It is actually a plugin for KQL. You can still easily use it, you just need to use the “evaluate” operator with it for it to function correctly.

The KQL code below shows how to use it

SecurityAlert
// Convert the Entities column to a dynamic type and then
//expand each entry in the JSON array into its own column,
//converting the new column into a dynamic type automatically
| mv-expand todynamic(Entities) to typeof(dynamic)
| project Entities
| evaluate bag_unpack(Entities)

This will provide results much like what is shown below

Figure 3 – bag_unpack results

Notice that each column that was created has the name of the sub-column. Also, there is a column called “AadUserId” that we didn’t see in our results before. This is because the “bag_unpack” looks at all the results to see which columns there are.

Using a column that doesn’t exist

What happens if I use a column that “bag_unpack” created later in the query but, for whatever reason, when I run the query later, the results don’t have that sub-column in the “Entities” column? Turns out, it is just like trying to use any other column that doesn’t exist. The query will throw an error. This is different than using the “extend” command where it will just have an empty value. Keep that in mind when using “bag_unpack”

Creating a duplicate column

Another potential issue can be demonstrated by commenting out the “project” line. Running the code below will throw an error.

SecurityAlert
// Convert the Entities column to a dynamic type and then
//expand each entry in the JSON array into its own column,
//converting the new column into a dynamic type automatically
| mv-expand todynamic(Entities) to typeof(dynamic)
//| project Entities
| evaluate bag_unpack(Entities)

The error message states:

evaluate bag_unpack(): the following error(s) occurred while evaluating the output schema: evaluate bag_unpack(): cannot add column ‘Type’ as it already exists in expression source. Use optional argument columnsConflict to resolve the conflict: (error, replace_source, keep_source, default)

What this means is that the “bag_unpack” plugin is trying to create a column that already exists. There are a few ways to avoid this.

The first way it to use a column prefix that will get added to all the columns. The KQL code below shows how to make sure each column has “Entities_” as the prefix for each column that the “bag_unpack” creates.

SecurityAlert
// Convert the Entities column to a dynamic type and then
//expand each entry in the JSON array into its own column,
//converting the new column into a dynamic type automatically
| mv-expand todynamic(Entities) to typeof(dynamic)
//| project Entities
| evaluate bag_unpack(Entities, "Entities_")

The image below shows what the results would look like.

Figure 4 – Column prefix

The second way is to tell “bag_unpack” what to do when there is a conflict. This is done by using the “columnsConflict” parameter. It has three values:

  1. error – Allow the query to create the error. This is the default value for when this parameter is not specified.
  2. replace_source – Replace the existing column with the one from the “bag_unpack” plugin.
  3. keep_source” – Ignore the column from the “bag_unpack” plugin.

The KQL below shows how to use the “columnConflict” parameter to overwrite the original column.

SecurityAlert
// Convert the Entities column to a dynamic type and then
//expand each entry in the JSON array into its own column,
//converting the new column into a dynamic type automatically
| mv-expand todynamic(Entities) to typeof(dynamic)
//| project Entities
| evaluate bag_unpack(Entities, columnsConflict="replace_source")

The third way is to use the “ignoreProperties” parameter to list which sub-columns to ignore. This parameter expects a dynamic array with one or more strings.

The KQL below shows how to use this to ignore the “Entities” sub-columns called “Type” and “AadUserId”.

SecurityAlert
// Convert the Entities column to a dynamic type and then
//expand each entry in the JSON array into its own column,
//converting the new column into a dynamic type automatically
| mv-expand todynamic(Entities) to typeof(dynamic)
//| project Entities
| evaluate bag_unpack(Entities, ignoredProperties=dynamic(['Type', 'AadUserId']))

I can’t really show that these sub-columns never show up, but trust me that they don’t 🙂

Summary

This blog post shows you how to use the “bag_unpack” plugin to easily extract all the sub-columns of a JSON based column. It went over the use and some of the pitfalls to be aware of.

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.