Power Query

What is the power query?

Power Query is a business intelligence tool available in Excel and Power BI that allows you to import data from many different sources and then clean, transform, and reshape your data as needed. Power Query also provides a Custom Connectors SDK so that third parties can create their data connectors and seamlessly plug them into Power Query.

It allows you to set up a query once and then reuse it with a simple refresh. It’s also pretty powerful. Power Query can import and clean millions of rows into the data model for analysis after. The user interface is intuitive and well laid out so it’s really easy to pick up.

If you want to edit or write your own M code, you certainly can, but you don’t need to. In upcoming posts, I will write about M code and the best way to use it !!!

Get the data used in this post to follow along.

superstore_sample_data

What Can Power Query Do?

Power Query is the data preparation tool allowing us to perform over 300 transformations. The most important feature of Power Query is its ability to automate the process and reusability of the underlying code.

Where is Power Query?

Power Query is available as an add-in to download and install for Excel 2010 and 2013 and will appear as a new tab in the ribbon labeled Power Query. In 2016 it was renamed to Get & Transform and appears in the Data tab without the need to install any add-in. Other tools that have natively integrated Power Query are

Microsoft Power BI Desktop (A free tool available for download), Microsoft SSDT for Visual Studio and Common Data Service

Importing Your Data with Power Query

Importing your data with Power Query is simple. The Get Data icon on the Power BI Desktop allows us to connect to various data sources. Broadly speaking the sources are classified into one of the following

  • File (local and online flat files)
  • Database (such as SQL, SAP HANA, etc)
  • Power Platform (CDS, Dataflows (Power BI and Power Platform)
  • Azure
  • Online Services
  • Other

Note: The available data connection options are always expanding.

An Example of Importing Data

Let’s take a look at importing some data from an Excel workbook in action. We’re going to import an Excel file called superstore_sample_data. It contains orders, returns, and users data on three different tabs.

Open the Power BI Desktop, In the Home tab and select the Get Data, click on the file on the navigator window. This will show all the current options on the right. Click on Excel and click on connect.

This will open a file picker menu where you can navigate to the file you want to import. Select the file and press the Open button.

After selecting the file you want to import, the data preview Navigator window will open. This will give you a list of all the objects available to import from the workbook.

Click on any one table (say Orders), this will show and give you a preview of the data that will be imported into Power BI. Check the box on the left of all three tables. This will enable three options on the bottom of the Navigator.

Load – This is used when you are confident that the data is already cleaned and would not require any further transformation

Transform Data – Choosing this option will take us to Power Query Editor allowing us to use Power Query for transformation purposes.

Cancel – This cancels the operations

When you click on either of the objects in the workbook, you can see a preview of the data for it on the right-hand side of the navigator window. This is great for a sense check to make sure you’ve got the correct file.

The Query Editor

After going through the guide to connecting your data and selecting the Edit option, you will be presented with the query editor. This is where any data transformation steps will be created or edited. There are 6 main areas in the editor to become familiar with.

  1. The Ribbon – The user interface for the editor is quite similar to Excel and uses a visual ribbon-style command center. It organizes data transformation commands and other power query options into 5 main tabs.
  2. Query List – This area lists all the queries in the current workbook. You can navigate to any query from this area to begin editing it.
  3. Data Preview – This area is where you will see a preview of the data with all the transformation steps currently applied. You can also access a lot of the transformation commands here either from the filter icons in the column headings or with a right-click on the column heading.
  4. Formula Bar – This is where you can see and edit the M code of the current transformation step. Each transformation you make on your data is recorded and appears as a step in the applied steps area.
  5. Properties – This is where you can name your query. When you close and load the query to an Excel table, power query will create a table with the same name as its source query if the table name isn’t already taken. The query name is also how the M code will reference this query if we need to query it in another query.
  6. Applied Steps – This area is a chronological list of all the transformation steps that have been applied to the data. You can move through the steps here and view the changes in the data preview area. You can also delete, modify, or reorder any steps in the query here.

The Query List

The Query List has other abilities other than just listing out all the current workbook’s queries. Out of all the functions that we see below, I want to focus on a few important ones that will help us in optimization

  • Reference – This creates another query referencing the original query. This is a useful feature as this will allow us to create a filtered dataset based on the original query
  • Enable load – This will load the query into the model. If a query is being referenced then we can uncheck enable load as the purpose for the original query was to fetch data and not to load during evaluation. I will write a blog about this in the future and explain how this consumes memory
  • Include in report refresh – When the refresh is clicked, this will allow the query to get updated with the latest data
  • Copy and Paste – Copy and paste a query to make another copy of it.
  • Delete – Delete the query. If you accidentally delete a query, there’s no undo button, but you can exit the query editor without saving via close and load to restore your query.
  • Rename – Rename your query. This is the same as renaming it from the properties section on the left-hand side of the editor.
  • Duplicate – Make another copy of the query. This is the same as copy and paste but turns the process into one step.
  • Move To Group – Place your queries into a folder like structure to keep them organized when the list gets large.
  • Move Up and Move Down – Rearrange the order your queries appear in the list or within the folder groups to add to your organizational efforts. This can also be done by dragging and dropping the query to a new location.
  • Create Function – Turn your query into a query function. They allow you to pass a parameter to the query and return results based on the parameter passed.
  • Convert To Parameter – Allows you to convert parameters to queries or queries to parameters.
  • Advanced Editor – Open the advanced editor to edit the M code for the query.
  • Properties – Allows you to change the query name, add a description text, and enable the Fast Data Load option for the query.

One of the primary functions of the query list is navigation. There’s no need to exit the query editor to switch which query you’re working on. You can left-click on any query to switch. The query you’re currently on will be highlighted.

You can hide the query list to create more room for the data preview. Left-click on the small arrow in the upper right corner to toggle the list between hidden and visible.

The Data Preview

The main job of the data preview area is to apply transformation steps to your data and show a preview of these steps you’re applying.

In the data preview area, you can select columns with a few different methods. A column will be highlighted when it’s selected.

  • Select a single column with a left-click on the column heading.
  • Select multiple adjacent columns with a left click on the first column heading, then hold Shift and left-click on the last column heading.
  • Select multiple non-adjacent columns by holding Ctrl then left click on any column headings you want to select.

You can then apply any relevant data transformation steps on selected columns from the ribbon or certain steps can be accessed with a right-click on the column heading. Commands that are not available to your selected column or columns will appear grayed out in the ribbon.

Each column has a data type icon on the left hand of the column heading. You can left-click on it to change the data type of the column.

You can choose from decimal numbers, fixed decimal numbers, whole number, currency, whole numbers, percentages, date and time, dates, times, timezone, duration, text, Boolean, and binary.

Using the Locale option allows you to set the data type format using the convention from different locations. For example, if you wanted to display the date in the American m/d/yyyy format instead of the usual dd/mm/yyyy then you could select the United States as the locale.

There’s a small table icon in the top left-hand corner of the data preview, you can right-click or left-click this to access various actions that affect the whole table.

Renaming any column heading is easy. Double left-click on any column heading then type your new name and press Enter when you’re done.

Each column also has a filter toggle on the right hand side. Left-click on this to sort and filter your data. This filter menu is very similar to the filters found in a regular spreadsheet and will work the same way.

The list of items shown is based on a sample of the data so may not contain all available items in the data. You can load more by clicking on the Load more text in blue.

Many transformations found in the ribbon menu are also accessible from the data preview area using a right-click on the column heading. Some of the action you select from this right-click menu will replace the current column. If you want to create a new column based, use a command from the Add Column tab instead.

The Applied Steps

Any transformation you make to your data will appear as a step in the Applied Steps area. It also allows you to navigate through your query. Left-click on any step and the data preview will update to show all transformations up to and including that step.

You can insert new steps into the query at any point by selecting the previous step and then creating the transformation in the data preview. Power Query will then ask if you want to insert this new step. Careful though, as this may break the following steps that refer to something you changed.

You can delete any steps that were applied using the X on the left-hand side of the step name in the Applied Steps area. Be cautious though, if any of the following steps depend on the step you’re trying to delete, you will break your query. This is where Delete Until End from the right-click menu can be handy.

You can rearrange the order the steps are performed in your query. Just left-click on any step and drag it to a new location. A green line between steps will indicate the new location. This is another one you’ll need to be careful with as a lot of steps will depend on previous steps, and changing ordering can create errors because of this.

Right-click on any step to access a menu of options.

  • Edit Settings – This allows you to edit the settings of the step similar to using the gear icon on the right-hand side of the step.
  • Rename – This allows you to rename the label of the step. Instead of displaying the generic name like “Filtered Rows“, you could have this display something like “Filtered Product Rows on Pens” so you can easily identify what the step is doing.
  • Delete – This deletes the current step similar to the X on the left-hand side of the step.
  • Delete Until End – This allows you to delete the current step plus all steps up until the end. Since steps can depend on previous steps, deleting all steps after a step is a good way to avoid any errors.
  • Insert Step After – This allows you to insert a new step after the current step.
  • Move Up and Move Down – This allows you to rearrange the query steps similar to the dragging and dropping method.
  • Extract Previous – This can be a really useful option. It allows you to create a new copy of the query up to the selected step.

The Formula Bar

When you click on different steps of the transformation process in the Applied Steps area, the formula bar updates to show the M code that was created for that step. If the M code generated is longer than the formula bar, you can expand the formula bar using the arrow toggle on the right-hand side.

Advanced editor

This is where we can write our own M code for all the transformation. Though this is not needed. All the steps that we apply are recorded and the underlying M code is constantly generated. Click on the advanced editor option to view the M code.

The Difference Between the Transform and Add Column Tabs

The bulk of all transformations available in power query can be accessed through either the Transform tab or the Add Column tab.

When you use a command from the Add Column tab that is found in both tabs, it will create a new column with the transformed data and the original column will stay intact. Whereas using the equivalent command from the Transform tab will change the original column and no new column is created.

This is a critical point to be aware of!

The Transform Tab

The Transform tab sections.

  1. Table – This section contains commands that will transform the entire table. You can group and aggregate your query, promote rows to headers, demote headers to rows, transpose your data, reverse row order, and count rows.
  2. Any Column – This section contains commands that will work on any column of data regardless of the data type. You can change the data type, automatically detect and change the data type, rename the column heading, find and replace values, fill values down (or up) a column to replace any blanks or nulls with the value above it (or below it), pivot or unpivot columns, move columns to a new location or convert a column to a list.
  3. Text Column – This section contains commands for text data. You can split columns with a delimiter, format the case, trim and clean, merge two or more columns, extract text, and parse XML or JSON objects.
  4. Number Column – This section contains commands for numerical data. You can perform various aggregations like sums and averages, perform standard algebra operations or trigonometry and round numbers up or down.
  5. Date & Time Column – This section contains commands for date and time data. You can extract information from your dates, times, and duration data.
  6. Structured Column – This section contains commands for working with nested data structures such as when your column contains tables.
  7. Scripts – This section contains the query pane to enter and write R and Python Scripts.

The Add Column Tab

The Add Column tab contains a lot of commands similar to the Transform tab, but the key difference is they will create a new column with the transformation.

  1. General – This section allows you to add new columns based on formulas or custom functions. You can also add index columns or duplicate a column from here.
  2. From Text – Very similar to the From Text section in the Transform tab, but these commands will create a new column with the transformation.
  3. From Number – Very similar to the From Number section in the Transform tab, but these commands will create a new column with the transformation.
  4. From Date & Time – Very similar to the From Date & Time section in the Transform tab, but these commands will create a new column with the transformation.

The View Tab

The View tab is quite sparse in comparison to the other tabs. There are no transformation commands to be found in it. Most Power Query users will rarely need to use this area, but there are still a few things worth knowing about.

  1. Layout – This section allows you to either show or hides the Query Setting pane (which contains the properties and applied steps) and the Formula Bar.
  2. Data Preview – This section allows you to show or hide white space characters or turn the font into a monospace font in the data preview area. This is handy when dealing with data delimited by a certain number of characters.
  3. Columns – This allows you to go to and select a certain column in the data preview. This command is also available in the Home tab.
  4. Parameters – This allows you to enable parameterize data sources and transformation steps.
  5. Advanced – This will open the advanced query editor which shows the M code for the query. This is also available from the Home tab.
  6. Dependencies – This will open a diagram view of the query dependencies in the workbook.

In particular, the Query Dependencies view is a useful resource that allows you to see a visual representation of the data transformation process flow.

Conclusion

Power Query can seem overwhelming at first to someone new to it all, but the UI is very well laid out and easy to catch on to.

While there is a lot to learn about Power Query, it is worth putting in the time to learn. There is massive potential to save time in repetitive data cleaning and formatting tasks with it. In my upcoming blogs, I will be writing in-depth about each command found in Power Query Editor.

%d bloggers like this: