Datatype Conversion in Energy Question Impacts Information Modeling in Energy BI


Datatype Conversion in Power Query Affects Data Modeling in Power BI

In my consulting expertise working with clients utilizing Energy BI, many challenges that Energy BI builders face are as a result of negligence to knowledge varieties. Listed here are some frequent challenges which might be the direct or oblique outcomes of inappropriate knowledge varieties and knowledge kind conversion:

  • Getting incorrect outcomes whereas all calculations in your knowledge mannequin are appropriate.
  • Poor performing knowledge mannequin.
  • Bloated mannequin measurement.
  • Difficulties in configuring user-defined aggregations (agg consciousness).
  • Difficulties in establishing incremental knowledge refresh.
  • Getting clean visuals after the primary knowledge refresh in Energy BI service.

On this blogpost, I clarify the frequent pitfalls to forestall future challenges that may be time-consuming to determine and repair.

Background

Earlier than we dive into the subject of this weblog submit, I want to begin with a little bit of background. Everyone knows that Energy BI shouldn’t be solely a reporting instrument. It’s certainly a knowledge platform supporting varied features of enterprise intelligence, knowledge engineering, and knowledge science. There are two languages we should be taught to have the ability to work with Energy BI: Energy Question (M) and DAX. The aim of the 2 languages is kind of completely different. We use Energy Question for knowledge transformation and knowledge preparation, whereas DAX is used for knowledge evaluation within the Tabular knowledge mannequin. Right here is the purpose, the 2 languages in Energy BI have completely different knowledge varieties.

The commonest Energy BI improvement eventualities begin with connecting to the info supply(s). Energy BI helps tons of of knowledge sources. Most knowledge supply connections occur in Energy Question (the info preparation layer in a Energy BI resolution) until we join dwell to a semantic layer comparable to an SSAS occasion or a Energy BI dataset. Many supported knowledge sources have their very own knowledge varieties, and a few don’t. As an illustration, SQL Server has its personal knowledge varieties, however CSV doesn’t. When the info supply has knowledge varieties, the mashup engine tries to determine knowledge varieties to the closest knowledge kind accessible in Energy Question. Although the supply system has knowledge varieties, the info varieties may not be appropriate with Energy Question knowledge varieties. For the info sources that don’t assist knowledge varieties, the matchup engine tries to detect the info varieties primarily based on the pattern knowledge loaded into the info preview pane within the Energy Question Editor window. However, there is no such thing as a assure that the detected knowledge varieties are appropriate. So, it’s best apply to validate the detected knowledge varieties anyway.

Energy BI makes use of the Tabular mannequin knowledge varieties when it masses the info into the info mannequin. The info varieties within the knowledge mannequin could or is probably not appropriate with the info varieties outlined in Energy Question. As an illustration, Energy Question has a Binary knowledge kind, however the Tabular mannequin doesn’t.

The next desk exhibits Energy Question’s datatypes, their representations within the Energy Question Editor’s UI, their mapping knowledge varieties within the knowledge mannequin (DAX), and the interior knowledge varieties within the xVelocity (Tabular mannequin) engine:

Power Query and DAX (data model) data type mapping
Energy Question and DAX (knowledge mannequin) knowledge kind mapping

Because the above desk exhibits, in Energy Question’s UI, Complete Quantity, Decimal, Mounted Decimal and Share are all in kind quantity within the Energy Question engine. The kind names within the Energy BI UI additionally differ from their equivalents within the xVelocity engine. Allow us to dig deeper.

Information Sorts in Energy Question

As talked about earlier, in Energy Question, we have now just one numeric datatype: quantity whereas within the Energy Question Editor’s UI, within the Remodel tab, there’s a Information Sort drop-down button displaying 4 numeric datatypes, as the next picture exhibits:

Data type representations in the Power Query Editor's UI
Information kind representations within the Energy Question Editor’s UI

In Energy Question components language, we specify a numeric knowledge kind as kind quantity or Quantity.Sort. Allow us to take a look at an instance to see what this implies.

The next expression creates a desk with completely different values:

#desk({"Worth"}
	, {
		{100}
		, {65565}
		, {-100000}
		, {-999.9999}
		, {0.001}
		, {10000000.0000001}
		, {999999999999999999.999999999999999999}
		, {#datetimezone(2023,1,1,11,45,54,+12,0)}
		, {#datetime(2023,1,1,11,45,54)}
		, {#date(2023,1,1)}
		, {#time(11,45,54)}
		, {true}
		, {#period(11,45,54,22)}
		, {"It is a textual content"}
	})

The outcomes are proven within the following picture:

Generating values in Power Query
Producing values in Energy Question

Now we add a brand new column that exhibits the info kind of the values. To take action, use the Worth.Sort([Value]) perform returns the kind of every worth of the Worth column. The outcomes are proven within the following picture:

Getting a column's value types in Power Query
Getting a column’s worth varieties in Energy Question

To see the precise kind, we should click on on every cell (not the values) of the Worth Sort column, as proven within the following picture:

Click on a cell to see its type in Power Query Editor
Click on on a cell to see its kind in Energy Question Editor

With this methodology, we have now to click on every cell in to see the info forms of the values that isn’t splendid. However there’s at the moment no perform accessible in Energy Question to transform a Sort worth to Textual content. So, to indicate every kind’s worth as textual content in a desk, we use a easy trick. There’s a perform in Energy Question returning the desk’s metadata: Desk.Schema(desk as desk). The perform leads to a desk revealing helpful details about the desk used within the perform, together with column TitleTypeNameType, and so forth. We need to present TypeName of the Worth Sort column. So, we solely want to show every worth right into a desk utilizing the Desk.FromValue(worth as any) perform. We then get the values of the Type column from the output of the Desk.Schema() perform.

To take action, we add a brand new column to get textual values from the Type column. We named the brand new column Datatypes. The next expression caters to that:

Desk.Schema(
      Desk.FromValue([Value])
      )[Kind]{0}

The next picture exhibits the outcomes:

Getting type values as text in Power Query
Getting kind values as textual content in Energy Question

Because the outcomes present, all numeric values are of kind quantity and the best way they’re represented within the Energy Question Editor’s UI doesn’t have an effect on how the Energy Question engine treats these varieties. The info kind representations within the Energy Question UI are in some way aligned with the sort aspects in Energy Question. A aspect is used so as to add particulars to a kind sort. As an illustration, we are able to use aspects to a textual content kind if we need to have a textual content kind that doesn’t settle for null. We will outline the worth’s varieties utilizing kind aspects utilizing Aspect.Sort syntax, comparable to utilizing In64.Sort for a 64-bit integer quantity or utilizing Share.Sort to indicate a quantity in proportion. Nevertheless, to outline the worth’s kind, we use the kind typename syntax comparable to defining quantity utilizing kind quantity or a textual content utilizing kind textual content. The next desk exhibits the Energy Question varieties and the syntax to make use of to outline them:

Defining types and facets in Power Query M
Defining varieties and aspects in Energy Question M

Sadly, the Energy Question Language Specification documentation doesn’t embrace aspects and there usually are not many on-line sources or books that I can reference right here aside from Ben Gribaudo’s weblog who completely defined aspects intimately which I strongly advocate studying.

Whereas Energy Question engine treats the values primarily based on their varieties not their aspects, utilizing aspects is beneficial as they have an effect on the info when it’s being loaded into the info mannequin which raises a query: what occurs after we load the info into the info mannequin? which brings us to the subsequent part of this weblog submit.

Information varieties in Energy BI knowledge mannequin

Energy BI makes use of the xVelocity in-memory knowledge processing engine to course of the info. The xVelocity engine makes use of columnstore indexing expertise that compresses the info primarily based on the cardinality of the column, which brings us to a vital level: though the Energy Question engine treats all of the numeric values as the sort quantity, they get compressed otherwise relying on their column cardinality after loading the values within the Energy BI mannequin. Due to this fact, setting the right kind aspect for every column is vital.

The numeric values are probably the most frequent datatypes utilized in Energy BI. Right here is one other instance displaying the variations between the 4 quantity aspects. Run the next expression in a brand new clean question within the Energy Question Editor:

// Decimal Numbers with 6 Decimal Digits
let
    Supply = Record.Generate(()=> 0.000001, every _ <= 10, every _ + 0.000001 ),
    #"Transformed to Desk" = Desk.FromList(Supply, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Desk.RenameColumns(#"Transformed to Desk",{{"Column1", "Supply"}}),
    #"Duplicated Supply Column as Decimal" = Desk.DuplicateColumn(#"Renamed Columns", "Supply", "Decimal", Decimal.Sort),
    #"Duplicated Supply Column as Mounted Decimal" = Desk.DuplicateColumn(#"Duplicated Supply Column as Decimal", "Supply", "Mounted Decimal", Forex.Sort),
    #"Duplicated Supply Column as Share" = Desk.DuplicateColumn(#"Duplicated Supply Column as Mounted Decimal", "Supply", "Share", Share.Sort)
in
    #"Duplicated Supply Column as Share"

The above expressions create 10 million rows of decimal values between 0 and 10. The ensuing desk has 4 columns containing the identical knowledge with completely different aspects. The primary column, Supply, incorporates the values of kind any, which interprets to kind textual content. The remaining three columns are duplicated from the Supply column with completely different kind aspects, as follows:

  • Decimal
  • Mounted decimal
  • Share

The next screenshot exhibits the ensuing pattern knowledge of our expression within the Energy Question Editor:

Generating 10 million numeric values and use different type facets in Power Query M
Producing 10 million numeric values and use completely different kind aspects in Energy Question M

Now click on Shut & Apply from the Dwelling tab of the Energy Question Editor to import the info into the info mannequin. At this level, we have to use a third-party neighborhood instrument, DAX Studio, which may be downloaded from right here.

After downloading and putting in, DAX Studio registers itself as an Exterior Software within the Energy BI Desktop as the next picture exhibits:

External tools in Power BI Desktop
Exterior instruments in Energy BI Desktop

Click on the DAX Studio from the Exterior Instruments tab which routinely connects it to the present Energy BI Desktop mannequin, and observe these steps:

  1. Click on the Superior tab
  2. Click on the View Metrics button
  3. Click on Columns from the VertiPaq Analyzer part
  4. Take a look at the CardinalityCol Dimension, and % Desk columns

The next picture exhibits the previous steps:

VertiPaq Analyzer Metrics in DAX Studio
VertiPaq Analyzer Metrics in DAX Studio

The outcomes present that the Decimal column and Share consumed probably the most vital a part of the desk’s quantity. Their cardinality can also be a lot larger than the Mounted Decimal column. So right here it’s now extra apparent that utilizing the Mounted Decimal datatype (aspect) for numeric values may help with knowledge compression, decreasing the info mannequin measurement and growing the efficiency. Due to this fact, it’s smart to all the time use Mounted Decimal for decimal values. Because the Mounted Decimal values translate to the Forex datatype in DAX, we should change the columns’ format if Forex is unsuitable. Because the identify suggests, Mounted Decimal has mounted 4 decimal factors. Due to this fact, if the unique worth has extra decimal digits after conversion to the Mounted Decimal, the digits after the fourth decimal level can be truncated.

That’s the reason the Cardinality column within the VertiPaq Analyzer in DAX Studio exhibits a lot decrease cardinality for the Mounted Decimal column (the column values solely preserve as much as 4 decimal factors, no more).

Obtain the pattern file from right here.

So, the message is right here to all the time use the datatype that is smart to the enterprise and is environment friendly within the knowledge mannequin. Utilizing the VertiPaq Analyzer in DAX Studio is sweet for understanding the assorted features of the info mannequin, together with the column datatypes. As a knowledge modeler, it’s important to grasp how the Energy Question varieties and aspects translate to DAX datatypes. As we noticed on this weblog submit, knowledge kind conversion can have an effect on the info mannequin’s compression price and efficiency.

Leave a Reply

Your email address will not be published. Required fields are marked *