Data Blending in Tableau
In Tableau, you may merge multiple data sets by using the data blending feature. Blending is an easy and efficient method for integrating data from various sources into a single visualization. Data blends are a powerful tool for combining datasets in Tableau, but they are often misused because of their perceived complexity.
Data blending is a source of aggravation for many Tableau developers. However, once the concept of data blending is grasped, Tableau’s data blending feature proves to be an invaluable asset. We’ll go over some of the restrictions associated with data merging below.
Data blending has the potential to solve many problems but also to cause new ones. When utilized properly, it simplifies the process of including new metrics on a dashboard. Use of this tool improperly can cause a Tableau Server to crash.
It is necessary to combine data sets when working with data extracts. Given that you can’t join Tableau data extracts to external sources, data blending is your only choice when working with extracts. Enroll in the best Tableau training in Chennai and learn latest data visualization techniques.
When making filled maps from a shape (kml) file, it’s also helpful. The kml file serves as the primary data source in this post displaying coronavirus in England by the local authority, whereas coronavirus counts are combined from a different data set.
Are a Data Blend And a Join The Same Thing?
The subject of whether or not a blend is the same as a join arises frequently in discussions on data blending. It is challenging to put a precise definition on the process of data blending. You could call it a left outer join, although it’s not really a join at all. Blending, which is analogous to a flexible left join, is typically used to bring in measurements from a different data source, as dimensions typically return an asterisk in place of the expected value.
The order in which operations are performed is the main distinction between a data mix and a join. When using a join, data is combined in one place.
Data blending involves combining sets of data together. Relationships between some fields are required to link the data sources together, much like in a database join. Comparable to combining TableA.Field ON TableB.Field.
Nonetheless, the worksheet does not strictly enforce the relationships between the fields. In the case of an active link, the chain icon appears in the spreadsheet. If orange, it’s linked; if gray, it’s inactive.
A relationship’s status in the worksheet is represented by a chain link. Ordinary joins also occur inside a data source. When compared to mixing, this is a significant change.
Worksheets are where data blending occurs, rather than the original data source. Each worksheet in a Tableau workbook can have its own set of connecting fields enabled, allowing for a variety of possible joins between the blended data sources on a variety of variables.
Data blending occurs at the aggregate level, after standard joins have been performed at the record level. One of the main distinctions between a join and a blending is this. Whereas a join comes before aggregation, a blend comes after.
Is A Data Blend Obsolete With The New Tableau Relationships?
A direct “no” is the shortest possible response. The use of Tableau Connections does not replace the need for blending data in any way. Blending is the only option available to bring in more data when using published data extracts, which is common in enterprise situations (it helps build a secure single source of the truth). Blending can be done in a number of different ways. (Except, of course, for changing the original source data that was made public.) Therefore, it is crucial for developers to have a firm grasp on data blending since it continues to play a critical role in the Tableau ecosystem.
The complexity of blending has increased, though, due to the relationships in Tableau. This won’t function in a data blend if the secondary data source contains relationships. If the pills in the workbook are suddenly turning red for no apparent reason, you may need to look at the usage relationships in the secondary source; otherwise, the blend will not be successful.
You should know that a Tableau data source with relationships still functions as the main data source, however, it fails when used as a secondary source. Empower your future by learning a Tableau course in Chennai.
Guidelines For Tableau Data Blending
Because data blending can yield results that aren’t always what was expected, it’s best to keep the following guidelines in mind.
- Know which source you’re using for primary data.
- Check that the worksheet contains the relevant data connections and that they are active.
- Prepare the joins at a high level, the smallest level that is achievable.
- Whenever applying filters to a view, Primary should be assigned to the data source containing the filter fields.
- Reduce as much as possible the number of data sources that are blended together. In this way, things are simpler and troubleshooting is less complicated.
- It is possible to exclude unnecessary information from the secondary source by applying filters to the incoming data.
- To avoid a catastrophic performance drop and possible Tableau server crash, keep supplementary data sources on the small side.
- A data blend’s joins may be affected by the case of text data due to the case sensitivity of joins.
- Apply a data blend to turn in either value in a single dimension (one per join field) OR metrics.
Instructions for constructing a data Blend
- To begin, the Tableau workbook must be linked to a minimum of two data collections.
- Select Data Edit Blend Relationships from the main menu to establish the connections.
- If Tableau determines that there is a connection between two fields, it will use the fields that have an Automatic relationship.
- Tableau’s “Automatic” associations may need adjustment when combining data sources. Simply select “Custom” and then “Modify” to make changes to the blend relationships manually.
- Changing the default relationships is a common necessity when combining date-based data. If you want a successful relationship based on dates, you need to start at the right “level” in the date hierarchy. Date components should be joined in the proper order.
- Just because a connection is defined doesn’t mean it’s actually being maintained. When you click the secondary source’s chain connection symbols, the worksheet’s “joins” will become active.
- Multiple worksheets in the same workbook can reference the same data sources through a variety of join fields.
- In this worksheet, active joins are shown by orange chain links. The broken linkage indicates that the join is disabled for this particular workbook.
- Select the chain icon to activate or deactivate the link.
Blending Tableau Data With A Calculated Field
Rarely will you need to blend on a computation or a measure, but when you do, you’ll want to be prepared. One possible factor in determining a bonus is the degree to which sales goals are being met; for instance, reaching 95% of a sales goal may earn a bronze medal, reaching 105% would earn silver, etc. Tableau’s built-in calculation functionality is required to compare the actual results to the set goals, which may vary depending on the view’s data.
When the outcome is dependent on the outcome of the computation, blending in the findings is the best approach. Data blending is not possible on measures, however, measures can be transformed into dimensions. A dimension can be converted from a measure computation, allowing it to be used as a blending field. Simply move the field from the Measures pane to the Dimensions pane, or select the field and choose “Convert to Dimension” from the context menu.
As job openings for Tableau are increasing day by day, learn a tableau course in Chennai.
Highlights On The Blending Of Data
Data blending is best practiced by constantly mixing at the least granular level possible, where the linkages are strongest. Blend on the year, for instance, if your file has yearly goals. If you’re reporting on a yearly basis, blend on the year rather than the day, even if your annual goals are subdivided daily.
When working with huge data sets, it is best to avoid blending on ID = ID. Your Tableau Server may crash as a result. A common join doesn’t have this feature, and it’s an important distinction. When performing a join, it is preferable to join at the finest possible level. Blending, on the other hand, is done at the finest possible grain size. If you can, avoid blending two massive datasets into one. Doing so will likely result in poor dashboard performance, if not render the dashboard useless.
In addition, case sensitivity applies to joins. Therefore, it may be preferable to mix text fields in the upper case before blending them using the UPPER method. Furthermore, a data mix is always performed on a worksheet. The blending is done exclusively in that one worksheet and isn’t carried over to other sheets or dashboards. Consequently, a data mix cannot be shared with the public as a publicly available data source. The mixture only affects the selected worksheet.
There is always the main source and a secondary source in Tableau blending. Knowing which is the major source will help you evaluate the information more critically.
There is a blue check next to the major source and an orange one next to the secondary.
Again, it is critical to distinguish between primary and secondary data sources. Since it’s a left join, all primary fields that are relevant to the secondary will be combined. An orange checkmark appears in fields that come from a different data source.
The first field entered into a worksheet is considered to be the worksheet’s primary source. The first field’s data becomes the major data, while the rest of the fields’ data becomes secondary. If a field was on a deleted worksheet, the original data source will still be used. If you need to switch the worksheet’s primary data source, you’ll need to start over with a whole new sheet.
An Illustration Of Data Blending
Using a straightforward case study, we can better illustrate how Tableau data mixing works. Feel free to try out this sample for yourself by downloading it from Tableau Public.
We may compare actual sales to projected sales using the sample superstore data source and a fake file including monthly and annual budgets for each country. Be aware that the budget figures make no sense because they were generated by a random number generator.
It’s a typical request, and a bullet chart is an excellent tool for presenting the results of an actual vs. budget comparison. To determine which joins, filters and algorithms are most effective, let’s first generate a bullet chart. Start by importing data from the supermarket and the financial plan files. Afterward, it’s time to establish the connections.
Tableau will make assumptions about the relationships, which may not always be accurate. Set appropriate boundaries, especially when combining dates.
As an illustration, assume that the orders are placed every day and the budget is allocated every month. As a result, the connection between the monthly and annual portions of the dates should be made. Include the MY date pieces as well for the fullness. It’s important to keep in mind that simply creating these connections won’t automatically reflect them in the dashboard’s presentation. The ability to blend is only available in the context of a single worksheet.
Here we have a bullet chart, with sales data serving as the primary source and projected budget numbers serving as the secondary source.
Only between the date fields do we need to activate the blend to compare monthly sales against projections. Take note that the association is only valid for the Month field. While turning on the blend between countries and regions doesn’t cause any issues, it also doesn’t provide much value, therefore turning it on is unnecessary from this perspective.
Make a change so that Year and Month are no longer measurements but rather Region sales versus budget. Either the join between Regions or the join between Countries (which rolls up to Regions) must be enabled for the right budget number to be displayed.
This connection totals up the budget figures to the Regional level without considering the dates involved at any point. This perspective doesn’t require joining on the date field. However, that connection needs to be enabled if date filters are available to users.
Join must be turned on against the Country if comparing to a national budget. In this case, we don’t want the budget to be returned at the regional level, against each individual country, which is what would happen if we activated merely against the region.
You can see what happens if you just join the Region in the picture below. As a result of the merger, the regional budget that appears next to each country is actually that of the region as a whole.
The numbers in the Tableau bullet chart are on the wrong level because of an error in the data mixing.
This perspective is inappropriate for the active blend relationship’s level.
There are 2 major reasons why the depth of the join fields matters:
- In order to provide the right answers
- If you’re looking to maximize performance, it’s best to join at a high level and only do so when absolutely essential.
The correct relationship for the perspective must be carefully defined. Assume a scenario in which a country’s budget exists (a secondary source) but the country is not present in the original data set since no sales have occurred. Because the country in question isn’t included in the primary data source, this budget won’t be reflected in regional budget totals even if the relationship is activated at the country level.
If the country is located in a region that is included in the core data set (because it has sales, for example), then activating the relationship at the Region level will display the right regional budget.
Even a basic join would have this issue. Tableau’s approach involves building a scaffold data source to ensure that all potential permutations of necessary data actually exist. Become a tableau developer by joining the top Tableau training institute in Chennai.
Constraints of Tableau Data Blending
The following are typical examples of when data mixing is needed. Frequently, these factors contribute to the complexity of situations:
- Ineffective calculations involving a data blend
- Presence of the asterisk for data blending
- The ineffectiveness of the filters when using data blending
- View Data only displays “all data” from the primary data source and nothing from the secondary
- Performance of the Dashboard when blending data
Performance of The Dashboard When Blending Data
There may be a performance hit when blending in larger data sources. In some cases, this will be an underestimate; for example, when combining enormous data sources at a granular level, the impact on performance can be substantial.
Even a Tableau Server can be brought down by blending data at a granular level from numerous huge data sources. It may also negatively impact the performance of ALL other dashboard users on the server, assuming it doesn’t bring the service down entirely.
As a result, handle joins and data sources with caution. The consequences of doing them incorrectly are significant, and as the developer, you may not even be aware of the widespread poor performance being experienced by dashboard users. As was said earlier in this post, it is best to combine data from the highest granularity feasible.
Utilize Data Blending To Provide A Fixed Value Without A Level Of Detail.
One more thing to consider. There are situations where a fixed value is required, but even then it’s possible that the fixed number will need to fluctuate. A good example of a constant is last year’s sales, but as of January 1st of each year, “last year” is no longer last year. Additionally, historical sales data may be revised for a variety of operational reasons.
The data source can be blended into a worksheet if the constant rate of change is known. Get the value from a secondary data source, but make sure there are no active connections between the original and secondary data sources, and the value won’t change unless the underlying data does. If there are no longer any connections being used, the total will remain unchanged. LOD in tableau shall cause issues in certain instances.
This is comparable to a FIXED computation, but it works faster on huge data sets and is thus preferable.
Data Blending Is Brilliant When Used Correctly
It’s helpful to have data blending as a tool in Tableau. Admittingly, it’s not a few developers’ favorite thing; it’s complicated, and the behavior is often frustrating. However, there are situations when this is the most effective strategy. Refer to our Tableau interview questions and win your job like a breeze.