What is staging area

 Data staging is one of the key concepts in data loading. Over the next few minutes, you will learn the basics of data staging, the reasons for its necessity and the advantages of using it in the overall ETL processes. To better understand the concept of staging, let's use in everyday life example. Imagine you've invited friends over for dinner and you've bought ingredients from the grocery store to prepare the meal. However, you will serve the ingredients as they are. You might marinate the meat in a pot, cut the vegetables and place them in a bowl for washing, and prepare other dishes like making a salad or putting appetizers on a plate. In this example, all the ingredients represent raw data while the processes of marinating, washing, cutting, and waiting correspond to ETL operations, the pots, bowls, and other utensils used before serving can be thought of as the Staging Area. Now, let's apply this everyday life example to data staging. A Staging Area serves as an intermediate storage location for raw or unprocessed data, allowing it to be temporarily stored and prepared for further processing. The Staging Area typically act as a bridge between the data sources and the data warehouse. A Staging Area simplifies the process of data cleansing and consolidation of operational data originating from multiple source systems, particularly for enterprise data warehouses that centralize an organization's critical data. Remember, a data Staging Area is not required for your ETL jobs. You can still execute ETL jobs without creating one. However, based on your need to consolidate data coming from multiple sources, it is recommended. Over at Adventure Works, the company receives feedback about its products from various channels, such as social media platforms and corporate websites. Your manager, Adio Queen, has tasked you to prepare a dataset by using these resources to consolidate and to prepare the data for use in reports and dashboards. None of the feedback can be used in its raw form as they have different formats. You must transform the data and then consolidate it in a unified list. Since you will only use this data in the ETL process, it is appropriate to use a Staging Area. Let's take a few moments to complete this task using Power Query. The first step is to import the two datasets. Adventure Works Social Media Feedbacks 1, and Adventure Works Social Media Feedbacks 2, to transform and consolidate in the Staging Area. To do this, navigate to the Home ribbon tab at the top of the Power BI window. Select the Excel workbook button inside the data group in the middle of the toolbar. Select your datasets and select Open. Then select your datasets and select transform data in the window that opened.


Now, you have two queries, Adventure Works Social Media Feedbacks 1, and Adventure Works Social Media Feedbacks 2 in the queries pane at the left menu of Power Query. To successfully complete your task, you have to consolidate these two queries into a single query and add an extra column to indicate where the feedback came from. To do this, you have to use these queries and integrate the data into a more defined, and optimized model. To do this, you need a Staging Area. As you have to consolidate these two tables into one, but also keep them separately, you have to create a new group called the Staging Area. In the queries pane at the left menu of Power Query, select New Group, type Staging Area in the name text box and select, Okay, now move both the datasets, Adventure Works Social Media Feedbacks 1, and Adventure Works Social Media Feedbacks 2, to the staging query's group. Your tables are now organized according to your need. Select the Adventure Works Social Media Feedbacks 1, and Adventure Works Social Media Feedbacks 2 tables respectively, and disable the load by clearing the checkbox enable load, you will keep the include and report refresh option. This way, both tables will still be used in queries, but will not be part of the data model. You are now familiar with the concept of a Staging Area and how it is implemented in Power BI.

Post a Comment

Previous Post Next Post