Introduction to schemas

 Generating business insights often means working through large amounts of data. And it's important that this data is stored and structured meaningfully. With Power BI, you can structure your data using a schema. In this video, you'll learn about different types of schemas and their advantages and disadvantages. Adventure Works wants to optimize its inventory and rework its sales strategy to sell more bicycles. But first, it needs to analyze the relevant data to determine the best way to approach this task. These data sources include customer, product, and sales data, along with information on other aspects of the business. Adventure Works can use a schema in Power BI to organize and build relationships between these different data sources. This way, the company can generate its required insights. Let's find out more about schemas and how Adventure Works can use one. A schema refers to a structure that defines the organization and relationships of tables within a dataset. It represents the logical framework of how the data is organized and connected. There are many benefits to using a schema in Power BI, which you'll explore over the course of this lesson. A schema plays a crucial role in defining the data structure. It also enables efficient data analysis, helps with the creation of visualizations, and assists with generating meaningful insights from your data. There are three different types of schema that can be used to organize and structure data, a flat schema, a star schema, and a snowflake schema. Let's review each of these schema types and find out how Adventure Works can use them. A flat schema is the simplest form of a data model. All attributes and fields related to the entity are stored in a single table. As you discovered in earlier courses, a table is a set of rows containing data, with each row divided into columns. Each column represents a piece of information with a specified data type. The required attributes and entities are stored in the rows and can be extracted as required from the columns. There are several advantages to a flat schema. It's easy to retrieve data from. It's less complex to analyze flat schema data, and it's a simpler way to visualize data. However, even though it's an easy approach to understand, the flat schema still has a few disadvantages. It requires large datasets, which are difficult to maintain and slow to query. It leads to data redundancy and inconsistency, so is more suited to smaller datasets. And it doesn't allow for complex datasets, which require more flexibility and detail. Next, let's explore the star schema data model. A star schema is a more advanced approach to structuring and organizing quantitative, or measurable data in Power BI. It allows for multiple tables to be connected through one central table. In a star schema, a central fact table connects to multiple dimension tables. You'll explore these concepts in a later lesson. These connections look like a star shape, so it's called a star schema. Adventure Works can build a star schema using a central fact table that contains sales transactions. The company can then link the fact table to dimension tables that contain records for customers, employees, dates, and marketing campaigns. Let's break down the components of the star schema using the example from Adventure Works database. First, there's the fact and dimension tables. You'll explore these further in a later lesson. And there are the table relationships. There are many different types of relationships, which you'll also explore in a later lesson. A star schema offers many advantages over a flat schema. By storing data in separate tables, star schemas help to reduce data redundancy and boost query performance. It also provides a clear, logical data model, which makes it easier to understand the data structure. However, it's also less flexible than other schema types. Adding or modifying tables can require extensive changes to the schema, and the star schema can struggle to manage complex relationships. Next is the third and final model, the snowflake schema. A snowflake schema is an extension of the star schema. It breaks down the dimension tables into multiple related tables. Existing tables in a star schema can be further denormalized into other tables, which creates a hierarchy. Yet these tables maintain a relationship with the dimension and central facts tables. For example, Adventure Works can further normalize its product data into supplier and category data tables. Don't worry about the terms normalize and denormalize for now. You'll learn more about these concepts later in the course. Extending a star schema into a snowflake schema offers several advantages. It provides more efficient data storage and retrieval. It improves data integrity and consistency, and it reduces data redundancy. It also offers scalability and flexibility by integrating new data tables as required. Yet there's also disadvantages to a snowflake schema. It's more difficult to perform data analysis because of the extra relationships. These new relationships also make the schema more challenging to understand and manage, and they result in slower queries. Finally, it's important to validate your schemas to make sure they're accurate. When validating a schema, you need to check for the following, make sure each table column has been assigned the correct data type, like text and numeric. Check that each column has the correct formatting applied. Confirm that all columns have clear descriptions with relevant context, and make sure all table and column properties are correctly configured. You should now be familiar with the different types of schemas in Power BI and their advantages and disadvantages. You can build on this knowledge to develop robust data models in Power BI. This way, you'll ensure that your data retains its integrity and simplicity, and can be used to generate insights.

Post a Comment

Previous Post Next Post