Ingesting and gathering data

 Adio, the data analyst at AdventureWorks needs to analyze sales data from multiple channels, including physical stores and e-commerce platforms. He asks the data analytics team to gather and ingest the data, a fundamental step before he can proceed with the later stages of the extract, transform, load or ETL process. In this video, you'll explore data gathering and ingestion, including different methods together and ingest data and their advantages and disadvantages. Let's start by outlining data gathering and ingestion, which typically take place in the extract step of the ETL process. Data can come from a variety of sources, such as structured data from spreadsheets or databases, unstructured data from text files or social media posts, and streaming data from real-time data transmissions, such as webcams or satellite navigation systems. Data gathering involves collecting or acquiring data from these different sources. An example of gathering data is the data analytics team at AdventureWorks collecting all their sales data, ranging from spreadsheets to real-time streams. Data ingestion starts with data gathering and it encompasses the process of obtaining and importing data from various sources for immediate use or storage, such as in a database. For example, as a part of data ingestion, the team at AdventureWorks can go on to extract relevant data from each source, such as customer data and sales metrics like revenue. They can then load it into a central database where it can be accessed for further processing and transformation. The data gathering and ingestion process is beneficial for organizations for various reasons. With data volume, velocity or speed of generation and variety in terms of types and sources constantly increasing, it helps organizations consolidate their data. This unified view of their data facilitates comprehensive analysis, data-driven decision-making, and innovation. Data ingestion improves operational efficiency to process automation. Proper ingestion practices can also help organizations meet regulatory requirements, protect sensitive data, and ensure data integrity. Now that you know more about data gathering and ingestion and its benefits, let's explore some common methods for gathering and ingesting data, as well as their advantages and limitations. These include manual data entry, file-based ingestion, database connections, web scraping, and data streaming. Manual data entry is the most basic method of data gathering and ingestion, where data is manually inputted into a system. For example, an employee at AdventureWorks may type in data from a physical customer order form into a customer relationship management or CRM system. While manual data entry is straightforward and suitable for small amounts of data, it is time-consuming, prone to errors, and unsuitable for large scale data ingestion. Another method is file-based ingestion. The process of importing data from files such as spreadsheets. To illustrate, AdventureWorks might receive sales data from retail stores in Excel spreadsheets. These files can be imported into the ETL process using tools that read and parse or interpret the file contents. While file based ingestion is common and requires less technical expertise than other methods, it can become cumbersome when dealing with large numbers of files or frequent updates. With the database connection methods, you access data directly from a database or data warehouse using tools that can connect to and query the source. For example, AdventureWorks can create a database connection to access data from its sales database using SQL queries. This connection enables the analytics team to extract necessary data by using SQL commands as well as transform and load it for further analysis later in the ETL process. While database connections offer real-time access to data, enabling instant insights and prompt decision-making, they do require knowledge of database languages like SQL and may involve complex configuration or authentication process. Web scraping is a method of extracting data from websites using automated methods or software tools. In the case of AdventureWorks, the analytics team can use web scraping to gather competitor pricing information or customer reviews. Web scraping is a powerful way to gather data from websites, but it can require legal permission and be complex as it involves a range of technologies. Streaming data is continuous, real-time data generated by sensors or other sources. You can ingest data streaming using tools that connect to and process the data as it is generated. For instance, AdventureWorks could use data streaming to monitor factory equipment, track inventory levels, or analyze real-time sales data. Data streaming allows for immediate analysis and decision-making, but requires specialized tools and infrastructure to handle the continuous flow of data. Each data ingestion method has its advantages and limitations so it's essential to choose the appropriate data ingestion method based on your specific use case and the nature of the data you're working with. In summary, data gathering and ingestion involve obtaining and importing data from different sources, generally in the extract phase of the ETL process. Data gathering and ingestion have many benefits for businesses, from consolidating data to facilitating innovation. By mastering the data gathering and ingestion methods introduced in this video, you can help organizations like AdventureWorks optimize their data for analysis.

Post a Comment

Previous Post Next Post