Header Ads Widget

Big Data Architecture

Big Data Architecture :

Big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems.


Logical diagram of a big data architecture style


The big data architectures include the following components:

Data sources: All big data solutions start with one or more data sources. 

Example,

o Application data stores, such as relational databases.

o Static files produced by applications, such as web server log files.

o Real-time data sources, such as IoT devices.

Data storage: Data for batch processing operations is stored in a distributed file store that can hold high volumes of large files in various formats (also called data lake). 

Example,

Azure Data Lake Store or blob containers in Azure Storage.

Batch processing: Since the data sets are so large, therefore a big data solution must process data files using long-running batch jobs to filter, aggregate, and prepare the data for analysis.

Real-time message ingestion: If a solution includes real-time sources, the architecture must include a way to capture and store real-time messages for stream processing.

Stream processing: After capturing real-time messages, the solution must process them by filtering, aggregating, and preparing the data for analysis. The processed stream data is then written to an output sink. We can use open-source Apache streaming technologies like Storm and Spark Streaming for this.

Analytical data store: Many big data solutions prepare data for analysis and then serve the processed data in a structured format that can be queried using analytical tools. Example: Azure Synapse Analytics provides a managed service for large-scale, cloud-based data warehousing.

Analysis and reporting: The goal of most big data solutions is to provide insights into the data through analysis and reporting. To empower users to analyze the data, the architecture may include a data modelling layer. Analysis and reporting can also take the form of interactive data exploration by data scientists or data analysts.

Orchestration: Most big data solutions consist of repeated data processing operations, that transform source data, move data between multiple sources and sinks, load the processed data into an analytical data store, or push the results straight to a report. To automate these workflows, we can use an orchestration technology such as Azure Data Factory.

Post a Comment

0 Comments