Header Ads Widget

Data Ingestion

Data Ingestion :

  • Hadoop Data ingestion is the beginning of your data pipeline in a data lake. 
  • It means taking data from various silo databases and files and putting it into Hadoop. 
  • For many companies, it does turn out to be an intricate task. 
  • That is why they take more than a year to ingest all their data into the Hadoop data lake.
  • The reason is, as Hadoop is open-source; there are a variety of ways you can ingest data into Hadoop. 
  • It gives every developer the choice of using her/his favourite tool or language to ingest data into Hadoop. 
  • Developers while choosing a tool/technology stress on performance, but this makes governance very complicated.
  • Sqoop :
  • Apache Sqoop (SQL-to-Hadoop) is a lifesaver for anyone who is experiencing difficulties in moving data from the data warehouse into the Hadoop environment.
  •  Apache Sqoop is an effective Hadoop tool used for importing data from RDBMS’s like MySQL, Oracle, etc. into HBase, Hive or HDFS. 
  • Sqoop Hadoop can also be used for exporting data from HDFS into RDBMS.
  • Apache Sqoop is a command-line interpreter i.e. the Sqoop commands are executed one at a time by the interpreter.
  • Flume :
  • Apache Flume is a service designed for streaming logs into the Hadoop environment. 
  • Flume is a distributed and reliable service for collecting and aggregating huge amounts of log data. 
  • With a simple and easy to use architecture based on streaming data flows, it also has tunable reliability mechanisms and several recoveries and failover mechanisms.

Post a Comment

0 Comments