Data Ingestion :
- Hadoop Data ingestion is the beginning of your data pipeline in a data lake.
- It means taking data from various silo databases and files and putting it into Hadoop.
- For many companies, it does turn out to be an intricate task.
- That is why they take more than a year to ingest all their data into the Hadoop data lake.
- The reason is, as Hadoop is open-source; there are a variety of ways you can ingest data into Hadoop.
- It gives every developer the choice of using her/his favourite tool or language to ingest data into Hadoop.
- Developers while choosing a tool/technology stress on performance, but this makes governance very complicated.
- Sqoop :
- Apache Sqoop (SQL-to-Hadoop) is a lifesaver for anyone who is experiencing difficulties in moving data from the data warehouse into the Hadoop environment.
- Apache Sqoop is an effective Hadoop tool used for importing data from RDBMS’s like MySQL, Oracle, etc. into HBase, Hive or HDFS.
- Sqoop Hadoop can also be used for exporting data from HDFS into RDBMS.
- Apache Sqoop is a command-line interpreter i.e. the Sqoop commands are executed one at a time by the interpreter.
- Flume :
- Apache Flume is a service designed for streaming logs into the Hadoop environment.
- Flume is a distributed and reliable service for collecting and aggregating huge amounts of log data.
- With a simple and easy to use architecture based on streaming data flows, it also has tunable reliability mechanisms and several recoveries and failover mechanisms.
0 Comments