Data Ingestion

Data Ingestion :

Hadoop Data ingestion is the beginning of your data pipeline in a data lake.
It means taking data from various silo databases and files and putting it into Hadoop.
For many companies, it does turn out to be an intricate task.
That is why they take more than a year to ingest all their data into the Hadoop data lake.
The reason is, as Hadoop is open-source; there are a variety of ways you can ingest data into Hadoop.
It gives every developer the choice of using her/his favourite tool or language to ingest data into Hadoop.
Developers while choosing a tool/technology stress on performance, but this makes governance very complicated.
Sqoop :
Apache Sqoop (SQL-to-Hadoop) is a lifesaver for anyone who is experiencing difficulties in moving data from the data warehouse into the Hadoop environment.
Apache Sqoop is an effective Hadoop tool used for importing data from RDBMS’s like MySQL, Oracle, etc. into HBase, Hive or HDFS.
Sqoop Hadoop can also be used for exporting data from HDFS into RDBMS.
Apache Sqoop is a command-line interpreter i.e. the Sqoop commands are executed one at a time by the interpreter.
Flume :
Apache Flume is a service designed for streaming logs into the Hadoop environment.
Flume is a distributed and reliable service for collecting and aggregating huge amounts of log data.
With a simple and easy to use architecture based on streaming data flows, it also has tunable reliability mechanisms and several recoveries and failover mechanisms.

Classification/Types of Operating Systems