Previously we talked about sqoop import, now in this article, we will continue on the journey with sqoop export.

Sqoop Export Common Scenarios

1. export a Hive table to MySQL

1
2
3
4
5
6
7
sqoop export \
  --connect "jdbc:mysql://quickstart:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --table products_export \
  --export-dir "/user/hive/warehouse/cca175/ps19/products_export" \
  --batch

NOTE:

  1. The --table and --export-dir variables are required for sqoop export. These specify the table to populate in the database, and the directory in HDFS that contains the source data.

2. export a Hive table and only update data in MySQL table

1
2
3
4
5
6
7
8
9
sqoop export \
  --connect "jdbc:mysql://quickstart:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --table departments \
  --export-dir "/user/hive/warehouse/cca175/ps20/departments_new" \
  --update-mode updateonly \
  --update-key department_id \
  -m 5

NOTE:

  1. we can use --update-mode updateonly option to specify that we just want to update existing records and we don’t want insert any new data to the MySQL table. --update-key column_name variable is to specify the particular column_name that we use for updating existing data

3. export a Hive table and insert and update data in MySQL table

1
2
3
4
5
6
7
8
9
sqoop export \
  --connect "jdbc:mysql://quickstart:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --table departments \
  --export-dir "/user/hive/warehouse/cca175/ps20/departments_new" \
  --update-mode allowinsert \
  --update-key department_id \
  -m 5

NOTE

  1. Different with previous scenario, now we need to use --update-mode allowinsert to tell sqoop that we are not only want to update existing data, but inserting new data to the MySQL table

4. export a Hive table and replace null characters

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sqoop export \
  --connect "jdbc:mysql://quickstart:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --table products \
  --export-dir "/user/hive/warehouse/cca175/" \
  --input-fields-terminated-by "\001" \
  --input-lines-terminated-by "\n" \
  --input-null-string "" \
  --input-null-non-string -999 \
  -m 1

NOTE:

  1. we can use --input-fields-terminated-by option to setup the input data delimiter in Hive. The default fields delimiter in Hive is \001. Similarly, --input-lines-terminated-by option is to setup the lines delimiter and the Hive default value is “\n”.
  2. --input-null-string option is to replace all the null string-typed value to the character we want. Whilst, --input-null-non-string option is to replace all the null non-string-typed value.