Introduction

Sqoop is a tool designed to transfer data between Hadoop and relational database servers.

Sqoop Workflow

Sqoop ships with a help tool. To display a list of all available tools, type the follow command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
$ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

Sqoop Eval

The sqoop eval tools allows to quickly run simple SQL queries against a database and the results are printed to the console. This allows users to preview their import queries to ensure they import the data they expect.

Sqoop List Databases

List database schemas present on a server.

Sqoop List Tables

List table schemas present on a database.

Sqoop Import

The sqoop import tool imports individual table from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files. The default format is text data.

Sqoop Export

The sqoop export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter.

Sqoop Job

sqoop job creates and saves the import and export commands. It specifies parameters to identify and recall the saved job.

Sqoop Installation

To see detailed environment setup steps, one can check the following link: [Big Data Experiment Environment Setup](Big Data Experiment Environment Setup)

Resource