HBase数据导入ImportTsv-创新互联

ImportTsv 工具是通过map reduce 完成的。所以要启动yarn. 工具要使用jar包，所以注意配置classpath。ImportTsv默认是通过hbase api 插入数据的

公司专注于为企业提供成都网站建设、成都做网站、微信公众号开发、商城系统网站开发，微信小程序开发，软件按需求定制设计等一站式互联网企业服务。凭借多年丰富的经验，我们会仔细了解各客户的需求而做出多方面的分析、设计、整合，为客户设计出具风格及创意性的商业解决方案，成都创新互联更提供一系列网站制作和网站推广的服务。
[hadoop-user@rhel work]$ cat /home/hadoop-user/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
JAVA_HOME=/usr/java/jdk1.8.0_171-amd64
PATH=$PATH:$JAVA_HOME/bin
CLASSPATH=$CLASSPATH:$JAVA_HOME/lib
HADOOP_HOME=/home/hadoop-user/hadoop-2.8.0
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
HBASE_HOME=/home/hadoop-user/hbase-2.0.0
PATH=$PATH:$HBASE_HOME/bin
CLASSPATH=$CLASSPATH:$HBASE_HOME/lib
ZOOKEEPER_HOME=/home/hadoop-user/zookeeper-3.4.12
PATH=$PATH:$ZOOKEEPER_HOME/bin
PHOENIX_HOME=/home/hadoop-user/apache-phoenix-5.0.0-alpha-HBase-2.0-bin
PATH=$PATH:$PHOENIX_HOME/bin
export PATH
创建表

hbase(main):033:0> create 'test','cf'
创建要导入的文件
[hadoop-user@rhel work]$ cat /home/hadoop-user/work/sample1.csv
row10,"mjj10"
row11,"mjj11"
row12,"mjj12"
row14,"mjj13"
将文件放入hdfs
[hadoop-user@rhel work]$ hdfs dfs -put /home/hadoop-user/work/sample1.csv /sample1.csv
ImportTsv导入命令
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf:a test /sample1.csv
注: HBASE_ROW_KEY表示文件rowid的位置，后面是列的定义。这里意思是导入的列为列族为cf,列名为a。要导入的文件是hdfs中/sample1.csv
帮助中的解释
Usage: importtsv -Dimporttsv.columns=a,b,c
Imports the given input directory of TSV data into the specified table.
The column names of the TSV data must be specified using the -Dimporttsv.columns
option. This option takes the form of comma-separated column names, where each
column name is either a simple column family, or a columnfamily:qualifier. The special
column name HBASE_ROW_KEY is used to designate that this column should be used
as the row key for each imported record. You must specify exactly one column
to be the row key, and you must specify a column name for every column that exists in the
input data. Another special columnHBASE_TS_KEY designates that this column should be
used as timestamp for each record. Unlike HBASE_ROW_KEY, HBASE_TS_KEY is optional.
You must specify at most one column as timestamp key for each imported record.
Record with invalid timestamps (blank, non-numeric) will be treated as bad record.
Note: if you use this option, then 'importtsv.timestamp' option will be ignored.
注意: ImportTsv导入的内容，phoenix看不到。事实上，hbase创建的表，Phoenix看不到。phoenix创建的表，hbase能看到，但是内容是编码后的内容。
importtsv 工具默认使用hbase put api导数据.当使用选项 -Dimporttsv.bulk.output时，将会先生成HFILE文件的内部格式的文件。
The importtsv tool, by default, uses the HBase Put API to insert data into the HBase
table using TableOutputFormat in its map phase. But when the -Dimporttsv.bulk.output option is specified, it instead generates HBase internal format (HFile) files on HDFS
by using HFileOutputFormat. Therefore, we can then use the completebulkload tool to load the generated files into a running cluster. The following steps are to use the bulk output and load tools:
生成HFILE格式的文件命令
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.bulk.output=/hfiles_tsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:a test /sample1.csv
注: 生成hfile格式的文件，存放于hdfs中的/hfile_tsv目录中，目录会由命令自己创建。
[hadoop-user@rhel work]$ hdfs dfs -ls /hfiles_tsv/cf
18/06/28 10:49:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--  1 hadoop-user supergroup    5125 2018-06-28 10:40 /hfiles_tsv/cf/0e466616d42a4a128fb60caa7dbe075a
注: 0e466616d42a4a128fb60caa7dbe075a的命名格式跟WEB中region的命名格式很像
通过

hadoop jar hbase-server-2.0.0.jar completebulkload /hfiles_tsv 'test'
出现异常； Exception in thread "main" java.lang.ClassNotFoundException: completebulkload
HBASE文档中两种导入方式:
There are two ways to invoke this utility, with explicit classname and via the driver:
Explicit Classname
$ bin/hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles
Driver
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar completebulkload

另外有需要云服务器可以了解下创新互联scvps.cn，海内外云服务器15元起步，三天无理由+7*72小时售后在线，公司持有idc许可证，提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案，具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势，专为企业上云打造定制，能够满足用户丰富、多元化的应用场景需求。

网站名称：HBase数据导入ImportTsv-创新互联
分享链接：http://kswsj.cn/article/ijhgp.html

其他资讯

CentOS6.9系统下部署Zabbix-server3.0的步骤
C#如何实现打字小游戏
互联网中网站排名好流量少怎么办
曼孚科技：7种常用的数据标注工具
Inheritable中的ThreadLocal是什么

关于创新互联

HBase数据导入ImportTsv-创新互联

其他资讯