主页 > 开源代码  > 

大数据技术之HBase操作归纳

大数据技术之HBase操作归纳
HBase基本命令总结表(实际操作方式)

进入Hbase:hbase shell

方式一:命令行窗口来操作HBase 1.通用性命令 version 版本信息 status 查看集群当前状态 whoami 查看登入者身份 help 帮助 2.HBase DDL操作(对象级操作) 2.1、namespace命名空间(相当于库) # 1.【查看】已创建的【所有】命名空间列表 list_namespace --------------------------- NAMESPACE default hbase hbase_test 【test_hbase】 4 row(s) Took 0.0631 seconds --------------------------- # 2.【创建】命名空间 create_namespace "test_hbase" # 3.【查看】【指定】命名空间(库)中的表 list_namespace_tables "test_hbase" --------------------------- TABLE 0 row(s) Took 0.0301 seconds => [] --------------------------- # 4.【描述】命名空间的定义 describe_namespace "test_hbase" --------------------------- DESCRIPTION {NAME => 'test_hbase'} Quota is disabled --------------------------- # 5.【删除】命名空间 drop_namespace "test_hbase" 2.2、Table表 # 1.查看所有表 list --------------------------- TABLE hbase_test:student_info 1 row(s) Took 0.0202 seconds => ["hbase_test:student_info"] --------------------------- # 2.表是否存在 exists "test_hbase:test_table" --------------------------- Table test_hbase:test_table does exist Took 0.0114 seconds => true --------------------------- # 3.创建表 1.完整写法: create "test_hbase:test_table",{NAME => 'base', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'TRUE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'},{NAME => 'sources', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false', VERSIONS => '3', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '655360', REPLICATION_SCOPE => '0'} 说明文字: BLOOMFILTER布隆过滤器有三个参数=>ROW,ROWCOL,NONE ROW:只对行键进行BLOOMFILTER检测 => 分裂策略 ROWCOL:行健和列键进行BLOOMFILTER检测 NONE:不使用BLOOMFILTER,默认值为ROW TTL:TTL的值以秒为单位 2.简单写法:✔ create "test_hbase:test_table","base","sources" # 4.查看表的定义 desc "test_hbase:test_table" --------------------------- Table test_hbase:test_table is ENABLED test_hbase:test_table COLUMN FAMILIES DESCRIPTION {NAME => 'base', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DE LETED_CELLS => 'TRUE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => ' FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATIO N_SCOPE => '0'} {NAME => 'sources', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false', VERSIONS => '3', K EEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', T TL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '655360', RE PLICATION_SCOPE => '0'} --------------------------- # 5.查看表的状态 is_enabled "test_hbase:test_table" # 是否已启用 is_disabled "test_hbase:test_table" # 是否已禁用 enable "test_hbase:test_table" # 启用表 disable "test_hbase:test_table" # 禁用表 # 6.删除表【禁用状态的表才可以删除】 disable "test_hbase:test_table" drop "test_hbase:test_table" 3.HBase DML操作(数据级操作) # 1.添加数据=>列插入【一个put只能插入一列】 语法:put "表名","行键","列族:新增的信息","内容" 案例:【单】插入 put "test_hbase:test_table","1","base:name","胡桃" put "test_hbase:test_table","1","base:age",17 put "test_hbase:test_table","1","base:gender","女" put "test_hbase:test_table","1","sources:English",82 put "test_hbase:test_table","1","sources:Math",90 # 2.查看全表数据【全表扫描】 scan "test_hbase:test_table" --------------------------- ROW COLUMN+CELL 1 column=base:age, timestamp=2024-03-07T15:07:10.339, value=17 1 column=base:gender, timestamp=2024-03-07T15:07:14.510, value=\ xE5\xA5\xB3 1 column=base:name, timestamp=2024-03-07T15:07:06.009, value=\xE 8\x83\xA1\xE6\xA1\x83 1 column=sources:English, timestamp=2024-03-07T15:07:17.987, val ue=86 1 column=sources:Math, timestamp=2024-03-07T15:07:21.874, value= 97 --------------------------- # 3.查看表中记录数【行数】 count "test_hbase:test_table" --------------------------- 1 row(s) Took 0.0194 seconds => 1 --------------------------- # 4.查看某列值 4.1、查一行 get "test_hbase:test_table","1" --------------------------- COLUMN CELL base:age timestamp=2024-03-07T15:36:03.061, value=17 base:gender timestamp=2024-03-07T15:36:03.115, value=\xE5\xA5\xB3 base:name timestamp=2024-03-07T15:36:03.001, value=\xE8\x83\xA1\xE6\xA1\ x83 sources:English timestamp=2024-03-07T15:36:03.156, value=82 sources:Math timestamp=2024-03-07T15:36:03.192, value=90 --------------------------- 4.2、查一行一个列族 get "test_hbase:test_table","1","sources" --------------------------- COLUMN CELL sources:English timestamp=2024-03-07T15:36:03.156, value=82 sources:Math timestamp=2024-03-07T15:36:03.192, value=90 --------------------------- 4.3、查一行一个列族某个列 get "test_hbase:test_table","1","sources:English" --------------------------- COLUMN CELL sources:English timestamp=2024-03-07T15:36:03.156, value=82 --------------------------- # 5.删除数据 5.1、删除【一个单元格】 deleteall | delete "test_hbase:test_table","1","base:name" 5.2、删除【整行】 deleteall "test_hbase:test_table","2" 5.3、ROEPREFIXFILTEB:支持行键前缀批量删除,CACHE:修改批量的值 deleteall "test_hbase:test_table",{ROEPREFIXFILTEB="时间戳TS|字符串STR",CACHE=>100} 5.4、删除表中【所有数据】 disable "test_hbase:test_table" truncate "test_hbase:test_table" # 6.自增 -- 首次针对不存在的列操作,针对存在的列会报错:Field is not a log,it‘s 10 bytes wide -- 此后操作可针对【新添列名】进行 6.1、基本语法 自增:incr "[命名空间:]表名","行键","列族名:新添列名",增加数N 查询:get_counter "[命名空间:]表名","行键","列族名:新添列名" 6.2、案例展示 scan "test_hbase:test_table" --------------------------- ROW COLUMN+CELL 1 column=base:age, timestamp=2024-03-07T15:36:03.061, value=17 1 column=base:gender, timestamp=2024-03-07T15:36:03.115, value=\ xE5\xA5\xB3 1 column=base:name, timestamp=2024-03-07T15:36:03.001, value=\xE 8\x83\xA1\xE6\xA1\x83 1 column=sources:English, timestamp=2024-03-07T15:36:03.156, val ue=82 1 column=sources:Math, timestamp=2024-03-07T15:36:03.192, value= 90 --------------------------- incr "test_hbase:test_table","1","sources:count",2 --------------------------- ROW COLUMN+CELL 1 column=base:age, timestamp=2024-03-07T15:36:03.061, value=17 1 column=base:gender, timestamp=2024-03-07T15:36:03.115, value=\ xE5\xA5\xB3 1 column=base:name, timestamp=2024-03-07T15:36:03.001, value=\xE 8\x83\xA1\xE6\xA1\x83 1 column=sources:English, timestamp=2024-03-07T15:36:03.156, val ue=82 1 column=sources:Math, timestamp=2024-03-07T15:36:03.192, value= 90 1 column=sources:count, timestamp=2024-03-11T20:01:16.651, value =\x00\x00\x00\x00\x00\x00\x00\x02 --------------------------- # 7.预分区(hbase优化) 7.1、预分区 策略一:【NUMREGIONS:分区数量;SPLITALGO:分裂所采用的算法】 create "test_hbase:test_split","t1","t2",{NUMREGIONS=>3,SPLITALGO=>"UniformSplit"} 策略二:【SPLITS:行键取值范围(字母或数字)】 ###取值范围:0~100,101~200,201~300,301以上 create "test_hbase:test_rowkey_split","cf1","cf2",SPLITS=>["100","200","300"] 7.2、查看分区 scan "hbase:meta",{STARTROW=>"test_hbase:test_rowkey_split",LIMIT=>10} --------------------------- #hdfs存储信息 #drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B .tabledesc #drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B .tmp #drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 28c38ce5ff401333122c00c05e521ae3 #drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 4493f765702cc8979678f14cbcff17ff #drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 540c8c1f386356cab11f824e74d33fad #drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 867157c4f6ab39ba52ac6b3b58e6cbf4 --------------------------- 4.TOOLS ## 2个小文件合并为一个大文件 1 pact "[命名空间:]表名" ## 所有小的文件合并为一个大文件 2.major_compact "[命名空间:]表名" 方式二:Hive来操作HBase(HBase数据映射至Hive中进行操作) 1.向HBase导入数据 ## 基本格式 hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \ -Dimporttsv.separator="分隔符" \ -Dimporttsv.columns="HBASE_ROW_KEY,列族:列名..." \ "命名空间:表名" \ 文件路径 ## 案例(在shell命令窗下进行,不在hbase中进行) hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \ -Dimporttsv.separator="|" \ -Dimporttsv.columns=HBASE_ROW_KEY,base:name,base:age,sources:English,sources:Math \ test_hbase:test_table \ file:///root/file/hbase_file/students_for_import_2.csv 2.hive 表映射 hbase表(在hive中进行) # hive中建表并导入数据【hbase数据映射到hive中】 create external table yb12211.student_from_hbase( stu_id int, stu_name string, stu_age int, score_English int, score_Math int ) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties("hbase.columns.mapping"=":key,base:name,base:age,sources:English,sources:Math") tblproperties("hbase.table.name"="test_hbase:test_table"); 方式三:Java来操作HBase——数据迁移 1、应用场景的讲解

Java借助于HBase的API接口来操作HBase。

其核心功能主要是数据迁移。

1.借助于原生的HBase的API接口和Java jdbc的API接口,将传统的关系型数据库(mysql)中的数据导入到HBase中。 2.借助于文件流将普通的文件中的数据导入到HBase中。 2、初步准备工作 2.1:Maven创建 选择quick start,进行Maven创建 2.2:初步配置 一、删除url 二、properties配置 <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven piler.source>1.8</maven piler.source> <maven piler.target>1.8</maven piler.target> </properties> 三、基本检查,确保版本一致=>都为1.8|8版本 四、依赖(覆盖) <!-- MySql 驱动 --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.29</version> </dependency> <!-- HBase 驱动 --> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>2.3.5</version> </dependency> <!-- Hadoop --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.1.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-auth</artifactId> <version>3.1.3</version> </dependency> <!-- zookeeper --> <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.6.3</version> </dependency> <!-- log4j 系统日志 --> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.17</version> </dependency> <!--json tool--> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>2.0.47</version> </dependency> 3、最终的传参操作(验证操作) 运行配置的设置——传参

步骤一:先点击绿色的小锤子,然后再点击Edit Configurations的选项

步骤二:进行信息的配置

标签:

大数据技术之HBase操作归纳由讯客互联开源代码栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“大数据技术之HBase操作归纳