Hbase Shell 常用命令

HBase是一个分布式的、面向列的开源数据库

Apache HBase是一种Key/Value系统,它运行在HDFS之上。和Hive不一样,Hbase的能够在它的数据库上实时运行,而不是运行MapReduce任务。Hbase被分区为表格,表格又被进一步分割为列簇。列簇必须使用schema定义,列簇将某一类型列集合起来(列不要求schema定义),每一个 key/value对在Hbase中被定义为一个cell,每一个key由row-key,列簇、列和时间戳。在Hbase中,行是key/value映射的集合,这个映射通过row-key来唯一标识。Hbase利用Hadoop的基础设施,可以利用通用的设备进行水平的扩展。

Hive是一种类SQL的引擎,并且运行MapReduce任务。Hive适合用来对一段时间内的数据进行分析查询
Hbase是一种在Hadoop之上的NoSQL 的Key/vale数据库。Hbase非常适合用来进行大数据的实时查询

HBase shell是HBase的一套命令行工具,类似传统数据中的sql概念,可以使用shell命令来查询HBase中数据的详细情况。安装完HBase之后,如果配置了HBase的环境变量,只要在shell中执行hbase shell就可以进入命令行界面

namespace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
hbase(main):002:0> create_namespace 'ark'
0 row(s) in 0.6910 seconds
hbase(main):003:0> describe_namespace 'ark'
DESCRIPTION
{NAME => 'ark'}
1 row(s) in 0.0330 seconds
hbase(main):004:0> list_namespace
NAMESPACE
ark
default
hbase
3 row(s) in 0.0410 seconds
hbase(main):006:0> create 'ark:users','info','roles'
0 row(s) in 4.5260 seconds
=> Hbase::Table - ark:users
hbase(main):007:0> list_namespace_tables 'ark'
TABLE
users
1 row(s) in 0.0330 seconds

##DDL

create命令了。它后面的第一个参数是表名,然后是一系列列簇的列表。每个列簇中可以独立指定它使用的版本数,数据有效保存时间(TTL),是否开启块缓存等信息

1
create 'ark:t1', {NAME =>'f1',VERSIONS =>1,BLOCKCACHE => true},'f2'

表也可以在创建时指定它预分割(pre-splitting)的region数和split方法。在表初始建立时,HBase只分配给这个表一个region。这就意味着当我们访问这个表数据时,我们只会访问一个region server,这样就不能充分利用集群资源。HBase提供了一个工具来管理表的region数,即org.apache.hadoop.hbase.util.RegionSplitter和HBase shell中create中的split的配置项。例如:

1
2
3
4
5
6
hbase(main):013:0> exists 'ark:t2'
Table ark:t2 does not exist
0 row(s) in 0.0320 seconds
hbase(main):014:0> create 't2', 'f1', {NUMREGIONS => 3, SPLITALGO => 'HexStringSplit'}
0 row(s) in 4.5200 seconds

通过enable和disable来启用/禁用这个表,相应的可以通过is_enabled和is_disabled来检查表是否被禁用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
hbase(main):017:0> is_enabled 'ark:t2'
true
0 row(s) in 0.0190 seconds
hbase(main):018:0> disable 'ark:t2'
0 row(s) in 4.5480 seconds
hbase(main):023:0> alter 'ark:t1' ,{NAME =>'f1',VERSIONS=>6}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 3.8640 seconds

使用alter来改变表的属性,比如改变列簇的属性, 这涉及将信息更新到所有的region

1
2
3
4
5
6
7
8
9
hbase(main):024:0> describe 'ark:t1'
Table ark:t1 is ENABLED
ark:t1
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '6', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0260 seconds

另外一个非常常用的操作是添加和删除列簇

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
hbase(main):025:0> alter 'ark:t1','f3'
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 3.9290 seconds
hbase(main):026:0> describe 'ark:t1'
Table ark:t1 is ENABLED
ark:t1
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '6', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
3 row(s) in 0.0200 seconds
hbase(main):027:0> alter 'ark:t1' ,{NAME=>'f3',METHOD=>'delete'}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 4.2140 seconds
hbase(main):029:0> describe 'ark:t1'
Table ark:t1 is ENABLED
ark:t1
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '6', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'f2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0230 seconds

删除表需要先将表disable。

1
2
3
4
5
hbase(main):032:0> disable 'ark:t1'
0 row(s) in 2.3810 seconds
hbase(main):033:0> drop 'ark:t1'
0 row(s) in 2.3310 seconds

put与get

在HBase shell中,我们可以通过put命令来插入数据。例如我们新创建一个表,它拥有id、address和info三个列簇,并插入一些数据。列簇下的列不需要提前创建,在需要时通过
:来指定即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
create 'member','id','address','info'
put 'member', 'debugo','id','11'
put 'member', 'debugo','info:age','27'
put 'member', 'debugo','info:birthday','1987-04-04'
put 'member', 'debugo','info:industry', 'it'
put 'member', 'debugo','address:city','beijing'
put 'member', 'debugo','address:country','china'
put 'member', 'Sariel', 'id', '21'
put 'member', 'Sariel','info:age', '26'
put 'member', 'Sariel','info:birthday', '1988-05-09 '
put 'member', 'Sariel','info:industry', 'it'
put 'member', 'Sariel','address:city', 'beijing'
put 'member', 'Sariel','address:country', 'china'
put 'member', 'Elvis', 'id', '22'
put 'member', 'Elvis','info:age', '26'
put 'member', 'Elvis','info:birthday', '1988-09-14 '
put 'member', 'Elvis','info:industry', 'it'
put 'member', 'Elvis','address:city', 'beijing'
put 'member', 'Elvis','address:country', 'china'

查询

1.查询表中有多少行:count

1
count 'member'

2.get操作

1)获取一个id的所有数据

1
2
3
4
5
6
7
8
9
get 'member','Sariel'
COLUMN CELL
address:city timestamp=1532076882889, value=beijing
address:country timestamp=1532076882925, value=china
id: timestamp=1532076882756, value=21
info:age timestamp=1532076882780, value=26
info:birthday timestamp=1532076882808, value=1988-05-09
info:industry timestamp=1532076882833, value=it
6 row(s) in 0.0570 seconds

2)获得一个id,一个列簇(一个列)中的所有数据

1
2
3
4
5
6
hbase(main):001:0> get 'member' ,'Sariel','info'
COLUMN CELL
info:age timestamp=1532076882780, value=26
info:birthday timestamp=1532076882808, value=1988-05-09
info:industry timestamp=1532076882833, value=it
3 row(s) in 0.4630 seconds

3.scan操作

1)查询整表数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
hbase(main):002:0> scan 'member'
ROW COLUMN+CELL
Elvis column=address:city, timestamp=1532076883093, value=beijing
Elvis column=address:country, timestamp=1532076884185, value=china
Elvis column=id:, timestamp=1532076882949, value=22
Elvis column=info:age, timestamp=1532076882972, value=26
Elvis column=info:birthday, timestamp=1532076883041, value=1988-09-14
Elvis column=info:industry, timestamp=1532076883066, value=it
Sariel column=address:city, timestamp=1532076882889, value=beijing
Sariel column=address:country, timestamp=1532076882925, value=china
Sariel column=id:, timestamp=1532076882756, value=21
Sariel column=info:age, timestamp=1532076882780, value=26
Sariel column=info:birthday, timestamp=1532076882808, value=1988-05-09
Sariel column=info:industry, timestamp=1532076882833, value=it
debugo column=address:city, timestamp=1532076882699, value=beijing
debugo column=address:country, timestamp=1532076882725, value=china
debugo column=id:, timestamp=1532076882582, value=11
debugo column=info:age, timestamp=1532076882627, value=27
debugo column=info:birthday, timestamp=1532076882650, value=1987-04-04
debugo column=info:industry, timestamp=1532076882676, value=it
3 row(s) in 0.0870 seconds

2)扫描整个列簇

1
2
3
4
5
6
7
8
9
10
11
12
hbase(main):003:0> scan 'member',{COLUMN=>'info'}
ROW COLUMN+CELL
Elvis column=info:age, timestamp=1532076882972, value=26
Elvis column=info:birthday, timestamp=1532076883041, value=1988-09-14
Elvis column=info:industry, timestamp=1532076883066, value=it
Sariel column=info:age, timestamp=1532076882780, value=26
Sariel column=info:birthday, timestamp=1532076882808, value=1988-05-09
Sariel column=info:industry, timestamp=1532076882833, value=it
debugo column=info:age, timestamp=1532076882627, value=27
debugo column=info:birthday, timestamp=1532076882650, value=1987-04-04
debugo column=info:industry, timestamp=1532076882676, value=it
3 row(s) in 0.0600 seconds

3)指定扫描其中的某个列

1
2
3
4
5
6
hbase(main):006:0* scan 'member', {COLUMNS=> 'info:birthday'}
ROW COLUMN+CELL
Elvis column=info:birthday, timestamp=1532076883041, value=1988-09-14
Sariel column=info:birthday, timestamp=1532076882808, value=1988-05-09
debugo column=info:birthday, timestamp=1532076882650, value=1987-04-04
3 row(s) in 0.0280 seconds

4)除了列(COLUMNS)修饰词外,HBase还支持Limit(限制查询结果行数),STARTROW(ROWKEY起始行。会先根据这个key定位到region,再向后扫描)、STOPROW(结束行)、TIMERANGE(限定时间戳范围)、VERSIONS(版本数)、和FILTER(按条件过滤行)等。比如我们从Sariel这个rowkey开始,找下一个行的最新版本

1
2
3
4
5
6
7
8
9
hbase(main):007:0> scan 'member', { STARTROW => 'Sariel', LIMIT=>1, VERSIONS=>1}
ROW COLUMN+CELL
Sariel column=address:city, timestamp=1532076882889, value=beijing
Sariel column=address:country, timestamp=1532076882925, value=china
Sariel column=id:, timestamp=1532076882756, value=21
Sariel column=info:age, timestamp=1532076882780, value=26
Sariel column=info:birthday, timestamp=1532076882808, value=1988-05-09
Sariel column=info:industry, timestamp=1532076882833, value=it
1 row(s) in 0.0360 seconds

5)Filter是一个非常强大的修饰词,可以设定一系列条件来进行过滤。比如我们要限制某个列的值等于26

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#值包含
hbase(main):010:0> scan 'member', FILTER=>"ValueFilter(=,'binary:26')"
ROW COLUMN+CELL
Elvis column=info:age, timestamp=1532076882972, value=26
Sariel column=info:age, timestamp=1532076882780, value=26
2 row(s) in 0.0740 seconds
#包含字符串
hbase(main):013:0> scan 'member', FILTER=>"ValueFilter(=,'substring:6')"
ROW COLUMN+CELL
Elvis column=info:age, timestamp=1532076882972, value=26
Sariel column=info:age, timestamp=1532076882780, value=26
2 row(s) in 0.0280 seconds
#列名中的前缀为birth的
hbase(main):016:0> scan 'member', FILTER=>"ColumnPrefixFilter('birth')"
ROW COLUMN+CELL
Elvis column=info:birthday, timestamp=1532076883041, value=1988-09-14
Sariel column=info:birthday, timestamp=1532076882808, value=1988-05-09
debugo column=info:birthday, timestamp=1532076882650, value=1987-04-04
3 row(s) in 0.8280 seconds
#FILTER中支持多个过滤条件通过括号、AND和OR的条件组合
hbase(main):019:0> scan 'member',FILTER=>"ColumnPrefixFilter('birth') AND ValueFilter(=,'substring:1988')"
ROW COLUMN+CELL
Elvis column=info:birthday, timestamp=1532076883041, value=1988-09-14
Sariel column=info:birthday, timestamp=1532076882808, value=1988-05-09
2 row(s) in 0.0950 seconds

scan ‘db_demobank626:dim_p’ ,FILTER=>”PrefixFilter(‘profile|xwhov’)”
-s 2000 -S 5000 -t 10 -s 2000 -S 5000 -t 10

坚持原创技术分享,您的支持将鼓励我继续创作!
分享