Clickhouse Dictionaries 外部字典的数据源配置

Clickhouse Dictionaries - Sources of External Dictionaries

Clickhouse允许从不同的源构造外部字典,配置文件通常像这样:

1
2
3
4
5
6
7
8
9
10
11
12
<yandex>
<dictionary>
...
<source>
<source_type>
<!-- Source configuration -->
</source_type>
</source>
...
</dictionary>
...
</yandex>

数据源则是通过source项进行配置

其中支持的数据源的类型有(source_type):

  • Local file
  • Executable file
  • HTTP(s)
  • DBMS
    • MySQL
    • ClickHouse
    • MongoDB
    • ODBC

Local file 本地文件

配置

1
2
3
4
5
6
<source>
<file>
<path>/opt/dictionaries/os.tsv</path>
<format>TabSeparated</format>
</file>
</source>
  • path - 本地文件的绝对路径
  • format - The file format. All the formats described in “Formats” are supported

Executable File

ClickHouse通过向可执行文件的STDIN发送命令来获得所需的内容

1
2
3
4
5
6
<source>
<executable>
<command>cat /opt/dictionaries/os.tsv</command>
<format>TabSeparated</format>
</executable>
</source>
  • command: 可执行文件的绝对路径,或者文件名(可执行命令的目录已添加到PATH)
  • format - The file format. All the formats described in “Formats” are supported

HTTP(s)

ClickHouse通过发送POST请求来获得所需的内容

配置样例:

1
2
3
4
5
6
<source>
<http>
<url>http://[::1]/os.tsv</url>
<format>TabSeparated</format>
</http>
</source>

如果需要支持https,需要在服务端配置文件中配置好ssl 相关设置,具体参考

  • url – The source URL.
  • format – The file format. All the formats described in “Formats” are supported.

ODBC

通过ODBC驱动连接任意数据库表

1
2
3
4
5
6
<odbc>
<db>DatabaseName</db>
<table>ShemaName.TableName</table>
<connection_string>DSN=some_parameters</connection_string>
<invalidate_query>SQL_QUERY</invalidate_query>
</odbc>
  • db – 数据库名称. 如果在<connection_string>中,则可以忽略.
  • table – Name of the table and schema if exists.
  • connection_string – Connection string.
  • invalidate_query – 用于检测字典状态的QuerySQL. 可选参数. Read more in the section Updating dictionaries.

ClickHouse从ODBC驱动程序接收引用符号并将查询中的所有设置引用到驱动程序,因此必须根据数据库中的表名称大小写相应地设置表名。

注意:已知ODBC字典功能的漏洞

通过ODBC驱动程序连接参数连接到数据库时,可以替换Servername。 在这种情况下,来自odbc.ini的USERNAME和PASSWORD值将被发送到远程服务器并可能被泄露。

/etc/odbc.ini 示例(PostgreSQL)

1
2
3
4
5
6
7
8
[gregtest]
Driver = /usr/lib/psqlodbca.so
Servername = localhost
PORT = 5432
DATABASE = test_db
#OPTION = 3
USERNAME = test
PASSWORD = test

If you then make a query such as

1
SELECT * FROM odbc('DSN=gregtest;Servername=some-server.com', 'test_db');

ODBC driver will send values of USERNAME and PASSWORD from odbc.ini to some-server.com.

Example of Connecting PostgreSQL

一. 安装 unixODBC 和 the ODBC driver for PostgreSQL:

1
sudo apt-get install -y unixodbc odbcinst odbc-postgresql

二. 配置 /etc/odbc.ini (or ~/.odbc.ini):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[DEFAULT]
Driver = myconnection
[myconnection]
Description = PostgreSQL connection to my_db
Driver = PostgreSQL Unicode
Database = my_db
Servername = 127.0.0.1
UserName = username
Password = password
Port = 5432
Protocol = 9.3
ReadOnly = No
RowVersioning = No
ShowSystemTables = No
ConnSettings =

You may need to edit odbc.ini to specify the full path to the library with the driver DRIVER=/usr/local/lib/psqlodbcw.so.

三. 在clickhouse中配置字典项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<yandex>
<dictionary>
<name>table_name</name>
<source>
<odbc>
<!-- You can specify the following parameters in connection_string: -->
<!-- DSN=myconnection;UID=username;PWD=password;HOST=127.0.0.1;PORT=5432;DATABASE=my_db -->
<connection_string>DSN=myconnection</connection_string>
<table>postgresql_table</table>
</odbc>
</source>
<lifetime>
<min>300</min>
<max>360</max>
</lifetime>
<layout>
<hashed/>
</layout>
<structure>
<id>
<name>id</name>
</id>
<attribute>
<name>some_column</name>
<type>UInt64</type>
<null_value>0</null_value>
</attribute>
</structure>
</dictionary>
</yandex>

Example of Connecting MS SQL Server

  1. 安装驱动
  2. 配置驱动
  3. 设置字典配置

DBMS

MySQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<source>
<mysql>
<port>3306</port>
<user>clickhouse</user>
<password>qwerty</password>
<replica>
<host>example01-1</host>
<priority>1</priority>
</replica>
<replica>
<host>example01-2</host>
<priority>1</priority>
</replica>
<db>db_name</db>
<table>table_name</table>
<where>id=10</where>
<invalidate_query>SQL_QUERY</invalidate_query>
</mysql>
</source>

参数项:

  • port – The port on the MySQL server. You can specify it for all replicas, or for each one individually (inside ).
  • user – Name of the MySQL user. You can specify it for all replicas, or for each one individually (inside ).
  • password – Password of the MySQL user. You can specify it for all replicas, or for each one individually (inside ).
  • replica – Section of replica configurations. There can be multiple sections.
    • replica/host – The MySQL host.
    • replica/priority – The replica priority. When attempting to connect, ClickHouse traverses the replicas in order of priority. The lower the number, the higher the priority.
  • db – Name of the database.
  • table – Name of the table.
  • where – The selection criteria. Optional parameter.
  • invalidate_query – Query for checking the dictionary status. Optional parameter. Read more in the section Updating dictionaries.

MySQL can be connected on a local host via sockets. To do this, set host and socket

1
2
3
4
5
6
7
8
9
10
11
12
<source>
<mysql>
<host>localhost</host>
<socket>/path/to/socket/file.sock</socket>
<user>clickhouse</user>
<password>qwerty</password>
<db>db_name</db>
<table>table_name</table>
<where>id=10</where>
<invalidate_query>SQL_QUERY</invalidate_query>
</mysql>
</source>

ClickHouse

1
2
3
4
5
6
7
8
9
10
11
<source>
<clickhouse>
<host>example01-01-1</host>
<port>9000</port>
<user>default</user>
<password></password>
<db>default</db>
<table>ids</table>
<where>id=10</where>
</clickhouse>
</source>
  • host – The ClickHouse host. If it is a local host, the query is processed without any network activity. To improve fault tolerance, you can create a Distributed table and enter it in subsequent configurations.
  • port – The port on the ClickHouse server.
  • user – Name of the ClickHouse user.
  • password – Password of the ClickHouse user.
  • db – Name of the database.
  • table – Name of the table.
  • where – The selection criteria. May be omitted.
  • invalidate_query – Query for checking the dictionary status. Optional parameter. Read more in the section Updating dictionaries.

MongoDB

配置举例:

1
2
3
4
5
6
7
8
9
10
<source>
<mongodb>
<host>localhost</host>
<port>27017</port>
<user></user>
<password></password>
<db>test</db>
<collection>dictionary_source</collection>
</mongodb>
</source>

参数项

  • host – The MongoDB host.
  • port – The port on the MongoDB server.
  • user – Name of the MongoDB user.
  • password – Password of the MongoDB user.
  • db – Name of the database.
  • collection – Name of the collection.
坚持原创技术分享,您的支持将鼓励我继续创作!
分享