合 GreenPlum数据库日常维护运维(持续更新)
Tags: GreenPlum整理自网络脚本小麦苗常用DBA脚本脚本分享日常维护日常运维
- 日常运维查询命令
- 启动和关闭
- 数据库启动:gpstart
- 数据库停止:gpstop
- 总结
- 查看实例配置和状态
- 集群状态查询
- 查看集群主实例与镜像实例同步状态
- 负载查询
- 磁盘空间查询
- 查询数据库、表占用空间大小
- 查询数据库大小
- 表占用空间大小
- 索引占用空间大小
- 分区表大小
- 查询对象类型和个数
- 查看GreenPlum表的分布键和压缩类型
- 查看表数据分布情况(数据倾斜)
- 查看数据库数据分布情况(数据倾斜)
- 会话、锁信息管理
- 会话查询、慢查询SQL
- 锁查询
- 会话kill、停止会话、杀会话
- 登陆与退出Greenplum
- 参数查询
- 查询分布键
- 创建数据库
- 创建GP文件系统
- 创建GP表空间
- 删除GP数据库
- 查看GP日志
- OBJECT的DDL操作统计
- 队列查询
- gpfdist外部表
- gpload导入数据
- copy导出导入数据
- 执行sql文件
- 查询gp最后vacuum时间
- 磁盘空间不足
- 将查询结果导出到文件
- GreenPlum生成创建用户的DDL语句
- 生成创建约束和索引的DDL语句
日常运维查询命令
启动和关闭
数据库启动:gpstart
常用可选参数: -a : 直接启动,不提示终端用户输入确认
-m:只启动master 实例,主要在故障处理时使用
数据库停止:gpstop
常用可选参数:-a:直接停止,不提示终端用户输入确认
-m:只停止master 实例,与gpstart –m 对应使用
-M fast:停止数据库,中断所有数据库连接,回滚正在运行的事务
-u:不停止数据库,只加载pg_hba.conf 和postgresql.conf中运行时参数,当改动参数配置时候使用。
总结
1 2 3 4 5 6 7 8 9 10 11 12 | -- 启动 gpstart -a -- 关闭:默认情况下,如果有任何客户端连接存在,就不允许关闭Greenplum数据库。 使用-M fast选项可以在关闭前回滚所有正在进行中的事务并且中断所有连接。 gpstop -M fast -a -- 重载配置文件 gpstop -u -- 不用输入Y进行确定而直接重启 gpstop -M fast -ra |
查看实例配置和状态
1 | select * from gp_segment_configuration order by 1 ; |
主要字段说明:
Content:该字段相等的两个实例,是一对P(primary instance)和M(mirror Instance)
Isprimary:实例是否作为primary instance 运行
Valid:实例是否有效,如处于false 状态,则说明该实例已经down 掉。
Port:实例运行的端口
Datadir:实例对应的数据目录
集群状态查询
查询命令:gpstate
显示Greenplum数据库运行状态,详细配置等信息,该命令默认列出数据库运行状态汇总信息,常用于日常巡检。
常用可选参数:
-c:primary instance 和 mirror instance 的对应关系
-m:只列出mirror 实例的状态和配置信息
-f:显示standby master 的详细信息
-Q:显示状态综合信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | [gpadmin@mdw ~]$ gpstate -Q 20231010:14:52:52:3839029 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -Q 20231010:14:52:52:3839029 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.25.2 build commit:2ca3d98b43958f23744aa2023f3d87c249cb481c Open Source' 20231010:14:52:52:3839029 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.25.2 build commit:2ca3d98b43958f23744aa2023f3d87c249cb481c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Sep 12 2023 01:04:47' 20231010:14:52:52:3839029 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20231010:14:52:52:3839029 gpstate:mdw:gpadmin-[INFO]:--Quick Greenplum database status from Master instance only 20231010:14:52:52:3839029 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------------- 20231010:14:52:52:3839029 gpstate:mdw:gpadmin-[INFO]:-# of up segments, from configuration table = 16 20231010:14:52:52:3839029 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------------- [gpadmin@mdw ~]$ gpstate -e 20231010:14:52:57:3839083 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e 20231010:14:52:57:3839083 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.25.2 build commit:2ca3d98b43958f23744aa2023f3d87c249cb481c Open Source' 20231010:14:52:57:3839083 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.25.2 build commit:2ca3d98b43958f23744aa2023f3d87c249cb481c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Sep 12 2023 01:04:47' 20231010:14:52:57:3839083 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20231010:14:52:57:3839083 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments... . 20231010:14:52:59:3839083 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20231010:14:52:59:3839083 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report 20231010:14:52:59:3839083 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20231010:14:52:59:3839083 gpstate:mdw:gpadmin-[INFO]:-All segments are running normally [gpadmin@mdw ~]$ |
查看集群主实例与镜像实例同步状态
查询命令:gpstate -m
Status均为Passive,Data Status均为为Synchronized则正常,否则异常
负载查询
使用vmstat命令可以监控 CPU 使用、进程状态、内存使用、虚拟内存使用、硬盘输入/输出状态等信息,常使用方法为:vmstat [刷新延时 刷新次数]
也可以使用gpcc历史视图查询,参考:https://www.xmmup.com/greenplumguanfangjiankonggongjugpccjianjiejishiyong.html
磁盘空间查询
数据库查看使用量
1 2 3 4 5 | select dfsegment,dfhostname,dfdevice,round(dfspace/1024/1024,2) as free_disk_GB from gp_toolkit.gp_disk_free ORDER BY dfsegment ; -- dfspace 实例所在的文件系统的空闲磁盘空闲空间(千字节为单位) |
系统命令行查看(在此例中“host_list”为所有服务器节点清单,“data”为数据存放目录,根据实际目录文件名查询即可)
1 | gpssh -f host_list -e "df -h |grep data" |
查询数据库、表占用空间大小
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | select pg_size_pretty(pg_relation_size('schema.tablename')); select pg_size_pretty(pg_database_size('databasename')); -- 1.查看所有库大小 select sodddatname,sodddatsize/1024/1024 as db_size_MB from gp_toolkit.gp_size_of_database order by sodddatname; -- 2.查看库下面的所有schema占用的磁盘空间 SELECT t.table_catalog as db, n.nspname AS schemaname, pg_size_pretty(sum(pg_table_size ( '"' || nspname || '"."' || relname || '"' ))) AS table_size FROM pg_class C LEFT JOIN pg_namespace N ON ( N.oid = C.relnamespace ) left join information_schema.tables t on (n.nspname= t.table_schema and c.relname=t."table_name" ) WHERE nspname NOT IN ( 'information_schema' ) AND relkind in ('r','p') group by table_catalog,nspname; select sosdnsp,round(sosdschematablesize/1024/1024,2) schema_tb_size_MB,round(sosdschemaidxsize /1024/1024,2) as schema_idx_size_MB from gp_toolkit.gp_size_of_schema_disk; -- 单个schema大小 SELECT pg_size_pretty(sum(pg_total_relation_size('"' || nspname || '".' || relname))) AS total_size FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE nspname = 'public'; -- 3.查看schema下所有表和索引占用的磁盘空间 SELECT soatioid,soatischemaname,soatitablename,round(soatisize/1024/1024,2) as INDEX_SIZE_MB FROM gp_toolkit.gp_size_of_all_table_indexes order by INDEX_SIZE_MB desc,soatischemaname,soatitablename ; -- 4.查看表占用空间(表大小) select (pg_relation_size('public.tb_name')); select sotuoid, sotuschemaname,sotutablename,round(cast(sotusize as integer )/1024/1024,2) as table_SIZE_MB from gp_toolkit.gp_size_of_table_uncompressed order by sotuschemaname, table_SIZE_MB desc,sotutablename -- 5.查看膨胀表 select * from gp_toolkit.gp_bloat_diag order by bdinspname,(bdirelpages-bdiexppages)desc,bdirelname; -- 没有统计信息且可能需要ANALYZE的表。 select * from gp_toolkit.gp_stats_missing order by smischema,smicols desc; |
必须在数据库所对应的存储系统里,至少保留30%的自由空间,日常巡检,要检查存储空间的剩余容量。
评:可以查看任何数据库对象的占用空间,pg_size_pretty可以显示如mb之类的易读数据,另外,可以与pg_tables,pg_indexes之类的系统表链接,统计出各类关于数据库对象的空间信息。
查询数据库大小
参考:https://www.xmmup.com/greenplumchaxunshujukudaxiao.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | SELECT d.datname as "Name", pg_catalog.pg_get_userbyid(d.datdba) as "Owner", pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding", d.datcollate as "Collate", d.datctype as "Ctype", pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges", CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname)) ELSE 'No Access' END as "Size", t.spcname as "Tablespace", pg_catalog.shobj_description(d.oid, 'pg_database') as "Description" FROM pg_catalog.pg_database d JOIN pg_catalog.pg_tablespace t on d.dattablespace = t.oid UNION ALL SELECT '总计' as "Name", '' as "Owner", '' as "Encoding", '' as "Collate", '' as "Ctype", '' AS "Access privileges", pg_catalog.pg_size_pretty(SUM((pg_catalog.pg_database_size(d.datname)))) as "Size", '' as "Tablespace", '' as "Description" FROM pg_catalog.pg_database d JOIN pg_catalog.pg_tablespace t on d.dattablespace = t.oid ORDER BY 1; -- 或 select sodddatname,sodddatsize/1024/1024 as db_size_MB from gp_toolkit.gp_size_of_database d order by d.sodddatname ; -- 或 \l+ |
表占用空间大小
1 2 3 4 5 6 7 8 9 | select pg_size_pretty(pg_total_relation_size('test')); select pg_size_pretty(pg_relation_size('test')); select pg_size_pretty(pg_table_size('test')); select pg_size_pretty(pg_indexes_size('test')); SELECT relname as name, sotdsize/1024/1024 as size_MB, sotdtoastsize as toast, sotdadditionalsize as other FROM gp_toolkit.gp_size_of_table_disk as sotd, pg_class WHERE sotd.sotdoid = pg_class.oid ORDER BY relname; |
索引占用空间大小
1 2 3 4 5 6 7 8 9 | select pg_size_pretty(pg_total_relation_size('test')); select pg_size_pretty(pg_relation_size('test')); select pg_size_pretty(pg_table_size('test')); select pg_size_pretty(pg_indexes_size('test')); SELECT soisize/1024/1024 as size_MB, relname as indexname FROM pg_class, gp_toolkit.gp_size_of_index WHERE pg_class.oid = gp_size_of_index.soioid AND pg_class.relkind='i'; |
分区表大小
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | -- 分区表信息和大小 select * from ( SELECT pt.schemaname,pt.tablename,pt.partitiontype,pt.partitionschemaname,pt.partitiontablename,pt.partitionname,pt.partitionposition||'' partitionposition,pt.partitionlistvalues,pt.partitionlevel||'' partitionlevel,pt.partitionboundary, pg_size_pretty( pg_total_relation_size('"' || pt.partitionschemaname || '"."' || pt.partitiontablename || '"')) pt_tb_size, pg_size_pretty( pg_indexes_size('"' || pt.partitionschemaname || '"."' || pt.partitiontablename || '"')) pt_index_size FROM pg_partitions pt union ALL SELECT pt.schemaname,pt.tablename,pt.partitiontype,'','','','','','','', pg_size_pretty(sum(pg_total_relation_size('"' || pt.partitionschemaname || '"."' || pt.partitiontablename || '"'))) pt_tb_size, pg_size_pretty(sum(pg_indexes_size('"' || pt.partitionschemaname || '"."' || pt.partitiontablename || '"'))) pt_index_size FROM pg_partitions pt GROUP BY pt.schemaname,pt.tablename,pt.partitiontype ) aa WHERE aa.tablename='log_month' order by aa.partitionname ; |
查询对象类型和个数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | select nsp.nspname as SchemaName ,case cls.relkind when 'r' then 'TABLE' when 'm' then 'MATERIALIZED_VIEW' when 'i' then 'INDEX' when 'S' then 'SEQUENCE' when 'v' then 'VIEW' when 'c' then 'composite type' when 't' then 'TOAST' when 'f' then 'foreign table' when 'p' then 'partitioned_table' when 'I' then 'partitioned_index' else cls.relkind::text end as ObjectType, COUNT(*) cnt from pg_class cls join pg_namespace nsp on nsp.oid = cls.relnamespace where nsp.nspname not in ('information_schema', 'pg_catalog','gp_toolkit') and nsp.nspname not like 'pg_toast%' GROUP BY nsp.nspname,cls.relkind UNION all SELECT n.nspname as "Schema", 'pg_proc' as "pg_proc", COUNT(*) cnt FROM pg_catalog.pg_proc p LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace WHERE pg_catalog.pg_function_is_visible(p.oid) AND n.nspname not in ('information_schema', 'pg_catalog','gp_toolkit') GROUP BY n.nspname order by SchemaName,ObjectType; |
查看GreenPlum表的分布键和压缩类型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | SELECT n.nspname AS "schema", C.relname AS "tb_name", CASE WHEN P.policytype = 'p' THEN '分区策略' WHEN P.policytype = 'r' THEN '复制策略' END "policytype", P.numsegments, P.distkey, pg_get_table_distributedby ( C.oid ) "distributed_key", C.reloptions, case c.relstorage when 'a' then ' append-optimized' when 'c' then 'column-oriented' when 'h' then 'heap' when 'v' then 'virtual' when 'x' then 'external table' end as "data_storage_mode" FROM pg_class C LEFT JOIN gp_distribution_policy P ON C.oid = P.localoid LEFT JOIN pg_namespace n ON C.relnamespace = n.oid where n.nspname NOT IN ( 'pg_catalog', 'pg_toast','information_schema','gp_toolkit' ) and C.relkind in ('r','m') and P.policytype is not null -- and C.relname='inp_nter' order by n.nspname,C.relname; |
分布键的选择参数gp_create_table_random_default_distribution
参考:https://www.xmmup.com/greenplumchuangjianheguanlibiao.html#can_shugp_create_table_random_default_distribution
注意:若使用Navicat导出GreenPlum的表结构时,其建表语句会缺失最后的分布键DISTRIBUTED BY部分。若在新环境建表,则一般取第1列作为分布键,若该表最后有主键语句,容易导致“ERROR: PRIMARY KEY must contain all columns in the distribution key of relation "inp_aa"”的错误。解决办法:建表语句加上DISTRIBUTED BY语句,或使用pg_dump语句导出表结构。
1234567891011121314 DROP TABLE IF EXISTS "public"."inp_aa";CREATE TABLE "public"."inp_aa" ("admitted_at" timestamp(6),"enter_id" varchar(64) COLLATE "pg_catalog"."default" NOT NULL,"sys_id" varchar(64) COLLATE "pg_catalog"."default","hos_soid" varchar(64) COLLATE "pg_catalog"."default" NOT NULL,"encoupe_name" varchar(128) COLLATE "pg_catalog"."default" NOT NULL,"inpatg_seq_no" varchar(64) COLLATE "pg_catalog"."default","encoause" varchar(1024) COLLATE "pg_catalog"."default") -- DISTRIBUTED BY (enter_id, sys_id, hos_soid);ALTER TABLE "public"."inp_aa" ADD CONSTRAINT "pk_c11012001" PRIMARY KEY ("enter_id", "sys_id", "hos_soid");
查看表数据分布情况(数据倾斜)
参考:https://www.xmmup.com/greenplumxingnengdiaoyouzhishujuqingxie.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | -- 用count(*)方式计算每个segment上的记录数,慢,不建议 select gp_segment_id, count(*) from table_name group by 1 order by 1; -- 计算一张表在不同segment上所占空间来评估是否发生数据倾斜,推荐 select gp_segment_id, pg_size_pretty(pg_relation_size('table_name')) from gp_dist_random('gp_id') order by 1; -- 真实大小 select pg_size_pretty(sum(tbsize)) from ( select gp_segment_id, (pg_relation_size('table_name')) tbsize from gp_dist_random('gp_id') order by 1) a; -- 该视图通过计算存储在每个Segment上的数据的变异系数(CV)来显示数据分布倾斜。 select * from gp_toolkit.gp_skew_coefficients; -- 该视图通过计算在表扫描过程中系统空闲的百分比来显示数据分布倾斜,这是一种数据处理倾斜的指示器。 select * from gp_toolkit.gp_skew_idle_fractions; 在命令运行: gpskew -t public.ate -a postgres -- 如果表的分布键合理,直接重分布数据: alter table table_name set with(reorganize=true); -- 指定分布键重新平衡 alter table table_name set distributed by (col1,col2...); -- 随机分布 alter table bi.fat_zy_cwsyqk set distributed randomly; -- 复制表 CREATE TABLE t2 (id int) DISTRIBUTED REPLICATED; create table t1 as select * from xxxx DISTRIBUTED REPLICATED; |
GPDB要保障数据分布均匀。如数据分布不均匀,将发挥不了并行计算的优势,严重影响性能。
查看数据库数据分布情况(数据倾斜)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | -- 库是否有倾斜 select b.hostname, a.gp_segment_id,pg_size_pretty(pg_catalog.pg_database_size('postgres')) from gp_dist_random('gp_id') a, gp_segment_configuration b where role='p' and a.gp_segment_id=b.content order by 1 ,2; select gp_segment_id,pg_size_pretty(pg_catalog.pg_database_size('DW')) from gp_dist_random('gp_id') order by 1; -- 所有数据库的分步情况 DROP TABLE IF EXISTS temp_db_skew_sizes; CREATE TEMPORARY TABLE IF NOT EXISTS temp_db_skew_sizes ( db_name text, hostname text, segment_id int, db_size text ); DO $$ DECLARE db_rec RECORD; BEGIN FOR db_rec IN SELECT datname FROM pg_database WHERE datistemplate = false -- 遍历所有非模板数据库 LOOP INSERT INTO temp_db_skew_sizes (hostname, segment_id, db_name, db_size) SELECT b.hostname, a.gp_segment_id, db_rec.datname, pg_size_pretty(pg_catalog.pg_database_size(db_rec.datname)) FROM gp_dist_random('gp_id') a, gp_segment_configuration b WHERE role='p' AND a.gp_segment_id=b.content; END LOOP; END $$; -- 查询临时表中的结果,包括主机名、segment ID、数据库名和数据库大小 SELECT db_name,hostname, segment_id, db_size FROM temp_db_sizes ORDER BY db_name,hostname, segment_id; |