合 GreenPlum数据库日常维护运维(持续更新)
Tags: GreenPlum整理自网络脚本小麦苗常用DBA脚本脚本分享日常维护日常运维
释放高水位的堆表,也叫表膨胀
高水位定义:表中没有几行数据,但是表占用空间很大的表就是有高水位的表。对于DML操作很频繁的表,非常容易出现表膨胀。
例如,表只有10行数据,但是却占用了100MB大小,那就是高水位的表;
表有1000万的数据,但是占用了100GB大小,那就是有高水位的表。
确认办法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | -- 查询表的大小 select pg_size_pretty(pg_total_relation_size('test')); -- 查询表的行数 select count(*) from test; -- 也可以通过如下SQL判断是否表膨胀 SELECT d.bdirelid 表OID, d.bdinspname 模式名, d.bdirelname 表名, d.bdiexppages 期望页数, d.bdirelpages 实际页数 , pg_size_pretty(d.bdiexppages * (current_setting('block_size')::bigint)) AS real_size, pg_size_pretty(d.bdirelpages * (current_setting('block_size')::bigint)) AS all_size, pg_size_pretty(pg_total_relation_size(bdinspname||'.'||bdirelname)) 表大小, CASE WHEN d.bdidiag = 'significant amount of bloat suspected' THEN'严重膨胀' WHEN d.bdidiag = 'moderate amount of bloat suspected' THEN'中度膨胀' END AS 膨胀状态 , 'vacuum full '||bdinspname||'.'||bdirelname||';' vacuums, 'analyze ' || d.bdinspname || '.' || d.bdirelname||';' anaylze_tb FROM gp_toolkit.gp_bloat_diag d WHERE d.bdirelid in (select oid from pg_class) and (d.bdirelpages * (current_setting('block_size')::bigint)) >=1*1024*1024*1024 -- and pg_total_relation_size(bdinspname||'.'||bdirelname) >=1*1024*1024*1024 ORDER BY bdinspname, ( bdirelpages - bdiexppages ) DESC, bdirelname; -- 统计信息需要准确 SELECT schemaname||'.'||relname as table_name, pg_size_pretty(pg_table_size(schemaname||'.'||relname)) as table_size, pg_size_pretty(pg_indexes_size(schemaname||'.'||relname)) as index_size, pg_size_pretty(pg_total_relation_size(schemaname||'.'||relname)) as all_size, pg_size_pretty(pg_table_size(schemaname||'.'||relname) * (n_dead_tup * 100 / (n_live_tup + n_dead_tup))/100) as swell_size, n_dead_tup, n_live_tup, round(n_dead_tup * 100 / (n_live_tup + n_dead_tup),2) AS dead_tup_ratio, 'vacuum full '||schemaname||'.'||relname||';' vacuums, 'analyze ' || a.schemaname || '.' || a.relname||';' anaylze_tb FROM pg_stat_all_tables a WHERE n_dead_tup >= 1000 and pg_table_size(schemaname||'.'||relname) >=1*1024*1024*1024 and round(n_dead_tup * 100 / (n_live_tup + n_dead_tup),2) > 10 and pg_table_size(schemaname||'.'||relname) * (n_dead_tup * 100 / (n_live_tup + n_dead_tup))/100 > 1*1024*1024*1024 ORDER BY pg_table_size(schemaname||'.'||relname) * (n_dead_tup * 100 / (n_live_tup + n_dead_tup))/100 desc,dead_tup_ratio DESC LIMIT 50; |
释放高水位的方法,如果是空表,则直接做truncate操作。
1 | truncate table test; |
如果表中还有数据,则需要做vacuum操作:
1 | vacuum full test; |
注意:
1、执行 vacuum full会锁表,就连最基本的SELECT也会阻塞
2、对于大表不要随意使用 vacuum full,或者在业务空闲的时候使用
3、对系统表禁止做vacuum full,各个开发人员只做自己创建的表,没见过的表不要做
查询膨胀的AO表
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | -- AO表查询膨胀信息 select tb_name, pg_size_pretty(pg_table_size(tb_name)) as table_size, pg_size_pretty(pg_indexes_size(tb_name)) as index_size, pg_size_pretty(pg_total_relation_size(tb_name)) as all_size, pg_size_pretty(pg_table_size(tb_name)*avg(percent_hidden)/100) as swell_size, sum(hidden_tupcount) hidden_tupcount, sum(total_tupcount) total_tupcount, round(avg(percent_hidden),2) percent_hidden from ( select t2.nspname||'.'||t1.relname tb_name, ( gp_toolkit.__gp_aovisimap_compaction_info ( t1.oid ) ).* from pg_class t1, pg_namespace t2 where t1.relnamespace = t2.oid and relstorage in ( 'c', 'a' ) ) t where t.percent_hidden>10 and t.compaction_possible='t' GROUP BY tb_name order by pg_table_size(tb_name)*avg(percent_hidden)/100 desc limit 50; -- select * from gp_toolkit.__gp_aovisimap_compaction_info('bi.abc'::regclass); |
对于AO表,执行vacuum即可释放膨胀空间。
特大表需要创建索引
在GreenPlum中,使用如下SQL查询出来表很大且不存在高水位,这类表需要自己创建索引,不然每次全表扫,肯定慢,这不是数据库的问题。。。表多大才建立索引???没有统一规定,一般情况,超过10G,且查询只需要返回很小的一部分数据,则需要创建索引。
1 2 3 4 5 6 7 8 9 10 11 | -- 查询大小 select pg_size_pretty(pg_total_relation_size('test')); -- 收集统计信息 vacuum (verbose,ANALYZE) hlht_ba_syjbk; -- 查看表的列分步情况 select * from pg_stats a where a.tablename='test' order by a.n_distinct desc; -- 根据where条件来判断,(n_distinct列较多的值建议创建索引且作为引导列) create INDEX idx_xxx on test(列1,列2); |
创建索引的列遵循规则:
1、大表(大于10G)且不同值很多的列适合创建索引,例如,主键列
2、小表不要建立索引,会导致DML慢
3、 经常在where条件中的列适合索引
查询索引膨胀的表
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | SELECT schemaname, tablename, iname, cols, ind_avg_len, ituples::bigint, ipages::bigint, iotta, internal, leaf, ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages, CASE WHEN ipages < iotta THEN pg_size_pretty(0::bigint) ELSE pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize FROM (SELECT schemaname, tablename, bs, iname, cols, ind_avg_len, ituples, ipages, leaf + round(leaf/((bs::float-40)*0.9/(ind_avg_len+4))) AS leaf, round(leaf/((bs::float-40)*0.9/(ind_avg_len+4))) AS hk, ceil (leaf / ceil((bs::float-40)*0.7/(ind_avg_len + 4))) + 1 AS internal, leaf + round(leaf/((bs::float-40)*0.9/(ind_avg_len+4))) + ceil (leaf / ceil((bs::float-40)*0.7/(ind_avg_len + 4))) + 2 AS iotta FROM (SELECT schemaname, tablename, bs, iname, cols, ind_avg_len, ituples, ipages, CEIL((ituples*(ind_avg_len + 4))/((bs::float-40)*0.9)) AS leaf FROM (SELECT bs, rs.schemaname, rs.tablename, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, rs.cols, CASE WHEN sum(s.avg_width+ma-s.avg_width%ma) < 16 THEN 16 ELSE sum(s.avg_width+ma-s.avg_width%ma) +8 END AS ind_avg_len FROM (SELECT (SELECT current_setting('block_size')::numeric) AS bs, 4 AS ma, nn.nspname AS schemaname, c1.relname AS tablename, i.indexrelid, array_agg(a.attname ORDER BY s.i) AS cols FROM pg_index i LEFT JOIN pg_class c1 ON i.indrelid = c1.oid JOIN LATERAL unnest(i.indkey) WITH ORDINALITY AS s(attnum, i) ON TRUE LEFT JOIN pg_attribute a ON a.attnum = s.attnum AND a.attrelid = i.indrelid LEFT JOIN pg_namespace nn ON c1.relnamespace = nn.oid WHERE i.indisvalid = TRUE AND nn.nspname NOT IN ('pg_toast', 'pg_catalog') GROUP BY bs, ma, nn.nspname, c1.relname, i.indexrelid) AS rs LEFT JOIN pg_class c2 ON c2.oid = rs.indexrelid JOIN pg_stats s ON s.tablename = rs.tablename AND s.attname = any(rs.cols) GROUP BY bs, rs.schemaname, rs.tablename, c2.relname, c2.reltuples, c2.relpages, rs.cols) AS smx) AS sml) RESULT ORDER BY wastedipages DESC; |