値のレンジ(最小~最大)を任意の区間数に等分割して各度数を出します。当初、範囲型を使わずウィンドウ関数の
create or replace function numarray_frq ( vals numeric[], num_levels int) returns table ( level int, range numrange, frequency int) language sql immutable as $$ with your_data (v) as ( -- select random() :: numeric from generate_series(1, 100) -- ex select unnest($1) ), your_num_levels (n) as ( -- select 10 -- ex select $2 ), min_max as ( select min(v), max(v) from your_data ), levels (level, range) as ( select level, case when level = n then numrange(break, max, '[]') else numrange(break, lead(break) over w) end as range from min_max, your_num_levels, cast(max - min as float8) as range, generate_series(1, n) as level, cast(min + range * (level - 1) / n as numeric) as break window w as (order by level) ) select level, range, count(v) :: int from your_data right join levels on v :: numeric <@ range group by level, range order by level; $$;
ストアドと言いながら、コメントの
↓ 簡単なテストで、1
-- usage example 1 select rec.* from ( select array_agg(v) from generate_series(1, 100) as v ) as x, numarray_frq(array_agg, 10) as rec; +-------+-------------+-----------+ | level | range | frequency | +-------+-------------+-----------+ | 1 | [1,10.9) | 10 | | 2 | [10.9,20.8) | 10 | | 3 | [20.8,30.7) | 10 | | 4 | [30.7,40.6) | 10 | | 5 | [40.6,50.5) | 10 | | 6 | [50.5,60.4) | 10 | | 7 | [60.4,70.3) | 10 | | 8 | [70.3,80.2) | 10 | | 9 | [80.2,90.1) | 10 | | 10 | [90.1,100] | 10 | +-------+-------------+-----------+ (10 rows)
↓ もう一つテスト。Box-Muller transform
-- usage example 2 select rec.* from ( select array_agg(sqrt(-2 * ln(random())) * sin(2 * pi() * random())) from generate_series(1, 1000) -- normally distributed random numbers -- based on Box-Muller transform. ) as x (val), numarray_frq(val :: numeric[], 10) as rec ; +-------+----------------------------------------+-----------+ | level | range | frequency | +-------+----------------------------------------+-----------+ | 1 | [-3.53367052356323,-2.88123888236791) | 2 | | 2 | [-2.88123888236791,-2.22880724117258) | 15 | | 3 | [-2.22880724117258,-1.57637559997726) | 51 | | 4 | [-1.57637559997726,-0.923943958781934) | 113 | | 5 | [-0.923943958781934,-0.27151231758661) | 205 | | 6 | [-0.27151231758661,0.380919323608714) | 251 | | 7 | [0.380919323608714,1.03335096480404) | 209 | | 8 | [1.03335096480404,1.68578260599936) | 105 | | 9 | [1.68578260599936,2.33821424719469) | 44 | | 10 | [2.33821424719469,2.99064588839001] | 5 | +-------+----------------------------------------+-----------+ (10 rows)
以下、備忘録です。
• 集約関数として作ることも考えたけど、複数の引数を受け取って(数値の列と区分数)面倒になるため、やめた
• 区分数のデフォルトか、自動的に決めるオプションを今後検討
• 度数の代わりに簡易なヒストグラム(ascii plot)にしてもいいかも。翌日の記事で追加しました。