
Contents
- ヒープ作成だけで時間かかり、ソートは取り止め
- 実行環境とストアド関数のソース
- 結果
- そもそもの動機と、今後
- 蛇足だけど
R で処理時間の増加を非線型回帰してみた
ヒープ作成だけで時間かかり、ソートは取り止め
とりあえずJavaScript
実行環境とストアド関数のソース
WindowsCREATE OR REPLACE FUNCTION test_array2heap(IN len int, IN print int) RETURNS setof void LANGUAGE plpgsql IMMUTABLE AS $BODY$ /* -- e.g. SELECT test_array2heap(10000, 10); */ DECLARE ord int; ord2 int; below int; below_r int; vc int; -- value current vb int; -- value below vr int; -- value below right ary int[]; heap int[]; ts timestamp[]; BEGIN ary = array_agg((random() * len) :: int) FROM generate_series(1, len); /* slow FOR i IN 1..len LOOP ary[i] = (random() * len) :: int; END LOOP; */ ts[1] = clock_timestamp(); heap = ary; ord = len / 2; WHILE ord > 0 LOOP ord2 = ord; WHILE ord2 <= len LOOP below = 2 * ord2; IF len < below THEN EXIT; END IF; below_r = below + 1; vc = heap[ord2]; vb = heap[below]; IF below < len THEN vr = heap[below_r]; IF vb < vr THEN below = below_r; vb = vr; END IF; END IF; IF vc < vb THEN heap[ord2] = vb; heap[below] = vc; END IF; ord2 = below; END LOOP; ord = ord - 1; END LOOP; ts[2] = clock_timestamp(); -- Check RAISE INFO 'array %,...', left(ary[1:print] :: text, -1); RAISE INFO 'heap %,...', left(heap[1:print] :: text, -1); RAISE INFO 'time % ms', extract(milliseconds FROM ts[2] - ts[1]); RAISE INFO ''; FOR i IN 1..len / 2 LOOP IF heap[i] < heap[2 * i] OR (heap[2 * i + 1] IS NOT NULL AND heap[i] < heap[2 * i + 1]) THEN RAISE 'at %', i; END IF; END LOOP; END; $BODY$;
今回、元の配列をいったんヒープ用の配列にコピーしてから入れ替えてますが、コピーせず元配列を直接入れ替えても、所要時間はほとんど変わりませんでした。
結果
まず配列の要素数が
agg_test=# SELECT test_array2heap(10000, 10); INFO: time 125 ms agg_test=# SELECT test_array2heap(10000, 10); INFO: time 109 ms agg_test=# SELECT test_array2heap(10000, 10); INFO: time 124 ms agg_test=# SELECT test_array2heap(10000, 10); INFO: time 109 ms agg_test=# SELECT test_array2heap(10000, 10); INFO: time 109 ms
↓ 要素数を

agg_test=# SELECT test_array2heap(25000, 10); INFO: time 1498 ms agg_test=# SELECT test_array2heap(25000, 10); INFO: time 1498 ms agg_test=# SELECT test_array2heap(25000, 10); INFO: time 1529 ms agg_test=# SELECT test_array2heap(25000, 10); INFO: time 1498 ms agg_test=# SELECT test_array2heap(25000, 10); INFO: time 1513 ms
↓ 要素数

agg_test=# SELECT test_array2heap(50000, 10); INFO: time 9547 ms agg_test=# SELECT test_array2heap(50000, 10); INFO: time 7660 ms agg_test=# SELECT test_array2heap(50000, 10); INFO: time 7691 ms agg_test=# SELECT test_array2heap(50000, 10); INFO: time 7706 ms agg_test=# SELECT test_array2heap(50000, 10); INFO: time 7707 ms
↓ 最後に要素数

agg_test=# SELECT test_array2heap(100000, 10); INFO: time 25132 ms agg_test=# SELECT test_array2heap(100000, 10); INFO: time 10312 ms agg_test=# SELECT test_array2heap(100000, 10); INFO: time 10437 ms agg_test=# SELECT test_array2heap(100000, 10); INFO: time 10327 ms agg_test=# SELECT test_array2heap(100000, 10); INFO: time 10358 ms
そもそもの動機と、今後
PostgreSQL 9.4蛇足だけど R で処理時間の増加を非線型回帰してみた
今回試した要素数
len = c(1, 2.5, 5, 10) * 10^4 dat_init = data.frame( ary_length = len, times = c(125, 1498, 9547, 25132)) dat = data.frame( ary_length = rep(len, each=4), times = c( 109, 124, 109, 109, 1498, 1529, 1498, 1513, 7660, 7691, 7706, 7707, 10312, 10437, 10327, 10358)) nls = nls(times ~ SSlogis(ary_length, Asym, xmid, scal), dat) prm = nls$m$getPars() fun = function(x) { prm[1] / (1 + exp((prm[2] - x) / prm[3])) } # plot points par(cex=0.8, plt=c(15, 98, 12, 98)/100) plot(dat_init$ary_length, dat_init$times, col='red', xlim=c(0, max(len)), xlab='ary_length', ylab='times (milliseconds)') points(dat$ary_length, dat$times) # plot logistic curve usr = par('usr') curve(add=T, col='red', expr=fun, from=usr[1], to=usr[2]) # plot summary options(show.signif.stars=F) str = paste(collapse='\n', capture.output(summary(nls))) text(adj=c(0, 1), x=0, y=max(dat_init$times), lab=str)