バイナリデータを読む時は仕様と突き合わせてコツコツやりますが、今日は息抜きを兼ね、ざっくり読む例のテンプレート。32×32ピクセルのicoファイルから16色ビットマップを抽出し、HTMLに出力して確認。使ったPostgreSQLは9.5.3です。

Contents

ファイルへのハードリンクをPGDATA下に置いて読み込み
16色のカラーテーブル、透過部分を抽出
ビットマップ、カラーテーブル、透過部分を結合してHTML出力

サンプルICOとして、EXEファイル等からアイコンを抜き出すフリーウェア ↓ のアイコンを使います。これ、コマンドラインでも使えて結構便利。

» IconsExtract - Extract icon-cursor stored in EXE, DLL, OCX, CPL files

↓ こんなアイコン。Windows 7のExplorerのプロパティでは16×16ピクセルと出ましたが、実際は32×32のビットマップもあり。今回はこちらを読みます。

ファイルへのハードリンクをPGDATA下に置いて読み込み

PostgreSQLにはバイナリを読めるpg_read_binary_file関数があり、DBのデータディレクトリ（PGDATA）にあるファイルのみ対象。で今回、アイコン自体は移さずハードリンクを作って読み込みます。シンボリックリンクでは中身を読めません。

Windowsネイティブのpsqlでやると ↓ こんな感じ。メタコマンドcdでPGDATAに移ったのは、作るハードリンク（ここではtesticon）のパスを省くため。またバイナリだけ表示するため、\tと\psetで表示設定を変更してます。

-- set current directory to postgres data
# \cd %PGDATA%

-- create hardlink to icon file (not symbolic link)
# \! mklink /h testicon "d:/windows_101.ico"
Hardlink created for testicon <<===>> d:/windows_101.ico 

-- output response only
# \t on \\ \pset format unaligned \\ \pset footer off
# select pg_read_binary_file('testicon');

\xに続いて16進数がずらずらと出てきました。これを適当な長さで1行ずつに区切り、ビットマップらしき部分を見つけます。バイナリを文字型にキャストし、substr関数で先頭の\xを除き、regexp_matchesで1行の長さをいろいろ変えて試した結果、ちょうど32字ずつで何か発見。↓

# select (regexp_matches(bin, '.{32}', 'g'))[1]
  from substr(pg_read_binary_file('testicon') :: text, 3) as bin;

ビットマップ先頭までのオフセットの見当を付けるのには、substr関数の第2引数を適当に変更。また第3引数がビットマップの長さになり、32×32と仮定して進めます。迷ったらウェブでicoの仕様を調査、例えば ↓ などで。ただICOは色々バリエーションがあるらしく、微妙に違うところもありました。

» www.daubnet.com/en/file-format-ico

試行錯誤の結果32×32のビットマップと思しき部分を決めたら、行番号を付けてテーブル化します。形を見ると上下が逆なので、行番号を逆順に。↓ クエリと、結果のテーブルを表示したところ。

# create table icon_pixels as
  select 32 + 1 - row_number() over() as nrow, reg[1] as pixel_lines 
  from pg_read_binary_file('testicon') as bin,
      substr(bin :: text, 1 + 2 * 143, 32 * 32) as str,
      regexp_matches(str, concat('.{32}'), 'g') as reg;

# \t off \\ \pset format aligned \\ \pset border 2
# select * from icon_pixels order by nrow;

16色のカラーテーブル、透過部分を抽出

上のとおり1ピクセルが1つの16進数なので、0～Fに対応するカラーテーブルがどこかにあるはず。ICOの仕様（先ほどのリンク等）を見ると、その部分は「色6バイト+00」が16個並んでいるらしい。で最終的に確定したのが ↓ こちら。なぜかRGBでなくBlue, Green, Redの順になっており、regexp_replace関数で置換しました。

# create table icon_colors as
  select to_hex(row_number() over() - 1) as hex,
      regexp_replace(reg[1], '(.{2})(.{2})(.{2})', '\3\2\1') as rgb 
      -- bgr -> rgb
  from pg_read_binary_file('testicon') as bin,
      right(substr(bin :: text, 3, 2 * 142), 16 * 8) as str,
      regexp_matches(str, '(.{6})00', 'g') as reg;

# select * from icon_colors order by hex;
+-----+--------+
| hex |  rgb   |
+-----+--------+
| 0   | 000000 |
| 1   | 800000 |
| 2   | 008000 |
| 3   | 808000 |
| 4   | 000080 |
| 5   | 800080 |
| 6   | 008080 |
| 7   | c0c0c0 |
| 8   | 808080 |
| 9   | ff0000 |
| a   | 00ff00 |
| b   | ffff00 |
| c   | 0000ff |
| d   | ff00ff |
| e   | 00ffff |
| f   | ffffff |
+-----+--------+

さらにアイコンの透過部分を抽出します。仕様によると2ビット（0か1）がピクセル数の分だけ並んでいるらしいので、それっぽい部分を探索。だいたい端の方（今回の場合は上と下）が透過になると見当を付け、結果は ↓ こう。1が透過みたいです。

# create table icon_trans as
  select 32 + 1 - row_number() over() as nrow,
      concat('x', reg[1]) :: bit(32) :: text as trans_lines 
  from regexp_matches(
       substr(pg_read_binary_file('testicon') :: text,
          1 + 2 * 143 + 32 * 32, 32 * 32 / 4),
      '.{8}', 'g') as reg;

# select * from icon_trans order by nrow;

ビットマップ、カラーテーブル、透過部分を結合してHTML出力

元アイコンを構成する3つの情報をテーブル化したので、同じ構造（32×32）のビットマップと透過部分を突き合わせ。行番号で結合して並べてみました。↓

# select * from icon_pixels natural join icon_trans;
+------+----------------------------------+----------------------------------+
| nrow |           pixel_lines            |           transp_lines           |
+------+----------------------------------+----------------------------------+
|    1 | 00000000000000000000000000000000 | 11111111111111111111111111111111 |
|    2 | 00000000000000000000000000000000 | 11111111111111111111111111111111 |
|    3 | 88888888888888888888888888888880 | 00000000000000000000000000000000 |
|    4 | 87777777777777777777777777777780 | 00000000000000000000000000000000 |
|    5 | 87444444444444444444444444444480 | 00000000000000000000000000000000 |
|    6 | 87444444444444444444477077077080 | 00000000000000000000000000000000 |
|    7 | 87444444444444444444477077077080 | 00000000000000000000000000000000 |
|    8 | 87444444444444444444400000000080 | 00000000000000000000000000000000 |
|    9 | 87777777777777777777777777777780 | 00000000000000000000000000000000 |
...

この各行を1字（=1ピクセル）ずつ結合し、カラーテーブルも結合すると ↓ こんな感じ。各行（32字）を1字ずつに分割する際にregexp_split_to_table関数を使い、第2引数は空文字。これで1ピクセルが1行ずつになり、row_numberを元の行番号（nrow）ずつ振って列番号（ncol）にします。

# with lines (nrow, hex, tr) as (
    select nrow,
        regexp_split_to_table(pixel_lines, ''),
        regexp_split_to_table(trans_lines, '')
    from icon_pixels natural join icon_trans
)
select *, row_number() over(partition by nrow) as ncol 
from lines natural join icon_colors;
+-----+------+----+--------+------+
| hex | nrow | tr |  rgb   | ncol |
+-----+------+----+--------+------+
| 0   |    1 | 1  | 000000 |    1 |
| 0   |    1 | 1  | 000000 |    2 |
| 0   |    1 | 1  | 000000 |    3 |
| 0   |    1 | 1  | 000000 |    4 |
| 0   |    1 | 1  | 000000 |    5 |
| 0   |    1 | 1  | 000000 |    6 |
...
| 0   |    6 | 0  | 000000 |   27 |
| 7   |    6 | 0  | c0c0c0 |   28 |
| 7   |    6 | 0  | c0c0c0 |   29 |
| 0   |    6 | 0  | 000000 |   30 |
| 8   |    6 | 0  | 808080 |   31 |
| 0   |    6 | 0  | 000000 |   32 |
| 8   |    7 | 0  | 808080 |    1 |
| 7   |    7 | 0  | c0c0c0 |    2 |
| 4   |    7 | 0  | 000080 |    3 |
| 4   |    7 | 0  | 000080 |    4 |
...

これで元アイコンの情報が一応そろいました。後はどういう形で可視化してもいいですが、今回は簡単にHTML出力します。上のクエリ結果を元に、透過（列tr）が1なら空、それ以外はrgbを文字色に指定し、ブロック要素（ █ ）を文字参照で。こうするとASCIIだけのクエリになり、クライアントエンコーディングを気にせず済みます。

↓ 最終形。クエリ前に表示設定を「データだけ」にし、クエリ後にメタコマンド「\gファイルパス」を打って出力してます。

# \t on \\ \pset format unaligned \\ \pset footer off
# with lines (nrow, hex, tr) as (
    select nrow,
        regexp_split_to_table(pixel_lines, ''),
        regexp_split_to_table(trans_lines, '') :: int
    from icon_pixels natural join icon_trans
    order by nrow
), pixels (nrow, ncol, tag) as (
    select nrow, row_number() over(partition by nrow),
        concat('<span style="',
        case tr when 0 then concat('color: #', rgb)
        else 'visibility: hidden'
        end, '">&#x2588;&#x2588;</span>')
    from lines natural join icon_colors
)
select text '<div style="border: dashed 1px gray; '
    || ' display: inline-block; font-size: 4px; '
    || ' margin: 1em; padding: 2em ">'
union all
select concat(tag, case ncol when 32 then '<br>' end) from pixels 
union all
select '</div>'
\g 'r:/icon_from_postgres.html'

↓ 結果がこちら（冒頭の再掲）。実際のHTMLもUPしました。ブラウザのフォントによって見え方が多少変わると思います。

今回は4bitカラー（1ピクセルが1つの16進数）と単純でした。実際あるアイコンはサイズ、カラー、ビットマップ形式（圧縮の有無など）が多様なので、この通り出来ない場合も多いかと。ただ最後の部分、「行番号、列番号、色、透過ありなし」で表したビットマップをHTML出力するのは、何か役立つかもしれません。