slow extraction of json array elements in postgresql

I have a json array and I want to expand each element to a new table. With the new json functions in postgresql 9.3, I expected this to be the best method:

create table p as select json_array_elements(json) foo from g

To my amazement, a lateral expansion is lot faster:

create table p as select json->x foo from g join lateral (select 
generate_series(0,json_array_length(g.json)-1) x ) xxx on true

Which is the problem with the first approach?

EDIT: a test case can be built for 20000 rows as

create table g as select (select json_agg(random()) json  from 
generate_series(0, (r1*4)::int))  from (select random() r1 from 
generate_series(1,20000)) aux;

Over a SSD storage, it takes 3 seconds against 0.2 seconds with lateral. For 40000 rows, the time increases to 12 seconds, while the lateral method just grows near linearly.

Answers


The test case is certainly conclusive, and perf top -p $the_backend_pid helps show why:

 96.92%  postgres      [.] MemoryContextReset
  0.15%  [kernel]      [k] cpuacct_account_field
  0.09%  [kernel]      [k] update_cfs_rq_blocked_load
  0.09%  postgres      [.] AllocSetAlloc
  0.09%  libc-2.17.so  [.] __memcpy_ssse3_back
  0.07%  postgres      [.] parse_array
  0.07%  [kernel]      [k] trigger_load_balance
  0.07%  [kernel]      [k] rcu_check_callbacks
  0.06%  [kernel]      [k] apic_timer_interrupt
  0.05%  [kernel]      [k] do_timer
  0.05%  [kernel]      [k] update_cfs_shares
  0.05%  libc-2.17.so  [.] malloc

It's spending a huge amount of time in MemoryContextReset. Especially given that the profile above was noted down at the 47 billion events (approx) mark.

Backtraces are always like:

#0  0x000000000072dd7d in MemoryContextReset (context=0x2a02dc90) at mcxt.c:130
#1  0x000000000072dd90 in MemoryContextResetChildren (context=<optimized out>) at mcxt.c:155
#2  MemoryContextReset (context=0x1651220) at mcxt.c:131
#3  0x00000000005817f9 in ExecScan (node=node@entry=0x164e1a0, accessMtd=accessMtd@entry=0x592040 <SeqNext>, recheckMtd=recheckMtd@entry=0x592030 <SeqRecheck>)
    at execScan.c:155

with varying locations within MemoryContextReset, usually at a branch.

Runtime was 836904.371, vs 903.202 for the lateral join at 200k input rows (10x your test).

So I'd say you've certainly found a performance problem that needs attention.

Update: here's a patch that will apply against git master, or against 9.3. It's pretty easy to grab the source package / srpm and rebuild it if you're using deb/rpm packages of PostgreSQL, no need to switch to unpackaged just to apply a patch.


Need Your Help

Is there a way to load data from a text file on the same server as the web page?

javascript

Recently I have started working on a simple top down tile based program where you can move a player character around the map and zoom in and out of the map.

OpenGL ES - How to Draw a filled Polygon?

iphone opengl-es polygon tessellation

I tried googling and searching on stack but I didn't find anything :-(