Slowdown between ERG 1214 and ERG 2018

I’m currently investigating the slowdown I discussed with Emily and @Dan for my master’s project. My work so far has been focusing on the following sentence:

On one side of this power struggle stand the forces in ascendency on Wall Street – the New Guard – consisting of high-tech computer wizards at the major brokerage firms, their pension fund clients with immense pools of money, and the traders at the fast-growing Chicago futures exchanges.

With 16GB of RAM, with the 1214 version of the grammar, this sentence produces 5646 readings in 66s for a time-per-parse of 12ms. However, in the 2018 version of the grammar, it produces 26 readings in the same amount of time for 2.5s per parse.

Increasing the RAM in the 2018 version produces the same number of readings, but in a longer period of time, running up until the RAM limit is reached. With 32GB, this leads to 26 readings in 155s, and with 50GB, this leads to 26 readings in 248s.

A further observation is that, according to ace’s debug output, the 2018 version produces less hypotheses when more RAM is available.

2018 with 16GB RAM:

NOTE: loading frozen grammar ERG (2018)
NOTE: 10439 types, 40320 lexemes, 362 rules, 67 orules, 108 instances, 49510 strings, 233 features
permanent RAM: 3k

NOTE: hit RAM limit while unpacking
NOTE: 26 readings, added 117069 / 104559 edges to chart (21974 fully instantiated, 2262 actives used, 30807 passives used)      RAM: 16384002k
NOTE: parsed 1 / 1 sentences, avg 16019459k, time 66.89725s
40371819 total hypotheses generated
763 total nodes reconstructed
NOTE: glb hash: 0 direct hits, 0 collisions, 10661 misses
NOTE: 2502149 subsumption tests; qc filters 90.0% leaving 249043, of which ss passes 39.4% = 98059 ; 2.5% = 6243 generalizable
NOTE: unify filters: 21480481 total, 11778277 rf (54.8%), 488531 qc (2.3% / 4.1%), 310401 success (1.4% / 63.5%), 0 bad orth (0.0% / 0.0%)
NOTE: 575073 / 238636 (241.0%) passive edges were connected to roots

2018 with 32GB RAM:

NOTE: loading frozen grammar ERG (2018)
NOTE: 10439 types, 40320 lexemes, 362 rules, 67 orules, 108 instances, 49510 strings, 233 features
permanent RAM: 3k

NOTE: hit RAM limit while unpacking
NOTE: 26 readings, added 117069 / 104559 edges to chart (21974 fully instantiated, 2262 actives used, 30807 passives used)      RAM: 32767999k
NOTE: parsed 1 / 1 sentences, avg 31836161k, time 129.16637s
85684421 total hypotheses generated
763 total nodes reconstructed
NOTE: glb hash: 0 direct hits, 0 collisions, 10664 misses
NOTE: 2502149 subsumption tests; qc filters 90.0% leaving 249043, of which ss passes 39.4% = 98059 ; 2.5% = 6243 generalizable
NOTE: unify filters: 21480481 total, 11778277 rf (54.8%), 488531 qc (2.3% / 4.1%), 310401 success (1.4% / 63.5%), 0 bad orth (0.0% / 0.0%)
NOTE: 575073 / 238636 (241.0%) passive edges were connected to roots
TIMERS (260 calls = ~ 35.6µs overhead):

Note that the number of edges created is the same. So, I believe this to be a problem with unpacking.

When unpacking, if I print the hypotheses being popped off the agenda and the edges begin decomposed, I find that the same number of edges are decomposed with both RAM limits, but that more than 6x the hypotheses are popped from the agenda with the 32GB RAM limit. In both cases, the last hypothesis popped is with the rule hdn-np_app-pr_c, and this hypothesis is popped from the root hypothesis agenda more with more RAM in the 2018 version, for example 6,848,016 with 16GB and 8,893,506 times with 32GB.

I’ve also enabled some of ACE’s debug statements from the unpacking phase, specifically those that show when ACE asks for a new hypothesis on a given edge (“asked to hyp…”) and when already known or new hypotheses are returned.

I find that in both versions of the grammar, ACE requests approximately the same number of hypotheses. However, the number of requests on edges with the rule hd-aj_int-unsl_c is doubled in 2018. I have counted the paths taken to reach hd-aj_int-unsl_c during hypothesizing, and have found that those paths that exhibit a large increase from 1214 to 2018 involve np_app-pr_c:

Path, 1214 count, 2018 count
hd-cmp_u_c->sp-hd_n_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn_bnp_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn-np_app-pr_c->sp-hd_n_c->hdn-aj_rc-pr_c->vp_rc-redrel_c->hd-aj_int-unsl_c, 84833, 387717
hd-cmp_u_c->sp-hd_n_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn_bnp_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn-np_app-pr_c->sp-hd_n_c->hdn-aj_redrel_c->hd-aj_int-unsl_c, 55292, 263507
hd-cmp_u_c->sp-hd_n_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn_bnp_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn-np_app-pr_c->sp-hd_n_c->hdn-aj_rc-pr_c->vp_rc-redrel_c->hd-aj_int-unsl_c->hd-aj_int-unsl_c, 18430, 136016
hd-cmp_u_c->sp-hd_n_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn_bnp_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn-np_app-pr_c->sp-hd_n_c->hdn-aj_rc_c->cl_rc-fin-nwh_c->sb-hd_nmc_c->aj-hd_scp_c->hdn_bnp-vger_c->vp_np-ger_c->hd-aj_int-unsl_c, 0, 102788
hd-cmp_u_c->sp-hd_n_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn_bnp_c->hdn-aj_redrel_c->hd-cmp_u_c->hdn-np_app-pr_c->sp-hd_n_c->hdn-aj_rc-pr_c->vp_rc-redrel_c->hd-cmp_u_c->hd-cmp_u_c->hdn_bnp_c->hdn-aj_redrel_c->hd-aj_int-unsl_c, 0, 70142

Right now, I’m exploring simpler sentences with this apposition rule to see if I can determine what might be happening. Any ideas/comments would be greatly appreciated. Also, @sweaglesw, if you have any ideas for tracking down an issue like this within ACE that would also be super helpful. Thanks!

Hi Andrew,

It looks like you’ve dug into the unpacking code and algorithms in ACE some, which is great – not many are brave enough for that.

You said you saw fewer hypotheses with 32GB than with 16GB, but the numbers in the log you posted don’t seem to corroborate that – I see

and

… which to me looks like about 2x the number of hypotheses for 2x the memory. Do you find something in that to be surprised by?

Other than that quibble, it sort of looks to me as though you may have run into a sentence for which the vast majority of hypotheses fail to reconstruct. These happen sometimes, and they definitely merit closer inspection. The situations that give rise to that phenomenon are not yet well described, and I believe there is room for improving the unpacking algorithm to short-circuit a lot of the bad hypotheses.

I recommend you try comparing also with --disable-generalization on your ACE commandline, and see if that changes the calculus of the situation. Packing under generalization is very likely to be implicated in those spurious hypotheses. On average, enabling generalization saves quite a bit of time, but there are instances where it has this kind of unfortunate hidden costs.

I also recommend hunting through your log files for “edge #12345 fail”, and then see if you can work out the configuration of rules that the packed forest implies is unifiable but in fact is not. You have already collected some data that may have a bearing on that question with your cataloguing of paths.

It’s also possible my guess is wrong and something different is happening.