ACE ubertagger: how does it get initialized?

Hopefully this question is more meaningful than my previous one about the curly braces and indentation :woman_facepalming:

I figured out the answer while writing the question. I will post the answer right below.

Original issue:

I cannot figure out how the ubertagger gets initialized. the code that uses it is in parse.c and it is this:

	extern struct ubertagger	*the_ubertagger;
	extern int	enable_ubertagging;
	char *st_file = "supertags.txt";
	if(g_profiling)start_and_alloc_profiler(&ubertagging_profiler, "ĂĽbertagging", parse_profiler, lexical_filtering_profiler);
	if(the_ubertagger && enable_ubertagging) 
	{
		printf("Lexical chart before ubertagging:\n");
		ubertag_lattice(the_ubertagger, lexical_chart, log(ubertagging_threshold));
		printf("Lexical chart after ubertagging:\n");
		print_lexical_chart(lexical_chart);
	}

I have found the place where the variable enable_ubertagging can be set to 1 but I cannot find where the ubertagger is set up. Or rather, I found this code but it doesn’t seem to be executed:

int	load_ubertagging()
{
	char	*expath = get_conf_file_path("ĂĽbertag-emission-path")?:get_conf_file_path("ubertag-emission-path");
	char	*txpath = get_conf_file_path("ĂĽbertag-transition-path")?:get_conf_file_path("ubertag-transition-path");
	char	*gmpath = get_conf_file_path("ĂĽbertag-generic-map-path")?:get_conf_file_path("ubertag-generic-map-path");
	char	*wlpath = get_conf_file_path("ĂĽbertag-whitelist-path")?:get_conf_file_path("ubertag-whitelist-path");

	if(!expath || !txpath || !gmpath)return -1;
	
	//the_ubertagger = load_ubertagger("/home/sweaglesw/cdev/erg-1214/ut/nanc_wsj_redwoods_noaffix.ex.gz","/home/sweaglesw/cdev/erg-1214/ut/nanc_wsj_redwoods_noaffix.tx.gz", "/home/sweaglesw/cdev/erg-1214/ut/generics.cfg");
	the_ubertagger = load_ubertagger(expath,txpath,gmpath);
	printf("loaded ubertagger\n");
	if(!the_ubertagger)return -1;
	the_ubertagger->whitelist = hash_new("ut-whitelist");
	if(wlpath)load_hashlist(the_ubertagger->whitelist, wlpath);
	return 0;
}

The above is from ubertag.c. I do not see this code being executed with any of the commands I try, e.g.

./ace -g ~/delphin/erg/trunk/ace/english-ut.dat sentences.txt --ubertagging=0.0001

– the ubertagging works but I don’t understand where and how the ubertagger gets set up/initialized. It is not the code from ubertag.c above.

The only call to load_ubertagging() is in tdl.c, in the function load_grammar():

	if(get_conf("ubertag-emission-path") || get_conf("ĂĽbertag-emission-path"))
		run_task("loading ĂĽbertagger", load_ubertagging);

but as far as I can tell, this code never gets called. How is the ubertagger initialized?

Answer: When you compile the grammar with ACE, it gets “frozen”. It is during the grammar compilation that the ubertagger is created (and also serialized). Then when you run ACE to parse things, it uses the frozen grammar and the frozen supertagger. So the line of code which loads the supertagger in the parsing mode is this one:

recover_ubertagger(G->ubertagger);
...
void	recover_ubertagger(void	*u)
{
	the_ubertagger = u;
}
1 Like

Just to elaborate, the ERG needs to be recompiled after making the following change to the standard ace/config.tdl file:

Remove the semicolon comment character from the beginning of the four lines toward the bottom of the file that appear right after the line “DPF 2019-11-20 - These next four should be uncommented for release”.

2 Likes