Finalizing adjunct extraction rules hierarchy

What should the hierarchy for adjunct extraction actually look like? I don’t actually know all the conventions having to do with basic- and -simple phrasal types, and so on.

There are two questions I wanted to discuss here: (1) The shape of the hierarchy; (2) the PLACEHOLDER feature. Let’s start with the PLACEHOLDER feature.

@ebender, you were keen on getting rid of the PLACEHOLDER feature, is that right? However, @guyemerson points out that, while we can of course get rid of it, it results in repeated constraints (provided we keep the same analysis overall):

extracted-adj-first-phrase := extracted-adj-phrase-simple &
  [ SYNSEM [ NON-LOCAL [ SLASH.APPEND < [ LIST < [ CAT
		                                            [ HEAD +rp &
		                                                 [ MOD < [ LOCAL intersective-mod &
                                                               [ CAT [ HEAD #head,
                                                                       POSTHEAD #ph,
                                                                       MC #mc ],
                                                                  CONT.HOOK #hook,
                                                                  CTXT #ctxt ] ] > ],
                                                      VAL [ SPEC < >,
                                                          SUBJ olist,
                                                          COMPS olist,
                                                          SPR olist ] ] ] > ], #slash > ] ],
    HEAD-DTR.SYNSEM canonical-synsem &
	   [ LOCAL local &
		   [ CAT [ HEAD #head,
               VAL.SUBJ cons,
		           POSTHEAD #ph,
                   MC #mc ],
                   CONT.HOOK #hook,
                   CTXT #ctxt ],
	     MODIFIED notmod,
       NON-LOCAL.SLASH #slash & [ LIST < > ] ] ].

extracted-adj-last-phrase := extracted-adj-phrase-simple &
  [ SYNSEM [ NON-LOCAL [ SLASH.APPEND < #slash, [ LIST < [ CAT
		                                            [ HEAD +rp &
		                                                 [ MOD < [ LOCAL intersective-mod &
                                                               [ CAT [ HEAD #head,
                                                                       POSTHEAD #ph,
                                                                       MC #mc ],
                                                                  CONT.HOOK #hook,
                                                                  CTXT #ctxt ] ] > ],
                                                      VAL [ SPEC < >,
                                                          SUBJ olist,
                                                          COMPS olist,
                                                          SPR olist ] ] ] > ] > ] ],
    HEAD-DTR.SYNSEM canonical-synsem &
	   [ LOCAL local &
		   [ CAT [ HEAD #head,
               VAL [ SUBJ < >, COMPS < > ],
		           POSTHEAD #ph,
                   MC #mc ],
                   CONT.HOOK #hook,
                   CTXT #ctxt ],
	     MODIFIED notmod,
       NON-LOCAL.SLASH #slash & [ LIST cons ] ] ].

Whereas with the PLACEHODER it was just:

extracted-adj-last-phrase := extracted-adj-phrase &
  [ SYNSEM [ NON-LOCAL.SLASH append-list &
		      [ APPEND < #slash, #placeholder >,
		      PLACEHOLDER #placeholder ] ],
    HEAD-DTR.SYNSEM.NON-LOCAL.SLASH #slash & [ LIST cons ] ].

; OZ 2020-07-01 This requires that any arguments are extracted after the adjunct.
extracted-adj-first-phrase := extracted-adj-phrase &
  [ SYNSEM [ NON-LOCAL.SLASH append-list &
		      [ APPEND < #placeholder, #slash >,
		      PLACEHOLDER #placeholder ] ],
    HEAD-DTR.SYNSEM.NON-LOCAL.SLASH #slash ].

extracted-adj-phrase := extracted-adj-phrase-simple &
  [ SYNSEM [ NON-LOCAL [ SLASH [ PLACEHOLDER.LIST < [ CAT
		                                            [ HEAD +rp &
		                                                 [ MOD < [ LOCAL intersective-mod &
                                                               [ CAT [ HEAD #head,
                                                                       POSTHEAD #ph,
                                                                       MC #mc ],
                                                                  CONT.HOOK #hook,
                                                                  CTXT #ctxt ] ] > ],
                                                      VAL [ SPEC < >,
                                                          SUBJ olist,
                                                          COMPS olist,
                                                          SPR olist ] ] ] > ] ] ],
    HEAD-DTR.SYNSEM canonical-synsem &
	   [ LOCAL local &
		   [ CAT [ HEAD #head,
		           POSTHEAD #ph,
                   MC #mc ],
                   CONT.HOOK #hook,
                   CTXT #ctxt ],
	     MODIFIED notmod ] ].

So, wouldn’t it be better to keep the PLACEHOLDER feature? Or were you in fact suggesting a change in the analysis?

Now assuming we decided what to do with the PLACEHOLDER, what should the hierarchy be? Right now it sort of organically grew into the following:

basic-extracted-adj-phrase := head-mod-phrase & head-only & phrasal.

extracted-adj-phrase-simple := basic-extracted-adj-phrase &
  [ SYNSEM [ LOCAL.CAT [ WH #wh,
                         VAL.SPEC < >,
                         POSTHEAD #ph,
                         MC #mc ],
	     NON-LOCAL [ QUE #que ] ],
    HEAD-DTR.SYNSEM canonical-synsem &
	   [ LOCAL local &
		   [ CAT [ HEAD verb,
		           WH #wh,
		           VAL  [ SUBJ < synsem-min >, SPEC < > ],
			       POSTHEAD #ph,
                   MC #mc ],
                   CONT.HOOK #hook ],
	     MODIFIED notmod,
	     NON-LOCAL [ QUE #que ] ],
    C-CONT [ HOOK #hook,
         RELS.LIST < >,
	     HCONS.LIST < >,
	     ICONS.LIST < > ] ].

extracted-adj-phrase := extracted-adj-phrase-simple &
  [ SYNSEM [ NON-LOCAL [ SLASH append-list-with-placeholder &
	                        [ PLACEHOLDER.LIST < [ CAT
		                                            [ HEAD +rp &
		                                                 [ MOD < [ LOCAL intersective-mod &
                                                               [ CAT [ HEAD #head,
                                                                       POSTHEAD #ph,
                                                                       MC #mc ],
                                                                  CONT.HOOK #hook,
                                                                  CTXT #ctxt ] ] > ],
                                                      VAL [ SPEC < >,
                                                          SUBJ olist,
                                                          COMPS olist,
                                                          SPR olist ] ] ] > ] ] ],
    HEAD-DTR.SYNSEM canonical-synsem &
	   [ LOCAL local &
		   [ CAT [ HEAD #head,
		           POSTHEAD #ph,
                   MC #mc ],
                   CONT.HOOK #hook,
                   CTXT #ctxt ],
	     MODIFIED notmod ] ].

extracted-adj-only-phrase := extracted-adj-phrase-simple &
  [ SYNSEM [ NON-LOCAL [ SLASH append-list &
	                        [ LIST < [ CAT
		                                  [ HEAD +rp &
		                                         [ MOD < [ LOCAL intersective-mod &
                                                               [ CAT [ HEAD #head,
                                                                       POSTHEAD #ph,
                                                                       MC #mc ],
                                                                  CONT.HOOK #hook,
                                                                  CTXT #ctxt ] ] > ],
                                                      VAL [ SPEC < >,
                                                          SUBJ olist,
                                                          COMPS olist,
                                                          SPR olist ] ] ] > ] ] ],
    HEAD-DTR.SYNSEM canonical-synsem &
	   [ LOCAL local &
		   [ CAT [ HEAD #head,
		           POSTHEAD #ph,
                   MC #mc ],
                   CONT.HOOK #hook,
                   CTXT #ctxt ],
	     MODIFIED notmod,
	     NON-LOCAL.SLASH.LIST < > ] ].

; OZ 2020-07-01 This requires that any arguments are extracted prior to the adjunct.
extracted-adj-last-phrase := extracted-adj-phrase &
  [ SYNSEM [ NON-LOCAL.SLASH append-list &
		      [ APPEND < #slash, #placeholder >,
		      PLACEHOLDER #placeholder ] ],
    HEAD-DTR.SYNSEM.NON-LOCAL.SLASH #slash & [ LIST cons ] ].

; OZ 2020-07-01 This requires that any arguments are extracted after the adjunct.
extracted-adj-first-phrase := extracted-adj-phrase &
  [ SYNSEM [ NON-LOCAL.SLASH append-list &
		      [ APPEND < #placeholder, #slash >,
		      PLACEHOLDER #placeholder ] ],
    HEAD-DTR.SYNSEM.NON-LOCAL.SLASH #slash ].

I suspect that this is not necessarily optimal. Note for example that the constraints on the head daughter are ensuring identities with both the mother and the extracted adjunct’s MOD. Currently this is done by two different types. Should those types be collapsed into one? Anything else that should be done differently?

Thanks!!!

  1. I don’t think there’s much rhyme or reason about basic- and -simple, alas.

  2. My comment about PLACEHOLDER was that I think it becomes superfluous without lexical threading, because then the order in which the subj/comp/adj extraction rules apply ends up constraining the order of order of the SLASH list (not true with lexical threading).

I was also under the impression that it wasn’t possible to make the rules work without the PLACEHOLDER (with lexical threading), but @guyemerson doesn’t seem to think that?

But, in any case, PLACEHOLDER allows me to not repeat all those MOD constraints…

We might be talking at cross-purposes, but I think that without lexical threading, you only need one adjunct extraction rule (that can apply before or after subject extraction).

Ah, I see.

But I don’t know how to write that; APPEND has its elements in order, so, I should say whether the new element goes before or after the existing SLASH.

So, either multiple extraction rules or multiple filller-gap rules? I think.

You don’t need multiple of either: the point is that the extraction rules apply in some order, and if multiple orders are possible they will construct different lists!

Sorry, I don’t understand :). How would this be written in terms of the APPEND?

All rules (subj-extr, comps-extr, adj-extr) append something to the start of the SLASH list. If you extract the subject first, and then the adjunct, you get < adj-gap, subj-gap >. If the rules apply in the other order, you get < subj-gap, adj-gap >.

1 Like

Aah. This sounds lovely.

I tried it, and looks like perhaps it is working fine, except I get this sort of ambiguity (pseudolanguage):

Screen Shot 2020-09-25 at 5.07.28 PM

But that’s either OK or avoidable, maybe?

Before I used to have a SUBJ constraint which I had to remove because it was clashing with the extracted subject rule now that I wanted the extracted adjunct to apply either below or above.

Of course I have test failures because of this, and so after I either deal with or accept this particular ambiguity, I may discover other issues. I will inspect the failures on Monday and report back. The English ones (the ones that are more serious than the above) look like they mostly have to do with the auxiliary. (But I haven’t even looked at Russian…)

Well, in a language without multiple wh extraction, you could pin that down to only applying if the SUBJ is non-empty (or whatever). For languages that do allow both, it’ll require a bit more cleverness. One possibility is that adjunct extraction applies of the SUBJ is list(gap), meaning that the adjunct extraction can apply before subject extraction (or after it), but only low with respect to subject realization.

1 Like

This proposed analysis sounds good to me. I just wanted to summarise this discussion and consolidate it with what I’ve previously said to Olga (and what @sweaglesw said here).

If there’s a single head-filler rule, and a single analysis of the head daughter (everything in the sentence apart from the fronted wh-elements), then we would only get one possible order for the wh-elements. To allow multiple orders, there are a few options:

  1. Multiple head-filler rules.

  2. Multiple extraction rules.

  3. A shuffle operator.

  4. Extraction rules applying in different orders.

The last option doesn’t work with lexical threading, as discussed in this thread.

For the first two options, a PLACEHOLDER feature avoids repeating code. (Anything that can be done with a PLACEHOLDER can be done without it, but I think it’s justifiable from a software engineering point of view.)

The first three options could be implemented with a non-deterministic computation type, so that you don’t actually need to write multiple head-filler rules or multiple extraction rules. There would just be a single binary rule which triggers the unary “computation rules” (these rules would be defined once per operation, rather than once per phrase structure rule). If we want to consider arbitrarily long lists, this would also get around the problem of needing arbitrarily many rules.

Are there any downsides to option 4 then? Ambiguity? (But I don’t know what would happen in terms of ambiguity with 3).

All of the options would need some care to avoid spurious ambiguity (but in different ways). I don’t have a good feeling for which one would make it easiest to avoid spurious ambiguity.

Option 4 is perhaps more controversial, since there are arguments in favour of lexical threading. (I’m not trying to start a discussion on that here – just saying that this could be seen as a downside.)

1 Like

Makes sense. I guess there are two separate scenarios for which this is relevant: our HPSG paper and the Matrix as a project. For the former, 4 may be more controversial, but for the latter, we had already decided to try and get rid of lexical threading, so, 4 would be clearly preferable (unless it adds massive ambiguity which it does not; I have the same ambiguity with option 2 once I remove lexical threading).