If I'm reading you right, you're saying that a simple way to do this would be to calculate logits for not just the next token, but also n+1 -- all at the same time. If one of the n+1 logits is chosen, then do an infill on the skipped token for the next step, then resume.
This could get us around the example that you gave for only a linear increase in the vocabulary size -- so looking an extra token ahead only increases vocab size by a factor of 2, and looking at a third token is a total factor of 3.
If I'm reading you right, you're saying that a simple way to do this would be to calculate logits for not just the next token, but also n+1 -- all at the same time. If one of the n+1 logits is chosen, then do an infill on the skipped token for the next step, then resume.
This could get us around the example that you gave for only a linear increase in the vocabulary size -- so looking an extra token ahead only increases vocab size by a factor of 2, and looking at a third token is a total factor of 3.
This seems really promising!