The comba sqr code does not check the maximum bounds of fp_int; eg:
if you invoke fp_sqr_comba_20, it will write 40 digits to the
destination even if FP_SIZE < 40. This is correct for achieving high
speeds, but it means that it is the caller's responsibility to check for
such overflows.
fp_sqr.c only checks for numeric overflows (a->used * 2 >= FP_SIZE)
though. This means that if you call fp_sqr() with a small number (say
1), and your FP_SIZE is 10, and you have enabled a fp_sqr_comba_8, it
will overflow your buffer by writing 16 digits.
Since the exact subset of active comba multipliers/sqrs are up to the user
(in tfm.h), we fix the code never to invoke them if they can cause
overflows.