The comba sqr code does not check the maximum bounds of fp_int; eg:
if you invoke fp_sqr_comba_20, it will write 40 digits to the
destination even if FP_SIZE < 40. This is correct for achieving high
speeds, but it means that it is the caller's responsibility to check for
such overflows.
fp_sqr.c only checks for numeric overflows (a->used * 2 >= FP_SIZE)
though. This means that if you call fp_sqr() with a small number (say
1), and your FP_SIZE is 10, and you have enabled a fp_sqr_comba_8, it
will overflow your buffer by writing 16 digits.
Since the exact subset of active comba multipliers/sqrs are up to the user
(in tfm.h), we fix the code never to invoke them if they can cause
overflows.
Currently, the fp_sqr_comba_* functions do not fully clear the destination
number, but only overwrites the digits they care about. Eg: if
you call a comba4, it will overwrite the first 8 digits and leave
the others unchanged.
On the other hand, fp_mul_comba_* functions do *not* check incoming
unused digits (relying on the guarantee that they must be zero),
so they will happily compute the wrong result if those digits
are not empty. Testcase for a 32-bit system:
char buf[64];
fp_int num, num2, d;
memset(buf, 0xFF, sizeof(buf);
fp_read_unsigned_bin(&num, buf);
fp_set(&d, 1);
fp_sqr_comba_3(&d, &num);
// now num is { 0x1, 0x0, 0x0, 0x0, 0x0, 0x0,
// 0xFFFFFFFF, 0xFFFFFFFF ... }
// only first 6 digits have been written, but even
// if num.used is correctly set to 6, this can trigger
// bugs.
// Create a number larger than 6 digits
fp_2expt(&num2, 8*32+4);
fp_mul_comba_8(&num, &num2, &num2);
// wrong result has been computed, because the first 8
// digits of num have been read and multiplied
// even if num->used == 6, relying on the fact that
// they should be zero.