Currently, the fp_sqr_comba_* functions do not fully clear the destination
number, but only overwrites the digits they care about. Eg: if
you call a comba4, it will overwrite the first 8 digits and leave
the others unchanged.
On the other hand, fp_mul_comba_* functions do *not* check incoming
unused digits (relying on the guarantee that they must be zero),
so they will happily compute the wrong result if those digits
are not empty. Testcase for a 32-bit system:
char buf[64];
fp_int num, num2, d;
memset(buf, 0xFF, sizeof(buf);
fp_read_unsigned_bin(&num, buf);
fp_set(&d, 1);
fp_sqr_comba_3(&d, &num);
// now num is { 0x1, 0x0, 0x0, 0x0, 0x0, 0x0,
// 0xFFFFFFFF, 0xFFFFFFFF ... }
// only first 6 digits have been written, but even
// if num.used is correctly set to 6, this can trigger
// bugs.
// Create a number larger than 6 digits
fp_2expt(&num2, 8*32+4);
fp_mul_comba_8(&num, &num2, &num2);
// wrong result has been computed, because the first 8
// digits of num have been read and multiplied
// even if num->used == 6, relying on the fact that
// they should be zero.