--- 0. IMPORTANT... why are you doubling the "even" terms individually? STUPID! - make it so you have four new macros that use an additional 3 carry variables - SQRADDSC - store first mult [ simple store, no carry ] - SQRADDAC - add subsequent mults [ 3n word add ] - SQRADDDB - double the carry [ 3n word add ] - SQRADDFC - forward the doubles into the main [ 3n word add, note, x86_32 may need "g" instead of "r" ] - only use the four macro pattern for rows with >= 3 "doubles" - otherwise use the existing SQRADD 1. Write more documentation ;-) 2. Ports to PPC and MIPS 3. Fix any lingering bugs, add additional requested functionality. 4. Unrolled copies of montgomery will speed it up a bit 5. NOTE: The library is still fairly new. I've tested it quite a bit but that doesn't mean surprises can't happen. Please test the results you get for correctness.