tomsfastmath/changes.txt

0.04 -- Fixed bugs in the SSE2 squaring code
     -- Rewrote the multipliers to be optimized for small inputs 
     -- Nelson Bolyard of the NSS crew submitted [among other things] new faster Montgomery reduction
        code.  It brings the performance for small numbers on the AMD64 and all numbers on the P4
        to a new level.  Thanks!
     -- Added missing ARM support for fp_montgomery_reduce.c that the NSS folk left off, Officially 
        the ARM code is for v4 and above WITH the "M" multiplier support (e.g. umlal instruction)
     -- Added PPC32 support, define TFM_PPC32 to enable it, I used the "PowerPC 6xx" instruction
        databook for reference.  Does not require altivec.  Should be fairly portable to the other
        32-bit PPCs provided they have mullw and mulhwu instructions.
        [Note: porting the macros to PPC64 should be trivial, anyone with a shell to lend... email me!]
     -- Rewrote the config a bit in tfm.h so you can better choose which set of "oh my god that's huge" code to 
        enable for your task.  "generic" functions are ALWAYS included which are smaller but will cover the
        gaps in the coverage for ya.
     -- The PPC32 code has been verified to function on a Darwin box running GCC 2.95.2 
        [Thanks to the folk at PeerSec for lending me a shell to use]
     -- Fixed a bug in fp_exptmod() where if the exponent was negative AND the destination the output
        would have the sign set to FP_NEG.

March 1st, 2005
0.03 -- Optimized squaring
     -- Applied new license header to all files (still PD)

September 18th, 2004
0.02 -- Added TFM_LARGE to turn on/off 16x combas to save even more space.
        This also helps prevent killing the cache on smaller cpus.
     -- Cast memset to void in fp_init() to catch people who misuse the function (e.g. expect return)
        Thanks to Johan Lindh
     -- Cleaned up x86-64 support [faster montgomery reductions]
     -- Autodetects x86-32 and x86-64 and enables it's asm now 
     -- Made test demo build cleaner in multilib platforms [e.g. mixed 32/64 bits]
     -- Fix to fp_mod to ensure that remainder is of the same sign as the modulus.
     -- Fixed bug in fp_montgomery_calc_normalization for single digit moduli
     -- cleaned up ISO C macros in comba/mont to avoid branches [works best with GCC 3.4.x branch]
     -- Added more testing to tfm.h to help detect misconfigured builds
     -- Added TFM_NO_ASM which forces ASM off [even if it was autodetected].
     -- Added fp_radix_size() to API
     -- Cleaned up demo/test.c to build with far fewer warnings (mostly %d => %lu fixes)
     -- fp_exptmod() now supports negative exponent and base>modulus cases
     -- Added fp_ident() which gives a string showing how TFM was configured.  Useful for debuging... 
     -- fix gen.pl script so it includes the whole source tree now 

August 25th, 2004
0.01 -- Initial Release
added tomsfastmath-0.04 2005-07-23 12:43:03 +02:00			`0.04 -- Fixed bugs in the SSE2 squaring code`
			`-- Rewrote the multipliers to be optimized for small inputs`
			`-- Nelson Bolyard of the NSS crew submitted [among other things] new faster Montgomery reduction`
			`code. It brings the performance for small numbers on the AMD64 and all numbers on the P4`
			`to a new level. Thanks!`
			`-- Added missing ARM support for fp_montgomery_reduce.c that the NSS folk left off, Officially`
			`the ARM code is for v4 and above WITH the "M" multiplier support (e.g. umlal instruction)`
			`-- Added PPC32 support, define TFM_PPC32 to enable it, I used the "PowerPC 6xx" instruction`
			`databook for reference. Does not require altivec. Should be fairly portable to the other`
			`32-bit PPCs provided they have mullw and mulhwu instructions.`
			`[Note: porting the macros to PPC64 should be trivial, anyone with a shell to lend... email me!]`
			`-- Rewrote the config a bit in tfm.h so you can better choose which set of "oh my god that's huge" code to`
			`enable for your task. "generic" functions are ALWAYS included which are smaller but will cover the`
			`gaps in the coverage for ya.`
			`-- The PPC32 code has been verified to function on a Darwin box running GCC 2.95.2`
			`[Thanks to the folk at PeerSec for lending me a shell to use]`
			`-- Fixed a bug in fp_exptmod() where if the exponent was negative AND the destination the output`
			`would have the sign set to FP_NEG.`

added tomsfastmath-0.03 2005-03-02 00:00:09 +01:00			`March 1st, 2005`
			`0.03 -- Optimized squaring`
added tomsfastmath-0.04 2005-07-23 12:43:03 +02:00			`-- Applied new license header to all files (still PD)`
added tomsfastmath-0.03 2005-03-02 00:00:09 +01:00
added tomsfastmath-0.02 2004-09-19 03:31:44 +02:00			`September 18th, 2004`
			`0.02 -- Added TFM_LARGE to turn on/off 16x combas to save even more space.`
			`This also helps prevent killing the cache on smaller cpus.`
			`-- Cast memset to void in fp_init() to catch people who misuse the function (e.g. expect return)`
			`Thanks to Johan Lindh`
			`-- Cleaned up x86-64 support [faster montgomery reductions]`
			`-- Autodetects x86-32 and x86-64 and enables it's asm now`
			`-- Made test demo build cleaner in multilib platforms [e.g. mixed 32/64 bits]`
			`-- Fix to fp_mod to ensure that remainder is of the same sign as the modulus.`
			`-- Fixed bug in fp_montgomery_calc_normalization for single digit moduli`
			`-- cleaned up ISO C macros in comba/mont to avoid branches [works best with GCC 3.4.x branch]`
			`-- Added more testing to tfm.h to help detect misconfigured builds`
			`-- Added TFM_NO_ASM which forces ASM off [even if it was autodetected].`
			`-- Added fp_radix_size() to API`
			`-- Cleaned up demo/test.c to build with far fewer warnings (mostly %d => %lu fixes)`
			`-- fp_exptmod() now supports negative exponent and base>modulus cases`
			`-- Added fp_ident() which gives a string showing how TFM was configured. Useful for debuging...`
			`-- fix gen.pl script so it includes the whole source tree now`

added tomsfastmath-0.01 2004-08-25 04:43:43 +02:00			`August 25th, 2004`
added tomsfastmath-0.02 2004-09-19 03:31:44 +02:00			`0.01 -- Initial Release`