There is a lot of on-line material regarding theses machines from the second world war. Much of it discusses how big, heavy, and power hungry they were, the cloak and dagger intrigue, and feats of daring-do. For the daring-don't person, there is a lot of discussion about how the cryptographic weaknesses worked, and how they were exploited.
In brief, the Enigma and Tunny machines each implemented an encryption/decription process. The machines that attacked their traffic were the Bombe and the Colossus, respectively.
This is a relatively simple machine to implement. One can buy modern-day replicas which perform the operation with a simple microcontroller and display the wheel positions using star-burst LED displays.
This encrypted a stream of 5-bit teleprinter characters with a pseudo-random block of bits. These where produced by a pegs set in a series of wheels. Like the Enigma, it should not be too hard to implement.
It might be mildly interesting too see how fast one could encrypt characters with FPGA implementations of the machines above, but there really is no need for encrypting messages faster than the operator can type them in. The real need for speed is in cryptanalytical machines, which have a lot more work to do.
Each Bombe had an array of rotors, and each rotor tested Enigma wheel settings and halted when it detected that an impossible wheel setting.
A large number of rotors attacked the problem, and a person has already explored using FPGA chips to implement rotors. Depending on various constraints like speed and simplicity, one can pack more or less rotors into each FPGA. For this reason I shall not be repeating the exercise.
This is the most interesting candidate for FPGA implementation. Firstly it was electronic and not mechanical like the Bombe rotor. Secondly a lot has been said about how powerful it was, and how even modern PCs have only just recently been able to do the same job in similar times. An article in the June 2004 issue of ELECTRONICS WORLD stated that Tony Sale (ex Museums Director of Bletchley Park) wrote a Colossus simulator for his 800MHz laptop PC which ran around half the speed of real Colossus whose clock speed is 5 kHz. This is a ratio of 160,000 to 1.
A common FPGA can run at 50 MHz very easily, i.e. 10,000 times faster. So a successful Colossus clone would need Tony's laptop to run at 16 TeraHertz to catch up. That is within the frequency range of infra-red light. If it ran proportionately hotter as well, I would not want it to be on my lap at the time.
I have not found much significant information about what boolean operations Colossus actually performed upon the characters being pumped through it. Most sources just seem to cover superficial details such as how big, heavy, and power hungry they were. One exception to this trend is a Tony Sales' pages on Lorenz which have a lot to digest.
It is known that Colossus found the wheel patterns by looking for statistical deviations from random. Bit positions in the character stream were independent of each other, so only one or two bit streams were tested in a runs.
Essentially the machine is cross-correlating bit stream, and counting the correlations. Higher correlation scores suggest more likely wheel settings.
I downloaded Tony's virtual colossus and found that it is written in Javascript. If this is the same simulator mentioned above, it would explain the relatively slow speed. Javascript is an interpreted language, thus about 100 times slower than compiled languages. However, it is highly portable as the author intended.
It seems a good speed increase might be made simply by converting the working to C. If the C code can run faster than a PC can send data to an external FPGA, then there would be no point for the latter.