Conversion of a simple Processor to asynchronous Logic
This paper discuss the conversion of a simple 16-bit synchronous RISC based processor into asynchronous logic. The most important targets were the simplicity of the conversion, to see how the tools reacted to asynchronous elements, increase the stability of the processor under different conditions and make some general guide lines for conversion other processors.
The report first gives a short introduction to design of asynchronous logic. Then it discuss approaches for the conversion before the actual implementations done are walked through. The area, power and performance for different implementations is also discussed.
3 different asynchronous implementations was implemented and tested: One simple sequential request-acknowledge scheme version with no pipelining, one simple muller pipeline based implementation and last a more advanced muller pipeline based version with more pipeline steps, FIFOs to avoid hazards and register feedback.
The simple pipelined muller version is verified in fpga as well as the original synchronous version.
The main challenge was to decide how the conversion should be done, and get the synthesis tool to synthesize the design correctly. We got the asynchronous versions of the processor to work after synthesis by adding constraints on paths that exploit propagation delays to avoid the synthesis tool from optimizing those parts of the circuit. During the synthesis and constraint generation it was necessary to analyse the synthesis output to verify that the generated netlist was as expected.
The fpga implemented asynchronous processor is not optimized for performance, but more for testability: We have added programmable delays and the possibility to control the processor with both a clock or the synchronous controller. A weakness with the report is that we have not had time to manufacture an actual asic to run and compare simulations against a real asic.
Both rtl simulations, ntl simulations and fpga tests show that the implemented designs work as expected.
We have not extracted exact timings from the actual routed asic design as the routing of the design is still a work in progress, it is however simple to see that for each extra pipeline step on the critical path of the processor, the area will increase, and the processor will be slowed down and the power will increase. This is because our processor is very sequential and thus the gain from more pipelining is zero. Still, it was important to see how more advanced implementations was handled by the tools. As the processor is small (if we look away from the ram blocks), the relative increase in the area and power for each pipeline stage is large.
A closer look into dynamic delay scaling, models for signal propagation under different conditions and timing assumptions would be a natural way to continue the future work.
Place, publisher, year, edition, pages
Institutt for elektronikk og telekommunikasjon , 2014. , 271 p.
IdentifiersURN: urn:nbn:no:ntnu:diva-26485Local ID: ntnudaim:11568OAI: oai:DiVA.org:ntnu-26485DiVA: diva2:747986
Svarstad, Kjetil, ProfessorBarzic, Ronan