FINN+ 1.0.0 Public Release
We are happy to announce the public release of FINN+ 1.0.0. FINN+ is an extended version of FINN that we have created during our project to better accommodate our need for fast integration of new features into FINN. In FINN+ we completely removed the Docker setup, making FINN installable simply via pip, and usable with an intuitive command line interface. To always stay on top of any updates that the original FINN repository receives, we regularly pull from FINN, while simultaneously adding our new features. Moreover, we not only integrate our new features, but also supply bugfixes and quality-of-life improvements, that may take a long time to merge into the original FINN. This enables us to move quickly and explore research ideas, while keeping a close connection to the original FINN team. Some of the new features are the support for Transformers, a high-performance C++ driver, Multi-FPGA capabilities, improved streamlining and many more. To check out the full list of coming and already integrated features, visit our GitHub at https://github.com/eki-project/finn-plus.To give you a quick overview of our work, we describe some of the highlights in more detail:
Transformer Support: We implement streaming dataflow scaled dot-product attention, which, alongside additional infrastructure and streamlining improvements introduced to FINN+, allows us to explore deployment of quantized Transformer-style models.
C++ driver: This optimized driver implementation enables the deployment of FINN accelerators in the datacenter (Alveo platform) without any unnecessary performance overhead.
Multi-FPGA: To enable larger networks on FPGAs, we need to scale beyond a single FPGA. When connecting FPGAs together, we get access to a much greater pool of resources, making it possible to synthesize larger models than previously possible. On top of that, we can potentially also unroll models further than would be possible otherwise. Currently, this is implemented using the AuroraFlow project, which adds support for low-latency, direct optical connection communication between devices, but the extension to other communication frameworks is possible.