The MAMBA design transformer having a language modeling head on leading (linear layer with weights tied for the input
This dedicate will not belong to any department on this repository, and could belong to a fork https://k2spiceshop.com/product/liquid-k2-on-paper-online/