go to ORKG: http://orkg.org/orkg/predicate/P157002

attention

The main types of attention mechanisms used in SLMs include: 1) Multi-Head Attention (MHA), widely used in transformer models 2) Multi-Query Attention (MQA) using a single shared query across all heads but allowing different key and value projections 3) Group-Query Attention (GQA) sharing query representations across multiple heads while allowing separate key and value representations, and 4) Multi-Head Latent Attention (MLA) using low-rank key-value joint compression, requiring much less Key-Value (KV) Cache