Mistral 7B is a 7.3B parameter model that uses Grouped-query attention (GQA) for faster inference and sliding Window Attention (SWA) to handle longer sequences at smaller cost.