Holistic, Reliable and Scalable Benchmark for Text-to-Image Models:
  1. Holistic skills evaluation. Rather than focus on isolated metrics such as accuracy, we measure 13 skills, which could be categorized into five critical skills; accuracy, robustness, generalization, fairness, and bias.
  2. Broad scenarios coverage. HRS-Bench covers 50 applications, e.g., fashion, animals, transportation, food, and clothes.
  3. Standardization. We propose a unified benchmark, where we fairly evaluate the existing models across a wide range of metrics.
  4. Holistic prompts generation.