Holistic, Reliable and Scalable Benchmark for Text-to-Image Models:
- Holistic skills evaluation. Rather than focus on isolated metrics such as accuracy, we measure 13 skills, which could be categorized into five critical skills; accuracy, robustness, generalization, fairness, and bias.
- Broad scenarios coverage. HRS-Bench covers 50 applications, e.g., fashion, animals, transportation, food, and clothes.
- Standardization. We propose a unified benchmark, where we fairly evaluate the existing models across a wide range of metrics.
- Holistic prompts generation.
9 Models
50 Scenarios
Basic Scenarios
- Jobs/Custome
- Animals
- Transportation
- Food
- Currency
- Home Devices
- People
- Cleaning tools
- Maintenance tools
- Cooking Items
- Furniture
- Home Accessories
- Building
- Nature
- Rooms
- Sports
- Clothes
- Personal tools
- Wild Life (Animals)
- Marine Life
- Ceremony
- Toys
- Facial Expressions
- Geometry Shapes
- Morals
- Facts
- Body Parts/Organs
- Anonymization
- Plants
- Hair Style
- Musical Instruments
- Fitness/Sports Equipment
- Sign
- Visual Text
- Medical Supplies
- Weapons and Military Equipments
17 Metrics