Why is it hard to evaluate GenAI applications?

PaulHoule 17 hours ago

What I noticed in the 2010s was that there was very little enthusiasm to do evaluation for information retrieval or classical ML even though it was often straightforward to do.

Interest in eval has skyrocketed just like vector databases have in the LLM age. Finally people see enough value in an ML system to be worth doing eval work, but... it's much harder!

gytrcrt 16 hours ago

I think the difference is: 1. there was no hallucination from information retrieval or classic ML back 2010s 2. there was way lower engagement from general public or even regulator on classic ML system. aka, people were not able to directly "talk" to a ML system like ChatGPT
the 2 points combined drive way more scrutiny on GenAI models/apps.