

99.999% would be fantastic.
90% is not good enough to be a primary feature that discourages inspection (like a naive chatbot).
What we have now is like…I dunno, anywhere from <1% to maybe 80% depending on your use case and definition of accuracy, I guess?
I haven’t used Samsung’s stuff specifically. Some web search engines do cite their sources, and I find that to be a nice little time-saver. With the prevalence of SEO spam, most results have like one meaningful sentence buried in 10 paragraphs of nonsense. When the AI can effectively extract that tiny morsel of information, it’s great.
Ideally, I don’t ever want to hear an AI’s opinion, and I don’t ever want information that’s baked into the model from training. I want it to process text with an awareness of complex grammar, syntax, and vocabulary. That’s what LLMs are actually good at.
According to the Programme for the International Assessment of Adult Competencies, 2013, the median score for the US was “level 2”. 3.9% scored below level 1, and 4.2% were “non-starters”, unable to complete the questionnaire.
For context, here is the difference between level 2 and level 3, from https://en.wikipedia.org/wiki/Programme_for_the_International_Assessment_of_Adult_Competencies#Competence_groups :