Stanford University researchers unveiled an AI model they say can analyze decades of property records in just a few days at little expense to weed out racist language, and they will offer the tool for free across the state and around the country.
Santa Clara County alone has 24 million property records, but the study team focused mostly on 5.2 million records from the period 1902 to 1980. The artificial intelligence model completed its review of those records in six days for $258, according to the Stanford study. A manual review would have taken five years at a cost of more than $1.4 million, the study estimated.
This is an awesome use of an LLM. Talk about the cost savings of automation, especially when the alternative was the reviews just not getting done.
Specialized LLMs trained for specific tasks can be immensely beneficial! I’m glad to see some of that happening instead of “Company XYZ is now needlessly adding AI to it’s products because buzzwords!”
LLMs are bad for the uses they’ve been recently pushed for, yes. But this is legitimately a very good use of them. This is natural language processing, within a narrow scope with a specific intention. This is exactly what it can be good at. Even if does have a high false negative rate, that’s still thousands and thousands of true positive cases that were addressed quickly and cheaply, and that a human auditor no longer needs to touch.
This is an awesome use of an LLM. Talk about the cost savings of automation, especially when the alternative was the reviews just not getting done.
Specialized LLMs trained for specific tasks can be immensely beneficial! I’m glad to see some of that happening instead of “Company XYZ is now needlessly adding AI to it’s products because buzzwords!”
Given the error rate of LLMs, it seems more like they wasted $258 and a week that could have been spent on a human review.
LLMs are bad for the uses they’ve been recently pushed for, yes. But this is legitimately a very good use of them. This is natural language processing, within a narrow scope with a specific intention. This is exactly what it can be good at. Even if does have a high false negative rate, that’s still thousands and thousands of true positive cases that were addressed quickly and cheaply, and that a human auditor no longer needs to touch.
What do you believe would make this particular use prone to errors?
The use of LLMs instead of someone that can actually understand context.
Did you see something that said it was an LLM?
Edit: Indeed it’s an LLM. They published the model here: https://huggingface.co/reglab-rrc/mistral-rrc