First of all. You could make facts a token value in an LLM if you had some pre-calculated truth value for your data set.
An extra bit of labeling on your training data set really doesn’t help you that much. LLMs already make up plausible looking citations and website links (and other data types) that are actually complete garbage even though their training data has valid citations and website links (and other data types). Labeling things as “fact” and forcing the LLM to output stuff with that “fact” label will get you output that looks (in terms of statistical structure) like valid labeled “facts” but have absolutely no guarantee of being true.
An extra bit of labeling on your training data set really doesn’t help you that much. LLMs already make up plausible looking citations and website links (and other data types) that are actually complete garbage even though their training data has valid citations and website links (and other data types). Labeling things as “fact” and forcing the LLM to output stuff with that “fact” label will get you output that looks (in terms of statistical structure) like valid labeled “facts” but have absolutely no guarantee of being true.