• 1 Post
  • 19 Comments
Joined 1 year ago
cake
Cake day: June 1st, 2023

help-circle













  • Both are concerning, but as a former academic to me neither of them are as insidious as the harm that LLMs are already doing to training data. A lot of corpora depend on collecting public online data to construct data sets for research, and the assumption is that it’s largely human-generated. This balance is about to shift, and it’s going to cause significant damage to future research. Even if everyone agreed to make a change right now, the well is already poisoned. We’re talking the equivalent of the burning of Alexandria for linguistics research.