Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

11
  • 2
    You should read the linked blog post about the deal. It says All products based on models that consume public Stack Overflow data are required to provide attribution back to the highest relevance posts that influenced the summary given by the model. There are a lot of problems with this, but attribution isn't one of them. A bigger problem is that the system seems to remove any reason someone might have to interact with SO as far as I can tell. I guess with fewer people asking questions, that will free everyone up to clean up the existing data with no compensation.
    – ColleenV
    Commented Mar 8 at 19:36
  • @ColleenV eh, that's not actually as clear cut as you're stating. It also says google will be able to use the SO data to train their models, code resulting from their models that are now trained on SO data likely won't directly cite training material. I've only seen them state that they'd cite the sources that are being summarized.
    – Kevin B
    Commented Mar 8 at 19:40
  • @ColleenV - I am guessing you are not familiar with the way that the "Ai" search was implemented at Stack Overflow. Attribution needs to be clear. It cannot be hidden in a click to discover zone where it only lists the "relevant" source as a whole without including the actual citation. Moreover, when looking at situations which involve code, the algorithm for relevance tends to weigh words higher than code which is incorrect. I read the linked material. I am familiar with the situation. It is a problem.
    – Travis J
    Commented Mar 8 at 19:54
  • @KevinB The blog says all products based on models that consume SO data, not just summarizers. It doesn't mention details of how it will be attributed. I assume since it will be done by Google, SO's stab at it is irrelevant. Don't get me wrong; I expect this is all going to go poorly. However I don't think the post here clearly communicates why the assurance of attribution in the blog is insufficient.
    – ColleenV
    Commented Mar 8 at 20:10
  • eh, no, in every case where it says attribution is a requirement, it's tied to summaries. "All products based on models that consume public Stack Overflow data are required to provide attribution back to the highest relevance posts that influenced the summary given by the model."
    – Kevin B
    Commented Mar 8 at 20:18
  • @KevinB What other products produced from a model require attribution if they aren't a summary of information from a different source? I read "summary" as "result".
    – ColleenV
    Commented Mar 8 at 20:21
  • 1
    as far as i'm aware, no model is currently capable of citing source material, in every case where there are sources the sources were decided before the prompt.
    – Kevin B
    Commented Mar 8 at 20:23
  • @KevinB That's irrelevant. The agreement (purportedly) says they have to provide attribution. If they can't they just wasted a whole lot of money. Frankly, a contract that insists on AI attribution is better than the current state of copyright law around AI right now.
    – ColleenV
    Commented Mar 8 at 20:25
  • 1
    I don't think we'll ever see the actual agreement, so we can't know whether or not an exception was carved out for training.
    – Kevin B
    Commented Mar 8 at 20:38
  • @KevinB I don't understand what you mean. I can train all the models I want on SE data and use it to write software. Must people attribute the SO answers that helped them when they release their software? You only have to provide attribution if you're copying/sharing the information. Not if you're learning from it. I suppose you could argue that a model using the data must be released under the same CC BY-SA license if it is released, but I doubt Google will release the model. They'll just sell access to it through an interface.
    – ColleenV
    Commented Mar 8 at 21:08
  • 4
    @ColleenV - Proper attribution is defined in the license agreement, and that is what must be provided. There needs to be direct citation for works used; that is the license. Without direct citation, it is plagiarism unless the entirety of the work is remixed solely from one source and the source is referenced (which isn't the case here). Ai remixes from multiple sources therefore it must explicitly cite those sources discretely in order to abide by the license.
    – Travis J
    Commented Mar 8 at 22:51