-444

Today, we announced an exciting new partnership with Google to bring Google's Gemini to Stack Overflow and to provide Stack Overflow content directly within Google Cloud. The story of this partnership started about 16 years ago - with you, the community that made Stack Overflow what it is: the internet’s predominant resource for high-quality curated technical Q&A. This resource is ubiquitous and invaluable today, but it took countless hours of hard work and dedication from many community members to get it to this point.

In an ever-evolving world, we’re always looking for new ways to get this repository in front of the world’s technologists: those who can benefit from it and contribute back to it. And with this new joint initiative with Google, we’re laying the groundwork for this reality.

As Ryan said in our announcement today on the blog, with this and future partnerships, “Our mission is to set new standards with vetted, trusted, and accurate data that will be the foundation on which technology solutions are built and delivered to our users.”

You can read more details in the blog post.

53
  • 364
    Don't get me wrong, but I am sure this post would a lot better received by the community by keeping it factual and refrain from repeating these over-exaggerating marketing buzzwords from the press blog. The style of this post alone gives me the impression the company has lost any sense for how they should communicate with us to be taken seriously.
    – Doc Brown
    Commented Feb 29 at 15:10
  • 110
    I agree with Doc Brown. If you want to put your business and marketing buzzwords in the blog posts and link to them, that's fine. But you should put the key content in community-friendly terms in the corresponding Meta post. Commented Feb 29 at 15:21
  • 82
    If this is about Stack Overflow, why was this posted on meta.stackexchange?
    – Dharman
    Commented Feb 29 at 16:11
  • 229
    So what does this partnership actually entail? This announcement fails to explain what it is.
    – Dharman
    Commented Feb 29 at 16:12
  • 68
    Yeah, to echo the above comments, I have no idea what this announcement actually means.
    – Stuart F
    Commented Feb 29 at 17:15
  • 89
    SE are selling what we voluntarily contributed to SE communities and we do not even receive attribution or an mention. Now you are putting garbage (aka. AI) on the site after years of filtering and analysis. This is inhumane, disrespectful to the entire community. Commented Feb 29 at 17:33
  • 28
    "to provide Stack Overflow content directly within Google Cloud" this is pretty clear. "to bring Google's Gemini to Stack Overflow" this is not clear at all, not even after reading the blog post. I guess you were forced to write something, trying to counter the sellout. Just "google will bring wine" would have been fine.
    – GrafiCode
    Commented Feb 29 at 20:12
  • 50
    Honestly, what's the point of this post? You haven't said anything meaningful, and the link to the blog post is already in the right-hand column on all the sites AFAICT...
    – Dan Mašek
    Commented Mar 1 at 3:52
  • 31
    "The story of this partnership started about 16 years ago - with you..." This reads strange. How does it come that nobody shared this particular story over the 16 years. And what does "we’re laying the groundwork for this reality" mean? It just sounds like some extra filling words to me. But especially: what does "provide Stack Overflow content directly within Google Cloud" exactly mean? Commented Mar 1 at 7:46
  • 21
    AI is the new "tech bubble" - crash is coming soon!
    – MT1
    Commented Mar 1 at 15:33
  • 13
    I know there are a lot of questions. I've been responding to what I can and will continue to keep an eye on this post and respond to questions that I can over the next week. As I mentioned, we can't talk about contracts or financial details. We will be sharing, as I mentioned, more details about upcoming product initiatives both for SO and the larger Stack Exchange Network in the upcoming weeks. Thank you for your patience.
    – Rosie StaffMod
    Commented Mar 1 at 17:35
  • 56
    Its been pretty evident for a while now that the commitment is indeed to AI, not so much the community/ies.
    – Sayse
    Commented Mar 1 at 18:56
  • 38
    This assumes that Google is "Socially Responsible" simply because they say so. I see evidence that is contrary to this claim. Is our goal to be socially responsible (what exactly does that mean?) or is it to get our work done?
    – Jay Walks
    Commented Mar 5 at 4:01
  • 37
    "The story of this partnership started about 16 years ago - with you, the community that made Stack Overflow what it is". This makes no sense. The story of the partnership with Google began with...us? This is just empty verbage that is supposed to vaguely make us feel better about having our content mined for your profit. Commented Mar 7 at 1:07
  • 25
    I've read the post 5 times and I still don't get what's actually asked / communicated here. Commented Mar 7 at 8:15

30 Answers 30

307
+50

So what exactly do we get for curating a multi-billion dollar corporation's data set for them, other than they will make us something cool that we can pay them for?

As of March 2024 Alphabet (Google) has a market cap of $1.662 Trillion. They could afford to pay people to curate their model data if they refocused their sprawling HR bureaucracy.

20
  • 107
    This. Especially since they've turned off access to OpenAI, Amazon, and appear to want to put restrictions on the use of the data dumps for others. We are literally curating Google's data set for free. Commented Feb 29 at 14:57
  • 2
    @ThomasOwens "they've turned off access to OpenAI, Amazon" Where did you read it? I thought OpenAI/Amazon/etc. could still use SE dumps (as many others are doing to train LLMs). Commented Feb 29 at 15:49
  • 23
    @FranckDernoncourt The access was turned off via robots.txt. GPTBot and Amazon's crawler (if they respect robots.txt) are prohibited. They may still use the data dumps, but the company has also mentioned "guardrails", which based on the context, alludes to trying to limit the use of the data by people training AI models. No additional information on the guardrails has been released since initially mentioned, though. Commented Feb 29 at 15:53
  • 7
    @ThomasOwens Thanks, got it. Indeed, idk how SE intends to limit the use of the SE dump by people training AI models: How can SE gate access to the Dump that will allow individuals access to the data while preventing "misuse" by for-profit organizations? Commented Feb 29 at 15:58
  • 6
    @FranckDernoncourt I would agree with that. But it still boils down to the "what's in it for me" question. As of now, we're doing curation on data that the company is selling to Google and is limiting (and wants to continue to limit) others from using it. We've said that we don't want these AI-enabled features and we have said what we want the company to implement, yet we aren't getting those. So what are we getting and what about the stuff we keep asking for that gets ignored? Commented Feb 29 at 16:01
  • 22
    Is there anyone who fails to recognise the company's modus operandi by now? First, post a "feel-good news/feature" on Meta. Gain upvotes. Gain some of that lost trust. Then post a possible future feature inviting discussions and feedback from the userbase. Wait a maximum of five weeks and drop a bombshell without any prior warning. Lather, wash, rinse and repeat. Commented Mar 3 at 11:45
  • 9
    @Mari-LouAСлаваУкраїні I fail to recognize any pattern or direction at all. They seem to just be running around aimlessly, chasing "AI-something-something". Because AI. Like any bloated hype before it, there's a always a pyramid scheme where everyone is selling sand castles to those further down the pyramid, until the whole thing collapses and everyone still on the hype train loses.
    – Lundin
    Commented Mar 6 at 9:30
  • 2
    @Lundin this pattern of behaviour: positive news, asking feedback to make users believe their opinions are valued, followed by bombshell was happening before GPT appeared. Commented Mar 6 at 9:35
  • 2
    @Mari-LouAСлаваУкраїні Maybe, but they mindlessly tossed everything aside when it did, for the chance to become yet another tulip maniac partaking in the pyramid scheme. Maybe if they are smart enough, maybe they can grab what they can then bail before the collapse. But historically, few manage to do that, since greed tends to tempt everyone into staying and investing even more.
    – Lundin
    Commented Mar 6 at 9:43
  • 6
    @Script47 Sometimes getting an answer isn't the point. Sometimes asking a question is an effective way to express something or provoke discussion about a topic.
    – ColleenV
    Commented Mar 7 at 20:50
  • 5
    A t-shirt with "I've done $100K of free work for billionaires and this make me so cool and 1337" printed on it?
    – tell
    Commented Mar 10 at 19:22
  • 3
    @ColleenV I might - and used to - echo that same sentiment if something actually came from the discussion. Most of the time the discussion is had and SO just lets things blow over then moves onto doing what they originally planned to do.
    – Script47
    Commented Mar 11 at 4:10
  • 9
    Could we, at bare minimum, get access to any and all models that used our data without cost, indefinitely? This wouldn't even put a dent in the cash model they have setup, based on the number of users contributing who would also use said models. In short, give us all the Pro API stuff and anything else that comes up
    – Organiccat
    Commented Mar 11 at 16:38
  • 7
    That's the galling part, isn't it @Organiccat. The people whose contributions get used don't get anything, not even an upvote. I don't count the attribution required by the licensing. People are entitled to that. The people who get help from the AI tools are going to think "Wow, this AI that Google wrote is awesome." not "Gee, I'm grateful Organiccat took the time to write such a detailed answer, and someone else tagged it properly, and someone else updated it after 5 years so it works for me." People should earn some sort of tool credits or something when their work is used.
    – ColleenV
    Commented Mar 11 at 17:01
  • 1
    Right on cue, the "feel good news" 52 upvotes against 7 downvotes so far... Commented Mar 30 at 12:30
221

I hate to be that guy but... most of the other pieces of news about social software companies selling data to AI companies talked about eye-wateringly high amounts of money getting paid.

Enter image description here

I'm hoping there's a financial or otherwise incentive here - and some of that goes back to the community in terms of support.

And while I see someone asked about money - I'll ask, does at least some of this include ensuring that tech debt for the foundational software behind Stack exchange gets paid off, and that resources are available for the community that generates the high quality Q&A, considering the recent losses of key staff?

Practically - what's in it for us? What's here that the folks who create, curate and occasionally cheerlead have to be excited about this announcement?

18
  • 103
    I'm all for us getting financially compensated since we're the ones who authored (and curated) the content being sold.
    – TylerH
    Commented Feb 29 at 14:57
  • 19
    that's the real question... what's in it for us? Commented Feb 29 at 15:09
  • 62
    I accept the only way that I'll ever get financially compensated for this is to get hired. That's a whole different kettle of fish. I just want my communities to get the support they need, and hopefully have some of the technical and social debt paid off Commented Feb 29 at 15:13
  • 12
    "Practically - what's in it for us?" Well, at least in theory the idea seems to be that attribution is what's in it for us: instead of models that just munch together information from various posts and not giving any credit at all to the authors of those posts, they're trying to push for models that do give credit to the users who wrote the posts. How meaningful and realistic that goal is is doubtful, but I'd say in principle it is a good goal. Commented Feb 29 at 15:50
  • 6
    @leftaroundabout except... this "attribution" is just attributing the sources that are being fed in to be summarized. They aren't and can't attribute training material.
    – Kevin B
    Commented Feb 29 at 15:53
  • 1
    @KevinB again, I'm doubtful about the efficacy but "required to provide attribution back to the highest relevance posts that influenced the summary given by the model" does say that the authors of the posts that were important for a given AI-response are to receive credit for it. Commented Feb 29 at 15:59
  • 8
    It implies they can do things they literally can't do. They're creating a search engine that then sends the results off to an LLM to be summarized, that's it. They aren't doing some comparison across 35 million answers and citing the one that looks the most similar to the LLM output.
    – Kevin B
    Commented Feb 29 at 15:59
  • 3
    I mean, attribution is nice. Its also covered by other posts, and honestly from the past few months, I'm more interested in practical benefits for the community, especially in these specific aspects. Commented Feb 29 at 16:06
  • 29
    The practical "benefits" will be less curation being done. Less voting happening to push bad content down and good content up. Less good useful content getting improved/created. A continued decline in content creation/voting activity. The most we're going to see out of this is an LLM Chatbot replacing the search bar.
    – Kevin B
    Commented Feb 29 at 16:34
  • 2
    Well maybe, but I also want to see what the company has to say, or not if they stay quiet, and what the practical effects, or lackthereof are. Commented Feb 29 at 16:35
  • 7
    Volunteers contributed with 99.9% of what all SE communities are today, but we're not owners of our own content, and y'all know that SE is willing to sell what we made and give us NOTHING in return. Not even an mention or an "thank you". Commented Feb 29 at 17:30
  • 3
    I vaguely recall there being a post or comment by phillipe a while ago promising that selling SE data would mean investing it back in dev time, but I'll need to go dig it up (later, if I remember to).
    – starball
    Commented Feb 29 at 17:34
  • 9
    "does at least some of this include ensuring that tech debt for the foundational software behind Stack exchange gets paid off"... you will know when the next "Reducing workforce" posts hits us.
    – SPArcheon
    Commented Mar 1 at 9:58
  • 5
    The real question is why anyone would pay for anything that's free, as per the licensing format. As long as the source is stated, all content can be used freely. If the source isn't stated, the content cannot be used at all. SO doesn't own the content, they own the right to publish it.
    – Lundin
    Commented Mar 6 at 9:20
  • 6
    I contribute to SE with the expectation that those contributions go into the creative commons to benefit all of humanity. I don't think I should get compensated for that—the data should be freely available to everyone, including for AI training. Luckily, I'm pretty sure it still is: data.stackexchange.com IMO, the real question is, what is Google even getting when our SE content is already free to everyone? Commented Mar 7 at 2:07
181
+50

I noticed that the robots.txt for all network sites has been updated. Google-Extended is no longer blocked. Looking at Internet Archive Wayback Machine captures, the last time Google-Extended was blocked on Stack Overflow was Tue, 02 Jan 2024 19:07:37 GMT. By Tue, 02 Jan 2024 21:41:45 GMT, it was no longer blocked in robots.txt.

Previously, I raised concerns about the process by which bots and crawlers are blocked and unblocked, especially since it's generally done silently - without discussion or announcement.

So, this brings up several questions:

  • Does this agreement with Google include terms preventing others from crawling SE data for AI training? Is this an exclusive deal or is the company also open to other agreements with other companies to give access to SE data for AI training?
  • If no, why are OpenAI and Amazon blocked from crawling and training on SE data? If yes, what is the duration of this agreement and why were all of these bots blocked much earlier? See the earlier question on the process for making decisions on blocking and unblocking crawlers and other bots.
  • Are there updates on the "guardrails" regarding the Data Explorer and the data dumps? This was announced previously and there were allusions to restrictions on who could use this data and how they could use it, especially in AI model development.
  • What is SE's commitment to ensuring our contributions are available to the broadest possible extent under the Creative Commons BY-SA license?

I do want to be clear: SE does have every right to control who can access their platform and how they can access it, including placing limits on crawlers, API usage, and more. However, the content being restricted is composed of the effort of countless people to create and curate. I firmly believe that contributors should have input into and visibility into how access to our contributions is controlled and how they are monetized by the company.

28
  • 74
    "SE does have every right to control who can access their platform and how they can access it." - SE does not, however, have a right to control who can access users' content. Creative Commons licenses are open to everyone. and also none of this licensing creates or implies any transfer of copyright. Commented Feb 29 at 16:15
  • 15
    Having a "right" to do something doesn't make doing it socially responsible.
    – Kevin B
    Commented Feb 29 at 16:17
  • 6
    @KevinB That is true. However, making decisions that affect millions of contributors and their work behind closed doors with no consultation also isn't socially responsible. Commented Feb 29 at 16:28
  • 18
    @KarlKnechtel That's not exactly true. To receive the CC license, you need to receive the work. Unless you post it elsewhere, SE can control who can read the platform and therefore receive the work under a CC license. So yes, they do have the right to control who can access user content via the platform. And I would expect that they block known bad actors from accessing the platform, especially if those bad actors degrade the experience for others. Commented Feb 29 at 16:29
  • 10
    @KarlKnechtel CC-BY-SA 4.0 places no requirements on access whatsoever. SE is perfectly free to stop sharing our content entirely.
    – OrangeDog
    Commented Mar 1 at 18:57
  • 2
    My point is: if you care about others having free and equal access to your content, repost it somewhere that you control. Commented Mar 1 at 20:02
  • 1
    @KarlKnechtel Okay but what about preventing access to people who violate the Creative Commons license, and don't provide proper credit to the original source (the users who created that content)? I'm not saying that anyone should prevent access but when one person violates a contract, the other end of the contract is no longer enforceable.
    – mchid
    Commented Mar 5 at 20:38
  • @mchid That part is not my area of expertise; presumably that's the point where lawyers could get involved. But I assume that the Creative Commons NPO has something to say about it. Commented Mar 5 at 20:54
  • 3
    @ThomasOwens True, but since they publish a data dump EvilCorp can just use that and not be bound by robots.txt. Commented Mar 6 at 20:42
  • 2
    @KarlKnechtel Yeah, I just feel like personally, I radically support free and open access but when people abuse that and don't credit the source, particularly for profit, they can get rekt.
    – mchid
    Commented Mar 6 at 21:27
  • @OrangeDog yes, of course they can. But ONLY if they stop doing it entirely. As long as the site is up, and a Q&A is relevant to their use, they have an obligation to keep it up on the public site and/or dumps for the sake of attribution. Probably both, due to practical/techical limitations of both formats.
    – BryKKan
    Commented Mar 14 at 8:37
  • @FedericoPoloni that's a fair observation. But given the time between the dumps, it's hard to claim they FULLY replace the access they aim to block through robots.txt
    – BryKKan
    Commented Mar 14 at 8:41
  • @BryKKan no, they are completely free to only provide copies to whomever they choose. They could block half the world from half the content, and that wouldn't be in violation of any license. There is also no legal obligation whatsoever to produce those dumps, nor allow any automated access. The attribution is already available through the web interface to the people who are accessing the content.
    – OrangeDog
    Commented Mar 14 at 9:28
  • That's an overly simplified view, and ultimately just not accurate. They are the original and often sole publisher. Removing "bad" Q&A is one thing. But as long as some content is considered "useful", SE can't remove it from the public site without constructively violating attribution and "share-alike" provisions. The situation with the dumps is even more complicated. In that case you're partially right. They only have to upload the dumps to avoid other obligations. More here: meta.stackexchange.com/a/390156/395778
    – BryKKan
    Commented Mar 17 at 14:11
  • @BryKKan That is very wrong. Just because I give you a work licensed CC BY-SA doesn't mean you have to distribute it. It just means that if you do distribute it or create a derivative work, you have to share alike. There's nothing wrong with contributed content being removed at any point in time. But anyone that did receive it before it was removed can continue to share it under the terms. Commented Mar 17 at 20:11
147

The story of this partnership started about 16 years ago - with you, the community that made Stack Overflow what it is: the internet’s predominant resource for high-quality curated technical Q&A.

This is approximately as tone-deaf as rolling up to a meeting of Georgists and opening with "the story of Monopoly's commercial success started about 120 years ago, with Elizabeth Magie and her passionate critique of rent-seeking."

AI is antithetical to our understanding of quality, curation and Q&A. Nobody involved at the beginning could have predicted what AI would look like in 2024. Users didn't consent to this, either. You can say that the content license does not require such consent, but this misses the point entirely.

This resource is ubiquitous and invaluable today, but it took countless hours of hard work and dedication from many community members to get it to this point.

Then why is the company treating it so flippantly? Why is the company advertising our own content (as repackaged by this venture) back to us, while doing nothing to resolve or even acknowledge countless problems pointed out by the userbase that impede human interaction with the system - many of them nearing a decade old if not older?

What reason can you offer as to why a sensible programming expert should choose Stack Overflow as a platform for sharing knowledge in 2024, when personal blogging is easier than ever and numerous alternatives to the Stack Exchange Q&A model exist where the staff actually respect and listen to their users?

3
  • 24
    I agree with most of what you say, and then you get to codidact... while I support the goals of that project, I think it's a stretch to imagine it as an alternative at this point given that the staff there outnumber the users. It's not a viable competitor at this point. Commented Feb 29 at 16:36
  • 25
    @BryanKrause I try to put in a good word where I can. I believe strongly in the project, so I'm not willing to engage in nihilism about network effects. Things change when people change them. Commented Feb 29 at 19:53
  • 12
    @BryanKrause It’s certainly a stretch to say that the staff outnumber the users. That’s not true. :P But you’re right in spirit, though, more users are welcome. Especially so, more questions. High quality ones, preferred. Commented Mar 1 at 11:05
102

Does "strategic partnership" just mean cooperation, or is there actual money flowing from Google to Stack Overflow?

It was a bit hard extracting this information from the marketing fluff.

7
  • 12
    Not unlikely actual money. The Facebook-Stack Overflow partnership (announced in August 2011) famously allegedly involved boatloads of money (figuratively). Not surprisingly, it flooded Stack Overflow with very low quality off-topic content. Commented Mar 1 at 4:37
  • 14
    I'm probably going to be unpopular here saying this, but for me at least this is not unlike editing Wikipedia or something like that. (Except with a lot fewer edit wars,.) Even if content gets resold by someone somewhere, I'm fine with that as long as it's also freely available. And someone has to somehow pay for the servers that get a lot of hits, after all. (As well as all the devs for the software that powers those.) Commented Mar 1 at 5:53
  • 3
    Reddit also partnered with Google for (allegedly) $60M a year, so I wouldn't be surprised here either.
    – GammaGames
    Commented Mar 1 at 20:22
  • 2
    @Dolphin613Motorboat: absolutely, I’m very happy with StackOverflow Inc getting revenue from the network. But as you say, the reason to accept that is that the company should use that money to support the site (servers, software, and all) — and at the moment, it seems to be putting a very minimal amount of resources into the aspects of the site we benefit from.
    – PLL
    Commented Mar 2 at 9:40
  • 1
    @GammaGames From Croesus to Prosus. Cha-ching! Commented Mar 6 at 18:43
  • 5
    "Does "strategic partnership" just mean cooperation" Ask yourself, realistically, why the hell would Google want to "partner" with StackExchange? For what? This is a silent merger; an acquistition of StackExchange by Google. Google has all the leverage, SE has user authored user content/data for Google to "train" it's "socially responsible" computer program. Nobody buys that... Commented Mar 7 at 1:50
  • 3
    @Dolphin613Motorboat I am more than fine with my content being used to train AIs. I posted it here so that people could learn from it, and if "people" now includes ChatGPT, so be it. As far as I'm concerned it's still serving its purpose. My objection is to the gatekeeping, to the "let's put up guardrails and block AIs unless the owners of said AIs pay us". That's when I start to raise the red flag. When it's no longer "publicly available content" but is instead "content viewable specifically on stackoverflow.com and by their sponsors and no one else". Commented Mar 8 at 21:25
78

As such, questions on Stack Overflow (whether by a community member or assisted and curated by AI) are posted only after human review.

The blog post mentions in parentheses here the possibility of AI-assisted or curated questions. Does this indicate a change in policy regarding the prohibition of AI-generated content that is currently in effect on Stack Overflow?

6
  • 29
    The API partnership does NOT change the ban on AI-generated content that some SE sites have put in place.
    – Rosie StaffMod
    Commented Feb 29 at 15:32
  • 58
    I'm confused about all this. So, ChatGPT = Bad & Evil, but Google Gemini = Great Business Opportunity? I believe that there is a conflict of interests somewhere, but, not really being a lawyer, I cannot understand where... Commented Feb 29 at 21:51
  • 10
    This, exactly, was my first thought. Does this mean that Google Gemini will be answering questions via AI as posted? If so, I've got points that I'll spend downvoting them, and I'll be flagging for deletion as AI generated since that's banned on most of the sites I frequent. Heck, we're in the process of getting the "AI generated answers aren't acceptable here" message enabled for at least a couple of sites. I'm guessing it'll take a loooong time for that to actually appear, now. :(
    – FreeMan
    Commented Mar 1 at 12:45
  • 7
    @Rosie Can you state explicitly that there are no plans to have A.I.-generated content in the QNA of StackOverflow? Just gives peace of mind in all these confusions yk. Commented Mar 6 at 20:21
  • 1
    @Rosie In short, the AI ban applies to all LLMs except Google's? Commented Mar 7 at 7:33
  • 2
    Human review, like when a human reviews the EULA on an iPhone before hitting "Accept", right? Commented Mar 13 at 4:58
65

https://stackoverflow.co/company/press/archive/google-cloud-strategic-gen-ai-partnership:

In addition, Stack Overflow plans to leverage Google Cloud’s state-of-the-art AI capabilities to improve their community engagement experiences and content curation processes.

What are the planned features for SO, and what's their release timeline?

9
  • 41
    "What are the planned features for SO, and what's their release timeline?" - ironically given the announcement, likely very irresponsible features launched irresponsibly without consideration for the consequences of their features
    – Zoe
    Commented Feb 29 at 14:16
  • 3
    We'll be sharing updates on planned features and their timeline in the near future.
    – Rosie StaffMod
    Commented Feb 29 at 14:23
  • 76
    You can tell times have changed when staff can’t even make the “6-8 weeks” joke. Commented Feb 29 at 14:25
  • 59
    Remember kids, this is what we're getting instead of onboarding features, because who needs to understand how the site works anyway?
    – Zoe
    Commented Feb 29 at 14:26
  • 2
    I've tried "Ask with AI" once or twice here and won't ever click it again. I am using google to find SO posts, crossing fingers if Google can fix SO search. Other than that I doubt we (users of SO) will get anything useful here, it's rather Google may add to its portfolio something cool.
    – Sinatr
    Commented Feb 29 at 17:17
  • 2
    @Zoeisonstrike how is this different from any of the other "features" launched and feature requests ignored over the lifetime of SE? (i.e., I agree with you 100%, but don't see anything different from past behavior.)
    – FreeMan
    Commented Mar 1 at 12:46
  • "Google may add to its portfolio something that used to be cool." <- fixed that for ya, @Sinatr
    – FreeMan
    Commented Mar 1 at 12:47
  • 4
    @FreeMan When they've launched other features, those features at least can't be repurposed to misinformation generators with a strategically crafted prompt
    – Zoe
    Commented Mar 1 at 12:51
  • @GeorgeStocker Spooky Commented Mar 6 at 18:38
55

The phrases "partnership with Google" and "socially responsible" are mutually exclusive.

Google (or Alphabet) is a corporation engaged in mass surveillance of the public, including what should be private communications, not only for psychologically manipulating people into buying things, but also in the service of government intelligence agencies. It also engages in political censorship or semi-censorship of online content in web page, video and other search results - by different rules in different world states, including the US of course.

Several other answers point out the details of how an SE-Inc-Google deal is indeed problematic and questionable, but one could have just safely assumed such details were to be found somewhere, given the general principle.

8
  • 1
    Well I am glad the conversation has managed to remained grounded and without massive exaggerations!
    – Ant
    Commented Mar 9 at 16:38
  • 12
    @Ant: I'm sorry for veering away from the gritty realism of our 16-year-long partnership with Google described in the post.
    – einpoklum
    Commented Mar 9 at 16:44
  • 4
    Yeah, I came here to post something similar: "How can Google and socially responsible be used in the same sentence?"
    – Conrado
    Commented Mar 11 at 11:32
  • 2
    @Conrado: Well, you just managed to do it :-P
    – einpoklum
    Commented Mar 11 at 20:05
  • 3
    The question must be asked: Since Google Search worked for so many years without claiming "A.I." had anything to do with the algorithm, why is "A.I." all of a sudden in corporations' advertising slogans? Commented Mar 11 at 21:15
  • @einpoklum As a matter of fact, he didn’t. He mentioned it, not used it. Commented Mar 13 at 15:41
  • 2
    @Ant Exaggerations? Tell that to Amnesty International.
    – wizzwizz4
    Commented Mar 14 at 15:02
  • 3
    @Ant what is said in this post is known, provable and a completely reasonable summary. Google really does do these things, and really is that bad about them. (Although most of the "service [for] government intelligence agencies" is probably in the form of "political censorship".) Commented Mar 15 at 14:29
45

Our mission is to set new standards with vetted, trusted, and accurate data that will be the foundation on which technology solutions are built and delivered to our users.

Where will this "vetted, trusted, and accurate data" come from?

While that description does apply to some highly curated parts of SO, most parts are more accurately described as a garbage heap. I somehow doubt that the company is suddenly changing track from the "more users are more better, quality is an afterthought" mantra of the last years to "let's ensure we only have high-quality contributions", and filtering out all the accumulated garbage seems improbable as well (keep in mind lots of it has a positive score).

2
  • Re "most parts are more accurately described as a garbage heap": Indeed. The garbage started to accumulate in 2010 (the start of Stack Overflow's Eternal September) Commented Mar 1 at 6:03
  • 11
    @This_is_NOT_a_forum at least in 2010 the company was still seeing this as an issue and was trying to do something against it. IMO the tipping point was removing the too broad / too localized / minimal understanding close reasons (ca. 2014 IIRC). Before that, the company trusted their curation-focused users to identify garbage questions and provided the tools to make them go away without much hassle, accepting that some in the grey area would be lost ("optimize for pearls, not sand"). Afterwards they tried to optimize for "welcoming" and not hurting the feelings of people asking bad questions.
    – l4mpi
    Commented Mar 1 at 9:30
41

From here:

In addition, developers using Gemini for Google Cloud will be able to access Stack Overflow directly from the Google Cloud console, bringing them greater access to information so they can ask questions and get helpful answers from the Stack Overflow community in the same environment where they already access Google Cloud developer services and manage cloud applications and infrastructure.

Does that mean that you plan to allow posting questions through a 3rd party interface? Potentially bypassing the guidance and quality filters we have in the ask wizard on the website?

7
  • 20
    That isn't how I interpret it. I'm interpreting it as people can use the Gemini interface to ask questions, and data from Stack Overflow (and maybe other SE sites?) would be surfaced via that interface. But clarity here would be good, since it is ambiguous. Commented Feb 29 at 14:25
  • 12
    @ThomasOwens, you are correct. People can use the Gemini interface and the Duet interface to ask questions and data from the SE Network would be surfaced via that interface.
    – Rosie StaffMod
    Commented Feb 29 at 17:23
  • 18
    @Rosie You may want to surface that in a place that is more visible than a comment. Perhaps edit the original blog post to explain that there. Asking questions is something that you can do through both the SO user interface as well as the Gemini user interface, so that gets ambiguous. Being clear that users can't use this Gemini integration to create new content is vital. Commented Feb 29 at 17:25
  • 9
    so... What's confusing about how you're (the SO team) wording this is it says it will allow developers to see and ask questions. If these are both one action and you're not actually allowing this interface to result in asked questions, then the wording here is wrong/missleading. The same wording was used with the OverflowAI VSCode extension, btw.
    – Kevin B
    Commented Feb 29 at 17:29
  • 2
    As well as the OverflowAI assistant that's currently in beta, it was described as including a question asking assistant that literally just makes up a question.
    – Kevin B
    Commented Feb 29 at 18:10
  • I would hope, for Gemini's sake, that it provides links to SE answers, and keeps the summary information it generates brief. It does not matter if the human Q&A are not perfect - that's to be expected and has not prevented SE/SO from being net-useful to date. The problem with any existing AI is that they cannot really add new knowledge, and they are imperfect at copying knowledge. In the realm of concept questions, what they are best at is linking to information sources. (However, for pattern matching tasks, they can be extremely useful, with an expert human in the loop as a controller. Commented Mar 10 at 17:45
  • @Rosie there's a confusing grammar mistake or typo in your comment that completely minimizes the value of your comment. Pls fix.
    – Fattie
    Commented Mar 12 at 1:34
35

What is the "added value" of this, if any? I mean to the communities that actually authored the content, not to the receiver of the Google dollar?

The content of this site represents countless years of experience, condensed into carefully worded answers to specific questions. There is already a mechanism to search out appropriate answers to questions, and it works.

The considerable risk of this idea is that our content, correctly perceived to be of very high value, becomes cattle-feed for a badly thought out tech experiment, and might actually end up doing more harm than good due to the shortcomings of the experiment.

This is NOT why I chose to spend time writing answers that share my technical knowledge here. I suspect that I am not alone in that sentiment.

No doubt when I joined the site there I ticked a box agreeing to some carefully worded legalise, and even if I didn't there would be little I could do. But this feels to me very much like an abuse of trust and good faith, and I suspect that I am also not alone in that.

4
  • I don't think much more than Our partnership with Google and commitment to socially responsible AI can be said about that.
    – dan1st
    Commented Mar 7 at 8:18
  • 2
    " resources will be put into the public platform." - that is the most that can be said? I disagree.
    – danmcb
    Commented Mar 7 at 9:15
  • 3
    Frankly, I wouldn't go so far as to say searching the site actually works.
    – tripleee
    Commented Mar 11 at 14:56
  • My last answer on Stack Overflow was on Aug 1, 2023 for these exact reasons (and my last self-answered canonical was months before that). Well, except for the actual details of the Google partnership, which weren't publicly known at the time. But fundamentally the situation has not changed. Going forward, I will be writing self-answered canonicals on Codidact corresponding to the questions I saved in order to close SO questions as duplicates; and I will start referring people there instead of SO when I participate in other forums. Commented Mar 15 at 14:21
29

Does this deal include the content of the public Q&A sites under a non-CC license?

10
  • 11
    All subscriber content, as defined in the Public Network Terms of Service, is still subject to Creative Commons License.
    – Rosie StaffMod
    Commented Feb 29 at 14:38
  • 10
    @Rosie just to clarify that I understood this correctly, SE did not provide the content to Google under a separate license? If Google accesses the content, it is still only available to them under the CC license? Commented Feb 29 at 15:03
  • 3
    @Rosie - Really?
    – Mithical
    Commented Feb 29 at 15:06
  • 9
    @MadScientist yes you are correct.
    – Rosie StaffMod
    Commented Feb 29 at 15:18
  • 19
    @Rosie I understand that the details of arrangements between companies are not often shared, but if they haven't purchased a different license to SE content it seems like the community would be interested in knowing what they did purchase. Commented Feb 29 at 15:39
  • 21
    @BryanKrause I agree. It looks like they purchased a change to robots.txt to let their crawler consume SE data for AI training. But what else did they purchase? Commented Feb 29 at 15:44
  • 1
    @Mithical presumably they don't consider that to be "subscriber content". Commented Feb 29 at 16:16
  • 3
    @KarlKnechtel Correct. "Subscriber content" is user-authored content like posts, comments, discussions, etc. Not site help center pages (even though some subscribers (moderators) can partially edit those pages).
    – TylerH
    Commented Mar 4 at 19:15
  • I think it would be useful if Google had access to History, since they clearly got something wrong about nazis and vikings, but probably not interested in getting things right. Commented Mar 13 at 15:02
  • There was a similar one between Google and Reddit at the same time (my emphasis)— "In February 2024, Reddit announced a partnership with Google in a deal worth about $60 million per year, to license its real-time user content to train Google's AI model. The partnership also lets Reddit get access to Google's "Vertex AI" service which would help improve search results on Reddit." Commented Apr 29 at 17:27
26

Too funny.

Duplicity:

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Stack Overflow plans to leverage Google Cloud’s state-of-the-art AI capabilities to improve their community engagement experiences and content curation processes.

Reminds me of SO policy of no politics on SO, then SO ownership and management decided to fly their political banner colors in the SO logo.

The model of "A.I.", as I see it, is a marketing racket.

People provide all of their own data to an algorithm humans create, then humans tailor the outcome to suit the consumer of their own data regurgitated back to them. Don't like the answer, no problem, the human will adjust the algorithm to output the answers that suit your fancy.

0
24

The Register seems to think this means SE will be charging for API access.

They have a very sensationalist headline reading "Stack Overflow to charge LLM developers for access to its coding content" Though there's not really much from the actual press release, and more from previous statements backing it up.

Assuming they're incorrect - since the article seems to be a mix of past statements and this, can we confirm/clarify that there's no change to access to APIs / Data dumps and if so, what any changes related to these might be?

And as much 'love' as we all have for the ol register, maybe get someone in touch with them to clarify.

20

It isn't clear what happened between now and 11 months ago. Specifically, what changed since the accepted answer (official company post) to this question Is SE [going to be] selling our content for AI model training? And what exactly does "reinvest back into our communities" mean? excerpt:

The money that we raise from charging these huge companies that have billions of dollars on their balance sheet will be used for projects that directly benefit the community. ... We may need to tighten up access controls to prevent abuse, but there will be a method for community members to access the api and its data for their own use.

But the community is - you are - being denied your rightful attribution as it stands right now. Prashanth is saying that you should at least get to benefit from the financial impact. This is about protecting your interest in the content that you have created.

I thought that Stack Overflow content was sold to Google then, in April 2023. Note the caveat about there being

a method for community members to access the API and its data for their use

which wouldn't need to be mentioned if licensing and/or ownership rights had not changed in some manner.

Perhaps the financial and other terms of the transaction weren't completed until now, in early March 2024?

Perhaps access to Stack Overflow content was sold to Google in April 2023, but not exclusive access? This conjecture is supported by the plural form ("huge companies") used when stating that--as protection of our interest in the content we created--we will be benefiting as a community from:

money that we [Stack Exchange management] raise from charging these huge companies that have billions of dollars...

Perhaps the current "exciting new announcement" by OP is that Google now has exclusive access to the Stack Overflow "repository" of content. That seems necessary for the following, per OP:

to provide Stack Overflow content directly within Google Cloud.

If an additional granting of (exclusive or other) rights occurred, then a second financial transaction was likely completed between Google and Stack Exchange management. It remains unclear how the infusions of funds, whether in April 2023, March 2024, or both are realized as benefits to us as the community.


An AI-related partnership with Google AND an independent commitment to socially responsible AI isn't possible. This isn't casting aspersions upon Google, but rather a consequence of the following:

  1. Alphabet has AI principles which are reflected in the business strategy and standards of its wholly-owned subsidiary, Google.
  2. A partnership between Google and Stack Exchange will result in Stack Overflow having de facto the same policies and level of commitment to socially responsible AI that Google does.

Alphabet makes AI policy recommendations for governments worldwide. Given that vast scope, I don't think that Stack Exchange is likely to be broken out separately.

4
  • 10
    There's also a clear difference between what many of us consider "the community" and what Stack seems to consider the community. The reality is the needs of the community that currently uses the sites doesn't necessarily align with the community they're intending to support.
    – Kevin B
    Commented Mar 6 at 15:28
  • Yes, @KevinB you're right. "The community" isn't the same as what it was in 2014 or even 2018. Commented Mar 8 at 2:20
  • 1
    I don't see how it can be exclusive. There may well be a clause that SE Inc. will not actively partner with Google's competitors; but that doesn't prevent them frim accessing the public data in accordance with the existing license.
    – tripleee
    Commented Mar 11 at 15:05
  • @tripleee Yes, you're right about it probably not being exclusive. I thought about it some more, and figured out why but then forgot. In the interim, I read that Google is paying reddit to train its models on reddit's content repository... ungh. It coincides with reddit doing its first IPO, announced yesterday. All kind of creepy IMO. Commented Mar 12 at 7:52
18

For what it's worth: I've had enough. I'm deleting my profiles.

16

Edit (May 9) after the months that have passed, and in light of the announcement that SO has now also partnered with OpenAI, I don't fully stand behind this answer anymore. Scroll down for more.


Original post

I'm not happy about this announcement, particularly the fluffy buzzy optimism.

Yet I won't join the general criticism, and indeed say that the actual strategic decision Stack Overflow made here is sensible. Perhaps it is in fact the best decision available (albeit rather in a least-bad-option sense).

Why? Well, let's go through the alternatives. In hypothetical other universes:

  1. SO takes a stance to keep the platform itself AI-free but the data fully available for anybody.
    This is what I would personally find the most amiable option, but a sober look on the actual ramifications makes it not so great. Namely, it would mean that every AI company would just feast on the data without any consideration for things like attribution for post authors at all. Users looking for help wouldn't care about that and for them just using a 3rd-party AI would be the increasingly more convenient option. As a result, SO would end up being more or less a ghost town plundered for its past wealth but with little relevant new activity.
  2. SO stays out of AI and locks down access strictly to ensure the data stays in human-only hands.
    This is undesirable in many ways, both in principle due to things like privacy but also practically because this sort of gatekeeping would make it very unattractive for new users. It might in the short term succeed in making the platform more of a "clean, high-quality human content" place, but over time it would again just make it less and less relevant. Simultaneously it would make AIs worse for programming purposes (assuming the access-control mechanisms actually work in that way), because they couldn't access the SO data; opinions may vary on how good or bad that is, but it would stop mattering when an alternative platform pulls ahead of SO in terms of user- and post count and eventually also catches up on quality (perhaps not quality of the average post, but in the sense that far too many questions simply don't get asked on Stack Overflow anymore).
  3. SO opens the gates, encourages companies to train their AIs on the posts by SO users without consideration for original author attribution, and then attempts to get back the wins by integrating some of these AIs back into the platform.
    I suppose this might briefly make it more popular with newbies, but it would alienate experienced users who are nothing, but cows being milked for content in such a system and increasingly don't get to interact with other users at all, but everybody just with AI.

In the light of all that, revisit the actual decision in a (perhaps naïvely) amenable reading:

  1. SO stays mostly how it currently is in terms of user interactions, but makes it more cumbersome to mine data in real time. Looking forward, it cooperates with companies who will develop "socially responsible" AI that helps keeping the platform attractive to new users (some of whom will hopefully eventually become experienced and valuable members of the community), whilst also respecting the users who answer questions both by making AI-generated help give attribution to the authors whose work was crucial there, and by keeping them in the loop with votes, future questions, etc.

Again: I'm not very optimistic that this will work out. Will the "responsible" AI be attractive enough to help-seekers (compared to independently trained ones)? Will it actually respect answer authors? Will the quality of information circulated on the site be, if not improved, then at least kept at current levels? - who knows. But it is better to try than not to, and of all the possible strategies, I would indeed assess 4. as the most promising.

Of course, one thing that really would be desirable is that SO discloses the economic side: how much does Google pay for being the "responsible partner"? Are there other contestants, and how do they compare in terms of both money and prospective social responsibility?


May 9 edit

By this point, a status update on what concretely SO is doing to ensure the supposedly non-negotiable "socially responsible" is overdue. Instead, what they have done is another corporate-speech oh we are so good press announcement, and with OpenAI, of all companies. The same OpenAI that started the whole debacle by unleashing ChatGPT in a state that absolutely does not honour the relevant issues like attribution.

It does now look an awful lot like all this "responsible" stuff is just empty talking, written to distract from the cashing-in. I still have hope that SO will prove the critics wrong, but I'm no longer willing to place any trust in it.

I will not contribute to the platform until more concrete information is available. I see several users are self-vandalising their contributions now. This is perhaps a bit of a knee-jerk reaction, but by as it stands now I consider joining in. Too much is at stake to just let these companies get away with pretty words but no evident actions in the right direction.

15
  • 14
    I don't trust SE on this statement: "SO takes a stance to keep the platform itself AI-free but the data fully available for anybody." given they've literally experimented publicly with the opposite of this.
    – Kevin B
    Commented Feb 29 at 17:35
  • 1
    @KevinB 1.-3. are hypothetical alternative realities. I'll clarify the post. Commented Feb 29 at 17:36
  • 21
    3 already occured. No amount of SO blocking it in the future will undo the fact that all of the best content has already been taken. This partnership literally gives all current and future content generated by the community to google to be used for training, not just for OverflowAI, but for whatever they want to do with it. They're certainly not going to start citing questions from SO in their code completion features.
    – Kevin B
    Commented Feb 29 at 17:36
  • I somewhat agree that 3 already occured, but that content will get stale. An AI that just bases its help on SO posts from 2022 will be pretty useless for many programming problems in 2027. Commented Feb 29 at 17:41
  • 10
    My point is your hypothetical options aren't hypothetical. They've happened, and are very real possibilities for things that are also possibly being launched in the near future. We're way past #2 being a reality at this point, the entire company has been reorganized toward the purpose of building and selling AI solutions.
    – Kevin B
    Commented Feb 29 at 17:52
  • Well, clearly the three of them can't be all non-hypothetical simultaneously, other than in a Schrödinger's Cat way... Commented Feb 29 at 17:54
  • 1
    That's exactly our criticism. they have a stance that their actions aren't supporting.
    – Kevin B
    Commented Feb 29 at 17:55
  • 1
    @KevinB fair enough, why don't you elaborate that in an answer? And also, what would you suggest SO should have done differently to not end up where we are now? Commented Feb 29 at 17:57
  • 8
    I don't feel it's worth the effort. They aren't listening anyway.
    – Kevin B
    Commented Feb 29 at 18:02
  • 5
    As a result, SO would end up being more or less a ghost town plundered for its past wealth but with little relevant new activity. Only if the AI tools are better. And if they are better then why should I care that SO is a ghost town? This answer shares a view with many of the angry delete-my-account rants in that it assumes that a world of superior AI is just around the corner and that, therefore, we should strive to keep SO relevant. I don't see that world just around the corner, but if it comes and obsoletes SO then why should I care? I don't own shares in the company. Commented Feb 29 at 23:45
  • 2
    @PresidentJamesK.Polk false dichotomy. We may neither have superior AI just around the corner, nor any guarantee that SO stays relevant because it's still the better option. Instead, what we have is AI that's good enough - and more convenient - for so many users (95%? 99%?) that it reduces SO to a niche site with too little traffic to really stay up to date, while we more experienced users still need it as a more robust platform / resource compared to the often-erratic AI alternatives. Commented Mar 1 at 8:02
  • 1
    Apart from that - well, I consider the current style of AI with their lack for attribution considerations or human peer-review ethically problematic, and even if they offered to all help-seekers more satisfying solutions than SO does then this is not a future we should welcome. Again, I'm not convinced that what Google cooks up here will be ethical either, but I also can't see any better suggestions for how to take steps in the right direction. Commented Mar 1 at 8:11
  • 4
    The company owners expect to be billionaires or it's a disaster. The people who curate the site are volunteers. There is a lot of room between those two for a non-profit site that hews to the original vision of Stack Overflow without compromise. Attribution always has been and always will be an issue that needs attention. As an aside, it's interesting that when I asked Gemini a programming question, its answer included attribution. Commented Mar 1 at 13:51
  • 1
    You left out the part where the Ai begins to self train on its own output in all of this and takes a complete nose dive. What do you think is going to happen when clueless consumers begin to regurgitate remixed prompt outputs back into the space that Ai trains from? It simply is not accurate enough to dogfood its own output. That said, if proper attribution were used... at least it could know not to eat its own crap.
    – Travis J
    Commented Mar 14 at 5:22
  • @TravisJ that's certainly something I'm concerned about too, but we should fear it from OpenAI etc. rather than from StackOverflow. SO's asset is specifically the human contributors and they'd be mad (not just in an idealistic-mission sense, also from a pure business perspective) to forgo that by letting such a feedback loop destroy the advantages it offers them over the pure AI companies. Commented Mar 14 at 9:56
15

In the "Defining socially responsible AI" blog post, the first part of the vision is about attribution being non-negotiable:

All products based on models that consume public Stack Overflow data are required to provide attribution back to the highest relevance posts that influenced the summary given by the model. With the lack of trust being felt in AI-generated content, it is critical to give credit to the author/subject matter expert and the larger community who created and curated the content being shared by an LLM. This also ensures LLMs use the most relevant and up-to-date information and content, ultimately presenting the Rosetta Stone needed by a model to build trust in sources and resulting decisions. Sharing credit ties to attribution and with attribution—trust is at the heart.

There are several problems with this.

It is not yet settled that training a model requires attribution. In the United States, there are court cases that may resolve this. However, until there is a more recent precedent from the courts, cases such as Authors Guild, Inc. v Google, Inc. and Authors Guild, Inc. v. HathiTrust have found that text and data mining is fair use. There are cases where models have regurgitated protected text or images, but there are also pending court cases to address the question of responsibility.

SE has no right to take action to protect our intellectual property. The official stance of the company is that they cannot help us if our intellectual property is misused. Several YouTube channels are taking and using content from the SE sites with insufficient attribution, but the company isn't and can't take action. Should the courts decide that model training requires attribution, it would be our responsibility as the copyright holder to take legal action against the model developers.

Why is the company taking these actions? Why block AI model developers from a rich, robust source of training data and selectively try to "help" the intellectual property owners without consulting us first?

5
  • 2
    Note that this is about working "with Stack Overflow at an API level" so it could be something about restricting API access/who to give API tokens to/setting ratelimits.
    – dan1st
    Commented Feb 29 at 16:11
  • 5
    That there's legal precedent to do something doesn't support it being called "socially responsible".
    – Kevin B
    Commented Feb 29 at 18:05
  • 3
    @KevinB That is true. However, making decisions behind closed doors is also not socially responsible. It would be socially responsible to have discussions about how best to use the intellectual property freely provided by countless people under a free and open source license in a way that is both legal and ethical. Commented Feb 29 at 18:09
  • 14
    I still find it hilarious that they apparently think Google is some how doing socially responsible AI
    – Flexo
    Commented Mar 1 at 7:32
  • 8
    @KevinB I think that the whole point is that claiming to be "Social Responsible" means exactly nothing when existing laws are just vapor and smoke that allows you to be an horrible company as soon as money make that likeable for you. Is "tracking beacon pixel in ads" social responsible? How many 6-8 units will it take for "not-attribution is the new normal, everyone does that" to become the "social responsible" standard of SO?
    – SPArcheon
    Commented Mar 1 at 9:38
15

What assurances does the community have that this will not open the door for plagiarism via uncited works?

I assume some of us perhaps understand that our work benefits others, even corporate entities... but that was always supposed to come with citation.

Has Stack Overflow the company considered that there may be a class action exposure to violating millions of user's content licencing?

11
  • 2
    You should read the linked blog post about the deal. It says All products based on models that consume public Stack Overflow data are required to provide attribution back to the highest relevance posts that influenced the summary given by the model. There are a lot of problems with this, but attribution isn't one of them. A bigger problem is that the system seems to remove any reason someone might have to interact with SO as far as I can tell. I guess with fewer people asking questions, that will free everyone up to clean up the existing data with no compensation.
    – ColleenV
    Commented Mar 8 at 19:36
  • @ColleenV eh, that's not actually as clear cut as you're stating. It also says google will be able to use the SO data to train their models, code resulting from their models that are now trained on SO data likely won't directly cite training material. I've only seen them state that they'd cite the sources that are being summarized.
    – Kevin B
    Commented Mar 8 at 19:40
  • @ColleenV - I am guessing you are not familiar with the way that the "Ai" search was implemented at Stack Overflow. Attribution needs to be clear. It cannot be hidden in a click to discover zone where it only lists the "relevant" source as a whole without including the actual citation. Moreover, when looking at situations which involve code, the algorithm for relevance tends to weigh words higher than code which is incorrect. I read the linked material. I am familiar with the situation. It is a problem.
    – Travis J
    Commented Mar 8 at 19:54
  • @KevinB The blog says all products based on models that consume SO data, not just summarizers. It doesn't mention details of how it will be attributed. I assume since it will be done by Google, SO's stab at it is irrelevant. Don't get me wrong; I expect this is all going to go poorly. However I don't think the post here clearly communicates why the assurance of attribution in the blog is insufficient.
    – ColleenV
    Commented Mar 8 at 20:10
  • eh, no, in every case where it says attribution is a requirement, it's tied to summaries. "All products based on models that consume public Stack Overflow data are required to provide attribution back to the highest relevance posts that influenced the summary given by the model."
    – Kevin B
    Commented Mar 8 at 20:18
  • @KevinB What other products produced from a model require attribution if they aren't a summary of information from a different source? I read "summary" as "result".
    – ColleenV
    Commented Mar 8 at 20:21
  • as far as i'm aware, no model is currently capable of citing source material, in every case where there are sources the sources were decided before the prompt.
    – Kevin B
    Commented Mar 8 at 20:23
  • @KevinB That's irrelevant. The agreement (purportedly) says they have to provide attribution. If they can't they just wasted a whole lot of money. Frankly, a contract that insists on AI attribution is better than the current state of copyright law around AI right now.
    – ColleenV
    Commented Mar 8 at 20:25
  • I don't think we'll ever see the actual agreement, so we can't know whether or not an exception was carved out for training.
    – Kevin B
    Commented Mar 8 at 20:38
  • @KevinB I don't understand what you mean. I can train all the models I want on SE data and use it to write software. Must people attribute the SO answers that helped them when they release their software? You only have to provide attribution if you're copying/sharing the information. Not if you're learning from it. I suppose you could argue that a model using the data must be released under the same CC BY-SA license if it is released, but I doubt Google will release the model. They'll just sell access to it through an interface.
    – ColleenV
    Commented Mar 8 at 21:08
  • 3
    @ColleenV - Proper attribution is defined in the license agreement, and that is what must be provided. There needs to be direct citation for works used; that is the license. Without direct citation, it is plagiarism unless the entirety of the work is remixed solely from one source and the source is referenced (which isn't the case here). Ai remixes from multiple sources therefore it must explicitly cite those sources discretely in order to abide by the license.
    – Travis J
    Commented Mar 8 at 22:51
15

Google is a proprietary company which obsoleted their own question and answer platform.

Stack Exchange is a question and answer website. It has its own server. It stores the emotional labour and research effort of uncountable user. For the public good.

Why you think it is justified to give control of a company over this huge volume of labour? Why do you think it our contribution should be used by the chatbot of only one company? Feels disgusting.

Update: Better keep this site content and moderation totally manual/human.

4
  • 8
    When SE applied AI, things went down. And IMO, SE went too far with AI. So I'd suggest SE to refrain using AI, but that's almost impossible now... Commented Mar 9 at 7:04
  • 1
    It's possible. I don't use "A.I.". Google Search worked for many years without claiming "A.I." had anything to do with the algorithm. If the idea is really just searching, just use Google Search. However, the idea is to squeeze all they can out of the trending "A.I." nonsense. Commented Mar 11 at 21:17
  • 6
    Re "obsoleted their own": The list is long and distinguished! Commented Mar 11 at 21:48
  • @This_is_NOT_a_forum Nicely done. I felt the usual sorrow at your first link, but smiled a bit from the second. (I haven't seen the movie yet. There's already a sequel so I should do both as a double feature.) Commented Mar 12 at 7:56
13

I have an idea.

I suggest allowing AI to post answers and then train AI on all that AI generated data.

9
  • 2
    No, it gives way better results if the AI gets trained exclusively on the AI-generated content. Commented Mar 8 at 13:55
  • 1
    That's what I meant. Updated answer to be more clear. Commented Mar 8 at 15:08
  • Please, Don't. I need human insight. keep it this way
    – user1477137
    Commented Mar 8 at 18:13
  • 8
    Better yet, ban all those pesky humans and only let robots post questions and answers, so they would be able to quickly take over the world! Commented Mar 8 at 20:43
  • 3
    Was this sarcasm, or are you advocating for Garbage In Garbage Out ?
    – Criggie
    Commented Mar 11 at 21:14
  • 3
    @Criggie That's all SE/SO is doing. What's good for the goose is good for the gander. The question must be asked though, since Google Search worked so well for many years without ever claiming Google Search was "A.I."-powered or other such slogans, why not use Google Search - without "A.I."? The answer is corporations are trying to squeeze all they can from dullards out of the "A.I." artificial trend corporations created themselves. Commented Mar 11 at 21:19
  • 3
    @guest271314: Google Search doesn't work all that well. Commented Mar 11 at 22:32
  • @This_is_NOT_a_forum It works. Like any research corroborating evidence is necessary. BTW, SE Web sites are forums. SE Web sites are literally social media. Commented Mar 12 at 0:01
  • 5
    @Criggie I wish it would be a sarcasm (and my answer is meant to be as such) but unfortunately this is the real direction in which SO/SE is aiming in. I am just cutting to the chase here. Commented Mar 12 at 13:41
12

WIRED reports (my emphasis):

The Google deal will also test how users of the version of Gemini for Google Cloud integration can create new data for Stack Overflow. People who don’t get a satisfactory response from the chatbot will be able to submit their query to Stack Overflow, where once approved by moderators it will be available for the website’s community of users to answer. As the companies prepare for the demo in April, they also are talking about letting users submit improved answers back to Stack Overflow.

Are we stepping away from the volunteer moderator model and paying folks to check queries sent over from Gemini?

15
  • 12
    Oh I doubt anyone is going to get paid.
    – ColleenV
    Commented Mar 13 at 13:06
  • 13
    Also: Where did WIRED get that information from that we didn't get?
    – dan1st
    Commented Mar 13 at 13:07
  • @dan1stiscrying “Their AI may not have all the answers, and so we have a huge ability to help complete that loop,” Chandrasekar says. It looks like they're trying to get more participation by funneling questions and answers from people using Google's AI.
    – ColleenV
    Commented Mar 13 at 13:14
  • 2
    @ColleenV Maybe the claim about moderator involvement was a hallucination.
    – Karsten
    Commented Mar 13 at 15:36
  • 3
    @Karsten The point of my quote was that Wired interviewed Chandraseker and he was the source of the information. I wonder if he is using "moderator" to mean something different from the elected site moderators.
    – ColleenV
    Commented Mar 13 at 15:42
  • 8
    @Karsten Is it CEO-GPT hallucinating?
    – dan1st
    Commented Mar 13 at 16:56
  • 2
    @Karsten the intent isn't to create additional work for moderators. I think a more accurate way to explain how this will work is if a Gemini user doesn't receive a satisfactory response there, they'll be able to submit their question to Stack Overflow. Like any question on the platform, it needs to meet the same standards and guidelines and can be closed if it does not. The article uses "Moderators" when, in our model, everyone takes moderator actions.
    – Rosie StaffMod
    Commented Mar 13 at 20:24
  • 2
    @Rosie What does submitting a question to SO mean? Can they directly ask a SO question from indide GCP/The Gemini UI/whatever Google uses there? Will that include possible AI modifications (e.g. formatting assistents)? Do the Gemini-SO-askers need to undergo the exact same process as other askers (including future changes like a possible Staging ground (please add the Staging ground thoughthat'sanother thing))?
    – dan1st
    Commented Mar 13 at 20:42
  • Also, how else will Gemini (and Gemini users) interact with SO in other ways you haven't told us about?
    – dan1st
    Commented Mar 13 at 20:47
  • 12
    @Rosie You do know that the lack of onboarding (I'm getting so tired of having to repeat "onboarding", but here we go again) has dwindled down the pool of curators too, right? We don't have enough diamond mods nor curators to keep up with the current influx of any singular type of post - adding genAI on top of that (which, by the way, is a blatant violation of the current genAI ban as it stands) still adds more work for curators, and it's still a group that can barely keep up with the current volume. The mod vs. curator distinction does not change the underlying concern in this case
    – Zoe
    Commented Mar 13 at 20:54
  • 9
    The fact you continue to dump more and more work on us (just like you're about to do to SO with that other thing I can't talk about because you lot delayed the announcement) while doing little to nothing to deal with our problems and our lack of resources is completely incompatible with a model where you continue using free labour so you don't have to put resources into dealing with quality. You can only strain your available resources so much before you see a collapse. Pushing a feature going against one of the community-decided policies is probably the single worst thing you can do right now
    – Zoe
    Commented Mar 13 at 20:58
  • 8
    @Rosie "if a Gemini user doesn't receive a satisfactory response there, they'll be able to submit their question to Stack Overflow" -- where will the searching for existing answers (duplicates) come in in that sequence of steps? Or are we just going to get more low-effort crap dumped on us to clean up (and inevitably get flak for doing so)?
    – Dan Mašek
    Commented Mar 13 at 21:09
  • 2
    @DanMašek i mean, in theory, if the answer exists surely gemini will find it, right? right?
    – Kevin B
    Commented Mar 13 at 21:17
  • @KevinB :D :D | However, not that I read it, somehow I wrote "answers", but I really had "questions" in mind. If it's something that has been asked many times before, but never got any answers... well, probably no point asking that yet again.
    – Dan Mašek
    Commented Mar 13 at 21:28
  • 8
    "I think a more accurate way to explain how this will work is if a Gemini user doesn't receive a satisfactory response there, they'll be able to submit their question to Stack Overflow. Like any question on the platform, it needs to meet the same standards and guidelines and can be closed if it does not." @Rosie Just to make sure: you do understand that probably less than one in a thousand questions that someone might prompt Gemini with, would actually be suitable for a SE site where they're on topic? At least if it's a remotely technical subject? Commented Mar 15 at 14:14
12

SE originally banned ChatGPT content, ostensibly because of concerns about accuracy. All my interactions with generative AI have convinced me that that concern was well founded. I never get code from ChatGPT or other LLMs that works out-of-the-box.

1
  • 1
    In addition, the post mentions feeding data to Gemini (a chat bot) en.wikipedia.org/wiki/Gemini_(chatbot) - This may be a personal opinion, but of all of the chatbots that I've had to interact with in the past ~5-10 years... none (NONE!) of them have actually been helpful, just a frustrating delay in trying to get to a real person for help/support/info. I really hope that this isn't the case here, but I'm not optimistic.
    – scunliffe
    Commented Mar 18 at 15:49
9

Since Google is blocked in China, does that mean SE will also be blocked in China now?

(It is vital to know if anything could affect users' social credit scores kept by a foreign power.)

3
  • 2
    I guess SE doesn't decide that.
    – dan1st
    Commented Mar 14 at 7:35
  • Or they don't know, or they didn't even cover it in their talks. We have a Chinese language site, so it certainly is relevant.
    – Jesse
    Commented Mar 14 at 22:58
  • 4
    While I don't have direct evidence of such a block already existing, I have seen multiple comments where SE users mention difficulty in circumventing the Great Firewall; and I have noticed that attempts to post in Japanese or even Korean on Stack Overflow happen every now and then while attempts to post in Chinese are vanishingly rare - despite the population ratio. Commented Mar 15 at 14:09
8

Stack Overflow will work with Google Cloud to bring new AI-powered features to its widely adopted developer knowledge platform.

And what does Google give back to Stack Overflow users who provide efforts to solve technical issues or share theirs?

5

I still feel that the best reward for the huge amount of voluntary work incorporated in this Q&A website would be to direct everyone having a question precisely to this website and not just to the naked answers, thus feeding back to its community. Anything that predates the answers bypassing actual participation is detrimental to the concept of collaborative Q&A.

It's like printing Wikipedia on a paper book and selling it; you can do it but it sucks.

4

janw, Journeyman Geek, CollenV, and others have asked what the community is getting out of this. We can’t discuss financial and contract details, as you are all aware. What I can share with you is that resources will be put into the public platform.

We have some exciting updates to share in the upcoming weeks. Some initiatives that we know the community cares deeply about are being prioritized. Those announcements will be coming soon.

I know it’s frustrating that I can’t be more specific than that right now, but I do mean it when I say those updates will be coming in weeks, not months. In 6-8 weeks. ;)

14
  • 41
    I don’t exactly trust they’d be things we want.. with the track record of other things that the company couldn’t talk about yet but were announcing soon…
    – Kevin B
    Commented Feb 29 at 19:14
  • 7
    By "public platform", do you mean "Stack Overflow" or the rest of the network? Although there are plenty of things that I've seen SO (both mods and the community) ask for, and due to their size and scale, they have needs that other sites don't have, there are also things that would benefit the network as a whole. Where will the focus be? Commented Feb 29 at 19:49
  • 49
    Thanks - I'm trying to not be super cynical about this. I'll check back in May and reassess the situation, but it looks very bleak to me right now. Alphabet is the last company I would trust to be ethical with their use of AI. I have forbidden my financial advisor from investing my money in Alphabet; it's unlikely I'm going to be inclined to volunteer my time in exchange for what, chat upgrades? Integrating their tainted AI into SE so that not only am I curating the model data, I'm reinforcement training it for free too? Sorry. I'm taking a break until I can be more constructive.
    – ColleenV
    Commented Feb 29 at 19:54
  • 4
    I honestly think their money is as good as anyones if it goes into the long term good of the community. I am somewhat less than enthused at how things have gone post acquisition - so if they can do better than they did in the past year or two, it would be good. Commented Mar 1 at 6:35
  • 11
    It's good to know, @ColleenV, that I don't seem to be the only one who remembers Google's now failed promise to "do no evil" that they made when they launched their IPO.
    – FreeMan
    Commented Mar 1 at 12:52
  • 5
    "...and others have asked what the community is getting out of this. We can’t discuss financial and contract details..." I'm not interested in your finanicial details. I'm interested in vetting the claims made so far. Re "...to improve their community engagement experiences and content curation processes." How is exactly is "improve" measured?" Commented Mar 9 at 1:08
  • 5
    Can you update this answer to link to these updates as they are announced? I otherwise have little idea what specific things that get announced might be connected to this, when coming to this page for information. Commented Mar 13 at 5:00
  • 6
    A minor gripe, @Rosie, but please can things not be described as exciting, constantly? Just describe the thing. We'll decide whether it's exciting or not. The same with all other value-judgements. You don't have to insert editorial into your news announcements. Ta.
    – Rounin
    Commented Mar 17 at 10:55
  • 2
    As I promised the other week we would be posting about upcoming initiatives. meta.stackexchange.com/questions/398734/…
    – Rosie StaffMod
    Commented Mar 27 at 14:07
  • OK, it's May, I checked back, and it's not looking much better to me. Best of luck trying to ride out the AI hype-fueled data grabs. I've got to figure out how to be a tech optimist and luddite simultaneously lol. I still want my flying car but not if it's going to steal all my know-how for some giant corporation's AI model so they can sell it back to me in a tool that will make flying my car easier.
    – ColleenV
    Commented May 6 at 17:44
  • "those updates will be coming in weeks, not months". Well, it's been over 8 weeks now. Did I miss an update? Commented May 7 at 9:59
  • @ShadowWizardLoveZelda It's 6-8 weeks, not 8 weeks at maximum.
    – dan1st
    Commented May 7 at 10:01
  • @dan1st nope, from the way of writing sounds like Rosie actually meant it literally, making us hope it will really be under 8 weeks. Commented May 7 at 10:49
  • 1
    @ShadowWizardLoveZelda I think this was after the omission of the 6-8 weeks joke was noticed in a comment: meta.stackexchange.com/questions/398127/… I think the winky smile seals that it was a reference to the joke.
    – ColleenV
    Commented May 7 at 16:06
1

I had seen here. It just says that we need to trust AI, how can we more effectively support developer needs? How do we “trust is at the heart”?

Please don’t announce something that doesn’t contain important points. I don’t think it answered me how we trust AI! If we use AI, it just ruins this SE only. It was hard to control the AI.

The reason I use SE is want to learn English and Mathematics from the Stack Exchange. And I can get the correct and trustful answers from Stack Exchange. Why do I want to use SE if AI was used by SE? I think that I will use other sites or communities instead of these sites.

Why do I need to use “Gen AI stack exchange” if it is the same as the function with the ChatGpt? (I mean for the future.)

I hope the Stack Exchange Team or community should think wisely first, and I hope that SE always bans ChatGpt everytime.

1
  • 2
    Moderators can't really do much about this, they're just unpaid users like us, with some more privileges (and responsibilities). This seems purely the doing of upper management, and the current bunch don't strike me as people interested in building something of lasting value.
    – Dan Mašek
    Commented Mar 22 at 12:02
1

Partnership with Alphabet/Google means I will probably not contribute here again. I just came from contributing a small answer.

Alphabet is a war criminal company and blissfully provides partners with the huge US spy agencies. I'm surprised that this partnership fits with Stack Overflow team's ethics. I read somewhere that Stack Overflow won't comment on the arrangements, particularly the financial side. This is anything, but transparent and community friendly.

I'm not a core user, but there are lots of people out there like me.

The initiative to make Stack Overflow and the Stack Exchange sites more accessible will just dumb the sites down and make them far less useful.

-1

In an ever-evolving world, we’re always looking for new ways to get this repository in front of the world’s technologists: those who can benefit from it and contribute back to it. And with this new joint initiative with Google, we’re laying the groundwork for this reality.

Given that there's no talking SE out of this, I hope this is just a start of this effort. Siloing Stack Overflow (and/or all Stack Exchange data) in one AI platform is a detriment to society at large. This data has been generated by Stack Exchange communities. It's understandable that Stack Exchange is a company, and as a company needs to be profitable. That said, this data should be available to the benefit of all mankind, not just one AI system or company.

I implore Stack Overflow to ensure that the data to remain democratized and available for all seekers, corporate or otherwise. Stack Exchange must not contribute to a "exclusivity" situation, like with Streaming providers today, but instead with all of the data on the internet. Imagine one AI being completely oblivious to entire swaths of the internet, just because of an exclusive agreement with one company or another. That would be exceptionally awful.

4
  • 8
    "democratized" That's funny. Nearly 300 "downvotes" so far, nobody clamouring for "A.I." and SE is still moving forward with this nonsense. So the overwhelming sentiment of the People is they don't want "A.I". SE ain't listening. SE management thinks their barons ruling their fiefdom, and the serfs will learn to enjoy their cake, full of high-fructose corn syrup. SE ain't no democracy. It's an oligarchy run by management, to hell with the People. Commented Mar 11 at 21:21
  • 1
    To think that all data isn't going to be fed into an some AI or another is missing the point. All of this data has already been fed into Google, Microsoft, Facebook, etc's systems. It's already there. It's already being used to mine for data on every person on the planet. Making sure that it stays available is what I'm asking for here. The internet would become WORSE if data sets get siloed up and sold off to individual companies, or permanently locked behind paywalls that only the most affluent companies can afford. Imagine the state of streaming, but oblivious AIs due to agreements. Awful.
    – Robert P
    Commented Mar 11 at 22:41
  • 2
    What is the point? Can you explain precisely how "improve" in "...to improve their community engagement experiences and content curation processes." is measured? What you say is true. The solution is to turn off the Internet, turn off the machine, then go read a book at the local library, then put the book back on the shelf. "A.I.", as I see it, is just a marketing racket designed as consumption for lazy, inept people who have no clue about real primary source research. It's just the same programmers at corporations and government contractors writing programs to make money. Commented Mar 12 at 0:13
  • 5
    "Someone is going to steal it anyway, we might as well give it away" is not the enlightened position that you think it is. Commented Mar 12 at 20:11

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .