Stack Exchange restricts access to dump of user-contributed data, critics complain this contradicts license • DEVCLASS (2024)

Stack Exchange restricts access to dump of user-contributed data, critics complain this contradicts license • DEVCLASS (1)

Stack Exchange, whose best-known site is the developer question and answer resource Stack Overflow, will restrict access to its user-contributed data dump behind a login and agreement not to use the content to train AI models, despite the Creative Commons license that allows the public to “remix, transform, and build upon the material for any purpose, even commercially,” subject to attribution.

The company formerly posted user-contributed data to the Internet Archive every three months, most recently in April 2024. The data is free to download, and accompanying text notes that “all user content contributed to the Stack Exchange network is CC-by-SA 4.0 licensed, intended to be shared and remixed.” The text does note the requirement for attribution, including the author and a link back to the original question.

Now Stack Exchange has informed users about a change to its data dump process, with the policy updated late last week in partial response to contributor complaints. The key changes are that the data dump will be on the Stack Exchange site; accessed via the user profile, which requires login; and that downloaders must agree that “the file is being provided to me for my own use and for projects that do not include training a large language model [LLM].”

The update states that both the product and legal teams have signed off on this modified language.

Users who do not comply may have their access to future downloads removed.

A previous proposed version of the download agreement required agreement to “use this file for non-commercial use;” this has now been narrowed to specify LLM training.

The post has been received negatively by contributors, with one highly upvoted comment claiming that the policy does not comply with the CC-BY-SA 4.0 license, specifically the part that says “you may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.”

The wording of the download agreement also appears to be at odds with the wording of the rest of the announcement, which states that “We are requesting that if you intend to use the dump for a commercial purpose, you consider joining the socially responsible AI movement and giving back to the community.” This appears to differ from what the click-to-agree conditions state.

An FAQ on the matter says that “we are attempting to protect the long-term viability of the Stack Exchange network” and complains that “companies have scraped or otherwise ingested Stack Overflow and Stack Exchange data to train models without proper attribution.”

In February Stack Overflow agreed to integrate with Google’s Gemini for “new AI-powered features” both on Stack Overflow and in Gemini’s output.

A prominent Stack Exchange member set out a plea to “save the data dump,” stating that “they’re still selling the data dump to genAI – they just don’t want genAI companies to get it for free. They’re capitalising on the community’s data, while making it harder for the same community to own our own data.”

It is obvious that developers using AI to answer coding questions or to generate code are less likely to visit the Stack Overflow site, causing a decline in traffic. It is also obvious that the commercial value of Stack Overflow content is diminished if the same content is freely available, even for commercial use.

These are business concerns for Stack Exchange but that does not change the license under which content is contributed, which is CC-BY-SA 4.0.

There are also unresolved questions throughout the industry regarding when AI-driven output is new content, and when it is more akin to search results that require attribution.

Another issue is that as fewer developers use the site directly, the number and quality of answers will diminish, reducing its value for LLM training and in general.

Might the data dump end up on the Internet Archive regardless? “You do realize that there is nothing you can do to prevent your community from simply maintaining the archive.org mirror for you?” said a developer, with the response from Philippe Beaudette, VP of community, “I do, yes. My hope is that over time, they see that there’s no need.”

We have asked Stack Exchange for further comment on the new policy.

JetBrains updates IDEs with new UI by default. Now a VS Code lookalike? Some devs think soCode is "drowning in security debt" says Veracode – and AI is both problem and solutionLittle sign of migration from C or C++ to Rust in latest dev survey – but PostgreSQL is winning agai...Security audit finds issues with 'misuse-prone' Homebrew package manager for macOS, most are fixedNode.js adds experimental TypeScript support, as it 'simply cannot be ignored'Uno Platform 5.3 released with full JetBrains Rider support and '350 enhancements'PHP 8.4 is coming in November with HTML 5 extension, new array functions, and moreReact community splitting into full-stack and client-only camps, suggests surveyExecutives have more confidence in software supply chain security than their developersWhy Facebook does not use Git – and why most other devs doNetlify sponsors Astro and becomes official deployment partner, as CEO takes aim at "vendor lock-in"Devs say many of their hours are wasted, disagree with managers on how to fix the issue
Stack Exchange restricts access to dump of user-contributed data, critics complain this contradicts license • DEVCLASS (2024)
Top Articles
Free Stuff On Craigslist Vancouver This Week - 604 Now
Die besten Gaming-Mauspads von SteelSeries
Umbc Baseball Camp
Quick Pickling 101
Ixl Elmoreco.com
Ofw Pinoy Channel Su
Dr Klabzuba Okc
Tap Tap Run Coupon Codes
Lesson 3 Homework Practice Measures Of Variation Answer Key
Milk And Mocha GIFs | GIFDB.com
Jet Ski Rental Conneaut Lake Pa
Evangeline Downs Racetrack Entries
Lenscrafters Huebner Oaks
Nebraska Furniture Tables
Napa Autocare Locator
360 Tabc Answers
V-Pay: Sicherheit, Kosten und Alternativen - BankingGeek
Ruse For Crashing Family Reunions Crossword
Milanka Kudel Telegram
Hyvee Workday
Doki The Banker
Koninklijk Theater Tuschinski
What Equals 16
Znamy dalsze plany Magdaleny Fręch. Nie będzie nawet chwili przerwy
WRMJ.COM
Frank Vascellaro
Federal Express Drop Off Center Near Me
Page 2383 – Christianity Today
Stouffville Tribune (Stouffville, ON), March 27, 1947, p. 1
Ff14 Sage Stat Priority
Craigslist Gigs Norfolk
Deleted app while troubleshooting recent outage, can I get my devices back?
Vip Lounge Odu
Dallas City Council Agenda
Puffco Peak 3 Red Flashes
Raisya Crow on LinkedIn: Breckie Hill Shower Video viral Cucumber Leaks VIDEO Click to watch full…
Dmitri Wartranslated
Blasphemous Painting Puzzle
Walgreens Agrees to Pay $106.8M to Resolve Allegations It Billed the Government for Prescriptions Never Dispensed
Www Usps Com Passport Scheduler
Gfs Ordering Online
The best bagels in NYC, according to a New Yorker
Andrew Lee Torres
Lucifer Morningstar Wiki
John M. Oakey & Son Funeral Home And Crematory Obituaries
Blow Dry Bar Boynton Beach
Ucla Basketball Bruinzone
How the Color Pink Influences Mood and Emotions: A Psychological Perspective
303-615-0055
Coleman Funeral Home Olive Branch Ms Obituaries
Att Corporate Store Location
Cbs Scores Mlb
Latest Posts
Article information

Author: Lidia Grady

Last Updated:

Views: 6152

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Lidia Grady

Birthday: 1992-01-22

Address: Suite 493 356 Dale Fall, New Wanda, RI 52485

Phone: +29914464387516

Job: Customer Engineer

Hobby: Cryptography, Writing, Dowsing, Stand-up comedy, Calligraphy, Web surfing, Ghost hunting

Introduction: My name is Lidia Grady, I am a thankful, fine, glamorous, lucky, lively, pleasant, shiny person who loves writing and wants to share my knowledge and understanding with you.