45

What is Guttenberg?

Guttenberg is a bot that searches for plagiarism or duplicated answers on Stack Overflow. It's currently running in SOBotics under the user Guttenberg.

Implementation

Every 60 seconds, Guttenberg fetches the most recent answers (the "targets") on Stack Overflow. For each of these answers, possibly related posts (for example answers to related questions) are collected. All those posts will be checked for different characteristics (such as the Jaro-Winkler distance of the posts). If at least one of the characteristics meets the requirements, a message like this will be posted in chat:

Feedback

The feedback (k/tp or f/fp) will be stored on CopyPastor, where you can also compare the target and the original.

Accuracy

We are already collecting data with CopyPastor to provide statistics, but since there are not that many posts to report, it will take a while until we have enough data. (approximately 6-8 weeks)

Special feature for moderators on Stack Overflow

In one of the latest releases, Petter Friberg has implemented the checkuser-command, which is available for moderators and room-owners. It checks all posts of a given user for plagiarism. In addition to the usual check via linked/related-posts, this command uses Google to find other sources where the user might have copied code or text. (we have a limited quota of 100 requests/day)

Source

The code of the bot is available on GitHub. Feel free to open an issue, if you have an idea to improve the bot or found a bug. Please read this, if you want to contribute.

Continuous integration

Every GitHub release/tag will be automatically deployed to my server and Guttenberg reboots for the update.

Current version: 1.0.0

Build Status

6
  • 1
    Is this in the workshop only? cc:@BhargavRao
    – M--
    Commented May 6, 2020 at 2:22
  • 2
    @M-- no. Usually, it runs in the main room. But from time to time, we have issues with Redunda. In that case I switch to dev mode. In Dev mode, Guttenberg does not try to contact Redunda and runs in the workshop only..
    – FelixSFD
    Commented May 6, 2020 at 4:12
  • Thanks for the info. Another question. Is there a way to send a report to Guttenberg? Take this example (answer from dcendents and this From Jenkins, how do I get a list of the currently running jobs in JSON?). Second is a poor rehashing attempt. I cannot find a way to report this to guttenberg as a false negative.
    – M--
    Commented May 18, 2020 at 22:56
  • @M-- Unfortunately, this is not implemented yet. There is an open issue on GitHub, but this might take 6-8 weeks. :-(
    – FelixSFD
    Commented May 19, 2020 at 7:54
  • 4
    Any chance this would work for users on other SE sites besides SO? A certain user is currently suspended on four sites for plagiarism, and has plagiarized on at least four other sites.
    – LShaver
    Commented Aug 27, 2021 at 1:51
  • 1
    Plagiarizing earlier posts is also a trick used my some spammers. Perhaps these posts could be reported to Smoke Detector automatically if they meet some (probably simple) criteria?
    – tripleee
    Commented Oct 4, 2021 at 17:19

0

You must log in to answer this question.

Browse other questions tagged .