54

UPDATE

We've built and will be releasing a plagiarism flag on 22 March 2023 (on SO only). For more info, see this MSO post - Plagiarism flag and moderator tooling launching to Stack Overflow


A while ago, one of the mods from SO requested help in dealing with plagiarism on SO. While we didn’t have an answer at the time, we have been working with the moderators to better understand this issue so that we could come up with a plan to help tackle this problem.

For more context, please see the linked question and Catija's response to it.

What we want

We need your feedback on our tentative plan for handling plagiarism on the network (described below).

Note: the content below is (mostly) duplicated from the answer. We're not putting it in quote markup, as both I and Catija worked on it, and the quote would be gigantic. Yes, we understand the irony here.


Our [tentative] plan

The solutions we’re working on are:

New plagiarism flag type for questions & answers

This new flag type will require flaggers to include a link to the original source and give them a space to add an explanation. This will also allow for these flags to be bucketed into the same category on the flags dashboard, separating them from other custom flags and giving us a better idea of the volume. This will only be enabled on Stack Overflow initially, but other sites will be able to request it if they have need.

[Note: this new flag type would initially be available only on SO, but depending on need, we could potentially roll it out on other sites.]

Allow mods to deny reputation for deleted plagiarized posts

Our system automatically allows the poster to keep reputation earned when a post is over 60 days old and has a score of 3 or higher. In the case of plagiarism, this means that moderators often have to request that CMs ‘disassociate’ older posts to ensure the user isn't earning reputation for content they didn't create. When posts are disassociated, they're no longer connected to the poster's account, meaning that the fact they've had posts deleted as plagiarism disappears. So we need a solution that removes reputation without disassociation. Because we want to allow users to fix their own posts and get them undeleted, this solution would only impact the user's reputation while the post is deleted.

Make it easier for mods to see when users have a history of plagiarism

Once flags have been handled, it becomes difficult for mods to see how common it is that a user has plagiarism flags validated on their posts. In order to simplify future investigation for cases of repeated plagiarism, we want to ensure these handled flags are easy to see. We think that one of the best ways to achieve this is by improving the flagged posts for user moderator page. This page shows all posts a user has that have ever been flagged but it's not sortable and can't be filtered. We think having this page look more like the flags raised by user page will be more useful for moderators in many cases.

Create a post notice explaining why the post was deleted and guiding the author

Often, moderators leave comments linking to the original sources to notify the poster and have an indication for high-reputation users why the post was deleted. We are going to investigate adding a post notice to answers deleted as plagiarism that will notify the poster and give them helpful information about how to properly attribute their post on our network and get attention to have it undeleted if the issues are fixed.


These are the initial changes that have been planned. We’ll be monitoring the following over the next couple of months:

  • Time that mods are spending on flags related to plagiarism
  • % of CM escalations related to plagiarism
  • Answers to this question

Feedback

If you have any thoughts on the proposals above, or any concerns about what we are planning, please let us know in the answers below. We want to ensure that we understand your line of thought, so please focus on your concerns or issues and why you are concerned, rather than on specific solutions to the problems.

7
  • 3
    Are there any plans to (attempt) an at least partial migration of current plagiarism flags?
    – Zoe
    Commented Nov 7, 2022 at 17:04
  • 5
    @ZoestandswithUkraine there are talks and it is something that we are considering. However there is concern about accuracy and that would cause more harm than good. TL;DR, yes there have been a discussions, but there are a lot of factors to consider.
    – Bella_Blue StaffMod
    Commented Nov 7, 2022 at 17:14
  • 4
    @Bella_Blue Accuracy in identifying which of the existing flags are for plagiarism is something we can solve, even if you want 100% human review of the selection. Many moderators already go through the flag list multiple times picking out things that aren't plagiarism, or specifically looking for plagiarism flags to handle (although finding a single, or few, plagiarism flags to handle is easy, but finding all of them isn't). While it would be some work, it wouldn't be that hard to have moderators identify which ones are plagiarism and which aren't.
    – Makyen
    Commented Nov 7, 2022 at 17:35
  • 2
    In conjunction with this effort, it might also be worthwhile to write up some tips for identifying posts that might be plagiarized, if we don't have that already.
    – ColleenV
    Commented Nov 7, 2022 at 18:15
  • 2
  • 4
    @FranckDernoncourt if that is about handling plagiarism on Quora/external sites, then I don't think this will change anything since SE also doesn't own copyright and they can't act behalf them. This is about handling plagiarism posted on SE. Commented Nov 9, 2022 at 3:06
  • @V2Blast and/or Bella: maybe feature this, now that we have only one featured MSE question? Thanks. :) Commented Nov 10, 2022 at 9:38

4 Answers 4

19

Multiple Sources

As someone who has spent a fair amount of time looking through posts of frequent plagiarists (as well as coordinating cleanup efforts of plagiarised posts) I have frequently run into situations where answers are plagiarised from multiple sources. It's a fairly common occurrence that someone will plagiarise from the top answers (multiple) on the same question. I've also seen multiple external sources used e.g. an article and an unrelated infographic from a different source.

How does this solution scale to multiple sources?

Is there going to be a built-in way to indicate there are multiple sources, or is it just going to be the case that you add one link to the link field and include the rest in the explanation?

Validation and Character Limits

Links are frequently very long especially if you're trying to link to a specific part of the page and also provide an archive.org link because the page content has changed since the answer was posted.

Additionally, I've seen that users have a tendency to... uh... just put anything in boxes and click buttons. Genuinely, people using the wrong field for the wrong thing is a regular occurrence (on Stack Overflow at least)

I have two questions related to the input fields here.

  1. What are the character limits for each field? (please consider that a single off-site link can sometimes be over a hundred characters long and there are commonly multiple links required)
  2. Are we going to do any text validation to ensure that the source link field contains something that at a minimum is link-like?
5
  • 2
    "Are we going to do any text validation to ensure that the source link field contains something that at a minimum is link-like?" - how would this work if someone, say, copied something from a non-digitized source (incl. exams in some cases)? That's something that Puzzling sees sometimes and I wouldn't be surprised to see it on Literature. While this may be rolling out only on SO for now, I'd like to not exclude the possibility of it rolling out to the rest of the network.
    – Mithical
    Commented Nov 8, 2022 at 19:32
  • That's a good question @Mithical. I guess my follow up question is how is a mod confirming the source in those cases? Even when I'm flagging plagiarism where the original source is a printed text book, I'm linking to an online copy/preview that shows the relevant section of the text (e.g. Google Books) so the mod can confirm it (without purchasing a text book or tracking down some arbitrary piece of print media). Commented Nov 8, 2022 at 19:36
  • 1
    Though I do see now that restricting that field to only links may not be a one-size-fits-all solution network-wide. I still have concerns of the dedicated flag becoming ineffective if users are filling it out incorrectly. At the scale of Stack Overflow minimising issues on the front-end is enormously helpful and anything that can be done to address that is beneficial (even if it's not by doing link-link validation). Commented Nov 8, 2022 at 19:39
  • 3
    1. Probably similar to the current limits. We haven't defined them yet. 2. If we do something to validate the links, we'd do it very cautiously since some posts are from multiple sources (as you point out), so we don't want to (for example) reject spaces or non-link characters.
    – Catija
    Commented Nov 8, 2022 at 20:15
  • 1
    Maybe a warning is better then. Something like "Plagiarism flags generally contain a link to at least one online source are you sure you want to submit anyway?" Commented Nov 9, 2022 at 0:12
13

I'm really glad to see how seriously you guys are taking this plagiarism issue and how much you have planned to lessen the weight on mods' shoulders. The backlog of plagiarism flags is large, but so is the backlog of plagiarized posts that aren't yet discovered and flagged. I haven't been flagging many recently, but there really are a lot still to be found.

Let me get just one concern out of the way first:

Make sure that the mod-deletes-as-plagiarism-and-removes-reputation tool clearly indicates somewhere that it was deleted as such

Mods having a tool to bypass the post deletion reputation protection you mentioned is obviously needed even from my POV. I'm sure you guys have processed a lot of dissociation requests as a result of the inundation of plagiarism flags. For the purposes of transparency, I think it best to ensure that there's some recording on the post for a delete-as-plagiarism event that removes reputation in an event where it wouldn't ordinarily remove reputation. I know that the mods wouldn't use this functionality without merit, but I feel like the natural response to mods gaining this type of ability could make folks feel a bit uneasy.

Noting somewhere on the post/in the timeline that rep was removed because it was plagiarism ensures that the tool really is only being used for the purpose of removing truly ill-gotten reputation.

Can we improve the workflow for mods going through and sourcing other posts on a repeat-offending plagiarist?

When the going gets tough, the tough... Say "screw it" and delete everything.

One of the biggest time-sinks for mods with regards to plagiarism is going through each of that user's contributions and seeing if they're possibly plagiarized from somewhere else. In cases where the user has a lot of contributions and over X% of them are plagiarism, mods eventually get to the point where they delete everything that's likely to be plagiarized and prompt the user to give them a list of what's not plagiarized. In even more extreme cases, they escalate it to you guys for further review to determine whether the whole profile should just be deleted.

What I'm wondering is... Can we ease that process at all? I'm sure this particular pain point has been mentioned to you guys before. Has any discussion taken place regarding how to ease this particular workflow? I can't imagine it's an easy one to tackle, but I'm curious where you've landed so far.

3
  • 2
    Regarding your first point, isn't that covered in "We are going to investigate adding a post notice to answers deleted as plagiarism..."? Or are you asking for the post notice to explicitly mention that votes have been reversed?
    – 41686d6564
    Commented Nov 7, 2022 at 18:03
  • 2
    @41686d6564standsw.Palestine More to the latter point. I suppose if the tool is wrapped up in a one-shot event (i.e. Mod chooses "Mod->Delete as plagiarism", post is deleted, rep is revoked, and a post notice is added) it would naturally follow that the timeline would indicate all of those actions accurately. My only concern is that the deletion event doesn't mention that it's a special remove-rep-and-delete event, and that that particular part is unclear to other viewers.
    – Spevacus
    Commented Nov 7, 2022 at 18:06
  • 5
    Regarding the deleted as plagiarism - that's a fair point. We were focusing the tool on deleted posts only so that the likelihood of anyone running across the post was lower but it does make sense to include that in the post history. We want to make it easier for mods to review posts in bulk but that will likely require additional tooling that would automate the post review process - which is a bigger thing - whether that means building something or from scratch or getting a third party service (similar to what professors use for essays).
    – Catija
    Commented Nov 7, 2022 at 20:01
11

Would it be possible to have a built-in moderator tool for assessing whether a post is likely to be plagiarised or not?

I realise this is potentially a big ask, and I have no idea how easy it would be to coordinate or code, but there are such tools already in existence, such as Turnitin and Ithenticate, which are used in academia and other settings. Maybe it could be possible for the SE company to make some agreement with a company like Turnitin, to integrate some of their software with our sites? Wild idea, but I thought it's worth asking and now might be exactly the right time to ask.

My experience with these tools in an academic setting leads me to caution against using them too blindly: for example, it would be a terrible idea to run every new post through such a test and automatically flag it if the similarity percentage is too high. But it could be very helpful for cases when moderators are already investigating potential plagiarism: to get a tentative estimate of similarity to existing sources (I wouldn't give Turnitin results any more credibility than that, just a tentative estimate) in order to figure out whether a post is worth further investigation or not.

In my experience of dealing with plagiarism as an SE mod (and I definitely agree it's one of the most time-consuming mod jobs, although I don't know if my workflow for it is anything like that of the SO mods), the most time-consuming part is checking through all of a user's posts and trying to figure out which ones are plagiarised. If we had a handy button like "get similarity percentage report list for all this user's posts", it could massively reduce the time spent on such cases. More investigation would still be needed to verify the automatically generated reports, but at least we'd have a clearer idea of where to look, which would make our work more efficient.

3
  • 7
    This is kinda what I'm getting at in the second part of my comment on Spevacus's answer. :) We definitely think it'd be valuable and time-saving but it's a big build and we'd probably want to prioritize it in the bigger ecosystem of tooling improvements rather than as part of this project, if that makes sense?
    – Catija
    Commented Nov 7, 2022 at 20:33
  • 1
    Even an internal (within site? Within network?) tool for detection , similar to but fuzzier than the identical posts one might make sense. That said I was recently bamboozled by a spammer who had copied another post so... Commented Nov 8, 2022 at 5:17
  • @Catija Fair enough, I just wanted to get the suggestion out there in an answer. IMHO, the current changes are all positive and improve various aspects of handling plagiarism but won't make a significant change in the time needed to handle plagiarism flags. Commented Nov 8, 2022 at 5:35
5

Here at MSE we have a definition of plagiarism, but individual sites vary on requirements to attribute and length of included text; some simply allow rewriting in your own words, that is sufficient.

We should adopt a universal standard for all our sites, so that the rule is enforced consistently.

Different academic institutions have slightly different rules, should we adopt one of those; since our sites where plagiarism is most disliked are those where most users have an academic background.

Here are some instances where we differ from the academic standards of some institutions:

"We discussed this on meta.serverfault a short while ago and decided that you should Steal comments that answer the question and post them as an answer. You can always tick the Community Wiki box if you're not comfortable earning reputation for someone else's work.".

There is also this answer which @Makyen had a hand (no complaint, no foul) in:

"What should moderators (or passing editors) do about suspected plagiarized or copyrighted images in posts?":

"... Complaints of copyright violation must be sent to Stack Exchange's designated agent1 by the copyright holder or their designated agent — third parties are not competent to judge whether an image has been used by permission.

As a user, you may want to inform posters who reuse someone else's image that they may be in violation of copyright. Moderators will not handle copyright infringement complaints. This is outside their attributions. (If moderators handled copyright infringement complaints, that could make them legally responsible for any mistake that they make.)".

As you can see from the FAQ on plagiarism, it doesn't follow what is taught at some institutions, and users follow the FAQ may run afoul of site policy:

"Paste the URL and point out who the author is.".

That won't be enough on many sites, nor does it address self-plagiarism.

Examples of the differences between plagiarism policies:

Avoiding Plagiarism - Paraphrasing

  • "In writing papers, you will paraphrase more than you will quote.
  • For a report or research paper, you may need to gather background information that is important to the paper but not worthy of direct quotation. Indeed, in technical writing direct quotation is rarely used.".

At the Massachusetts Institute of Technology plagiarism is avoided simply by citing the source and paraphrasing.

What is Plagiarism?

  • "For purposes of the Stanford University Honor Code, plagiarism is defined as the use, without giving reasonable and appropriate credit to or acknowledging the author or source, of another person's original work, whether such work is made up of code, formulas, ideas, language, research, strategies, writing or other form(s).
  • Moreover, verbatim text from another source must always be put in (or within) quotation marks.”.

At Stanford you are allowed to copy verbatim, but need to use attribution and quotation marks.

Plagiarism

  • "Plagiarism is defined as use of intellectual material produced by another person without acknowledging its source, for example:
  • Wholesale copying of passages from works of others into your homework, essay, term paper, or dissertation without acknowledgment.
  • Use of the views, opinions, or insights of another without acknowledgment.
  • Paraphrasing of another person’s characteristic or original phraseology, metaphor, or other literary device without acknowledgment.".

Berkeley only requires acknowledgement, and doesn't focus on length.

Citing sources and Plagiarism

  • Plagiarism is defined as any of the following:
  • The use of words or ideas of another with no credit to the original source.
  • Paraphrasing or restating the ideas of another without acknowledgment.
  • Presenting data or facts that have been borrowed without full citation to the original source. Fraud—a more serious offense—involves the presentation of fabricated data or facts.
  • Using a unique term or concept that one has read, without acknowledging its author or source.
  • Using the ideas of other students without giving proper credit. These ideas include those obtained through discussion groups, notes, the electronic transfer of notes, and the work of students who have participated in previous class discussions. While verbal plagiarism is more difficult to detect and enforce, the same standards and principles of credit and attribution apply.
  • Copying a computer program from another student or any other source, or deriving a program substantially from the work of another, without permission and acknowledgment.".

Self-Plagiarism

  • "It is expected that all work submitted for any HBS course has been completed solely for that course. Self-plagiarism includes the practice of submitting identical or very similar material for credit in two separate courses. While the school encourages you to continuously integrate your learning across courses, it is not acceptable to submit the same deliverable (or a very similar deliverable) to more than one course.".

At the Harvard Business School they seem to be the most strict:

  • You must give credit even when paraphrasing.
  • When using a unique term or concept you must give credit.
  • Work of another student (user) requires both permission and acknowledgement.
  • Self-plagiarism is to be avoided, you can't keep offering the same answer to different questions, even across different websites. I see a lot of that, without reference to previous usage.

Proposed answer: Stack Exchange should write a clear policy to be enforced for all sites.

People learn in school what is considered plagiarism, but the principles differ depending on the university; a single clear definition for our use is helpful to set out what our standards are.

3
  • 1
    Please explain how you think the existing all-Stack Exchange policy regarding plagiarism in the Help Center article "How to reference material written by others" is unclear and/or insufficient. The "How to reference material written by others" article exists in the Help Center on every site. You're asking for "Stack Exchange should write a clear policy to be enforced for all sites", but such a policy already exists, and has existed since at least 2014, so please explain how you feel it needs to change, or be made more specific/explicit/detailed.
    – Makyen
    Commented Nov 11, 2022 at 14:26
  • 1
    Rob, your first and last couple of paragraphs don't appear to clearly identify anything that needs to change in the Help Center article on SE that defines plagiarism. How is the existing definition in the help center article unclear? What issues do you feel are not covered in the existing Help Center article? None of the definitions which you've included here appear to address the "length of included text", which appears to be one of the major issues that you want defined. I'm not saying that the Help Center article can't be improved, but it's really unclear to me what you feel is deficient.
    – Makyen
    Commented Nov 11, 2022 at 15:17
  • @Makyen, thanks. I've updated my answer.
    – Rob
    Commented Nov 11, 2022 at 16:18

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .