32

If there is a web article (or even a website) without a date of publication, is there a chance to determine a year of publication for a complete reference of it? Or should it omitted in the reference?

4 Answers 4

43

The Publication Date may be in the webpage's meta data or source

There may be date information contained in the meta tags in the page source. If you are not using a tool (see below) to extract this information, you can view the page source and attempt to interpret it. There are many different ways publication date information may be stored in meta tags. The most common ones will begin with <meta. The number of ways in which an appropriate date may be stored in in these tags is just too numerous to cover in a single post. If you are not familiar with what these might look like, a tool (see below) to extract the data will be quite helpful.

Extracting the date from meta tags, and picking the correct date (there may be several), can be complex. How, exactly, to do so varies from website to website. In addition, each site may change their format from time to time. If you are going to be referencing more than one or two webpages a well maintained tool to extract the reference information will be very helpful.

As an example, this page (randomly selected for testing some time ago), which does not display a date if you have JavaScript turned off (dates are displayed if JavaScript is turned on), contains appropriate publication dates in the following tags (there are more tags that contain dates that are not appropriate):

<meta name="parsely-pub-date" content="2014-10-14T23:45:00.011Z" data-ephemeral="true">
<meta name="date" content="2014-10-14T23:45:00.011Z" data-ephemeral="true">
<meta name="iso-8601-publish-date" content="2014-10-14T23:45:00.011Z" data-ephemeral="true">
<time class="published-at time-based" datetime="2014-10-14T23:45:00.011Z" itemprop="datePublished">
<time class="updated-at__time" datetime="2014-10-15T05:07:37.564Z">
<meta name="pubdate" content="Tue Oct 14 2014 19:45:00 GMT-0400" data-ephemeral="true">

That page also contains the publication and update dates located in multiple <script> tags, links, and various other tags.

While it is possible to extract this type of information by hand, it is usually much more effective allow a tool to do so for you. The tools mentioned in the last section of this answer should do a reasonable job of extracting the publication date from most webpages.

However, if you don't have one of those available, here is a bookmarklet that will toggle a display at the top of the page of all tags, except <A> and <IMG>, which contain a date in YYYY-[M]M-[D]D format; or in English language Month, [D]D, YYYY; or [D]D Month YYYY (It does not show text which is part of the displayed text):1,2

javascript:void((function(){var toRm=document.getElementById('showTagsWithDate');if(toRm){document.body.removeChild(toRm);return;}document.body.insertAdjacentHTML('afterbegin','<div id="showTagsWithDate" style="background-color:white;color:black;">Tags with a date in YYYY-[M]M-[D]D format; or in English (US, or non-US format):<ul/></div>');var myul=document.body.firstChild.lastChild;var tags=[];function addMoreDates(reg){var addTags=document.documentElement.innerHTML.match(reg);if(addTags){addTags.forEach(function(newTag){if((newTag.indexOf('<a ')===0)||(newTag.indexOf('<img '))===0){return;}if(tags.indexOf(newTag) === -1){tags.push(newTag);}});}}addMoreDates(/<[A-Z][^>]*\D(20\d\d|1\d\d\d)[\s\/\-.,]\s*([1-9]|0[1-9]|[1][012])[\s\/\-,.]\s*([1-9]|0[1-9]|[12]\d|3[01])\s*(st|nd|rd|th){0,1}\D[^>]*>/img);addMoreDates(/<[A-Z][^>]*\b([1-9]|0[1-9]|[12]\d|3[01])(st|nd|rd|th){0,1}[\/\-\s]\s*(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)[\s,.\/\-][\s,.\/\-]?\s*(20\d\d|1\d\d\d)\b[^>]*>/img);addMoreDates(/<[A-Z][^>]*\b(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)[\s,.\/\-][\s,.\/\-]?\s*([1-9]|0[1-9]|[12]\d|3[01])(st|nd|rd|th){0,1}[\s,.\-]+(20\d\d|1\d\d\d)\b[^>]*>/img);if(tags.length===0){tags=['No tags with dates.'];}tags.forEach(function(tag){myul.appendChild(document.createElement('LI')).appendChild(document.createTextNode(tag));});document.body.firstChild.appendChild(document.createElement('BR'));})())

Bookmarklet searching all text on the page for dates

In case you need to look through all the text, including displayed text, for dates, the following bookmarklet will display just the dates contained in the HTML. It will not show any context for these dates. You should be careful when using the results of this bookmarklet, as you will need to determine why a particular date is in the text, and you will need to verify that the date is, in fact, a date, because the regular expressions used can recognize some strings which are not dates as dates. But, it may provide you with some hints as to what you should be looking for. The formats displayed include those in the previous bookmarklet plus YYYY month/season; month/season YYYY); [M]M-[D]D-YYYY; [D]D-[M]M-YYYY; and YYYYMMDD. Duplicates are not displayed. In addition, the list is sorted from earliest to latest (except some formats). Given the much broader set of things being searched for, on some pages there will be some items found which are not dates. You will need to use your own judgement. This bookmarklet is quite long.1,2 The bookmarklet will toggle showing/not showing all the dates it finds in the text.

javascript:void((function(){var toRm=document.getElementById('showTagsWithDate');if(toRm){document.body.removeChild(toRm);return;}document.body.insertAdjacentHTML('afterbegin','<div id="showTagsWithDate" style="background-color:white;color:black;">Dates in the HTML in multiple numeric and English language formats:<ul/></div>');var myul=document.body.firstChild.lastChild;var tags=[];function addMoreDates(reg){var addTags=document.documentElement.innerHTML.match(reg);if(addTags){addTags.forEach(function(newTag){if(tags.indexOf(newTag)===-1){tags.push(newTag);}});}}addMoreDates(/(20\d\d|1\d\d\d)[\s\/\-.,]\s*([1-9]|0[1-9]|[1][012])[\s\/\-,.]\s*([1-9]|0[1-9]|[12]\d|3[01])\s*(st|nd|rd|th){0,1}(?=\D)/img);addMoreDates(/([1-9]|0[1-9]|[12]\d|3[01])(st|nd|rd|th){0,1}[\/\-\s]\s*(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)[\s,.\/\-][\s,.\/\-]?\s*(20\d\d|1\d\d\d)/img);addMoreDates(/(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)[\s,.\/\-][\s,.\/\-]?\s*([1-9]|0[1-9]|[12]\d|3[01])(st|nd|rd|th){0,1}[\s,.\-]+(20\d\d|1\d\d\d)/img);addMoreDates(/\b([1-9]|0[1-9]|[1][012])[\s\/\-.,]\s*([1-9]|0[1-9]|[12]\d|3[01])[\s\/\-,.]\s*(20\d\d|1\d\d\d)\s*\b/img);addMoreDates(/\b([1-9]|0[1-9]|[12]\d|3[01])[\s\/\-.,]\s*([1-9]|0[1-9]|[1][012])[\s\/\-,.]\s*(20\d\d|1\d\d\d)\s*\b/img);addMoreDates(/\b(winter|spring|summer|fall|autumn|january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)[\s,.\/\-][\s,.\/\-]?\s*(20\d\d|1\d\d\d)\b/img);addMoreDates(/(20\d\d|1\d\d\d)[\s,.\/\-]\s*(winter|spring|summer|fall|autumn|january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)/img);addMoreDates(/\b(20\d\d|1\d\d\d)(0[1-9]|[1][012])(0[1-9]|[12]\d|3[01])\b/img);tags.sort(function(a,b){var aVal=Date.parse(a);var bVal=Date.parse(b);if(aVal===bVal){return 0;}if(aVal>bVal){return 1;}return -1;});if(tags.length===0){tags=['No dates detected in page.'];}tags.forEach(function(tag){myul.appendChild(document.createElement('LI')).appendChild(document.createTextNode(tag));});document.body.firstChild.appendChild(document.createElement('BR'));})())

The above bookmarklets are not the best tool for the task of finding the correct date of a page. A purpose-built referencing tool can devote significantly more logic to obtaining, formatting, and displaying such dates with the context needed for you to determine the correct one to use. In addition, for dates that include the month written out as a work, the above bookmarklet only looks for English months.

Last resort: use the Last Modification Date

If the page does not contain an explicit publishing/creation date in either the displayed text, or the meta data in the page source (viewed either by hand or using a referencing tool), you should include the Last Modification Date. The Last Modification Date is the date the publisher, the company hosting the page, is claiming as the date the page was last modified (i.e. the date it was published). It should be used as a last resort, if no other date is available. It should not your first choice for a date.

The Last Modification Date is usually the date/time at which the primary file for the page was changed which is determined by the modification timestamp stored with that file. This date may, or may not, be accurate. It may only represent the date and time to which the clock was set on the system where that file is stored at the last point the file was modified. It may, or may not, consider any dates on additional resources which are loaded onto the page from other locations (e.g. images, or dynamically by JavaScript). While it is not guaranteed to be accurate, it is the "publication date" provided by the publisher (the company hosting the webpage).

For dynamically generated webpages, the Last Modification Date may be the current day and time. While that might not be the information you desire, it is the date/time the page you are viewing was assembled from its base content and presented to you. The system serving the webpage to you may be composing the page from various different sources (e.g. an article fetched from a database which is combined with today's banner, a header, a footer and appropriate ads). While it would be preferable to have the date of the primary content you are referencing, the current date may be the only one the server can provide. Alternately, it might provide the date of the last modification of the primary content. What date is provided depends on how the server has been programmed. This is one of the reasons that using the Last Modification Date should be your last choice when no more authoritative date is available.

In addition, it should be noted that the Last Modification Date may be completely invalid. You should use your own judgement when looking at the date as to its validity.

You can obtain the last modification date using a bookmarklet. One that will show the last modification date is:1,2

javascript:void(window.alert('The page was last modified on '+document.lastModified))

When I have used a last modification date in a reference I will usually include a note like (last modified: xxxx-xx-xx) which indicates how the date was obtained. This is needed because such dates are not normally displayed to the viewer of a webpage.

Bookmarklet with both last modification date and dates in the page

The following bookmarklet combines the above two bookmarlets to show both the last modified date and the dates in the page. It will toggle showing/not showing all the dates it finds.

javascript:void((function(){var toRm=document.getElementById('showTagsWithDate');if(toRm){document.body.removeChild(toRm);return;}var tags=[];function addMoreDates(reg){var addTags=document.documentElement.innerHTML.match(reg);if(addTags){addTags.forEach(function(newTag){if(tags.indexOf(newTag)===-1){tags.push(newTag);}});}}addMoreDates(/(20\d\d|1\d\d\d)[\s\/\-.,]\s*([1-9]|0[1-9]|[1][012])[\s\/\-,.]\s*([1-9]|0[1-9]|[12]\d|3[01])\s*(st|nd|rd|th){0,1}(?=\D)/img);addMoreDates(/([1-9]|0[1-9]|[12]\d|3[01])(st|nd|rd|th){0,1}[\/\-\s]\s*(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)[\s,.\/\-][\s,.\/\-]?\s*(20\d\d|1\d\d\d)/img);addMoreDates(/(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)[\s,.\/\-][\s,.\/\-]?\s*([1-9]|0[1-9]|[12]\d|3[01])(st|nd|rd|th){0,1}[\s,.\-]+(20\d\d|1\d\d\d)/img);addMoreDates(/\b([1-9]|0[1-9]|[1][012])[\s\/\-.,]\s*([1-9]|0[1-9]|[12]\d|3[01])[\s\/\-,.]\s*(20\d\d|1\d\d\d)\s*\b/img);addMoreDates(/\b([1-9]|0[1-9]|[12]\d|3[01])[\s\/\-.,]\s*([1-9]|0[1-9]|[1][012])[\s\/\-,.]\s*(20\d\d|1\d\d\d)\s*\b/img);addMoreDates(/\b(winter|spring|summer|fall|autumn|january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)[\s,.\/\-][\s,.\/\-]?\s*(20\d\d|1\d\d\d)\b/img);addMoreDates(/(20\d\d|1\d\d\d)[\s,.\/\-]\s*(winter|spring|summer|fall|autumn|january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|sept|oct|nov|dec)/img);addMoreDates(/\b(20\d\d|1\d\d\d)(0[1-9]|[1][012])(0[1-9]|[12]\d|3[01])\b/img);tags.sort(function(a,b){var aVal=Date.parse(a);var bVal=Date.parse(b);if(aVal===bVal){return 0;}if(aVal>bVal){return 1;}return -1;});if(tags.length===0){tags=['No dates were detected in the page.'];}document.body.insertAdjacentHTML('afterbegin','<div id="showTagsWithDate" style="background-color:white;color:black;">The page was last modified on '+document.lastModified+'<br>Dates in the HTML in multiple numeric and English language formats:<ul/></div>');var myul=document.body.firstChild.lastChild;tags.forEach(function(tag){myul.appendChild(document.createElement('LI')).appendChild(document.createTextNode(tag));});document.body.firstChild.appendChild(document.createElement('BR'));})())

Always include an Access Date for a webpage or web resource

Even if you include any other date, you should always include the date you accessed the page, an Access Date. The Access Date is the only date which you know is correct and is critical information needed for someone to verify they are seeing the same resource you did.

Referencing is about providing enough information such that a reader can verify the information being referenced with the source material, or read more detail if interested. Unlike paper publication, webpages are not static. At any time, they can be changed or removed by the person/company in control of the website, or disappear if the company/institution goes out of business. Thus, even if there is a "publication date" contained in the page, the date you accessed the page should always be provided. If you do not provide the access date, then there is no way for a reader to know exactly which version of a page you were viewing. While a particular webpage may not have changed from the time it was created to when you viewed it, there is no way for you to know that it will never change in the future.

The date you accessed the page can be included in the reference as a note (e.g. (accessed xxxx-xx-xx)).

Create an archive at the time you access the page

When using a webpage as a reference, I almost always cause at least one archive to be created at the time I view/reference the webpage. This helps ensure that readers wishing to view the source material are able see exactly the same page I did even if the webpage is changed or removed.

In addition, depending on the format you are using for the reference (e.g. online vs. paper), you may be helpful to include in the reference a direct link to the archive which was created of the page you are referencing.

To make an archive at the time I am viewing the page, I commonly use a bookmarklet to archive.org, or other archiving site, which causes them to create an archive at that time.

Bookmarklet to create an archive of the current page on archive.org

That bookmarklet to create an archive of the page you are currently viewing on archive.org is1:

javascript:void(window.open('https://web.archive.org/save/'+location.href))

In my opinion, archive.org is, by a significant margin, the most well established and stable archiving site. However, there are others. Below are bookmarklets for a couple of additional archiving sites:

Bookmarklet to create an archive of the current page on archive.is

A bookmarklet to create an archive of the page you are currently viewing on archive.is. To actually create the archive, you will need to click on "Save the page" on the page which is opened when you use this bookmarklet.1:

javascript:void(open('https://archive.is/?run=1&url='+encodeURIComponent(document.location)))

Tools for getting reference data, including dates, from webpages

There are many tools which can help you obtain reference information from webpages, including dates. If you are not already using such a tool, one place which you can find free tools to view reference information contained within a webpage is to investigate the ones used on Wikipedia. While the formats used for references on Wikipedia is probably not appropriate for most cases, Wikipedia has been dealing with the issue of obtaining reference information from webpages for quite some time. Their page "Help:Citation tools" contains multiple tools which will extract dates, and other information, from webpages and display it in a format which you can copy into whatever reference format you are using.

Any tool you use to extract reference information is just a tool. The quality of the information extracted will depend significantly on how up to date the tool is relative to any changes on a specific site. You will need to review for accuracy any information provided by such tools.

Notes:
1. For security reasons, StackExchange does not permit JavaScript links to be created in pages. Thus, if you want to use the above bookmarklet you will need to manually create it. The above text should go in the "location" area of the bookmark.

2. If you are using Firefox, when you save the bookmarklet, all of the spaces will be automatically translated to %20. If you entered it with %20s, Chrome would do the reverse (translate all %20s to spaces). IE leaves either the space, or the %20, as you entered it. As a result, if you are using Firefox, don't worry when you see the bookmarklet location looks something like: javascript:void(window.alert('The%20page%20was%20last%20modified%20on%20'+document.lastModified))

12
  • 1
    Fantastic answer! You do not need the %20's in the first bookmarklet as the text is enclosed in quotes which automatically escape the spaces.
    – Raydot
    Commented Aug 2, 2016 at 18:25
  • 2
    @DaveKaye, Thanks. Correct, using %20 is not needed for a string enclosed in quotes. They are automatically substituted for spaces by Firefox, the browser I normally use, when the bookmarklet is stored. I left them in there because I felt people would be a bit less confused to see %20 translated to a space (Chrome) than spaces translated to %20 (Firefox). IE leaves them as-is (either as a space, or %20 depending on what was entered). I will add an explanation in a footnote.
    – Makyen
    Commented Aug 2, 2016 at 18:41
  • Note: Last modified only works on static sites, dynamic sites with static content may return the current time as last modified, and the browser may actually increment the last-modified time if set to 'now'. See: dl.tyzoid.com/index.html vs dl.tyzoid.com/index.php
    – Tyzoid
    Commented Aug 2, 2016 at 19:14
  • 1
    @AndreaLazzarotto, There are a variety of other archiving sites. In my opinion, archive.org is the most well established and stable. But it may not be the most appropriate for any particular task/page. I have added bookmarklets for both WebCite [you have to edit in your email address] and archive.is. I agree that having a link to the archive in the reference is a good idea when the format is appropriate for it. I had mentioned so in the archiving section, but the paragraph was stuck at the end of the section and not as clear as it could be. I have moved it and reworded a bit to be more clear.
    – Makyen
    Commented Aug 4, 2016 at 10:15
  • 1
    @AndreaLazzarotto, Yeah, it looked like a big flap (4 long pages RFC, RFC2, RFC3, and RFC4, all linked from the top of each of those pages; no room to link here). It included some potentially inaccurate assumptions (including if the person involved actually was from archive.is). I was not involved at the time it went down. Aside from those issues, I still wonder about their financial stability (undisclosed, privately funded). I do use archive.is from time to time when they store the best representation of a page.
    – Makyen
    Commented Aug 4, 2016 at 11:39
36

As prevalent mostly in Computer Science, while referring to websites you mention the date on which you accessed the referred link. For example, if you want to refer http://www.example.com on Absolute Random Topic, you could do it as follows

[1]: Absolute Random Topic, http://www.example.com (Accessed on: 02/08/2016)

3
  • That's only half the answer - you should make sure the referred-to site will remain accessible e.g. via archive.org Commented Aug 3, 2016 at 5:59
  • @TobiasKienzler my understanding of archive.org is that it will have its own periodic snapshots of a website, which could be or not be updated. Other services exist for making snapshots of web pages in their current states. To me this is half of the answer because it misses the year field that is very common in citations (also in BibTeX). I tend to have both year and accessed when citing web pages.
    – user7112
    Commented Aug 3, 2016 at 7:16
  • @dgraziotin You can ask archive.org to save a current snapshot of a page. See Makyen's answer. Commented Aug 3, 2016 at 9:17
9

I usually use this site when citing websites:

https://archive.org/web/

It gives you information on when the page was last updated - that's about the best you can do I think (beats just putting it down as the current year anyway).

2
  • Yeah, Brewster Kahle did a great job. (Brewster, I'm to say hi from Missy.) The internet archive can also be used to establish a date after which the content in question was surely available (its first appearance in a version) and, similarily, a date before it was not available (date of the last version without the content) if the site is archived often enough. Commented Aug 2, 2016 at 21:50
  • 1
    This is not entirely correct - the date is simply that of the last snapshot, not of actual update. However, +1 for pointing out archive.org, since that's one reliable way of permanently accessing old websites that may not be hosted anymore. Commented Aug 3, 2016 at 6:01
8

If the website does not provide a publishing date all you can do is to provide the current year when citing the web page.

A plus would be to make the page permanent using a service like http://archive.is or https://perma.cc, so that your citation would contain an item that cannot (arguably) disappear or change over time.

7
  • 4
    Using a date from the metadada can often be worse than using no date at all, as when dates in metadata even exist they may well refer to the structure rather than the content.
    – Chris H
    Commented Aug 2, 2016 at 13:23
  • @ChrisH yeah, actually those who look for the metadata know what to look for. That part of my answer is misleading. I will delete it.
    – user7112
    Commented Aug 2, 2016 at 14:32
  • 1
    I don't suggest deleting, just perhaps adding a note to say use caution.
    – Chris H
    Commented Aug 2, 2016 at 14:56
  • @ChrisH: Depending on the citation format imposed (and, possibly, the BibTeX style or similar required to be used), anything can be better than using no date at all. Commented Apr 6, 2017 at 16:17
  • @O.R.Mapper I can't agree. My comments oppose the use of dates from metadata and refer to a now deleted part of the answer. Using the download date (or even no date at all) is better than suspect information. Metadata that reflects when a content management script was written and not when the content was written is quite plausible. It would play havoc with publication precedence if it made content appear earlier than it really was. The style is only a means of presentation and had no bearing whatsoever on whether incorrect or suspicious information should be presented
    – Chris H
    Commented Apr 6, 2017 at 20:40

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .