Jump to content

API:Search and discovery

From mediawiki.org

MediaWiki, its extensions, and its sibling projects hold tremendous potential for knowledge discovery through search. The Search Platform team maintains the mechanisms, tools, and services for doing so.

Users can find information in MediaWiki by looking it up directly, and in Wikidata by reading Help:Navigating Wikidata.

MediaWiki

[edit]

The MediaWiki API has several search-related modules. You can make requests and view generated help at any wiki's /w/api.php entry point, or fill in API request parameters at Special:ApiSandbox.

Search modules

[edit]
action=opensearch
See API:Opensearch . Returns search results in OpenSearch format, each with Extension:TextExtracts on Wikimedia projects. View generated API help
action=languagesearch
Search for language names in any script. View generated API help

Query list submodules

[edit]

These Query submodules return a list of wiki pages matching the search criteria, and some return additional information about each page. Furthermore, you can use each as a generator to provide many other Properties of the set of returned pages, such as a lead image, snippet, and/or page description.

action=query list=prefixsearch
Retrieves wiki page titles with the given prefix. See the showcase article Page info in search results. See module documentation for API:Prefixsearch and View generated API help.
action=query list=search
Uses the wiki search engine to find matching pages. On Wikimedia wikis it provides search results from CirrusSearch, returning typical search result information such as text snippets and page size. See module documentation for API:Search and View generated API help
action=query list=geosearch
If the GeoData extension is installed on the wiki, then this returns wiki pages near a location, with their geographical information. See the showcase article Showing nearby wiki information, module documentation for geosearch, and View generated API help.

Clients

[edit]

Command line

[edit]

From the command line you can query the API using cURL to make the API request, then use jq to parse the JSON response.

For example, let's try looking up item Richard Feynman (Q39246) on Wikidata, and request its English-language label:

$ URL='https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q39246&format=json'
$ curl -s $URL | jq '.entities[].labels.en.value'
"Richard Feynman"

$ curl -s $URL | jq '.entities[].claims|length'
55

We find that Q39246 is the Wikidata identifier for the item with English label "Richard Feynman", and that there are 55 claims made about him.

JavaScript

[edit]

To write a MediaWiki API client in JavaScript, all that's needed is a JSONP handler. Many libraries (e.g. jQuery) include JSONP clients, or one can be written independently.

Within the MediaWiki ecosystem, jQuery can be used directly:

$.ajax({
  url: '//www.wikidata.org/w/api.php',
  data: { action: 'wbgetentities', ids: mw.config.get('wgWikibaseItemId'), format: 'json' },
  dataType: 'jsonp',
  success: function (x) {
    console.log('wb label', x.entities.Q39246.labels.en.value);
    console.log('wb description', x.entities.Q39246.descriptions.en.value);
  }
});

This uses jQuery's $.ajax() which is available in many interactive JavaScript coding environments and makes sense if your eventual goal is a separate standalone project.

If your eventual goal is code running on a wiki, e.g. as a Gadget, then you should use the higher-level mw.api() function provided by the 'mediawiki.api' ResourceLoader module.

In other environments, a simple JSONP handler can be written:

var mw;
(function (mw) {

  /**
   * Query a MediaWiki api.php instance with the given options
   */
  function mwQuery(endpoint, options) {

    /**
     * Create a uniquely-named callback that will process the JSONP results
     */
    var createCallback = function (k) {
      var i = 1;
      var callbackName;
      do {
        callbackName = 'callback' + i;
        i = i + 1;
      } while (window[callbackName])
      window[callbackName] = k;
      return callbackName;
    }

    /**
     * Flatten an object into a URL query string.
     * For example: { foo: 'bar', baz: 42 } becomes 'foo=bar&baz=42'
     */
    var queryStr = function (options) {
      var query = [];
      for (var i in options) {
        if (options.hasOwnProperty(i)) {
          query.push(encodeURIComponent(i) + '=' + encodeURIComponent(options[i]));
        }
      }
      return query.join('&');
    }

    /**
     * Build a function that can be applied to a callback.  The callback processes
     * the JSON results of the API call.
     */
    return function (k) {
      options.format = 'json';
      options.callback = createCallback(k);
      var script = document.createElement('script');
      script.src = endpoint + '?' + queryStr(options);
      var head = document.getElementsByTagName('head')[0];
      head.appendChild(script);
    };

  }

  mw.api = {
    query: mwQuery,
  };

})(mw || (mw = {}));

CirrusSearch

[edit]

CirrusSearch is a MediaWiki extension to enable Elastic-based search of MediaWiki content. It acts as a search back-end, so action=query&list=search is the main interface to this.

You can use the same Cirrus features in API queries that users can enter in the search box. For example, you can use the morelike: special prefix to find related pages.

Search for pages related to Marie Curie and radium.

Additional CirrusSearch API modules

[edit]

In addition, CirrusSearch can report its configuration and internal information. These APIs are probably only useful if you're familiar with Elasticsearch and want to see how CirrusSearch uses it. These are all considered internal debugging API's and no guarantees are made with regards to backwards compatability of changes to their output.

?action=cirrusdump page parameter
For example, https://en.wikipedia.org/wiki/2014?action=cirrusdump
?cirrusDumpQuery parameter to Special:Search queries
This is an action parameter to index.php, for example https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpQuery
?cirrusDumpResult parameter to Special:Search queries
This is an action parameter to index.php, for example https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult
An additional parameter, cirrusExplain, can be passed with cirrusDumpResult to have the lucene explanation of the score included with the result dump. For example https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult&cirrusExplain
API modules cirrus-config-dump, cirrus-settings-dump, cirrus-mapping-dump
These dump the CirrusSearch setup.
Here's a sample query

Wikidata

[edit]

Wikidata's API includes a few actions (wbgetentities, wbgetclaims, wbsearchentities) that can be used to search for information about entities, properties, statements, and claims.

Wikidata Query Service

[edit]

Wikidata Query Service performs graph-based searching of Wikidata via a SPARQL API. It's available at https://query.wikidata.org/

WDQS Explorer (demo) (source code) provides in-browser graph exploration using SPARQL queries against the Wikidata Query Service.

Interactive examples

[edit]

Wikipedia

[edit]

Browse to https://en.wikipedia.org/wiki/Main_Page, open up the JavaScript console, and run the following:

$.ajax({
  url: '//en.wikipedia.org/w/api.php',
  data: {
    action: 'query',
    list: 'search',
    srsearch: 'Richard Feynman',
    format: 'json',
    formatversion: 2
  },
  dataType: 'jsonp',
  success: function (x) {
    console.log('title', x.query.search[0].title);
  }
});

This logs the string Richard Feynman to the JavaScript console.

If the MediaWiki libraries and environment are unavailable, this can be done using the wmQuery() function above:

var queryWikipedia = mw.api.query('//en.wikipedia.org/w/api.php',
  { action: 'query', list: 'search', srsearch: 'Richard Feynman' });

queryWikipedia(function (x) {
  console.log('title', x.query.search[0].title);
});

Wikidata

[edit]

Using JSONP, we can perform the above steps right from the Web browser's JavaScript console. On Wikipedia, the Wikidata item identifier is available via the MediaWiki configuration value wgWikibaseItemId.

Browse to https://en.wikipedia.org/wiki/Richard_Feynman, open up the JavaScript console, and run the following:

$.ajax({
  url: '//www.wikidata.org/w/api.php',
  data: { action: 'wbgetentities', ids: mw.config.get('wgWikibaseItemId'), format: 'json' },
  dataType: 'jsonp',
  success: function (x) {
    console.log('wb label', x.entities.Q39246.labels.en.value);
    console.log('wb description', x.entities.Q39246.descriptions.en.value);
  }
});

This logs the string Richard Feynman and the Wikidata entry description string "American quantum physicist" to the JavaScript console.

If the MediaWiki libraries and environment are unavailable, this can be done using the wmQuery() function above:

var queryWikidata = mw.api.query('//www.wikidata.org/w/api.php',
  { action: 'wbgetentities', ids: 'Q39246' }); 

queryWikidata(function (x) {
  console.log('wb label', x.entities.Q39246.labels.en.value);
  console.log('wb description', x.entities.Q39246.descriptions.en.value);
});

Wiktionary

[edit]

Browse to https://en.wikipedia.org/wiki/Main_Page, open up the JavaScript console, and run the following:

$.ajax({
  url: '//en.wiktionary.org/w/api.php',
  data: { action: 'query', prop: 'revisions', rvprop: 'content', titles: 'Richard Feynman', format: 'json' },
  dataType: 'jsonp',
  success: function (x) {
    console.log('wiktionary title', x.query.pages['-1'].title);
  }
});

This logs the string Richard Feynman to the JavaScript console.

If the MediaWiki libraries and environment are unavailable, this can be done using the wmQuery() function above:

var queryWiktionary = mw.api.query('//en.wiktionary.org/w/api.php',
  { action: 'query', prop: 'revisions', rvprop: 'content', titles: 'Richard Feynman' });

queryWiktionary(function (x) {
  console.log('wiktionary title', x.query.pages['-1'].title);
});