Page view counts

wx_page_views(
  project,
  page_name,
  access_method = c("all", "desktop", "mobile web", "mobile app"),
  agent_type = c("all", "user", "bot", "spider", "automated"),
  granularity = c("daily", "monthly"),
  start_date = "20191101",
  end_date = "20191231",
  include_redirects = FALSE
)

Arguments

project

The name of any Wikimedia project formatted like {language code}.{project name}, for example en.wikipedia. You may pass 'en.wikipedia.org' and the .org will be stripped off. For projects without language codes like Wikimedia Commons or MediaWiki, use commons.wikimedia.org and mediawiki.org, respectively. This inclusion of .org is especially important if including redirects, as the project's MW API will need to be queried.

page_name

The title of any article in the specified project. The function takes care of replacing spaces with underscores and URI-encoding, so that non-URI-safe characters like %, / or ? are accepted -- e.g. "Are You the One?" becomes "Are_You_the_One%3F". Internally this is done with a non-exported wx_encode_page_name function. If you need to get the pageviews for multiple pages, you're encouraged to provide all the page names at once as this function has been optimized for that use-case.

access_method

If you want to filter by access method, use one of: "desktop", "mobile app", or "mobile web". If you are interested in pageviews regardless of access method, use "all" (default).

agent_type

If you want to filter by agent type, use "user", "bot"/"spider", or "automated" (refer to wikitech:Analytics/Data Lake/Traffic/BotDetection). If you are interested in pageviews regardless of agent type, use "all" (default).

granularity

The time unit for the response data. As of today, supported values are daily (default) and monthly.

start_date

The date of the first day to include, in YYYYMMDD format. Can also be a Date or a POSIXt object, which will be auto-formatted.

end_date

The date of the last day to include, in YYYYMMDD format. Can also be a Date or a POSIXt object, which will be auto-formatted.

include_redirects

Whether to include redirects to requested pages. Currently, only article (mainspace) redirects are supported. See "Redirects" section below for more details.

Value

A tibble data frame with the following columns:

project

project

page_name

the page_name provided by the user

redirect_name

the name of the redirect to the page if include_redirects = TRUE; NA for the page itself

date

Date; beginning of each month if granularity = "monthly"

views

total number of views for the page

Redirects

By default include_redirects = FALSE for performance reasons. The pageviews API does not roll up view counts for redirects into total view counts of the target page, so set include_redirects to TRUE if you want to have this function automatically locate the redirects via the MediaWiki API and request their pageview counts. Obviously this makes the function much slower, especially if the number of redirects to the page(s) is high.

For example, if the user visits "2019-20 coronavirus pandemic" (with a minus) they will be redirected to the actual article "2019–20 coronavirus pandemic" (with an en-dash). Any visits to the redirect (the page with the minus sign instead of the en-dash) will not be counted toward the page view count of the redirected-to article, although once the client is taken to the target page that counts as a separate page view.

In some cases, you may want to include page views of the redirects in your total or you may want to the ability to disentangle the portion of traffic that is from users arriving to a page via a redirect vs. users arriving to a page directly.

Note on performance: again, the process of finding and fetching view counts for redirects is considerably slower. The function has been optimized for multiple pages, since the redirects API supports up to 50 pages per call. Therefore, it is highly recommended that if you have multiple pages to retrieve traffic for within the same project, try not to retrieve traffic for one page at a time but instead provide the full vector of page names to minimize the burden on the MediaWiki API.

License

Data retrieved from the API endpoint is available under the CC0 1.0 license.

See also

Examples

wx_page_views( "en.wikipedia", c("New Year's Eve", "New Year's Day"), start_date = "20191231", end_date = "20200101" )
#> # A tibble: 4 x 4 #> project page_name date views #> <chr> <chr> <date> <int> #> 1 en.wikipedia New Year's Day 2019-12-31 61379 #> 2 en.wikipedia New Year's Day 2020-01-01 281166 #> 3 en.wikipedia New Year's Eve 2019-12-31 202638 #> 4 en.wikipedia New Year's Eve 2020-01-01 51715
if (FALSE) { wx_page_views( "en.wikipedia", "COVID-19 pandemic", start_date = "20200301", end_date = "20200501", include_redirects = TRUE ) }