Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta-Issue: Usage Stats #792

Open
mjordan opened this issue Feb 1, 2018 · 23 comments
Open

Meta-Issue: Usage Stats #792

mjordan opened this issue Feb 1, 2018 · 23 comments
Labels
Type: Meta-issue Identifies multiple related tickets for ease Type: use case proposes a new feature or function for the software using user-first language.

Comments

@mjordan
Copy link
Contributor

mjordan commented Feb 1, 2018

Title (Goal) Gather usage stats on binary resources via Piwik
Primary Actor Repository admin
Scope Architecture
Level High
Story As a repository admin, I want to be able to gather usage/analytics data via Piwik. There is already a Piwik integration module for D8; this use case is more about being able to track usage of binary resources associated with an Islandora node.
@mjordan
Copy link
Contributor Author

mjordan commented Feb 1, 2018

Incidentally, there is a Docker image for a Piwik server, which will lower the barrier to setting up a Piwik instance.

@bryjbrown
Copy link
Member

bryjbrown commented Feb 1, 2018

Additional info: we discussed this at Islandoracon, and Piwik was specifically chosen because its client-side (no caching issues) and conforms to European privacy laws which is of interest to international projects.

@DiegoPino
Copy link
Contributor

@mjordan @bryjbrown FYI, formerly known as PIWIK, now named Matomo!

@mjordan
Copy link
Contributor Author

mjordan commented Feb 2, 2018

Groan. Their rationale for changing the name is that it "will allow users to take a fresh look at what we’ve become today and acknowledging all of the community’s hard work over the past 10 years." Srsly?

Awesome platform though.

@DiegoPino
Copy link
Contributor

True. Name changes&market-needs. But: I deployed today the latest version of Matomo over my old 2.7 PIWIK (full of data) and migration was pain-less + UX is super nice and fast. So, I can't complain really. Also: opensource and naming stuff: maybe someone needs to make a conference to discuss this for the future, so needed 🙄

@DiegoPino
Copy link
Contributor

one collection

@bryjbrown
Copy link
Member

@DiegoPino As a PiWik/Matomo user, how flexible is it in terms of sending custom variables? For instance, could you configure it to send parent collection or certain RDF values to it so that you could facet out certain data or create reports?

A good example of this would be creating a report of views/downloads for ETDs.

@DiegoPino
Copy link
Contributor

@bryjbrown totally flexible. I made some slight modifications to 7.x Piwik module/js to send custom dimensions instead. So using that we manage a segmentation based "collection owners" (we have many), each collection owner has many islandora collections and Matomo/Piwik reports two custom dimensions directly: Islandora Object (separating drupal page and title from what an object is) and Collection membership. This allows to actually have many PIWIK/Matomo users that see only their own stats but also generate separate reports. Dimensions are cool. https://matomo.org/docs/custom-dimensions/

@bryjbrown
Copy link
Member

Notes from 2/7/2018 CLAW call:

  • Setting up standardized custom variables for Islandora objects could correlate with standardized reports on the Piwik server. Pre-made out of the box reports could go a long way towards conveying the value of the repository to external stakeholders like authors/department heads/etc in an IR setting.
  • Setting up a Piwik API that takes an Islandora object ID and returns some data about its usage could allow you to display stats from inside a template, similar to what some are doing with Islandora Usage Stats Callbacks in D7.
  • Setting up user accounts on the Piwik server would be a quick way to allow certain people to see visualizations of relevant reports. You could also set up custom APIs to give that report data back to Drupal in some format (JSON?) and visualize it in Drupal as special user dashboards.

@bryjbrown
Copy link
Member

bryjbrown commented Jun 25, 2018

Additional info: Implementing Piwik in CLAW would also go a long way towards satisfying behavior 10 in COAR's Next Generation Repositories repot: http://ngr.coar-repositories.org/behaviour/exposing-standardized-usage-metrics/

Piwik would be harvesting the usage stats in a standardized can consistent way, and would allow you set up an API that a third party could query to aggregate views/downloads.

@dannylamb I'm about to set up a few new use cases related to the COAR NGR report. Should we make a label for that?

@dannylamb
Copy link
Contributor

@bryjbrown Little late returning to this, but I can def make a COAR NGR label and slap it on the issues you've made.

@dannylamb dannylamb changed the title Gather usage stats on binary resources via Piwik Meta-Issue: Usage Stats Sep 18, 2018
@dannylamb
Copy link
Contributor

Converting this use case to a meta-issue to track usage stats development. Please refer to this issue in any subsequent issues to link them.

At this point in time, it looks like using https://www.drupal.org/project/matomo and providing an ansible installation of a Matomo server should fulfill the requirements of this use case.

@Natkeeran
Copy link
Contributor

@dannylamb https://github.com/Natkeeran/ansible-role-matomo will install matomo server in claw vagrant. Needs some refinement, but works.

@dannylamb
Copy link
Contributor

Simply amazing, @Natkeeran.

What's the best way to test roles that aren't up on ansible galaxy? I'd like to give it a run-through, and if it's ok with you, transfer that role to Islandora-Devops.

@Natkeeran
Copy link
Contributor

@dannylamb You can pull roles from any git repo as here. I've created a branch: Natkeeran/claw-playbook@71fb8b5.

Sure, we can transfer it. I'll convert some of the config settings to variables and ping you. The one thing I am not sure is how to configure the apache in RedHat.

@Natkeeran
Copy link
Contributor

Natkeeran commented Sep 20, 2018

@dannylamb I've made it more configurable. If you are good, please pull into the Islandora-Devops namespace, and I'll do a PR to the claw-playbook.

One thing that remains as todo is configuring the apache in RedHat.

This matoma branch install matamo server as well as drupal module and do basic configuration setup.

@dannylamb
Copy link
Contributor

vagrant up

@dannylamb
Copy link
Contributor

screenshot from 2018-09-25 15-53-47

@dannylamb
Copy link
Contributor

@Natkeeran I've given you the permissions to transfer the repo using the Danger Zone ™️

Works as advertised!

@dannylamb
Copy link
Contributor

dannylamb commented Oct 1, 2018

We now officially have Ubuntu support for including Matomo during the installation process. I've added another ticket to add CENTOS support, and after that, it's maybe some configuration and we can consider this accomplished.

@bryjbrown
Copy link
Member

bryjbrown commented May 10, 2019

So I've been working on Matomo quite a bit over the past few days and here are some outcomes worth sharing:

Configuration & Custom Variables

The Islandora 8 Ansible VM doesn't have much configuration in terms of the Matomo integration. It has the link to the Matomo server, which is the minimum you need for it to work, but I have a few recommendations.

First, the Drupal Matomo module has the ability to not track users of a certain role, and I think most of us would probably agree that administrators shouldn't count. This is pretty easy to set up through the Admin page.

Second, the current build isn't using any custom variables. I did a lot of thinking about what custom variables we should be using since any Matomo plugin that implements a custom API will be dependent on not only having the same custom variables, but having them named the same thing and in the same order. Matomo's custom variable API calls them by number, not by name, so if we are to share custom Matomo report APIs, we need to rally around a standard configuration of the Drupal Matomo module.

With regards to the custom variables we send to Matomo, I think all we really need to send is a unique ID. Node ID is good enough if you are only running one Drupal, but if you are running multiple Drupals that all phone home to the same Matomo server and you want to be able to create reports that aggregate data across all of them, something truly unique like a UUID or Fedora URI would be better. So long as we have a unique ID transmitted as a custom variable to Matomo, we can set up a custom API in Matomo that returns all the data available about that ID (similar to the API Islandora Usage Stats Callbacks had). If we want to create other more robust types of reports (eg, "Sum all the downloads of ETDs" or "Sum the views of this collection and all its children recursively") we could do that at the Drupal layer with views. Create a view that results in a list of nodes, hit the Matomo API to get data for each node by unique ID, then process that data back in Drupal and display the final result. No need to duplicate all that node data in Matomo with custom variables as far as I can see.

Custom Matomo APIs

Matomo's internal API is kind of tricky and, like Drupal, it requires you to think about the data in a specific way. It is pretty flexible though, and it also allows you to very easily create custom plugins that can implement their own outwards-facing APIs. I've done just that with a Matomo module created for the LDbase project that can be modified to meet similar needs in the Islandora community. The module is at https://github.com/ldbase/LDbaseReports, but the real meat of what it is doing is in https://github.com/ldbase/LDbaseReports/blob/master/API.php#L22-L49. If you hit this API with a site ID and a node ID, you get a response like this:

[{
	"nid": "2",
	"page_visits": 5,
	"page_hits": 9,
	"download_visits": 4,
	"download_hits": 6
}]

As far as I can tell, "hits" are individual page loads and "visits" are hits deduplicated by unique visitors within 30 minutes of each other. I can see situations where you might want one or the other, so I provided both. This is an extremely simple module right now, but it could be expanded to provide more data on individual nodes, or even new types of reports if we decide we want them.

I've created a skeleton for an Islandora version at https://github.com/fsulib/IslandoraReports in case the community decides they like this direction. Right now it is the vanilla plugin & API skeleton generated by the Matomo console script, but I can update it to behave like LDbaseReports once the community decides on a unique ID to use as the custom variable.

@bryjbrown
Copy link
Member

After reading #1073, it seems like the optimal ID that should be sent to Matomo would be one that represents the object conceptually, uncoupled from any specific instance of that object. This would allow you to track an object that is represented in multiple Drupals, whereas the same object in two different Drupals would have two different UUIDs or Node IDs. The field that seems to fit this use case the most in the current iteration of Islandora 8.x (at least to me) is PID, since thats identifying the same conceptual object regardless of being represented in Islandora 7.x or 8.x. From this point of view, it seems like a good candidate for reuse across multiple Islandora 8.x instances because it avoids trying to assign duplicate UUIDs or Fedora URIs to multiple nodes which has a fair chance of causing all sorts of chaos.

@kstapelfeldt kstapelfeldt added Type: Meta-issue Identifies multiple related tickets for ease Type: use case proposes a new feature or function for the software using user-first language. and removed COAR NGR labels Sep 25, 2021
@amyrb
Copy link
Contributor

amyrb commented Oct 29, 2021

The following are usage stat capabilities outlined in the Islandora 8 IR Delta. I don't know where all of these stand in current priorities, but they represent desired functionality for institutional repository usage stats, so I thought they should at least get mentioned here.

  • track if the page is an Islandora object, also the parent object, the type of object, and the UUID (in order to discern when the same object is loaded via different URL paths)
  • Extend Matomo module to include custom dimensions as a feature, or have a submodule that adds this feature, or rewrite a new updated version of the Matomo module
  • Create a custom Matomo API module that runs on the Matomo server that Islandora 8 features can query in an object-centric way (likely by UUID) and get a response with all the relevant information for that object
  • Allow for the inclusion of a CSV file with legacy usage data that can be dynamically included in the API response
  • Create a simple “Views and Downloads” block that can query this API and get the aggregate total of all views and downloads for a given Islandora object which can be included as part of that object’s display
  • Create a standardized set of custom dimensions for a standardized Islandora 8 configuration so that all Islandora 8 sites report similar data on usage to Matomo
  • View blocks that query Islandora Matomo API and display common data such as most popular objects, most popular files, most popular collections, views over time, summary analytics by current owner/user, collection, topic, etc; can export the analytics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Meta-issue Identifies multiple related tickets for ease Type: use case proposes a new feature or function for the software using user-first language.
Projects
Development

No branches or pull requests

7 participants