Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats.kiwix.org lacks granularity AND is too granular #216

Open
Popolechien opened this issue Aug 9, 2023 · 7 comments
Open

stats.kiwix.org lacks granularity AND is too granular #216

Popolechien opened this issue Aug 9, 2023 · 7 comments
Assignees
Labels

Comments

@Popolechien
Copy link
Member

Looking at stats for download.kiwix.org I can kind of surmise that around 12,000,000 zim files were downloaded over the past year.

The tool, however both fails to aggregate different versions of the same file (e.g. wikipedia_en_all_maxi_2022-05.zim and wikipedia_en_all_maxi_2023-05.zim) and does not show more than the top 500 rows.

We either need a better tool or make sure this one provides feedback that is actionable.

@rgaudin
Copy link
Member

rgaudin commented Aug 9, 2023

fails to aggregate different versions of the same file (e.g. wikipedia_en_all_maxi_2022-05.zim and wikipedia_en_all_maxi_2023-05.zim)

That's because those are two different files. Those files are different version of the same Book (CMS terminology). Only a custom tool could know that those are linked and should produce an aggregated counter.

I don't think messing with source logs is a good idea so you're probably left with creating/modifying a tool that works off matomo API/data and produce this. Might be a matomo extension of something separate.

@Popolechien
Copy link
Member Author

To clarify, I'm trying to get the number of downloads for wikipedia_en_for_schools_maxi.zim (and the arabic version) over the past two years (1 August 2021 to 31 July 2023). Since the Zimfarm generates a new zim every month, I will also need the total to be calculated.

@kelson42
Copy link
Collaborator

If you put "wikipedia_en_for_schools_maxi" as filter you should get your number. I see no result at all, so looks like either nobody has ever downloaded it in the last 12 months or we have somehow a bug or I don't understand how it work. Anyway, I just have downloaded it, so in one hour worse case it should be at least one download.

@Popolechien
Copy link
Member Author

I've just checked and it did not pick it up as far as I can tell.

@kelson42
Copy link
Collaborator

@rgaudin OK then it looks like a bug, either in rhe log hathering part or in matomo.

@rgaudin
Copy link
Member

rgaudin commented Aug 18, 2023

I am currently trying to find the record for this hit in the DB, if it's possible. Will be easier to know what to look at next then

@rgaudin
Copy link
Member

rgaudin commented Aug 18, 2023

I found the hit in matomo's DB so we can rule out a download log capture/upload issue.

Here's how I found it

SELECT * FROM piwik_log_visit WHERE idsite=2 AND location_country="ch" AND visit_first_action_time >= "2023-08-11 21:00:00" AND visit_last_action_time <= "2023-08-11 23:00:00"

There were several records. I identified @Kelson with the location, time and OS which gave me idvisit=21264276

SELECT * FROM piwik_log_link_visit_action WHERE idvisit=21264276

There were a few results. I checked the URLs from idaction_url column with piwik_log_action table which told me the one we are looking for is idaction=9162997

SELECT * FROM piwik_log_action WHERE idaction=9162997;
idaction name hash type url_prefix
9162997 download.kiwix.org/zim/wikipedia/wikipedia_en_for_schools_maxi_2023-07.zim 4067615011 1 2

So the hit was recorded by matomo.

Out of curiosity (expensive query!)

SELECT COUNT(*) FROM piwik_log_link_visit_action WHERE idaction_url =9162997;
COUNT(*)
257

Not all rows in that table are individual downloads. There are many columns with not obvious names and there's this action concept that is mapped to other tables (and some stuff references one another). But there are records for that ZIM.


My opinion is that matomo is a complex tool and we (well you 😀) don't know exactly how to use it. I'd suggest you describe your use case in a matomo forum or support so we know exactly how to get that information you're looking for.
Then, we may come back to a configuration issue in our instance.

@rgaudin rgaudin removed the bug label Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants