Commons:Requests for comment/Technical needs survey
An editor has requested comment from other editors for this discussion. If you have an opinion regarding this issue, feel free to comment below. |
This survey is not finalized yet method or timeline might change.
Background[edit]
Commons is facing many technical problems in the way of bugs and broken tools or needed missing features. In September 2022 the Commons:WMF support for Commons started working on some of these. A recent discussion on the Village pump showed that we never really decided what we as Commons users need to most. This survey should fill this gap and result in a priority list of the most urgent problems.
Many of this was already discussed with the Open letter of 2022: Commons:Think big - open letter about Wikimedia Commons.
Method[edit]
For making this survey we use the same method as the annual m:Community Wishlist Survey of the WMF on Meta and de:Wikipedia:Technische Wünsche of Wikimedia Germany are using.
Timeline[edit]
- Until 24 December 2023 discuss the procedure of this survey and change it if needed (proposals can already be made but might need to become adjusted later)
- Until 14 January 2024 submit and discuss proposals
- 15 January 2024-21 January 2024 clustering and merging of proposals if needed
- 22 January 2024-15 February 2024 vote on the proposals
Resulting list[edit]
During proposal and voting all proposals are treated the same but after the voting there will be two separate lists. One list for fixing existing functionalities and tools and one list for the requested new features. Please consider this when creating proposals and split fixing and the request for new features for one tool into two proposals.
Proposals[edit]
Use the box below to create a proposal: |
Checkbox to mark new files as current on upload[edit]
Description of the Problem[edit]
- Problem description:
As some of us know well, there has been some controversy about file overwrites in Commons. Now, files can't be overwritten by any user as before, but files with "Current" template, can. The problem is that many users don't know about this template, and it can be very difficult to use for new users.
- Proposal type: bugfix / feature request / process request
feature request
- Proposed solution:
Checkbox to mark a file as "Current" (or not) on upload. Creating a file redirect to have a versioned file could also be facilitated (for example, a "Versioned file" check: a date is added at the end of file's name, and a file redirect with the original name, redirecting to it; a component in file's page could make easy to upload a new version as a separate file, while updating the redirect so it points to the new file now).
- Phabricator ticket:
- Further remarks:
Some users complain that "Current" isn't the most intuitive name for such a template. Perhaps a new, better name, should be agreed.
Discussion[edit]
I totally supported restricting file overwrites, but I think that things should be made easier for users on files that need overwrites indeed. This will play in favor of overwrite restriction, since it will have much less opposition. I fear that too many complains could eventually roll back that good change.MGeog2022 (talk) 11:03, 23 December 2023 (UTC)
Priority thumbnail render queue[edit]
Description of the Problem[edit]
- Problem description: Pages with many new thumbnails like on Special:ListFiles shortly after uploading, new created galleries or viewing files in the MediaViewer on lager screens leads to the problem that you run into the rate limit for thumbnail generation. This results in missing thumbnails or missing image in the MediaViewer.
- Proposal type: feature request
- Proposed solution: Currently the limits are the same for logged in users and not logged in users. There should be a second priority thumbnail rendering queue for autoconfirmed users. If this is not sufficient there could also be a third queue for admins and bots only.
- Phabricator ticket: phab:T266155
- Further remarks:
Discussion[edit]
- Support. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 09:49, 22 December 2023 (UTC)
Fix MediaViewer's inability to handle collaborations[edit]
Description of the Problem[edit]
- Problem description:
When two or more creators created a work of media, MediaViewer will always reduce this to the first person with a Creator template. It functions perfectly well when names are given in text alone, however. For example:
This should list two creators, Humanité René Philastre and Charles-Antoine Cambon. It instead will only list one. There is, however, no single creator who's more "correct" than the other: This was a collaboration. This is, of course, misattribution, which is especially bad since MediaViewer presents itself as if it can provide an accurate credit line. This is not a new issue: The problem has been known about for a decade now, and really, really needs at least some resources thrown at it.
Alternatively, consider just defaulting to "See file description page" with a link there in any case where the conversion isn't trivial.
- Proposal type: bugfix
- Proposed solution:
I'd suggest it'd be easiest to move forwards by allowing two or more creator templates to be used: A Creator template and additional text breaks as well, but that, at least, can be worked around by making a Creator template (or Creator-like template) for additional people. Personally, I would suggest at least four Creator templates, because I can point to cases with four names easily: File:Edward_Duncan_-_The_Explosion_of_the_United_States_Steam_Frigate_Missouri.jpg.
- Phabricator ticket: T68606 - (from 2014!!!)
- Further remarks:
Discussion[edit]
- Support. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 09:51, 22 December 2023 (UTC)
Fix rsvg text alignment regression[edit]
Description of the Problem[edit]
- Problem description:
The latest thumbnail-image-maker (named rsvg) unfortunately has a bug which misaligns centre- or right-aligned text
tags containing tspan
tags on the same line. Many existing files have been affected.
- Proposal type: bugfix
/ feature request / process request - Proposed solution:
Fix rsvg or use a version without the bug.
- Phabricator ticket:
http://phabricator.wikimedia.org/T97233
- Further remarks:
@Glrx: described the root cause as follows:
“ | The problem is computing the width of an SVG "text chunk". If the text chunk consists of multiple XML nodes, then librsvg is using the width of the last node as the width of the entire text chunk. (librsvg is correctly tossing out the initial and final whitespace for the text element.) | ” |
Discussion[edit]
- In practical terms: This bug causes
<text>
elements containing<tspan >
sub-elements (for subscripting, italicizing, boldfacing, coloring, font-sizing, etc.) to be rendered in the wrong place in Wikipedia thumbnails and on Commons file description pages—despite rendering properly within browsers during development. Example: text that should be centered on the page, runs off the right margin. Over years, this bug has required me to revise a few dozen .svg images to specify all attributes within each<text>
element, or even to compromise content to work around rendering problems. RCraig09 (talk) 06:32, 21 December 2023 (UTC)
- I wonder if this is the bug that affected the text alignment of BABT Green Dot label.svg and BABT Red Triangle label.svg? The old versions (before I had to improvise a fix) worked correctly until mid-2022. --Minoa (talk) 06:40, 22 December 2023 (UTC)
- Yes, User:Minoa, I think that is the same problem. Having
<tspan font-size="18">APPROVED</tspan>
embedded in a<text>
specification falls prey to this bug. RCraig09 (talk) 07:12, 22 December 2023 (UTC)- Strong support: this has to be fixed, because the problem affected two of my uploads and the made centring of text with different font sizes tedious. --Minoa (talk) 03:18, 24 December 2023 (UTC)
- Thanks, @Minoa: not just tedious but strictly impossible, if the specified font is unavailable and another is substituted. cmɢʟee ⋅τaʟκ 01:36, 25 December 2023 (UTC)
- I personally recommend changing to a fasterer and less buggy renderer, see phab:T40010, in more details can be found at: User:JoKalliauer/SVG_test_suites — Johannes Kalliauer - Talk | Contributions 14:16, 25 December 2023 (UTC)
- Thanks, @Minoa: not just tedious but strictly impossible, if the specified font is unavailable and another is substituted. cmɢʟee ⋅τaʟκ 01:36, 25 December 2023 (UTC)
- Strong support: this has to be fixed, because the problem affected two of my uploads and the made centring of text with different font sizes tedious. --Minoa (talk) 03:18, 24 December 2023 (UTC)
- Yes, User:Minoa, I think that is the same problem. Having
Video conversion support[edit]
Description of the Problem[edit]
- Problem description: New users who attempt to upload videos in MP4 format, a very common video format, are met with an error message that provides no guidance on how to convert the video to an acceptable format. The current video conversion tools, such as Video2commons, are frequently nonfunctional. This likely results in many users not uploading files that would be of use to Commons.
- Proposal type: feature request
- Proposed solution: The file upload wizard should offer to perform a conversion to an acceptable file type whenever a user attempts to upload an MP4 file.
- Phabricator ticket: phab:T353659
- Further remarks: See prior discussion recently.
Discussion[edit]
Courtesy pinging Sannita (WMF), who has discussed this with us. Cheers, {{u|Sdkb}} talk 05:57, 19 December 2023 (UTC)
- Support. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 09:52, 22 December 2023 (UTC)
Description of the Problem[edit]
- Problem description: Each upload gets currently post-processed by bots to add structured data according to: Commons:Structured data/Modeling, see for example this diff: [1]
- Proposal type: feature request: the Upload wizard should include some basic structured data (which are not depicts) or prepopulate SDCs in the last step of the upload for the user to confirm
- Proposed solution: After an upload with the upload wizard all information after this diff [2] is already included
- Further remarks:
Discussion[edit]
Supporting as proposer and also as operator of User:SchlurcherBot who does exactly these edits and could otherwise focus on the less well understood SDC cases. --Schlurcher (talk) 08:19, 19 December 2023 (UTC)
- Support. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 09:53, 22 December 2023 (UTC)
Massive support for video (or not)[edit]
Description of the Problem[edit]
- Problem description: I believe we (Commons) need directive from the WMF as to whether it is even practical to have support for a large quantity of large video files. This question becomes more pressing as we start to see more and more commercially-made films, many of which have been digitized, come into the public domain. I suspect that whether we can reasonably host (and stream on demand) any large number of such films is mainly a technical and budget consideration, and I'd like feedback from the WMF as to what is feasible, so that we don't either waste our time discussing proposals that would be impossible to implement, or (worse yet) go ahead with adding a lot of content that we cannot adequately support and frustrating users by a half-assed implementation. - Jmabel ! talk 20:05, 18 December 2023 (UTC)
- Proposal type: process request
- Proposed solution: clarity from WMF
- Phabricator ticket:
- Further remarks:
Discussion[edit]
In my opinion, historic movies (or other videos, such as documentaries or TV images) that enter public domain, provided that they have enough value, are part of the kind of content that Commons should store. Wikimedia Foundation had a revenue of $154.7 million in 2022, while Internet Archive, in 2019, had an budget of only $36 million. Archive tries to host as many content as possible (probably a mistake, and very possibly a big one). Commons, on the other hand, stores only content that is deemed educational (this includes any historic content, including movies). Commons is not an archive, strictly speaking, but as far as it stores historic material, it can, in fact, be considered an archive. And an archive that is part of something far greater, the sum of all human knowledge, that also has other archive-like components (such as Wikisource). I think that selected videos of high value, are to be stored in Commons, even if they take some space. Specially, if they are somewhat rare, and are likely to be lost. As I mentioned on another request in this same technical needs survey, Internet Archive, a really, really great idea, right now stores only 2 copies of each archived item, and both of them in San Francisco area, of high seismic risk (in my opinion, a really, really bad idea). I doubt it has enough money to make more backups, given its relatively low budget, and that it stored, as of 2021, 99 PetaBytes (1 PB = 1024 TB) of unique data. Commons currently stores "only" 471.86 TB (only 22.64 TB of videos), all of them replicated in 2 datacenters (both in USA: Virginia and Texas, in areas with no particular natural risks), plus a complete backup on each (and, probably, additional copies). It also uses RAID (multi-disk setup on each server), according to Wikitech. So, even if Commons doubles in size, it would be 1% of Internet Archive size, in an organization with a budget 5 times larger, and with much greater guarantees of preservation. Yes, Wikimedia must also store many other projects, but they are much smaller in disk space. And it must provide a connection speed, and handle a load of requests, far larger than Internet Archive. But I think it can can help preserve highly valuable content that might otherwise be lost. MGeog2022 (talk) 20:09, 19 December 2023 (UTC)
- Only if it lets the videos upload successfully. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 00:15, 20 December 2023 (UTC)
- "...and are likely to be lost". To play devil's advocate for a minute, I wonder how often this is actually the case. Once a public domain video has been digitized, it seems to typically proliferate (as a free source of monetization) rather than disappear. Are there any known examples of a video that has been digitized and subsequently lost? While I do think Commons should definitely host educational videos, I think more effort should be focused on getting modern videos freely licensed (through content partnerships and video creation projects), rather than on archiving all old films (which I think archive.org, YouTube, and other platforms can do better). Plus video streaming is extremely expensive. $154 million may seem like a lot, but if Commons actually became known as a video platform that money would evaporate very quickly. I don't remember where, but I remember seeing a breakdown created by TheDJ that was pretty informative. Regardless, I do think having more guidance from the WMF on this would be very helpful. Nosferattus (talk) 17:37, 21 December 2023 (UTC)
- @Nosferattus, my complete sentence was: "Specially, if they are somewhat rare, and are likely to be lost". Rare videos are more likely to be lost that other more widely known ones.
- but if Commons actually became known as a video platform: that's why I said such things as "selected videos of high value", I know that Commons can't allow uploading any video that someone wants to (useless images take up much space, and with videos it's much, much worse).
- I think more effort should be focused on getting modern videos freely licensed: again, with some strict criteria, I understand. Otherwise it could be the same problem you talked about, or even worse.
- Once a public domain video has been digitized, it seems to typically proliferate (as a free source of monetization) rather than disappear: probably yes, but this doesn't eliminate the need to have a copy stored indefinitely somewhere, to ensure its preservation.
- which I think archive.org, YouTube, and other platforms can do better: YouTube is a commercial platform, and archival or preservation are not part of its goals. Uploaders can delete the content they uploaded, and if their account gets closed, their videos can be deleted over time. Archive.org always seems to make the wrong decisions: as I said before, they store all than they can, so they can't have more than 2 copies (and they don't even use RAID disks: when a disk fails, there is only 1 copy for a time, while the disk is replaced; at least once, they lost content due to a defective disk; this doesn't seem like best practices for an archive). As if this was not enough, both copies are placed in San Francisco, where a strong earthquake can cause severe troubles at any moment. They also take legal risks that costs them money from their already small budget for what they are trying to do. Of course archive.org would be the right place for video archival, I hope in the future they get more money or make wiser decisions, but, for now, I think they won't really achieve their goals (archival and preservation of content for an indefinite period), despite their good intentions and the great idea that the project is.
- Plus video streaming is extremely expensive: perhaps some big videos could be offered only for downloading and not streaming, for example. MGeog2022 (talk) 19:55, 21 December 2023 (UTC)
- "...and are likely to be lost". To play devil's advocate for a minute, I wonder how often this is actually the case. Once a public domain video has been digitized, it seems to typically proliferate (as a free source of monetization) rather than disappear. Are there any known examples of a video that has been digitized and subsequently lost? While I do think Commons should definitely host educational videos, I think more effort should be focused on getting modern videos freely licensed (through content partnerships and video creation projects), rather than on archiving all old films (which I think archive.org, YouTube, and other platforms can do better). Plus video streaming is extremely expensive. $154 million may seem like a lot, but if Commons actually became known as a video platform that money would evaporate very quickly. I don't remember where, but I remember seeing a breakdown created by TheDJ that was pretty informative. Regardless, I do think having more guidance from the WMF on this would be very helpful. Nosferattus (talk) 17:37, 21 December 2023 (UTC)
Okay, let me make a comparison. MGeog2022 said Commons is currently hosting 471.86 TB of data. Recently Apple introduced an option for up to 12 TB of iCloud storage for every single of their almost one billion iCloud users. 40 iCloud users only with 12 TB and you have more storage available than Commons is currently using. iCloud 12 TB comes with a lot of other services and only makes 60 Euro a month, so 40 users x 60 Euro = 2400 Euro a month (28.800 a year). Compared to 180 million dollar revenue Wikimedia has.
Digital storage is so cheap it almost never does matter in the balance sheet.
And by the way because of Internet Archive, deleting things is easy restoring lost things not. Killarnee (talk) 08:25, 27 December 2023 (UTC)
Media dumps[edit]
Description of the Problem[edit]
- Problem description:
There are no Wikimedia Commons dumps that include any media. There's an open Phabricator ticket since 2021 (T298394), but no major advances have been seen. The root of this problem seems to be fundamentally in the enormous size that the sum of all media currently in Commons has (almost 500 TB). Fortunately, thanks to the hard work of some guys, Commons media now have 2 backups at very distant locations (https://phabricator.wikimedia.org/T262668, https://wikitech.wikimedia.org/wiki/Media_storage/Backups), although in the same data centers as the primary copies. Having copies in more locations would provide greater security, considering the value of some of the content hosted.
- Proposal type: bugfix / feature request / process request
process request
- Proposed solution:
There's no need at all to include ALL Commons media in dumps. Focus should be in images with special value, such as historical photographies or documents (here, historical does not necessarily mean old) or featured pictures. Using categories, it should be easy to select all pictures depicting paintings, books, documents, maps (with some kind of filter to exclude user-made or trivial maps, such as country location maps, that, individually, take very little space, but there are lots and lots of them), or photos of special historic value (again, they can be very recent, provided they depict something trully historic). Featured pictures are easy to select since they belong to specific categories. This collection (a subset of Commons) could be split by topic, to have even smaller individual dumps. These dumps, could then be distributed to mirrors around the world (for example, in libraries or universities that volunteer to host them, using a model similar to Debian mirrors). Internet Archive would be another location to host them, but, since it stores only 2 copies of each item, both of them in San Francisco area, with high seismic risk, it probably isn't, sadly, to be relied on for long-term preservation, unless they improve this in the future, or paid Archive-It service (https://support.archive-it.org/hc/en-us/articles/208117536-Archive-It-Storage-and-Preservation-Policy) is used (they store more copies in other locations when using this option).
- Phabricator ticket:
- Further remarks:
This proposed solution are only general ideas that obviously need much more revision and elaboration, but the basic goal is to have at least dumps with the media that is deemed most important (criteria and technical aspects apart). Having backups of all media in other locations besides the 2 main datacenters would be another, perhaps even better, solution. It costs money, but it should be a priority in the budget, as Wikimedia Foundation Mission states: The Foundation will make and keep useful information from its projects available on the internet free of charge, in perpetuity.
Discussion[edit]
File verification[edit]
Description of the Problem[edit]
- Problem description:
Source websites from which content is uploaded to Commons may cease to exist over time. Once it happens, files that originate from them could easily (specially when certain conditions are met) be mistakenly taken by copyright violations. Also, even when the source website still exists and has the uploaded file available, there can be mistakes that that lead to a file being deleted by mistake (just have a look here). Another problem is vandalism: if the file page was vandalized, file's source could be missing or have been changed (yes, file history should be reviewed before deletion, but work overload could lead to it not being reviewed with due care).
- Proposal type: bugfix / feature request / process request
feature request
- Proposed solution:
Implement a mechanism to verify uploaded files. As a file uploaded to Commons is patrolled (by a user who has privileges for it) it could also be publicly marked as verified (it could also be done for already existing files over time). This proposal is something similar to what is already being done for images from sites such as Flickr, but now for all files from external sources. A verified file would be more than a simple verification or attribution template (for example, verification couldn't be removed by a vandal, only by an administrator if needed). Of course, we can never be 100% sure, but having a file verified, it would require an exhaustive investigation before considering it a copyright violation, so the risk of mistaken removal is greatly reduced. Also, users could trust verified files with greater confidence before using them.
- Phabricator ticket:
- Further remarks:
If not feasible, an intermediate solution could be not allowing attribution template removal to unpriviliged users (but this would only be a solution for files to which an attribution template applies).
Discussion[edit]
- Does this amount to placing a request for license review on every upload that comes from a third-party site? That seems excessive. Consider especially material old enough to be out of copyright on that basis, or an PD-ineligible logo. Similarly, a U.S. government doc with internal markings that show it to be that; I'm sure there are many other cases. You'd be taking "patroller" (presumably actually image-reviewer) time to verify something that has nothing to do with the source site. - Jmabel ! talk 19:33, 17 December 2023 (UTC)
- If the patroller/image reviewer has indeed verified that the image (or other media) has been published under a free license, I think it would be a very good thing that he/she could mark the file as verified, and this could be visible to anyone. This would even save work for the future: the file is not a copyright violation, so if somebody tags it as such, the deletion request can be quickly dimissed unless some breaking new evidence has been found (this would happen very rarely, if things are well done). Many files are in fact verified (any reviewed media from third-parties that is not found to be a Copyvio, has been verified, but we can't be aware of what files have been reviewed). As an uploader or many files from Spain's National Geographic Institute, most of these files include a text "© Instituto Geográfico Nacional. All rights reserved. Total or partial reproduction banned", because they were published before IGN released them under CC-BY 4.0 license. I'm sure those maps (or at least, most of them) were reviewed and everything was found to be OK. But if in the future, the URL from which they were downloaded ceases to exist, someone could tag the file for deletion as Copyvio. The administrator who reviews the deletion request, would then see that there's an "All rights reserved" text on the image, that it's only a few years old, and that no evidence of it being CC-BY licensed can be found on the source website, because it doesn't exist anymore. I think that allowing to mark a file as "Verified" would solve this. On the other hand, as I also said, not allowing unpriviliged users to remove attribution templates from files, would be another way to prevent that kind of things from happening. MGeog2022 (talk) 19:53, 17 December 2023 (UTC)
- You have given a problem description, but it this an actual problem ? Sure there are lots of things that can happen and happen in small amounts of cases, but is it worth it to complicate everything else for such a case ? The flick case is being done because it is so easy to change licenses on material (in bulk). It is not because Flickr can disappear. Additionally we have our upload date/times and page history to deal with any age questions. And as far as I know, we have never had legal problems because of any of this. I think this is a LOT of overhead we are adding, for very little return. —TheDJ (talk • contribs) 12:14, 18 December 2023 (UTC)
- What about having a list of safe sources, where only administrators can add sites, after verifying them? Or, as I said, disallowing unpriviliged users to remove attribution templates from files uplodaded by other users? Certainly I don't know about this ever happening, but I think it's sad to risk losing valuable material due to potential confusions. I think it's specially risky when the media includes a copyright tag from a relatively recent date, with "All rights reserved" text, such as the case I mentioned. I think my proposal is no complication for patrollers: if I understood well this page, most uploaded files are patrolled in search of possible copyright violations. If the file is found to really be under a free license, it would only be a click or 2 away to have it verified by the patroller (much less work than requesting its deletion, if it was a Copyvio). Older files could be verified on demand. On the other hand, perhaps I'm being a little paranoid here, and all that can be found about the source (even in Wayback Machine, if the site exists no more), file history, etc., is always carefully checked before file deletion. But even if this is the case, my proposal would greatly reduce research work for administrators, if we have files verified in advance. MGeog2022 (talk) 13:35, 18 December 2023 (UTC)
- The patrol user right is a specific user right that enables a user to mark edits, file uploads and page creations as patrolled
- What I propose is only a publicly visible way to "mark as patrolled", but only at the file level (it could even be fully automatic, when a patroller marks a file that is not an own work, as patrolled). Once a file was marked this way, it should never be considered a Copyvio, unless very clear evidence is found that it was wrongly verified. MGeog2022 (talk) 14:55, 18 December 2023 (UTC)
- What about having a list of safe sources, where only administrators can add sites, after verifying them? Or, as I said, disallowing unpriviliged users to remove attribution templates from files uplodaded by other users? Certainly I don't know about this ever happening, but I think it's sad to risk losing valuable material due to potential confusions. I think it's specially risky when the media includes a copyright tag from a relatively recent date, with "All rights reserved" text, such as the case I mentioned. I think my proposal is no complication for patrollers: if I understood well this page, most uploaded files are patrolled in search of possible copyright violations. If the file is found to really be under a free license, it would only be a click or 2 away to have it verified by the patroller (much less work than requesting its deletion, if it was a Copyvio). Older files could be verified on demand. On the other hand, perhaps I'm being a little paranoid here, and all that can be found about the source (even in Wayback Machine, if the site exists no more), file history, etc., is always carefully checked before file deletion. But even if this is the case, my proposal would greatly reduce research work for administrators, if we have files verified in advance. MGeog2022 (talk) 13:35, 18 December 2023 (UTC)
- It might be more useful to codify what the "certain conditions" are, in guidelines or policy. I tag a lot of files as copyright violations, and a very common scenario is that an image is clearly an old stock image since it's being used on dozens of websites, that usages predates whatever date the uploader claimed it was (i.e. the uploader says it was their work from 12/18/2023, but it's showing up on the web as early as 2013), but the stock site no longer exists so it may well have been under a free license. I would suspect that sites disappearing causes more false negatives than false positives (another common scenario is that an uploader has several files, a few of which appear online before the date they claimed and a few of which did not, and while the other images are probably copyvios there's no proof). But I don't have any proof of that, and by nature it's probably impossible to. At any rate I think the most likely result of this would be creating another backlog another hundreds of thousands of files. Gnomingstuff (talk) 23:11, 18 December 2023 (UTC)
- @Gnomingstuff, the "certain conditions" which I was referring to, were, as I mentioned later, for example, CC-BY licensed maps that include a "© Instituto Geográfico Nacional. All rights reserved. Total or partial reproduction banned" text, because their initial publication date predated when they were released under a free license. I hope that before deleting a file, its history is carefully checked, that it's cheked if there are other files from the same source in Commons, etc. But anyway, the risk still exists, and administrators can have to do much research work in case of such a deletion nomination.
- Also, talking about stock image sites, as you mentioned, it has happened that people have uplodaded public domain photos from Commons to stock photo sites, without mentioning they were public domain (this is absolutely legal: public domain imposes no obligation), and they were later deleted from Commons as Copyvio, because they were present at a stock photo site (I read about this sime time ago, sorry but I can't find the source now). If they were verified as public domain in the moment when they were included in Commons, this wouldn't have happened (its presence on a stock photo site should raise the alarm, but then who was right should be carefully investigated, not automatically admitting the uploader to the stock photo site was right, without even consulting the site's owners). MGeog2022 (talk) 13:08, 19 December 2023 (UTC)
- creating another backlog another hundreds of thousands of files: that is the last of my intentions: having many unverified files is no problem, since all files are unverified now. The idea is to have as many verified files as possible (it would be easier for new files: they could be patrolled and verified at the same time), priorizing those who may potentially have more problems, and those that are deemed most important, or whose uploader (or other user) requests them to be verified. MGeog2022 (talk) 13:13, 19 December 2023 (UTC)
- To clarify my example, I'm not saying that IGN will cease to exist tomorrow: I know this won't happen. But please think about the following scenarios:
- • It's deemed that https://centrodedescargas.cnig.es/ URL is too long: it's changed to www.cnig.es, so the original URL exists no more.
- • IGN decides that there's no need for a separate institution (CNIG) for cartography distribution. CNIG integrates into IGN, so https://centrodedescargas.cnig.es/ ceases to exist as well.
- • (I hope this never happens) The government considers the production of new maps too expensive, so it charges a tax on commercial use of new cartography, and new maps aren't CC-BY licensed. Only a obscure notice at the website says: "Maps published before 20XX are CC-BY licensed", while "All rights reserved" text is clearly visible.
- • EU countries join their national mapping agencies into a unified European one. IGN ceases to exist as such.
- In any of these cases, if an administrator who is not familiar with IGN sees a map that includes "© 2011 IGN. All rights reserved", could possibly delete the file (it perhaps could happen even now, though I hope due care is always taken). IGN maps are in Commons thanks to talks between Wikimedia Spain and IGN (see here; in Spanish), and thousands of maps have been uploaded since by many users. I think we should avoid to risk losing any of them.
- Apart from file verification for third-party works, I think that a notice such as "This image wasn't found in Google Images as of 22 December 2023" would be a good thing for user-created works, to avoid such things as photos being "stolen" by uploaders to stock photo sites from happening (with a user's own work, we never can be 100% sure, but this would indicate that a more detailed and calm investigation is needed before deletion). MGeog2022 (talk) 13:57, 22 December 2023 (UTC)
- creating another backlog another hundreds of thousands of files: that is the last of my intentions: having many unverified files is no problem, since all files are unverified now. The idea is to have as many verified files as possible (it would be easier for new files: they could be patrolled and verified at the same time), priorizing those who may potentially have more problems, and those that are deemed most important, or whose uploader (or other user) requests them to be verified. MGeog2022 (talk) 13:13, 19 December 2023 (UTC)
Bots[edit]
Description of the Problem[edit]
- Problem description:
Some bots don't do what they used to do.
- Proposal type: bugfix / feature request / process request
feature request
- Proposed solution:
Provide more support to the bot maintainers, add bot maintainers, or bring the bots into WMF management
- Phabricator ticket:
- None yet.
- T339145
- Further remarks:
List of such bots and undone tasks:
- User:SteinsplitterBot: Maintenance of reports like Commons:Database reports/Abuse filter effectiveness, which has not been updated since 00:43, 06 October 2020 (UTC). Updates requested of User:Steinsplitter 11:42, 25 September 2022 (UTC) in an ignored post archived to User talk:Steinsplitter/Archive/2022#Commons:Database reports/Abuse filter effectiveness. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 01:05, 10 December 2023 (UTC)
- Commons deletion notification bot, which notifies talk pages on other WMF wikis about images that are up for deletion on Commons, has been broken since 2023-06-06. See T339145. Toohool (talk) 19:13, 10 December 2023 (UTC)
- @Toohool: MusikAnimal (WMF) started that task. It has needed discussion since Jun 21 2023, 10:07 AM. This is what can happen under WMF management. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 19:23, 10 December 2023 (UTC)
Drafted by — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 01:05, 10 December 2023 (UTC)
- "Some bots don't do what they used to do." This is a very bad problem description. It casts a very wide net, that has no defined boundaries to the problem to be solved. It is much better to have specific items for specific bots. —TheDJ (talk • contribs) 12:16, 18 December 2023 (UTC)
- @TheDJ: They are not maintaining the site in the manner to which we have become accustomed. I am sure that more will be listed. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 13:46, 18 December 2023 (UTC)
Discussion[edit]
- I want to run a bot but there seem to be too many things to clear. I would appreciate if somebody creats a more friendly tutorial for beginners. --トトト (talk) 13:06, 18 December 2023 (UTC)
File upload stability[edit]
Description of the Problem[edit]
- Problem description: When uploading files using the UploadWizard or the API users experience very frequent problems resulting in aborted uploads or broken files. When the error is not recognized broken files or file description page info might be lost for Commons. If they are recognized they are very inconvenient to the uploads resulting in long term term contributors leaving or scaring new contributors.
- Proposal type: bugfix
- Proposed solution: Define the goal that only 1:10000 uploads using the API should fail because of server side problems. Only 1:1000 uploads should fail when uploading in the web browser because of server or website errors.
- Phabricator ticket: There are multiple tickets on the various problems: Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw Files occasionally getting uploaded to Commons without file pages
- Further remarks: Feel free to add other relevant tickets. GPSLeo (talk) 14:05, 9 December 2023 (UTC)
Discussion[edit]
- Diesem Vorschlag schließe ich mich aus tiefstem Herzen an. Insbesondere der UploadWizard könnte die Server-Fehlermeldungen viel verständlicher darstellen und viele auch besser abfangen. Ich möchte auch nochmals auf das Android-Tool Offroader hinweisen, das zeigt, wie stabil Uploads auf Commons mit der vorhandenen Server-Implementierung selbst unter widrigsten Bedingungen sein können, dass ein abgebrochener Upload ohne weiteres - auch auf einem anderen Gerät und mit einem anderen Internetzugang fortgesetzt werden kann, dass Uploads auf Fehlerfreiheit verifiziert werden können, dass Duplikate bereits vor Beginn eines Uploads erkannt und verhindert werden können und das - als Hilfe fürs Entwickeln, die Server-Meldungen während eines Uploads mitschneiden kann für ein PostMortem. --C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 18:41, 9 December 2023 (UTC)
- Yes, this is sorely needed! I like the idea of having target metrics especially. Nosferattus (talk) 17:40, 21 December 2023 (UTC)
Taking on certain upload tools[edit]
Description of the Problem[edit]
- Problem description: Certain tools, many of which are not part of the mediawiki itself, are nonetheless very basic for people who upload files to Commons. Many of these are currently each maintained by a single individual. We need a plan for more robust maintenance of these over time.
- Proposal type: process request
- Proposed solution: a program manager at WMF should be responsible for a plan for maintenance (or replacement) of these tools going forward. I (Jmabel) am not trying to dictate a particular technical solution here, just to have some entity that is not "the community" take primary responsibility. If this is best done by a paid team at WMF, great. If this is best done by a better-organized and "deeper" pool of volunteers, great. And some might best be left to exactly whoever is doing them now, but if that is a single individual we need at least a plan as to what should happen if that individual becomes unavailable. If it's some mix of the above, or even third parties like the Flickr Foundation, great. And if individuals want to contribute on their own, and the community can adopt their tools or not, that's also great. But I think we need program management from within WMF so that someone has the job of making overall status visible and making sure the ball doesn't get dropped.
Initially, we need to identify what tools would have this status. People are welcome to add to this initial list (and/or clarify situations), but please stick to existing (or previously existing and now broken) tools used by contributors who upload content.
- Special:UploadWizard: as I understand it, this is part of mediawiki, and is already maintained by WMF staff
- Special:Upload: as I understand it, this is part of mediawiki, and is already maintained by WMF staff
- Uploading apps for mobile devices (I know nothing here, I never use them, can someone please fill this in?)
- Flickr2Commons: the Flickr Foundation has already taken on the task of replacing this with a more robust tool, which I think means this is well covered
- Batch uploader(s) (programs running on a PC): there have been several of these over the years, notably Commonist, which I believe is dead. I have no idea of the current status here
- Pattypan: for batch upload via spreadsheets, some issues but working, developed by Yarl and maintained by Abbe98
- Vicuna Uploader
- tool(s) for mass uploads from GLAMs or other databases of file content: I have no idea of the status of these
- Video2Commons: especially important because of its ability to convert file formats. This is often broken in one or another degree. See phab:T353659
- CropTool: (rotating and cropping, either for overwrite or for a new file). Currently in danger of breaking because the Grid Engine is about to go away and no one has dealt with this.
- Url2Commons: for direct upload from the given URL: written by Magnus Manske but not actively maintained (many unresolved issues)
- Commons:derivativeFX, tool at https://iw.toolforge.org/derivative: to easily upload derivative works
- IA-upload: used to upload PD works on the Internet Archive to Commons as DJVU files. Some commons issues like: phab:T300761.
- The API itself using pywikibot or custom scripts
- Phabricator ticket:
- Further remarks: I'm very open to "sympathetic edits" to the above proposal, but reserve the right to revert edits that I think hijack my proposal to be something else. - Jmabel ! talk 22:31, 6 December 2023 (UTC)
- I have added some tools. — Draceane talkcontrib. 09:31, 7 December 2023 (UTC)
- I added one too. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 00:39, 10 December 2023 (UTC)
- In thinking about uploads, it is worth considering various (overlapping, variously combined) groups of users. Some of the considerations include:
- Experienced or not
- PC vs. tablet vs. phone
- Uploading own photos vs. GLAM content vs. other third party
- Uploading photos where many photos share a description etc., vs. each being unique
- Jmabel ! talk 19:13, 17 December 2023 (UTC)
- @Jmabel: With the ideal tool, everything entered by the user should be sharable in the upload session: all or part of the description, source, author, templates, cats, freeform stuff after the description, freeform stuff before the cats... This could follow the model of the granularity of global preferences vs. local preferences. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 22:25, 17 December 2023 (UTC)
- Jmabel ! talk 19:13, 17 December 2023 (UTC)
Discussion[edit]
@Jmabel: (or anyone else). Is there some reason why these things are done through third party solutions instead of just being integrated into the website to begin with? Like is there a reason it's better to have the WMF maintain the CropTool instead of them just making cropping an actual feature of mediawiki? --Adamant1 (talk) 11:15, 7 December 2023 (UTC)
- If this tools will also be available via the API then there is no reason to not make them a feature of mediawiki. But batch uploads via a GUI only tool is no fun. C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 16:33, 7 December 2023 (UTC)
- As I say, I'm not prejudging the technical solution here. Obviously, if something can be brought into mediawiki and provide essentially the existing capability, that's great, and also benefits other sites using mediawiki. What I am saying is that for Commons, all of the above constitute part of the core functionality that we provide to uploaders, and that this deserves the same level of program management and, ultimately, robustness as the content editing that is core functionality across the sister projects. - 18:46, 7 December 2023 (UTC)
- Thanks for the clarification. I'm certainly not against the proposal. I was just wondering about the trade offs between having them manage the applications in house versus just building similar features into mediawiki. I guess they aren't mutually exclusive though. --Adamant1 (talk) 13:27, 9 December 2023 (UTC)
- As I say, I'm not prejudging the technical solution here. Obviously, if something can be brought into mediawiki and provide essentially the existing capability, that's great, and also benefits other sites using mediawiki. What I am saying is that for Commons, all of the above constitute part of the core functionality that we provide to uploaders, and that this deserves the same level of program management and, ultimately, robustness as the content editing that is core functionality across the sister projects. - 18:46, 7 December 2023 (UTC)
- @Adamant1 Working tools the WMF deems useful for all MediaWiki installations are in Core. Working tools the WMF deems useful for some MediaWiki installations are in Extensions. Working tools the WMF deems useful for all WMF MediaWiki installations are in WMF Builds. Working tools developed by others who saw a need and filled it could be upgraded to any of the above. As far as I know. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 00:49, 10 December 2023 (UTC)