cff-version: 1.2.0 abstract: "

This dataset contains the following inside a tar.zst file:

  1. A list of all Java repositories on GitHub in a CSV format
  2. The POM.xml file from those repositories if there was one at the root of the repo
  3. A sample of 500 000 repositories that
  4. Have been searched recursively for POM.xml files
  5. Of those that have a POM.xml file an 'effective' POM.xml has been created
  6. Of those that have distribution repositories configured, GitHub workflow files if they exist
  7. a report.json file that contains aggregate information of the sample


The scraper written to retrieve this data is also included.


This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.

" authors: - family-names: Roest given-names: Vivian orcid: "https://orcid.org/0009-0005-2351-6602" title: "Data underlying the BSc project: "An analysis of Java release practices on GitHub"" keywords: version: 1 identifiers: - type: doi value: 10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d.v1 license: CC0 date-released: 2024-01-29