Data underlying the BSc project: "An analysis of Java release practices on GitHub"

doi:10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d
Datacite citation style:
Roest, Vivian (2024): Data underlying the BSc project: "An analysis of Java release practices on GitHub". Version 1. 4TU.ResearchData. software. https://doi.org/10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Software
Delft University of Technology logo
usage stats
115
views
56
downloads
time coverage
2023
licence
cc-0.png logo CC0

This dataset contains the following inside a tar.zst file:

  1. A list of all Java repositories on GitHub in a CSV format
  2. The POM.xml file from those repositories if there was one at the root of the repo
  3. A sample of 500 000 repositories that
  4. Have been searched recursively for POM.xml files
  5. Of those that have a POM.xml file an 'effective' POM.xml has been created
  6. Of those that have distribution repositories configured, GitHub workflow files if they exist
  7. a report.json file that contains aggregate information of the sample


The scraper written to retrieve this data is also included.


This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.

history
  • 2024-01-29 first online, published, posted
publisher
4TU.ResearchData
format
tar.zst of xml, csv and json files, and git repo of source code
organizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science

DATA

To access the source code, use the following command:

git clone https://data.4tu.nl/v3/datasets/f38d8fdd-fe95-4bd4-968e-71db238de45e.git

Or download the latest commit as a ZIP.

files (2)