Data underlying the BSc project: "An analysis of Java release practices on GitHub"
doi:10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future.
For a link that will always point to the latest version, please use
doi: 10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d
doi: 10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d
Datacite citation style:
Roest, Vivian (2024): Data underlying the BSc project: "An analysis of Java release practices on GitHub". Version 1. 4TU.ResearchData. software. https://doi.org/10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Software
This dataset contains the following inside a tar.zst file:
- A list of all Java repositories on GitHub in a CSV format
- The POM.xml file from those repositories if there was one at the root of the repo
- A sample of 500 000 repositories that
- Have been searched recursively for POM.xml files
- Of those that have a POM.xml file an 'effective' POM.xml has been created
- Of those that have distribution repositories configured, GitHub workflow files if they exist
- a report.json file that contains aggregate information of the sample
The scraper written to retrieve this data is also included.
This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.
history
- 2024-01-29 first online, published, posted
publisher
4TU.ResearchData
format
tar.zst of xml, csv and json files, and git repo of source code
organizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science
DATA
To access the source code, use the following command:
git clone https://data.4tu.nl/v3/datasets/f38d8fdd-fe95-4bd4-968e-71db238de45e.git
files (2)
- 2,075 bytesMD5:
d753f6159cf10988a8ba713a0240ab1f
README.md - 3,539,608,057 bytesMD5:
06c77a540a6f79956b0442b958ea2819
data.tar.zst -
download all files (zip)
3,539,610,132 bytes unzipped