1. Introductory Information:
o  Title of the dataset: Engineering Complexity Beyond the surface - Discerning the Viewpoints, the Drivers and The Challenges
Authors: G.A. Garza Morales, K. Nizamis, G.M. Bonnema
Systems Engineering and Multidisciplinary Design Group, Design, Production, and Management Department, Faculty of Engineering Technology, University of Twente
Corresponding author: K. Nizamis

Contact Information:
k.nizamis@utwente.nl
University of Twente - Faculty of Engineering Technology
Horst Complex building, number 20
PO Box 217
7500 AE Enschede
The Netherlands

*** General introduction ***
This dataset contains the data from the identification, screening, and selection of the papers for a Systematic Mapping Study (SMS) into the Coverage of Complexity Viewpoints, Drivers, and Challenges in the context of Systems Engineering and Engineering Design. These three items are defined in the main publication. It also contains the coding scheme and the code-document analysis reports in detail.
o  Content: 13 files:
	a) Identification, screening, and selection of papers --> Since the SMS included six parallel literature searches, there are six individual files for identifying, screening, and selecting the publications for each solution direction covering the first five steps in the process, which are:
	1) finding papers in databases, 2) removing duplicates, 3) inspect title, 4) inspect abstract, and 5) determine accessibility and scan full text. The following six files corresponding to the six solution directions we explored are as follows (the five tabs in each file correspond to the aforementioned five steps):
		1. DSM_Identification and Screening_Steps 1 To 5.xlsx 
		2. Model_Identification and Screening_Steps 1 To 5.xlsx
		3. Knowledge_Identification and Screening_Steps 1 To 5.xlsx
		4. Process_Identification and Screening_Steps 1 To 5.xlsx
		5. Product_Identification and Screening_Steps 1 To 5.xlsx
		6. Tool_Identification and Screening_Steps 1 To 5.xlsx
	The six subsets summed 386 publications, which in the main publication are referred as the screen pre-subset.

	b) Concentrate_FINAL_With Random.xlsx --> This file contains the final concentrate of publications. The resulting publications of the six individual searches were put together (Tab: "Total386papers") and removing duplicates (Tab: "NoDUPTotal373papers"), which resulted in 373 unique titles (referred to as Screened subset in our publication). These were randomly selected per category applying resulting in a final selection of 135 papers (Tab: "NoDUPRandomTot135papers" and referred to as Random selection subset in our publication).
	The other tabs have graphs of the 135 publications showing them per year, per type, per country, and per source. 
	
	c) Tables and Figures_Final_Count Viewpoint,Drivers,Challenges.xlsx --> Contains several tabs with the processess to analyze, count, and create the tables and figures of the main publication and the thesis. The tabs explicitly say which tables or figures they are linked to.
	
	d) ECBS_3.docx --> Word file with supplementary information about how the mapping was done for the coverage analysis of the complexity drivers. The 135 references in this file constitute the random selection.
	
	e) ComplexityDrivers_RelationshipMap.pdf --> A network created in Atlas.Ti software to showcase some of the relationships found between the various complexity drivers. 
	
	f) Reference Model 2021-Final.pdf --> A reference model showing many factors studied and which led to the creation of the SSPT framework (System, Social, Process, Tooling). 
	
	g) Code-Document-Analysis-Viewpoints, Drivers, and Challenges.xlsx --> Code-Document report details the 72 documents and the codes assigned to each one in binary representation. This information was used to quantify the identified complexity taxonomy for the viewpoints and the challenges (for the drivers, the word file described in d) was used). Description of the individual codes can be found in the main publication: Engineering complexity beyond the surface: discerning the viewpoints, the drivers, and the challenges (https://doi.org/10.1007/s00163-023-00411-9) and is included in detail in the sheet called: "info" 
	The following tabs contain all the quotations per grouped viewpoints, drivers, and challenges and the respective information of the document of origin. Each of these quotes constitute the binarized results shown in the Code-Document tab.
	
	h) SMS FINAL Engineering Complexity Beyond The Surface.qdpx --> Source file obtained from Atlas.TI QDA (qualitative data analysis) software. This file has all the documents and coding used for our review. allows to move entire projects from one software to another. The standard exchange file is XML based and therefore not just allows specialised QDA software to open it but allows any software that is able to process XML to access the data.
	

2. Methodological Information:
	a)Review type selection: We conducted a systematic mapping study (SMS).
		Step 1: Definition of research questions. The three questions from the framework (see original publication) are used.
		Step 2: Conducting the search. To avoid using the topic keyword of complexity, which is too generic, we built up on our previous work and use the six most relevant categories related to complexity solution approaches:
			1. Model-based
			2. Knowledge-based
			3. Process-based
			4. Tool-based
			5. Matrix-based (e.g., DSMs)
			6. Product (lifecycle)- based
			Keywords were selected for each. Apart from the main topic keywords, context keywords were selected to reduce the expected several thousand hits. The context keywords reflect the main areas of study namely “engineering design” and “systems engineering”. not sufficient, a second group of context words related to the multidisciplinary/interdisciplinary aspect was used. The keywords are detailed in the main publication.

	b) Database selection: The queries were sought in five relevant research databases: Wiley, IEEE Xplore digital library, ACM digital library, Springer Link Digital library, and Scopus.
	We found 5617 papers, which were subject to the screening process. This paper set was retrieved from the libraries on the 6th of June 2020, therefore any additions to the libraries after this date  are excluded from our study.
	
	c) Basic inclusion criteria:
		i. Publication period: To find current challenges, we limited to the last 6 years, between 2014–2020.
		ii. Title, abstract and keywords: If possible, the search engines were configured for title, abstract and keywords.

	d) Paper screening. The screening of the papers included sub-steps described below:
		i.. Remove duplicates per category
		ii. Screen based on inclusion/exclusion criteria:
		(a) Contextual relevance: discussion of complexity topics in engineering design and/or systems engineering.
		(b) Relevance for research questions: Screening of the titles, the abstracts and full text to find:
			• Review/experience report: substantiated challenges, limitations, discussions of state of the art or practice.
			• For primary papers: complexity in the problem domain with sufficient quality (has related research and discussion and conclusion sections).
		(c) Language: English
		(d) Peer reviewed
		(e) Evolving publications: If a paper has been published several times, we used the newer/longer version. 
	The screening of the papers was done in several sub-steps and after performing the screening process, our result set contained 386 papers. Because the searches are conducted separately, some papers appear in more than one category. Due to that, out of the 386 papers, we had 373 unique titles.
	
	e) Random selection of screened subset. Our SMS is more comprehensive than a purely quantitative SMS. To manage the workload, we considered sufficient to make a random selection. The random selection would be either 50% of the papers or 30 papers per category, whichever results in a lower number. The final subset and input to the SMS had 135 papers. The 135 files correspond to those presented in the files ECBS_3.docx or Code-Document-Analysis-Viewpoints, Drivers, and Challenges.xlsx. 

	f) Keywording using abstracts, introduction, discussion and conclusions and data extraction and mapping process. For the SMS we extract and code the data from the random selection subset. To manage the effort as efficiently as possible we limited the scanning of the papers to the introduction, background, discussion, and conclusion sections. The mapping was conducted using the software, Atlas.Ti version 22.4

	
	
3. Data-specific Information: 
	FILES: Identification, screening, and selection of papers excel files (see introduction bullet a)
	The five tabs in each file correspond to following five steps in the process: 1) finding papers in databases, 2) removing duplicates, 3) inspect title, 4) inspect abstract, and 5) determine accessibility and scan full text.
	
	FILE: Code-Document-Analysis-Viewpoints, Drivers, and Challenges.xlsx
	All information about the coding and quantification of the data can be found in the "info" Tab.

	FILE: Concentrate_FINAL_With Random.xlsx
	Tab: "Total386papers": The resulting publications of the six individual searches were put together (386 titles)
	Tab: "NoDUPTotal373papers": Results after removing duplicates (373 unique titles) (
	Tab: "NoDUPRandomTot135papers": Randomly selected per category applying resulting in a final selection of 135 papers.
	The other tabs have graphs of the 135 publications showing them per year, per type, per country, and per source.
	
	FILES: Tables and Figures_Final_Count Viewpoint,Drivers,Challenges.xlsx, ECBS_3.docx, Reference Model 2021-Final.pdf, and ComplexityDrivers_RelationshipMap.pdf are detailed in the introduction.
	
	FILE: SMS FINAL Engineering Complexity Beyond The Surface.qdpx
	The “.qdpx” file that is created for the exchange, is basically a .zip file.
	This file contains a folder “sources” with the plain documents and one file called “project.qde” which is an XML.
	The XML file contains an element <codebook> which lists all the <codes> used for annotation. 
	The element <sources> contains a list of the documents (the element <TextSource>) and the <codings> within each document.
	The codings are defined by start and end position in the linked document.
	Furthermore, elements <note> for memos and <sets> for document groups and variables are available.
	
4. Sharing and Access information.
	Public Domain Dedication (CC0)
	CC0 (Creative Commons Zero) enables scientists and other creators and owners of copyright- or database-protected content to waive those interests in their works. This means that they place their work as completely as possible in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restriction under copyright or database law. 4TU.ResearchData has adopted CC0 as the default means for researchers to share their datasets. In many cases, it can be difficult to ascertain whether a dataset is subject to copyright law, as many types of data aren’t copyrightable in many jurisdictions. Putting a dataset in the public domain under CC0 is a way to remove any legal doubt about whether researchers can use the data in their projects. This leads to the enrichment of open datasets and further dissemination of knowledge.
	Attribution: Although CC0 doesn’t legally require users of the data to cite the source, it's best practice and good science to give proper credit to the original creator(s). Be aware that not citing the research data you’re using, could be considered plagiarism, which would compromise your reputation and the credibility of your work.