skip to main content
10.1145/3180155.3182515acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
abstract

Data scientists in software teams: state of the art and challenges

Published: 27 May 2018 Publication History

Abstract

The demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams. For example, Face-book, LinkedIn and Microsoft are creating a new career path for data scientists. In this paper, we present a large-scale survey with 793 professional data scientists at Microsoft to understand their educational background, problem topics that they work on, tool usages, and activities. We cluster these data scientists based on the time spent for various activities and identify 9 distinct clusters of data scientists and their corresponding characteristics. We also discuss the challenges that they face and the best practices they share with other data scientists. Our study finds several trends about data scientists in the software engineering context at Microsoft, and should inform managers on how to leverage data science capability effectively within their teams.

Reference

[1]
Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. Data Scientists in Software Teams: State of the Art and Challenges. IEEE Transactions in Software Engineering. To appear.

Cited By

View all
  • (2023)Run-Time Prevention of Software Integration Failures of Machine Learning APIsProceedings of the ACM on Programming Languages10.1145/36228067:OOPSLA2(264-291)Online publication date: 16-Oct-2023
  • (2023)MELT: Mining Effective Lightweight Transformations from Pull Requests2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00117(1516-1528)Online publication date: 11-Sep-2023
  • (2021)PATSQLProceedings of the VLDB Endowment10.14778/3476249.347625314:11(1937-1949)Online publication date: 1-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '18: Proceedings of the 40th International Conference on Software Engineering
May 2018
1307 pages
ISBN:9781450356381
DOI:10.1145/3180155
  • Conference Chair:
  • Michel Chaudron,
  • General Chair:
  • Ivica Crnkovic,
  • Program Chairs:
  • Marsha Chechik,
  • Mark Harman
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Check for updates

Author Tags

  1. data science
  2. software productivity

Qualifiers

  • Abstract

Conference

ICSE '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Run-Time Prevention of Software Integration Failures of Machine Learning APIsProceedings of the ACM on Programming Languages10.1145/36228067:OOPSLA2(264-291)Online publication date: 16-Oct-2023
  • (2023)MELT: Mining Effective Lightweight Transformations from Pull Requests2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00117(1516-1528)Online publication date: 11-Sep-2023
  • (2021)PATSQLProceedings of the VLDB Endowment10.14778/3476249.347625314:11(1937-1949)Online publication date: 1-Jul-2021
  • (2021)How Teams Communicate about the Quality of ML Models: A Case Study at an International Technology CompanyProceedings of the ACM on Human-Computer Interaction10.1145/34639345:GROUP(1-24)Online publication date: 13-Jul-2021
  • (2021)Are Machine Learning Cloud APIs Used Correctly?Proceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00024(125-137)Online publication date: 22-May-2021
  • (2020)Adopting Agile Software Development Methodologies in Big Data Projects – a Systematic Literature Review of Experience Reports2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378118(2028-2033)Online publication date: 10-Dec-2020
  • (2019)Integrating runtime data with development data to monitor external quality: challenges from practiceProceedings of the 2nd ACM SIGSOFT International Workshop on Software Qualities and Their Dependencies10.1145/3340495.3342752(20-26)Online publication date: 26-Aug-2019
  • (2019)Artificial intelligence meets software engineering in the classroomProceedings of the 1st ACM SIGSOFT International Workshop on Education through Advanced Software Engineering and Artificial Intelligence10.1145/3340435.3342718(35-38)Online publication date: 26-Aug-2019
  • (2019)Data quality in ETL processProcedia Computer Science10.1016/j.procs.2019.09.223159:C(676-687)Online publication date: 1-Jan-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media