skip to main content
10.1145/3531146.3533113acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open access

Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits

Published: 20 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Recent years have seen the development of many open-source ML fairness toolkits aimed at helping ML practitioners assess and address unfairness in their systems. However, there has been little research investigating how ML practitioners actually use these toolkits in practice. In this paper, we conducted the first in-depth empirical exploration of how industry practitioners (try to) work with existing fairness toolkits. In particular, we conducted think-aloud interviews to understand how participants learn about and use fairness toolkits, and explored the generality of our findings through an anonymous online survey. We identified several opportunities for fairness toolkits to better address practitioner needs and scaffold them in using toolkits effectively and responsibly. Based on these findings, we highlight implications for the design of future open-source fairness toolkits that can support practitioners in better contextualizing, communicating, and collaborating around ML fairness efforts.

    References

    [1]
    2017. Facets - visualizations for ML datasets.arXiv:1810.01943https://pair-code.github.io/facets/
    [2]
    2021. People AI Guidebook. (2021). https://pair.withgoogle.com/guidebook/
    [3]
    Martín Abadi and Ashish Agarwal et al.2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
    [4]
    Julius A Adebayo 2016. FairML: ToolBox for diagnosing bias in predictive modeling. Ph. D. Dissertation. Massachusetts Institute of Technology.
    [5]
    Yongsu Ahn and Yu-Ru Lin. 2019. Fairsight: Visual analytics for fairness in decision making. IEEE transactions on visualization and computer graphics 26, 1(2019), 1086–1095.
    [6]
    Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. 2019. Discrimination through optimization: How Facebook’s Ad delivery can lead to biased outcomes. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–30.
    [7]
    Oscar Alvarado and Annika Waern. 2018. Towards algorithmic experience: Initial efforts for social media contexts. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.
    [8]
    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233
    [9]
    McKane Andrus, Elena Spitzer, Jeffrey Brown, and Alice Xiang. 2021. What We Can’t Measure, We Can’t Understand: Challenges to Demographic Data Procurement in the Pursuit of Fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 249–260.
    [10]
    Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica, May 23(2016), 2016.
    [11]
    Matthew Arnold, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilović, Ravi Nair, K Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, 2019. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development 63, 4/5 (2019), 6–1.
    [12]
    Joshua Asplund, Motahhare Eslami, Hari Sundaram, Christian Sandvig, and Karrie Karahalios. 2020. Auditing race and gender discrimination in online housing markets. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 24–35.
    [13]
    Niels Bantilan. 2018. Themis-ml: A fairness-aware machine learning interface for end-to-end discrimination discovery and mitigation. Journal of Technology in Human Services 36, 1 (2018), 15–30.
    [14]
    Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilović, 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development 63, 4/5 (2019), 4–1.
    [15]
    Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610–623.
    [16]
    Sarah Bird, Miro Dudík, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. 2020. Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32. Microsoft. https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/
    [17]
    Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems 29 (2016), 4349–4357.
    [18]
    Nigel Bosch, Sidney K D’Mello, Ryan S Baker, Jaclyn Ocumpaugh, Valerie Shute, Matthew Ventura, Lubin Wang, and Weinan Zhao. 2016. Detecting student emotions in computer-enabled classrooms. In IJCAI. 4125–4129.
    [19]
    Karen Boyd. 2021. Datasheets for Datasets help ML Engineers Notice and Understand Ethical Issues in Training Data. Proceedings of the ACM on Human-Computer Interaction 5 (2021), 1 – 27.
    [20]
    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.
    [21]
    Sue Lacey Bryant, Andrea Forte, and Amy Bruckman. 2005. Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In GROUP ’05.
    [22]
    Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency(Proceedings of Machine Learning Research, Vol. 81), Sorelle A. Friedler and Christo Wilson (Eds.). PMLR, 77–91. https://proceedings.mlr.press/v81/buolamwini18a.html
    [23]
    Barbara Katrina Burian. 2006. Design Guidance for Emergency and Abnormal Checklists in Aviation. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 50(2006), 106 – 110.
    [24]
    Ángel Alexander Cabrera, Abraham J Druck, Jason I Hong, and Adam Perer. 2021. Discovering and Validating AI Errors With Crowdsourced Failure Reports. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2(2021), 1–22.
    [25]
    Ángel Alexander Cabrera, Will Epperson, Fred Hohman, Minsuk Kahng, Jamie Morgenstern, and Duen Horng Chau. 2019. FairVis: Visual analytics for discovering intersectional bias in machine learning. In 2019 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 46–56.
    [26]
    Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153–163.
    [27]
    Victoria Clarke and Virginia Braun. 2014. Thematic analysis. In Encyclopedia of critical psychology. Springer, 1947–1952.
    [28]
    Jennifer Cobbe, Michelle Seng Ah Lee, and Jatinder Singh. 2021. Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 598–609. https://doi.org/10.1145/3442188.3445921
    [29]
    Paulo Cortez. [n. d.]. Student Performance Dataset. https://archive.ics.uci.edu/ml/datasets/Student+Performance
    [30]
    Paulo Cortez and Alice Maria Gonçalves Silva. 2008. Using data mining to predict secondary school student performance. (2008).
    [31]
    Sophia T. Dasch, Vincent Rice, Venkat R. Lakshminarayanan, Taiwo A. Togun, C. Malik Boykin, and Sarah M. Brown. 2020. Opportunities for a More Interdisciplinary Approach to Perceptions of Fairness in Machine Learning.
    [32]
    Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. 2020. A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores. Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376638
    [33]
    Emily Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, and Hilary Nicole. 2021. On the genealogy of machine learning datasets: A critical history of ImageNet. Big Data & Society 8, 2 (2021), 20539517211035955.
    [34]
    Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.
    [35]
    Brian Ellis, Jeffrey Stylos, and Brad Myers. 2007. The factory pattern in API design: A usability evaluation. In 29th International Conference on Software Engineering (ICSE’07). IEEE, 302–312.
    [36]
    Motahhare Eslami, Aimee Rickman, Kristen Vaccaro, Amirhossein Aleyasen, Andy Vuong, Karrie Karahalios, Kevin Hamilton, and Christian Sandvig. 2015. ” I always assumed that I wasn’t really that close to [her]” Reasoning about Invisible Algorithms in News Feeds. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 153–162.
    [37]
    Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural networks. nature 542, 7639 (2017), 115–118.
    [38]
    Avi Feller, Emma Pierson, Sam Corbett-Davies, and Sharad Goel. 2016. A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear. The Washington Post (2016).
    [39]
    Lincoln H. Forbes and Syed M. Ahmed. 2010. Modern Construction : Lean Project Delivery and Integrated Practices.
    [40]
    Sorelle A Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P Hamilton, and Derek Roth. 2019. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency. 329–338.
    [41]
    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92.
    [42]
    Soumya Ghosh, Q Vera Liao, Karthikeyan Natesan Ramamurthy, Jiri Navratil, Prasanna Sattigeri, Kush R Varshney, and Yunfeng Zhang. 2021. Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI. arXiv preprint arXiv:2106.01410(2021).
    [43]
    Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019). https://doi.org/10.1145/3359152
    [44]
    Nina Grgic-Hlaca, Elissa M. Redmiles, Krishna P. Gummadi, and Adrian Weller. 2018. Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction. In Proceedings of the 2018 World Wide Web Conference(Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 903–912. https://doi.org/10.1145/3178876.3186138
    [45]
    Philip Guo. 2021. Ten Million Users and Ten Years Later: Python Tutor’s Design Guidelines for Building Scalable and Sustainable Research Software in Academia. In The 34th Annual ACM Symposium on User Interface Software and Technology. 1235–1251.
    [46]
    Brigette M. Hales and Peter J. Pronovost. 2006. The checklist–a tool for error management and performance improvement.Journal of critical care 21 3 (2006), 231–5.
    [47]
    Aaron Halfaker, R Stuart Geiger, Jonathan T Morgan, and John Riedl. 2013. The rise and decline of an open collaboration system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist 57, 5 (2013), 664–688.
    [48]
    Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016), 3315–3323.
    [49]
    Douglas D. Heckathorn. 2011. Comment: Snowball versus Respondent-Driven Sampling. Sociological Methodology 41, 1 (2011), 355–366. https://doi.org/10.1111/j.1467-9531.2011.01244.x arXiv:https://doi.org/10.1111/j.1467-9531.2011.01244.xPMID: 22228916.
    [50]
    Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. ArXiv abs/1805.03677(2018).
    [51]
    Kenneth Holstein and Vincent Aleven. 2021. Designing for human-AI complementarity in K-12 education. arXiv preprint arXiv:2104.01266(2021).
    [52]
    Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–16.
    [53]
    Naja Holten Møller, Irina Shklovski, and Thomas T. Hildebrandt. 2020. Shifting Concepts of Value: Designing Algorithmic Decision-Support Systems for Public Services. NordiCHI (2020), 1–12. https://doi.org/10.1145/3419249.3420149
    [54]
    Knut T Hufthammer, Tor H Aasheim, Sølve Ånneland, Håvard Brynjulfsen, and Marija Slavkovik. 2020. Bias mitigation with AIF360: A comparative study. In Norsk IKT-konferanse for forskning og utdanning.
    [55]
    Brittany Johnson, Jesse Bartola, Rico Angell, Katherine Keith, Sam Witty, Stephen J Giguere, and Yuriy Brun. 2020. Fairkit, Fairkit, on the Wall, Who’s the Fairest of Them All? Supporting Data Scientists in Training Fair Models. arXiv preprint arXiv:2012.09951(2020).
    [56]
    Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning. PMLR, 2564–2572.
    [57]
    Jon Kleinberg. 2018. Inherent trade-offs in algorithmic fairness. In Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems. 40–40.
    [58]
    Andrew J Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, 2011. The state of the art in end-user software engineering. ACM Computing Surveys (CSUR) 43, 3 (2011), 1–44.
    [59]
    Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences 117, 14(2020), 7684–7689.
    [60]
    Julia Kupis, Sydney Johnson, Gregory M. Hallihan, and Dana Lee Olstad. 2019. Assessing the Usability of the Automated Self-Administered Dietary Assessment Tool (ASA24) among Low-Income Adults. Nutrients 11(2019).
    [61]
    Min Kyung Lee, Daniel Kusbit, Anson Kahng, Ji Tae Kim, Xinran Yuan, Allissa Chan, Daniel See, Ritesh Noothigattu, Siheon Lee, Alexandros Psomas, 2019. WeBuildAI: Participatory framework for algorithmic governance. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–35.
    [62]
    Michelle Seng Ah Lee, Luciano Floridi, and Jatinder Singh. 2021. Formalising trade-offs beyond algorithmic fairness: lessons from ethical philosophy and welfare economics. AI and Ethics 1, 4 (2021), 529–544.
    [63]
    Michelle Seng Ah Lee and Jat Singh. 2021. The landscape and gaps in open source fairness toolkits. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–13.
    [64]
    Michelle Seng Ah Lee and Jatinder Singh. 2021. Risk Identification Questionnaire for Detecting Unintended Bias in the Machine Learning Development Lifecycle. Association for Computing Machinery, New York, NY, USA, 704–714. https://doi.org/10.1145/3461702.3462572
    [65]
    Lorelei A Lingard, Sherry Espin, Barbara Rubin, Sarah Whyte, Marcela Colmenares, G. Ross Baker, Diane Doran, Ethan D. Grober, Beverley A. Orser, John Bohnen, and Richard Reznick. 2005. Getting teams to talk: development and pilot implementation of a checklist to promote interprofessional communication in the OR. Quality and Safety in Health Care 14 (2005), 340 – 346.
    [66]
    Michael Madaio, Lisa Egede, Hariharan Subramonyam, Jennifer Wortman Vaughan, and Hanna Wallach. 2021. Assessing the Fairness of AI Systems: AI Practitioners’ Processes, Challenges, and Needs for Support. arXiv preprint arXiv:2112.05675(2021).
    [67]
    Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.
    [68]
    Wes McKinney 2011. pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing 14, 9 (2011), 1–9.
    [69]
    Jacob Metcalf, Emanuel Moss, 2019. Owning ethics: Corporate logics, silicon valley, and the institutionalization of ethics. Social Research: An International Quarterly 86, 2 (2019), 449–476.
    [70]
    Milagros Miceli, Julian Posada, and Tianling Yang. 2022. Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?Proceedings of the ACM on Human-Computer Interaction 6, GROUP(2022), 1–14.
    [71]
    Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between subjectivity and imposition: Power dynamics in data annotation for computer vision. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2(2020), 1–25.
    [72]
    Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220–229.
    [73]
    Shira Mitchell, Eric Potash, Solon Barocas, Alexander D’Amour, and Kristian Lum. 2018. Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions. arXiv preprint arXiv:1811.07867(2018).
    [74]
    Deirdre K Mulligan, Joshua A Kroll, Nitin Kohli, and Richmond Y Wong. 2019. This thing called fairness: disciplinary confusion realizing a value in technology. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–36.
    [75]
    Lauren Murphy, Mary Beth Kery, Oluwatosin Alliyu, Andrew Peter Macvean, and Brad A. Myers. 2018. API Designers in the Field: Design Practices and Challenges for Creating Usable APIs. 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2018), 249–258.
    [76]
    Brad A. Myers, Amy J. Ko, Thomas D. LaToza, and YoungSeok Yoon. 2016. Programmers Are Users Too: Human-Centered Methods for Improving Programming Tools. Computer 49(2016), 44–52.
    [77]
    Brad A. Myers and Jeffrey Stylos. 2016. Improving API usability. Commun. ACM 59(2016), 62 – 69.
    [78]
    Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 6464 (2019), 447–453.
    [79]
    Samir Passi and Solon Barocas. 2019. Problem formulation and fairness. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 39–48.
    [80]
    Samir Passi and Steven Jackson. 2017. Data vision: Learning to see through algorithmic abstraction. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. 2436–2447.
    [81]
    Samir Passi and Steven J Jackson. 2018. Trust in data science: Collaboration, translation, and accountability in corporate data science projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW(2018), 1–28.
    [82]
    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).
    [83]
    Amandalynne Paullada, Inioluwa Deborah Raji, Emily M Bender, Emily Denton, and Alex Hanna. 2021. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns 2, 11 (2021), 100336.
    [84]
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
    [85]
    Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Record 47, 2 (2018), 17–28.
    [86]
    R Core Team. 2017. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
    [87]
    Bogdana Rakova, Jingying Yang, Henriette Cramer, and Rumman Chowdhury. 2020. Where Responsible AI meets Reality: Practitioner Perspectives on Enablers for shifting Organizational Practices. arXiv preprint arXiv:2006.12358(2020).
    [88]
    Bogdana Rakova, Jingying Yang, Henriette Cramer, and Rumman Chowdhury. 2021. Where responsible AI meets reality: Practitioner perspectives on enablers for shifting organizational practices. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1(2021), 1–23.
    [89]
    Brianna Richardson, Jean Garcia-Gathright, Samuel F Way, Jennifer Thom, and Henriette Cramer. 2021. Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.
    [90]
    Niloufar Salehi, Lilly C Irani, Michael S Bernstein, Ali Alkhatib, Eva Ogbe, and Kristy Milland. 2015. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 1621–1630.
    [91]
    Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T Rodolfa, and Rayid Ghani. 2018. Aequitas: A bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577(2018).
    [92]
    Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vinodkumar Prabhakaran. 2021. Re-imagining algorithmic fairness in india and beyond. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 315–328.
    [93]
    Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Kumar Paritosh, and Lora Mois Aroyo. 2021. ”Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI.
    [94]
    Morgan Klaus Scheuerman, Alex Hanna, and Emily Denton. 2021. Do datasets have politics? Disciplinary values in computer vision dataset development. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2(2021), 1–37.
    [95]
    Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 59–68.
    [96]
    Hong Shen, Wesley H Deng, Aditi Chattopadhyay, Zhiwei Steven Wu, Xu Wang, and Haiyi Zhu. 2021. Value Cards: An Educational Toolkit for Teaching Social Impacts of Machine Learning through Deliberation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 850–861.
    [97]
    Hong Shen, Alicia DeVos, Motahhare Eslami, and Kenneth Holstein. 2021. Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors. Proc. ACM Hum.-Comput. Interact. 5, CSCW2, Article 433 (oct 2021), 29 pages. https://doi.org/10.1145/3479577
    [98]
    Korsuk Sirinukunwattana, Shan e Ahmed Raza, Yee-Wah Tsang, David R. J. Snead, Ian A. Cree, and Nasir M. Rajpoot. 2016. Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images. IEEE transactions on medical imaging 35 5 (2016), 1196–1206.
    [99]
    Megha Srivastava, Hoda Heidari, and Andreas Krause. 2019. Mathematical Notions vs. Human Perception of Fairness: A Descriptive Approach to Fairness for Machine Learning. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(2019).
    [100]
    Jeffrey Stylos and Brad A Myers. 2008. The implications of method placement on API learnability. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. 105–112.
    [101]
    Joshua Sushine, James D Herbsleb, and Jonathan Aldrich. 2015. Searching the state space: A qualitative study of API protocol usability. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 82–93.
    [102]
    Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2017. FairTest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 401–416.
    [103]
    Niels van Berkel, Jorge Goncalves, Daniel Russo, Simo Hosio, and Mikael B Skov. 2021. Effect of Information Presentation on Fairness Perceptions of Machine Learning Predictors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.
    [104]
    Guido Van Rossum and Fred L Drake Jr. 1995. Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam.
    [105]
    Maarten van Someren, Yvonne Barnard, and Jacobijn A. C. Sandberg. 1994. The think aloud method: a practical approach to modelling cognitive processes. Knowledge Based Systems(1994).
    [106]
    Neil Vigdor. 2019. Apple card investigated after gender discrimination complaints. The New York Times (2019).
    [107]
    Dakuo Wang, Q Vera Liao, Yunfeng Zhang, Udayan Khurana, Horst Samulowitz, Soya Park, Michael Muller, and Lisa Amini. 2021. How Much Automation Does a Data Scientist Want?arXiv preprint arXiv:2101.03970(2021).
    [108]
    Ruotong Wang, F Maxwell Harper, and Haiyi Zhu. 2020. Factors influencing perceived fairness in algorithmic decision-making: Algorithm outcomes, development procedures, and individual differences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.
    [109]
    James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viégas, and Jimbo Wilson. 2019. The what-if tool: Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics 26, 1(2019), 56–65.
    [110]
    Chamila Wijayarathna and Nalin AG Arachchilage. 2019. An empirical usability analysis of the google authentication api. In Proceedings of the Evaluation and Assessment on Software Engineering. 268–274.
    [111]
    Richmond Y Wong and Tonya Nguyen. 2021. Timelines: A World-Building Activity for Values Advocacy. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.
    [112]
    Bowen Yu, Ye Yuan, Loren G. Terveen, Zhiwei Steven Wu, Jodi Forlizzi, and Haiyi Zhu. 2020. Keeping Designers in the Loop: Communicating Inherent Algorithmic Trade-offs Across Multiple Objectives. Proceedings of the 2020 ACM Designing Interactive Systems Conference (2020).
    [113]
    Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles, Workflows, and Tools. Proc. ACM Hum.-Comput. Interact.CSCW (Oct. 2020).
    [114]
    Z Zhong. 2018. A Tutorial on Fairness in Machine Learning. https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb

    Cited By

    View all
    • (2024)"It's the most fair thing to do but it doesn't make any sense": Perceptions of Mathematical Fairness Notions by Hiring ProfessionalsProceedings of the ACM on Human-Computer Interaction10.1145/36373608:CSCW1(1-35)Online publication date: 26-Apr-2024
    • (2024)Interpretability Gone Bad: The Role of Bounded Rationality in How Practitioners Understand Machine LearningProceedings of the ACM on Human-Computer Interaction10.1145/36373548:CSCW1(1-34)Online publication date: 26-Apr-2024
    • (2024)Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent CircumventionProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659002(1733-1744)Online publication date: 3-Jun-2024
    • Show More Cited By

    Index Terms

    1. Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
        June 2022
        2351 pages
        ISBN:9781450393522
        DOI:10.1145/3531146
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 June 2022

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • Carnegie Mellon University Block Center for Technology and Society Award
        • Aviva and the UK Engineering and Physical Science Research Council
        • Jacob Foundation for CERES network
        • National Science Foundation

        Conference

        FAccT '22
        Sponsor:

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1,071
        • Downloads (Last 6 weeks)92

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)"It's the most fair thing to do but it doesn't make any sense": Perceptions of Mathematical Fairness Notions by Hiring ProfessionalsProceedings of the ACM on Human-Computer Interaction10.1145/36373608:CSCW1(1-35)Online publication date: 26-Apr-2024
        • (2024)Interpretability Gone Bad: The Role of Bounded Rationality in How Practitioners Understand Machine LearningProceedings of the ACM on Human-Computer Interaction10.1145/36373548:CSCW1(1-34)Online publication date: 26-Apr-2024
        • (2024)Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent CircumventionProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659002(1733-1744)Online publication date: 3-Jun-2024
        • (2024)Learning about Responsible AI On-The-Job: Learning Pathways, Orientations, and AspirationsProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658988(1544-1558)Online publication date: 3-Jun-2024
        • (2024)Impact Charts: A Tool for Identifying Systematic Bias in Social Systems and DataProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658965(1187-1198)Online publication date: 3-Jun-2024
        • (2024)The Fall of an Algorithm: Characterizing the Dynamics Toward AbandonmentProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658910(337-358)Online publication date: 3-Jun-2024
        • (2024)SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational NotebooksExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650848(1-17)Online publication date: 11-May-2024
        • (2024)Human-Centered Evaluation and Auditing of Language ModelsExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3636302(1-6)Online publication date: 11-May-2024
        • (2024)JupyterLab in Retrograde: Contextual Notifications That Highlight Fairness and Bias Issues for Data ScientistsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642755(1-19)Online publication date: 11-May-2024
        • (2024)Towards a Non-Ideal Methodological Framework for Responsible MLProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642501(1-17)Online publication date: 11-May-2024
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media