Surgical artificial intelligence (AI) is a nascent field with potential to improve patient safety and clinical outcomes. Current surgical AI models can identify surgical phases, critical events, and surgical anatomy1,2,3. Most of these models utilize supervised machine learning and require large amounts of annotated video data, typically by domain experts. Crowdsourcing, using layperson annotations to form consensus annotations, can scale and accelerate acquisition of high-quality training data4.
Crowdsourced annotations of surgical video, however, have historically relied on unsophisticated crowdsourcing methodologies and have been limited to annotations of simple rigid surgical instruments and other non-tissue structures. Models trained to segment laparoscopic surgical instruments performed equally well when trained on non-expert crowdsourced annotations as when trained on expert annotations5. However, annotations of deformable and mobile surgical tissues are believed to require domain expertise due to complexity and need for accurate contextual knowledge of surgical anatomy4. The acquisition of expert-annotated training data is cost-prohibitive, time consuming, and slows the development and deployment of surgical AI models for clinical benefit.
Here we describe an application of gamified, continuous-performance-monitored crowdsourcing to obtain annotated training data of surgical tissues used to train a soft tissue segmentation AI model. We validate this by training and deploying a highly accurate, real-time AI-assisted multimodal imaging platform to increase precision when assessing tissue perfusion which may help reduce complications such as anastomotic leak in bowel surgery6,7.
All video data, composed of 95 de-identified colorectal procedures for benign and malignant indications (IRB #OSU2021H0218), were included for model training (train dataset) and testing (test dataset) (Supplementary Table 1, Methods). Crowdsourced annotations of the train and test dataset were obtained using a gamified crowdsourcing platform utilizing continuous performance monitoring and performance-based incentivization (Fig. 1a, Methods)8. Five crowdsourcing parameters were controlled: testing score (TS), running score (RS), minimum crowdsource annotations (n), majority vote (MV), and review threshold (RT) (Fig. 1b, Methods).
Due to the impracticality of time constraints by experts to annotate the large train dataset (27,000 frames), a smaller test dataset (510 frames) was created. This dataset was annotated by crowdsourced workers, the models trained on crowdsourced worker annotations, and one of four surgical experts with surgical domain expertise (Methods). The test dataset was then used to compare the annotations from crowdsourced workers and the models trained from crowdsourced workers to expert annotations. These comparisons were done using standardized metrics of Intersection over Union (IoU) (Supplementary Fig. 1) and the harmonic mean of precision and recall (F1) (Methods, Supplementary Eq. (1)).
Bowel.CSS (bowel crowdsourced segmentation), was trained to segment bowel and abdominal wall using crowdsourced annotations of the train dataset. Additionally, a streamlined model was optimized for real-time segmentation of bowel and deployed as a part of an AI-assisted multimodal imaging platform (Methods).
We validate the use of non-expert crowdsourcing with the following primary endpoints:
1.
Expertise level of crowdsource workers.
2.
Expert hours saved.
3.
Accuracy of the crowdsource annotations to expert annotations.
4.
Accuracy of the Bowel.CSS model predictions to expert annotations.
Secondary endpoints were:
1.
Difficulty level of the crowdsourced annotations in the train and test datasets.
2.
Accuracy of real-time predictions of the deployed Bowel.CSS model to expert annotations.
Train dataset was annotated by 206 crowdsourced workers (CSW) giving 250,000 individual annotations and 54,000 consensus annotations of bowel and abdominal wall. 3% (7/206) of CSW identified as MDs, and 1% (2/206) identified as surgical MDs. Test dataset was annotated by 48 CSW giving 5100 individual annotations and 1020 consensus annotations. 4% (2/48) of CSW identified as MDs, and 0% as surgical MDs (Fig. 1c, e, Supplementary Table 1, Methods).
These demographics indicate non-domain expertise of the CSW. Although demographic data is self-reported and not available for every CSW, the platform reports that the majority of the active CSW are health science students (59.7%) looking to improve their clinical skills (57.3%) (Supplementary Table 2).
On average, an expert spent 120.3 s annotating a frame for bowel and abdominal wall in the test dataset. This extrapolates to an estimated 902 expert hours saved during the annotation of the train dataset by utilizing crowdsourcing methodology, and an estimated 17 expert hours saved in the test dataset (if expert annotations of the test dataset weren’t required for this study). Assuming each of the four expert annotators annotated one hour per day, this estimates to 120 frames annotated per day. In contrast, CSW annotated an average of 774 frames per day in the train dataset (Fig. 1d, Methods).
The difficulty of crowdsourced annotations was measured by Difficulty Index (DI) (Methods). The median difficulty of the crowdsourced annotations was 0.09 DI for bowel and 0.12 DI for abdominal wall in the train dataset, and 0.18 DI for bowel and 0.26 DI for abdominal wall in the test dataset, indicating a robust spectrum of task difficulty across the frame populations. (Fig. 1f, Methods).
Compared to expert annotations of bowel and abdominal wall within the test dataset, crowdsource workers and Bowel.CSS were highly accurate; F1 values of 0.86 ± 0.20 for bowel and 0.79 ± 0.26 for abdominal wall for crowdsource workers and 0.89 ± 0.16 and 0.78 ± 0.28 for bowel and abdominal wall for Bowel.CSS (Fig. 2a, b).
A streamlined version of Bowel.CSS optimized for real-time bowel segmentation was deployed in real-time to provide AI-assisted display of multimodal imaging and provided highly accurate segmentation of bowel tissue compared to expert annotation. This allowed surgeons to visualize physiologic perfusion the colon and rectum that is normally invisible to human eye (Fig. 2c, d, Supplementary Table 3, Methods).
Herein, we report the first complete and adaptable methodology to obtain highly accurate segmentations of surgical tissues using non-expert crowdsourcing. We outline five crowdsourcing parameters; TS, RS, n, MV, and RT which could be adjusted to fit a variety of segmentations depending on task difficulty and applications. We validated this methodology by showing the crowdsourced annotations can be used to train a highly accurate surgical tissue segmentation model, while greatly accelerating the speed of development by eliminating over 900 expert annotation hours. This study is limited by lack of source video diversity as all videos came from colorectal procedures at a single institution, and thus performance may suffer when applied to other video datasets. Another limitation is the inability to train segmentation models using both crowdsourced and expert annotations due to the inability to source expert annotations for 27,000 video frames in the train dataset due to expert time constraints. However, the crowdsource annotations and the crowdsource trained model predictions were shown highly accurate to expert annotations, and the inability to secure high volume of expert annotations demonstrates the need for crowdsourcing.
While we demonstrated that crowdsourcing is viable when scaling these surgical tissue annotations, further work should be done to determine the limitations of this methodology when applied to increasingly complex anatomical structures. While we showed that the deployed AI model accurately segmented bowel as a part of an AI-assisted multimodal imaging platform, future work should be done to investigate clinical outcomes with the use this technology. This accelerated model development using crowdsource annotations will further enable additional applications of AI-assisted multimodal imaging data for enhanced real-time clinical decision support for safer surgery and improved outcomes.
Methods
This study was approved by The Ohio State University Institutional Review Board (IRB #OSU2021H0218). All patients provided written informed consent.
Video source and frame sampling
Surgical videos were obtained from a prospective clinical trial evaluating the utility of real-time laser speckle contrast imaging for perfusion assessment in colorectal surgery (IRB #OSU2021H0218). In the source material for the train dataset, video clips were not prefiltered, and frames were extracted at a regular interval (1 frame per second and 1 frame per 30 seconds) to create a diverse set of training data and eliminate frame selection bias. For the test dataset, clips were extracted when the surgeon was assessing perfusion of the colon. Frames were extracted at 1 frame per second to minimize frame selection bias. The final video and frame counts are represented in Supplementary Table 1.
Crowdsourced annotations
Crowdsourced annotations of bowel and abdominal wall were obtained using a gamified crowdsourcing platform (Centaur Labs, Boston MA) utilizing continuous performance monitoring and performance-based incentivization8. This methodology differs from standard crowdsourcing platforms such as Amazon’s Mechanical Turk, which don’t allow for such continuous performance monitoring and incentivization9. Previous implementations of crowdsourcing annotations in surgical computer vision have typically only utilized the majority vote crowdsourcing parameter5.
Annotation instructions were developed utilizing as little specialized surgical knowledge as possible while following surgical data science best practices10. Crowdsourced annotation instructions given to the crowdsourced workers (CSW) included 13 training steps for each task with 11 and 14 example annotations of abdominal wall and bowel, respectively (Fig. 1a). Four experts (two senior surgical trainees and two trained surgeons) provided expert annotations used to calculate training (TS) and running (RS) scores. In our study, CSW were required to achieve a minimum training score (TS) as measured by intersection-over-union (IoU) with 10 expert annotations prior to performing any annotations. A running score (RS) was calculated by intermittently testing the CSW in the same fashion. Annotations from CSW with a sufficient TS and RS were used in consensus generation. A minimum of 5 annotations (n) were required to generate the consensus crowdsourced annotation using the majority vote parameter (MV) to only include pixels annotated by 4 or more, and 2 or more annotations for bowel and abdominal wall respectively. Difficulty index (DI) was calculated for each frame using IoU with values between 0 and 1, higher indicating increasing difficulty (Supplementary Eq. (2), Methods). Quality assurance (QA) was performed by experts (two surgical trainees) on randomly selected frames above the difficulty review threshold (RT) of 0.4 difficulty index (Fig. 1b).
SegFormer B3 framework and model training
SegFormer is a semantic segmentation framework developed in partnership with NVIDIA and Caltech. It was selected for the real-time implementation for powerful and yet efficient semantic segmentation capabilities accomplished by unifying transformers with lightweight multilayer perception decoders11.
Using the SegFormer B3 framework, we trained two versions of Bowel.CSS. Bowel.CSS was trained on the entire crowdsource-annotated 27,000 frame dataset (78 surgical videos). A second model, Bowel.CSS-deployed, was trained on a subset of the train dataset (3500 frames from 11 surgical videos) and optimized for real-time segmentation of bowel. This model was deployed in real-time as a part of an AI-assisted multimodal imaging platform (Methods).
Train and test dataset crowdsourced annotations and demographics
Train dataset frames (n = 27,000) were annotated by 206 CSW giving 250,000 individual annotations and 54,000 consensus annotations of bowel and abdominal wall. 3% (7/206) of CSW identified as MDs, and 1% (2/206) identified as surgical MDs. Test dataset frames (n = 510) were annotated by 48 CSW giving 5100 individual annotations and 1020 consensus annotations. 4% (2/48) of CSW identified as MDs, and 0% as surgical MDs (Fig. 1c, e, Supplementary Table 1, Methods).
To further characterize “unknown” CSW demographics in the crowdsource user population in this study, Supplementary Table 3 presents CSW demographics for the entire annotation platform in the year 2022. It shows the majority (59.7%) were health science students, and the majority listed the reason for participating in crowdsource annotations as “to improve my skills” (57.3%). This supports the conclusion that most users on this platform are non-physicians and are not full-time annotators.
Crowdsource vs expert hours saved
A primary goal of the use of crowdsourced annotations is to mitigate the rate-limiting and expensive time of experts. The average time for the three domain experts to complete a frame annotation for bowel and abdominal wall was 120.3 s in test dataset. Using the average time to annotate, and the frame totals of 27,000 and 510, crowdsourcing saved an estimated 902 expert hours in the train dataset, and 17 in the test dataset (if experts would have not been required to annotate the test dataset for this study).
Annotation comparison statistics
The pixel-level agreement of both crowdsourced and Bowel.CSS annotations were compared to expert annotation using accuracy, sensitivity, specificity, IoU and F1 scores (Supplementary Fig. 1, Supplementary Eq. (1)). These metrics are accepted measurements of accuracy of segmentation annotations in computer vision and surgical data science12.
Difficulty index
Difficulty of the annotation task was measured per frame using a difficulty index (DI) defined in Supplementary Equation 2 which utilizes the average inter-annotator agreement of the individual CSW annotations to the crowdsourced consensus annotation as measured by IoU. This index is supported by evidence that lower inter-annotator agreement has shown to be an indicator of higher annotation difficulty when other factors such domain expertise, annotation expertise, instructions, platform and source material are constant13,14. DI values range from 0 (100% inter-annotator agreement) to 1 (0% inter-annotator agreement). Values closer to 0 indicate easier frames, especially when the annotation target is not visible and the annotation of “no finding” is used since annotations of “no finding” are in 100% agreement. Values closer to 1 indicate harder frames where there is less agreement amongst the CSWs.
The DI of bowel was 0.09 and 0.12 for abdominal wall in the train dataset and was lower than the DI of 0.18 for bowel and 0.12 for abdominal wall in the test dataset. The train dataset included full surgical videos versus the test dataset, which included only clips of surgeons assessing perfusion of the bowel, leading to an increased proportion of “no finding” annotation of bowel (22%) and abdominal wall (32%) in train dataset versus 2.4% and 11% for bowel and abdominal wall in the test dataset. The “no finding” annotations have low difficulty indices leading to the lower median difficulty of the train dataset.
Real-time deployment of near infrared artificial intelligence
Advanced near infrared physiologic imaging like indocyanine green fluorescence angiography and laser speckle contrast imaging show levels of tissue perfusion beyond what is visible in standard white light imaging. These technologies are used in colorectal resections to ensure adequate perfusion of the colon and rectum during reconstruction to reduce complications and improve patient outcomes. Subjectively interpreting physiologic imaging can be challenging and is dependent on user experience.
Bowel.CSS was developed to mask the physiologic imaging data to only those tissues relevant to the surgeon during colorectal resection and reconstruction to assist with interpretation of the visual signal. The output of this model was the bowel label only and it was deployed in real-time on a modified research unit of a commercially available advanced physiologic imaging platform for laparoscopic, robotic, and open surgery.
Bowel.CSS-deployed successfully segmented the bowel in real-time during 2 colorectal procedures at 10 frames per second. The intraoperative labels were not saved from the procedures, so to evaluate the intraoperative performance of the model, 10 s clips from each procedure were sampled at 1 FPS (20 frames total) from when the surgeon activated the intraoperative AI model. To assess for accuracy, the model outputs of Bowel.CSS and Bowel.CSS-deployed were compared to annotations by one of three surgical experts (1 trainee and 2 board-certified surgeons). Model outputs were compared to the expert annotations in these 20 frames using standard computer vision metrics. (Supplementary Table 3).
Madani, A. et al. Artificial Intelligence for Intraoperative Guidance: Using Semantic Segmentation to Identify Surgical Anatomy During Laparoscopic Cholecystectomy. Ann. Surg.276, 363–369 (2022).
Mascagni, P. et al. A Computer Vision Platform to Automatically Locate Critical Events in Surgical Videos: Documenting Safety in Laparoscopic Cholecystectomy. Ann. Surg.274, e93–e95 (2021).
Hashimoto, D. A. et al. Computer Vision Analysis of Intraoperative Video: Automated Recognition of Operative Steps in Laparoscopic Sleeve Gastrectomy. Ann. Surg.270, 414 (2019).
Maier-Hein, L. et al. Can Masses of Non-Experts Train Highly Accurate Image Classifiers?: A Crowdsourcing Approach to Instrument Segmentation in Laparoscopic Images. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014 (eds. Golland, P., Hata, N., Barillot, C., Hornegger, J. & Howe, R.) 8674 438–445 (Springer International Publishing, Cham, 2014).
Vignali, A. et al. Altered microperfusion at the rectal stump is predictive for rectal anastomotic leak. Dis. Colon Rectum.43, 76–82 (2000).
Skinner, G. et al. Clinical Utility of Laser Speckle Contrast Imaging (LSCI) Compared to Indocyanine Green (ICG) and Quantification of Bowel Perfusion in Minimally Invasive, Left-Sided Colorectal Resections. Dis. Colon. Rectum (In press).
Van Gaalen, A. E. J. et al. Gamification of health professions education: a systematic review. Adv. Health Sci. Educ.26, 683–711 (2021).
Bhattacherjee, A. & Fitzgerald, B. Shaping the Future of ICT Research: Methods and Approaches. In IFIP WG 8.2 Working Conference, Tampa, FL, USA, Proceedings. (Springer, Heidelberg New York, 2012).
Rädsch, T. et al. Labelling instructions matter in biomedical image analysis. Nat. Mach. Intell.5, 273–283 (2023).
Kentley, J. et al. Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study. JMIR Med. Inf.11, e38412 (2023).
Ribeiro, V., Avila, S. & Valle, E. Handling Inter-Annotator Agreement for Automated Skin Lesion Segmentation. Preprint at http://arxiv.org/abs/1906.02415 (2019).
This study was funded by Activ Surgical, Inc. (Boston, MA). We extend our gratitude to the hardworking research team at The Ohio State University Wexner Medical Center for collection and generation of the de-identified video database used in this study.
Author information
Authors and Affiliations
Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
Garrett Skinner & Peter Kim
Activ Surgical, University at Buffalo, Buffalo, NY, USA
Garrett Skinner, Tina Chen, Gabriel Jentis, Yao Liu, Christopher McCulloh & Peter Kim
Warren Alpert Medical School Alpert Medical School of Brown University, Providence, RI, USA
Yao Liu
The Ohio State University Wexner Medical Center, Columbus, OH, USA
T.C., G.S., and P.K. were responsible for designing and training the initial artificial intelligence models. T.C. and G.S. were responsible for obtaining crowdsourced annotations. G.J, T.C., C.M. and G.S. performed data analysis on crowdsourcing demographics and comparisons to expert annotators. G.S., C.M., Y.L, and P.K. provided expert annotations. A.H., M.K., and E.H., provided design considerations and clinical feedback during training and deployment of artificial intelligence models. P.K. supervised this work. All authors contributed to manuscript preparation, critical revisions, and have read and approved the manuscript.
This study was funded by Activ Surgical Inc., Boston, MA. Current or previous consultants for Activ Surgical Inc.: G.S., A.H., M.K. Current or previous employment by Activ Surgical Inc.: T.C., G.J., C.M., Y.L. Founder/Ownership of Activ Surgical Inc.: P.K. No competing interests: E.H.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
"title": "Real-time near infrared artificial intelligence using scalable non-expert crowdsourcing in colorectal surgery - npj Digital Medicine",
"description": "Surgical artificial intelligence (AI) is a nascent field with potential to improve patient safety and clinical outcomes. Current surgical AI models can identify surgical phases, critical events, and surgical...",
"content": "<div>\n <p>Surgical artificial intelligence (AI) is a nascent field with potential to improve patient safety and clinical outcomes. Current surgical AI models can identify surgical phases, critical events, and surgical anatomy<sup><a target=\"_blank\" title=\"Madani, A. et al. Artificial Intelligence for Intraoperative Guidance: Using Semantic Segmentation to Identify Surgical Anatomy During Laparoscopic Cholecystectomy. Ann. Surg. 276, 363–369 (2022).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR1\">1</a>,<a target=\"_blank\" title=\"Mascagni, P. et al. A Computer Vision Platform to Automatically Locate Critical Events in Surgical Videos: Documenting Safety in Laparoscopic Cholecystectomy. Ann. Surg. 274, e93–e95 (2021).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR2\">2</a>,<a target=\"_blank\" title=\"Hashimoto, D. A. et al. Computer Vision Analysis of Intraoperative Video: Automated Recognition of Operative Steps in Laparoscopic Sleeve Gastrectomy. Ann. Surg. 270, 414 (2019).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR3\">3</a></sup>. Most of these models utilize supervised machine learning and require large amounts of annotated video data, typically by domain experts. Crowdsourcing, using layperson annotations to form consensus annotations, can scale and accelerate acquisition of high-quality training data<sup><a target=\"_blank\" title=\"Ward, T. M. et al. Challenges in surgical video annotation. Comput. Assist. Surg. 26, 58–68 (2021).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR4\">4</a></sup>.</p><p>Crowdsourced annotations of surgical video, however, have historically relied on unsophisticated crowdsourcing methodologies and have been limited to annotations of simple rigid surgical instruments and other non-tissue structures. Models trained to segment laparoscopic surgical instruments performed equally well when trained on non-expert crowdsourced annotations as when trained on expert annotations<sup><a target=\"_blank\" title=\"Maier-Hein, L. et al. Can Masses of Non-Experts Train Highly Accurate Image Classifiers?: A Crowdsourcing Approach to Instrument Segmentation in Laparoscopic Images. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014 (eds. Golland, P., Hata, N., Barillot, C., Hornegger, J. & Howe, R.) 8674 438–445 (Springer International Publishing, Cham, 2014).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR5\">5</a></sup>. However, annotations of deformable and mobile surgical tissues are believed to require domain expertise due to complexity and need for accurate contextual knowledge of surgical anatomy<sup><a target=\"_blank\" title=\"Ward, T. M. et al. Challenges in surgical video annotation. Comput. Assist. Surg. 26, 58–68 (2021).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR4\">4</a></sup>. The acquisition of expert-annotated training data is cost-prohibitive, time consuming, and slows the development and deployment of surgical AI models for clinical benefit.</p><p>Here we describe an application of gamified, continuous-performance-monitored crowdsourcing to obtain annotated training data of surgical tissues used to train a soft tissue segmentation AI model. We validate this by training and deploying a highly accurate, real-time AI-assisted multimodal imaging platform to increase precision when assessing tissue perfusion which may help reduce complications such as anastomotic leak in bowel surgery<sup><a target=\"_blank\" title=\"Vignali, A. et al. Altered microperfusion at the rectal stump is predictive for rectal anastomotic leak. Dis. Colon Rectum. 43, 76–82 (2000).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR6\">6</a>,<a target=\"_blank\" title=\"Skinner, G. et al. Clinical Utility of Laser Speckle Contrast Imaging (LSCI) Compared to Indocyanine Green (ICG) and Quantification of Bowel Perfusion in Minimally Invasive, Left-Sided Colorectal Resections. Dis. Colon. Rectum (In press).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR7\">7</a></sup>.</p><div><p>All video data, composed of 95 de-identified colorectal procedures for benign and malignant indications (IRB #OSU2021H0218), were included for model training (train dataset) and testing (test dataset) (Supplementary Table <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">1</a>, Methods). Crowdsourced annotations of the train and test dataset were obtained using a gamified crowdsourcing platform utilizing continuous performance monitoring and performance-based incentivization (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig1\">1a</a>, Methods)<sup><a target=\"_blank\" title=\"Van Gaalen, A. E. J. et al. Gamification of health professions education: a systematic review. Adv. Health Sci. Educ. 26, 683–711 (2021).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR8\">8</a></sup>. Five crowdsourcing parameters were controlled: testing score (TS), running score (RS), minimum crowdsource annotations (n), majority vote (MV), and review threshold (RT) (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig1\">1b</a>, Methods).</p><div><figure><figcaption><b>Fig. 1: Gamified crowdsourcing methodology and expert time savings.</b></figcaption><div><div><a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8/figures/1\"><picture><source type=\"image/webp\" srcset=\"https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41746-024-01095-8/MediaObjects/41746_2024_1095_Fig1_HTML.png?as=webp\"></source><img src=\"https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41746-024-01095-8/MediaObjects/41746_2024_1095_Fig1_HTML.png\" alt=\"figure 1\" /></picture></a></div><p><b>a</b> Screenshot images of annotation instructions (Centaur Labs, Boston MA) for bowel and abdominal wall. <b>b</b> Crowdsource annotation parameters values used for bowel and abdominal wall tasks. For test and train datasets: <b>c</b> Number of videos and frames. <b>d</b> Estimated expert hours saved by utilizing crowdsourcing. <b>e</b> Crowdsource worker demographics indicating percentage of non-MD/unknown (black), MD (green), and surgical MD (red). <b>f</b> Difficulty level (difficulty index) of bowel and abdominal wall (wall) annotations with median values (green dashed line).</p></div><p><a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8/figures/1\"><span>Full size image</span></a></p></figure></div></div><p>Due to the impracticality of time constraints by experts to annotate the large train dataset (27,000 frames), a smaller test dataset (510 frames) was created. This dataset was annotated by crowdsourced workers, the models trained on crowdsourced worker annotations, and one of four surgical experts with surgical domain expertise (Methods). The test dataset was then used to compare the annotations from crowdsourced workers and the models trained from crowdsourced workers to expert annotations. These comparisons were done using standardized metrics of Intersection over Union (IoU) (Supplementary Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">1</a>) and the harmonic mean of precision and recall (F1) (Methods, Supplementary Eq. (<a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">1</a>)).</p><p><i>Bowel.CSS</i> (bowel crowdsourced segmentation), was trained to segment bowel and abdominal wall using crowdsourced annotations of the train dataset. Additionally, a streamlined model was optimized for real-time segmentation of bowel and deployed as a part of an AI-assisted multimodal imaging platform (Methods).</p><div><p>We validate the use of non-expert crowdsourcing with the following primary endpoints:</p><ol>\n <li>\n <span>1.</span>\n <p>Expertise level of crowdsource workers.</p>\n </li>\n <li>\n <span>2.</span>\n <p>Expert hours saved.</p>\n </li>\n <li>\n <span>3.</span>\n <p>Accuracy of the crowdsource annotations to expert annotations.</p>\n </li>\n <li>\n <span>4.</span>\n <p>Accuracy of the <i>Bowel.CSS</i> model predictions to expert annotations.</p>\n </li>\n </ol></div><div><p>Secondary endpoints were:</p><ol>\n <li>\n <span>1.</span>\n <p>Difficulty level of the crowdsourced annotations in the train and test datasets.</p>\n </li>\n <li>\n <span>2.</span>\n <p>Accuracy of real-time predictions of the deployed <i>Bowel.CSS</i> model to expert annotations.</p>\n </li>\n </ol></div><p>Train dataset was annotated by 206 crowdsourced workers (CSW) giving 250,000 individual annotations and 54,000 consensus annotations of bowel and abdominal wall. 3% (7/206) of CSW identified as MDs, and 1% (2/206) identified as surgical MDs. Test dataset was annotated by 48 CSW giving 5100 individual annotations and 1020 consensus annotations. 4% (2/48) of CSW identified as MDs, and 0% as surgical MDs (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig1\">1c, e</a>, Supplementary Table <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">1</a>, Methods).</p><p>These demographics indicate non-domain expertise of the CSW. Although demographic data is self-reported and not available for every CSW, the platform reports that the majority of the active CSW are health science students (59.7%) looking to improve their clinical skills (57.3%) (Supplementary Table <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">2</a>).</p><p>On average, an expert spent 120.3 s annotating a frame for bowel and abdominal wall in the test dataset. This extrapolates to an estimated 902 expert hours saved during the annotation of the train dataset by utilizing crowdsourcing methodology, and an estimated 17 expert hours saved in the test dataset (if expert annotations of the test dataset weren’t required for this study). Assuming each of the four expert annotators annotated one hour per day, this estimates to 120 frames annotated per day. In contrast, CSW annotated an average of 774 frames per day in the train dataset (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig1\">1d</a>, Methods).</p><p>The difficulty of crowdsourced annotations was measured by Difficulty Index (DI) (Methods). The median difficulty of the crowdsourced annotations was 0.09 DI for bowel and 0.12 DI for abdominal wall in the train dataset, and 0.18 DI for bowel and 0.26 DI for abdominal wall in the test dataset, indicating a robust spectrum of task difficulty across the frame populations. (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig1\">1f</a>, Methods).</p><div><p>Compared to expert annotations of bowel and abdominal wall within the test dataset, crowdsource workers and <i>Bowel.CSS</i> were highly accurate; F1 values of 0.86 ± 0.20 for bowel and 0.79 ± 0.26 for abdominal wall for crowdsource workers and 0.89 ± 0.16 and 0.78 ± 0.28 for bowel and abdominal wall for <i>Bowel.CSS</i> (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig2\">2a, b</a>).</p><div><figure><figcaption><b>Fig. 2: Evaluation of crowdsource and model anatomy segmentations and deployment of near-infrared artificial intelligence system.</b></figcaption><div><div><a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8/figures/2\"><picture><source type=\"image/webp\" srcset=\"https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41746-024-01095-8/MediaObjects/41746_2024_1095_Fig2_HTML.png?as=webp\"></source><img src=\"https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41746-024-01095-8/MediaObjects/41746_2024_1095_Fig2_HTML.png\" alt=\"figure 2\" /></picture></a></div><p><b>a</b> Crowdsourced annotations and <i>Bowel.CSS</i> predictions of bowel and abdominal wall compared to expert annotations in the test dataset. <sup>a</sup>IoU intersection over union, <sup>b</sup>F1 dice similarity coefficient. <b>b</b> Representative frames comparing crowdsourced annotations and <i>Bowel.CSS</i> predictions to expert annotations with corresponding difficulty index. <b>c</b> Schematic representing intraoperative deployment of real-time artificial intelligence. <b>d</b> Example of deployed version of <i>Bowel.CSS</i> incorporated into real-time artificial intelligence assisted multimodal imaging utilizing laser speckle contrast imaging to allow visualization of physiologic information beyond human vision.</p></div><p><a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8/figures/2\"><span>Full size image</span></a></p></figure></div></div><p>A streamlined version of <i>Bowel.CSS</i> optimized for real-time bowel segmentation was deployed in real-time to provide AI-assisted display of multimodal imaging and provided highly accurate segmentation of bowel tissue compared to expert annotation. This allowed surgeons to visualize physiologic perfusion the colon and rectum that is normally invisible to human eye (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig2\">2c, d</a>, Supplementary Table <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">3</a>, Methods).</p><p>Herein, we report the first complete and adaptable methodology to obtain highly accurate segmentations of surgical tissues using non-expert crowdsourcing. We outline five crowdsourcing parameters; TS, RS, n, MV, and RT which could be adjusted to fit a variety of segmentations depending on task difficulty and applications. We validated this methodology by showing the crowdsourced annotations can be used to train a highly accurate surgical tissue segmentation model, while greatly accelerating the speed of development by eliminating over 900 expert annotation hours. This study is limited by lack of source video diversity as all videos came from colorectal procedures at a single institution, and thus performance may suffer when applied to other video datasets. Another limitation is the inability to train segmentation models using both crowdsourced and expert annotations due to the inability to source expert annotations for 27,000 video frames in the train dataset due to expert time constraints. However, the crowdsource annotations and the crowdsource trained model predictions were shown highly accurate to expert annotations, and the inability to secure high volume of expert annotations demonstrates the need for crowdsourcing.</p><p>While we demonstrated that crowdsourcing is viable when scaling these surgical tissue annotations, further work should be done to determine the limitations of this methodology when applied to increasingly complex anatomical structures. While we showed that the deployed AI model accurately segmented bowel as a part of an AI-assisted multimodal imaging platform, future work should be done to investigate clinical outcomes with the use this technology. This accelerated model development using crowdsource annotations will further enable additional applications of AI-assisted multimodal imaging data for enhanced real-time clinical decision support for safer surgery and improved outcomes.</p><div><h2 id=\"Sec1\">Methods</h2><div><p>This study was approved by The Ohio State University Institutional Review Board (IRB #OSU2021H0218). All patients provided written informed consent.</p><h3 id=\"Sec2\">Video source and frame sampling</h3><p>Surgical videos were obtained from a prospective clinical trial evaluating the utility of real-time laser speckle contrast imaging for perfusion assessment in colorectal surgery (IRB #OSU2021H0218). In the source material for the train dataset, video clips were not prefiltered, and frames were extracted at a regular interval (1 frame per second and 1 frame per 30 seconds) to create a diverse set of training data and eliminate frame selection bias. For the test dataset, clips were extracted when the surgeon was assessing perfusion of the colon. Frames were extracted at 1 frame per second to minimize frame selection bias. The final video and frame counts are represented in Supplementary Table <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">1</a>.</p><h3 id=\"Sec3\">Crowdsourced annotations</h3><p>Crowdsourced annotations of bowel and abdominal wall were obtained using a gamified crowdsourcing platform (Centaur Labs, Boston MA) utilizing continuous performance monitoring and performance-based incentivization<sup><a target=\"_blank\" title=\"Van Gaalen, A. E. J. et al. Gamification of health professions education: a systematic review. Adv. Health Sci. Educ. 26, 683–711 (2021).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR8\">8</a></sup>. This methodology differs from standard crowdsourcing platforms such as Amazon’s Mechanical Turk, which don’t allow for such continuous performance monitoring and incentivization<sup><a target=\"_blank\" title=\"Bhattacherjee, A. & Fitzgerald, B. Shaping the Future of ICT Research: Methods and Approaches. In IFIP WG 8.2 Working Conference, Tampa, FL, USA, Proceedings. (Springer, Heidelberg New York, 2012).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR9\">9</a></sup>. Previous implementations of crowdsourcing annotations in surgical computer vision have typically only utilized the majority vote crowdsourcing parameter<sup><a target=\"_blank\" title=\"Maier-Hein, L. et al. Can Masses of Non-Experts Train Highly Accurate Image Classifiers?: A Crowdsourcing Approach to Instrument Segmentation in Laparoscopic Images. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014 (eds. Golland, P., Hata, N., Barillot, C., Hornegger, J. & Howe, R.) 8674 438–445 (Springer International Publishing, Cham, 2014).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR5\">5</a></sup>.</p><p>Annotation instructions were developed utilizing as little specialized surgical knowledge as possible while following surgical data science best practices<sup><a target=\"_blank\" title=\"Rädsch, T. et al. Labelling instructions matter in biomedical image analysis. Nat. Mach. Intell. 5, 273–283 (2023).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR10\">10</a></sup>. Crowdsourced annotation instructions given to the crowdsourced workers (CSW) included 13 training steps for each task with 11 and 14 example annotations of abdominal wall and bowel, respectively (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig1\">1a</a>). Four experts (two senior surgical trainees and two trained surgeons) provided expert annotations used to calculate training (TS) and running (RS) scores. In our study, CSW were required to achieve a minimum training score (TS) as measured by intersection-over-union (IoU) with 10 expert annotations prior to performing any annotations. A running score (RS) was calculated by intermittently testing the CSW in the same fashion. Annotations from CSW with a sufficient TS and RS were used in consensus generation. A minimum of 5 annotations (n) were required to generate the consensus crowdsourced annotation using the majority vote parameter (MV) to only include pixels annotated by 4 or more, and 2 or more annotations for bowel and abdominal wall respectively. Difficulty index (DI) was calculated for each frame using IoU with values between 0 and 1, higher indicating increasing difficulty (Supplementary Eq. (<a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">2</a>), Methods). Quality assurance (QA) was performed by experts (two surgical trainees) on randomly selected frames above the difficulty review threshold (RT) of 0.4 difficulty index (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig1\">1b</a>).</p><h3 id=\"Sec4\">SegFormer B3 framework and model training</h3><p>SegFormer is a semantic segmentation framework developed in partnership with NVIDIA and Caltech. It was selected for the real-time implementation for powerful and yet efficient semantic segmentation capabilities accomplished by unifying transformers with lightweight multilayer perception decoders<sup><a target=\"_blank\" title=\"Xie, E. et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Preprint at \n http://arxiv.org/abs/2105.15203\n (2021).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR11\">11</a></sup>.</p><p>Using the SegFormer B3 framework, we trained two versions of <i>Bowel.CSS</i>. <i>Bowel.CSS</i> was trained on the entire crowdsource-annotated 27,000 frame dataset (78 surgical videos). A second model, <i>Bowel.CSS-deployed</i>, was trained on a subset of the train dataset (3500 frames from 11 surgical videos) and optimized for real-time segmentation of bowel. This model was deployed in real-time as a part of an AI-assisted multimodal imaging platform (Methods).</p><h3 id=\"Sec5\">Train and test dataset crowdsourced annotations and demographics</h3><p>Train dataset frames (<i>n</i> = 27,000) were annotated by 206 CSW giving 250,000 individual annotations and 54,000 consensus annotations of bowel and abdominal wall. 3% (7/206) of CSW identified as MDs, and 1% (2/206) identified as surgical MDs. Test dataset frames (<i>n</i> = 510) were annotated by 48 CSW giving 5100 individual annotations and 1020 consensus annotations. 4% (2/48) of CSW identified as MDs, and 0% as surgical MDs (Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#Fig1\">1c, e</a>, Supplementary Table <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">1</a>, Methods).</p><p>To further characterize “unknown” CSW demographics in the crowdsource user population in this study, Supplementary Table <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">3</a> presents CSW demographics for the entire annotation platform in the year 2022. It shows the majority (59.7%) were health science students, and the majority listed the reason for participating in crowdsource annotations as “to improve my skills” (57.3%). This supports the conclusion that most users on this platform are non-physicians and are not full-time annotators.</p><h3 id=\"Sec6\">Crowdsource vs expert hours saved</h3><p>A primary goal of the use of crowdsourced annotations is to mitigate the rate-limiting and expensive time of experts. The average time for the three domain experts to complete a frame annotation for bowel and abdominal wall was 120.3 s in test dataset. Using the average time to annotate, and the frame totals of 27,000 and 510, crowdsourcing saved an estimated 902 expert hours in the train dataset, and 17 in the test dataset (if experts would have not been required to annotate the test dataset for this study).</p><h3 id=\"Sec7\">Annotation comparison statistics</h3><p>The pixel-level agreement of both crowdsourced and <i>Bowel.CSS</i> annotations were compared to expert annotation using accuracy, sensitivity, specificity, IoU and F1 scores (Supplementary Fig. <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">1</a>, Supplementary Eq. (<a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">1</a>)). These metrics are accepted measurements of accuracy of segmentation annotations in computer vision and surgical data science<sup><a target=\"_blank\" title=\"Hicks, S. A. et al. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 12, 5979 (2022).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR12\">12</a></sup>.</p><h3 id=\"Sec8\">Difficulty index</h3><p>Difficulty of the annotation task was measured per frame using a difficulty index (DI) defined in Supplementary Equation <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">2</a> which utilizes the average inter-annotator agreement of the individual CSW annotations to the crowdsourced consensus annotation as measured by IoU. This index is supported by evidence that lower inter-annotator agreement has shown to be an indicator of higher annotation difficulty when other factors such domain expertise, annotation expertise, instructions, platform and source material are constant<sup><a target=\"_blank\" title=\"Kentley, J. et al. Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study. JMIR Med. Inf. 11, e38412 (2023).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR13\">13</a>,<a target=\"_blank\" title=\"Ribeiro, V., Avila, S. & Valle, E. Handling Inter-Annotator Agreement for Automated Skin Lesion Segmentation. Preprint at \n http://arxiv.org/abs/1906.02415\n (2019).\" href=\"https://www.nature.com/articles/s41746-024-01095-8#ref-CR14\">14</a></sup>. DI values range from 0 (100% inter-annotator agreement) to 1 (0% inter-annotator agreement). Values closer to 0 indicate easier frames, especially when the annotation target is not visible and the annotation of “no finding” is used since annotations of “no finding” are in 100% agreement. Values closer to 1 indicate harder frames where there is less agreement amongst the CSWs.</p><p>The DI of bowel was 0.09 and 0.12 for abdominal wall in the train dataset and was lower than the DI of 0.18 for bowel and 0.12 for abdominal wall in the test dataset. The train dataset included full surgical videos versus the test dataset, which included only clips of surgeons assessing perfusion of the bowel, leading to an increased proportion of “no finding” annotation of bowel (22%) and abdominal wall (32%) in train dataset versus 2.4% and 11% for bowel and abdominal wall in the test dataset. The “no finding” annotations have low difficulty indices leading to the lower median difficulty of the train dataset.</p><h3 id=\"Sec9\">Real-time deployment of near infrared artificial intelligence</h3><p>Advanced near infrared physiologic imaging like indocyanine green fluorescence angiography and laser speckle contrast imaging show levels of tissue perfusion beyond what is visible in standard white light imaging. These technologies are used in colorectal resections to ensure adequate perfusion of the colon and rectum during reconstruction to reduce complications and improve patient outcomes. Subjectively interpreting physiologic imaging can be challenging and is dependent on user experience.</p><p><i>Bowel.CSS</i> was developed to mask the physiologic imaging data to only those tissues relevant to the surgeon during colorectal resection and reconstruction to assist with interpretation of the visual signal. The output of this model was the bowel label only and it was deployed in real-time on a modified research unit of a commercially available advanced physiologic imaging platform for laparoscopic, robotic, and open surgery.</p><p><i>Bowel.CSS-deployed</i> successfully segmented the bowel in real-time during 2 colorectal procedures at 10 frames per second. The intraoperative labels were not saved from the procedures, so to evaluate the intraoperative performance of the model, 10 s clips from each procedure were sampled at 1 FPS (20 frames total) from when the surgeon activated the intraoperative AI model. To assess for accuracy, the model outputs of <i>Bowel.CSS</i> and <i>Bowel.CSS-deployed</i> were compared to annotations by one of three surgical experts (1 trainee and 2 board-certified surgeons). Model outputs were compared to the expert annotations in these 20 frames using standard computer vision metrics. (Supplementary Table <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM1\">3</a>).</p><h3 id=\"Sec10\">Reporting summary</h3><p>Further information on research design is available in the <a target=\"_blank\" href=\"https://www.nature.com/articles/s41746-024-01095-8#MOESM2\">Nature Research Reporting Summary</a> linked to this article.</p></div></div>\n </div><div>\n <div><h2 id=\"data-availability\">Data availability</h2><p>Requests for additional study data will be evaluated by the corresponding author upon request.</p></div><div><h2 id=\"code-availability\">Code availability</h2><div>\n <p>The trained Bowel.CSS models are available free and open source (<a target=\"_blank\" href=\"https://github.com/ACTIV-Sugical/Bowel.CSS\">https://github.com/ACTIV-Sugical/Bowel.CSS</a>).</p>\n </div></div><div><h2 id=\"Bib1\">References</h2><div><ol><li><p>Madani, A. et al. Artificial Intelligence for Intraoperative Guidance: Using Semantic Segmentation to Identify Surgical Anatomy During Laparoscopic Cholecystectomy. <i>Ann. Surg.</i> <b>276</b>, 363–369 (2022).</p><p><a target=\"_blank\" href=\"https://doi.org/10.1097%2FSLA.0000000000004594\">Article</a> \n <a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=33196488\">PubMed</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=Artificial%20Intelligence%20for%20Intraoperative%20Guidance%3A%20Using%20Semantic%20Segmentation%20to%20Identify%20Surgical%20Anatomy%20During%20Laparoscopic%20Cholecystectomy&journal=Ann.%20Surg.&doi=10.1097%2FSLA.0000000000004594&volume=276&pages=363-369&publication_year=2022&author=Madani%2CA\">\n Google Scholar</a> \n </p></li><li><p>Mascagni, P. et al. A Computer Vision Platform to Automatically Locate Critical Events in Surgical Videos: Documenting Safety in Laparoscopic Cholecystectomy. <i>Ann. Surg.</i> <b>274</b>, e93–e95 (2021).</p><p><a target=\"_blank\" href=\"https://doi.org/10.1097%2FSLA.0000000000004736\">Article</a> \n <a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=33417329\">PubMed</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=A%20Computer%20Vision%20Platform%20to%20Automatically%20Locate%20Critical%20Events%20in%20Surgical%20Videos%3A%20Documenting%20Safety%20in%20Laparoscopic%20Cholecystectomy&journal=Ann.%20Surg.&doi=10.1097%2FSLA.0000000000004736&volume=274&pages=e93-e95&publication_year=2021&author=Mascagni%2CP\">\n Google Scholar</a> \n </p></li><li><p>Hashimoto, D. A. et al. Computer Vision Analysis of Intraoperative Video: Automated Recognition of Operative Steps in Laparoscopic Sleeve Gastrectomy. <i>Ann. Surg.</i> <b>270</b>, 414 (2019).</p><p><a target=\"_blank\" href=\"https://doi.org/10.1097%2FSLA.0000000000003460\">Article</a> \n <a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=31274652\">PubMed</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=Computer%20Vision%20Analysis%20of%20Intraoperative%20Video%3A%20Automated%20Recognition%20of%20Operative%20Steps%20in%20Laparoscopic%20Sleeve%20Gastrectomy&journal=Ann.%20Surg.&doi=10.1097%2FSLA.0000000000003460&volume=270&publication_year=2019&author=Hashimoto%2CDA\">\n Google Scholar</a> \n </p></li><li><p>Ward, T. M. et al. Challenges in surgical video annotation. <i>Comput. Assist. Surg.</i> <b>26</b>, 58–68 (2021).</p><p><a target=\"_blank\" href=\"https://doi.org/10.1080%2F24699322.2021.1937320\">Article</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=Challenges%20in%20surgical%20video%20annotation&journal=Comput.%20Assist.%20Surg.&doi=10.1080%2F24699322.2021.1937320&volume=26&pages=58-68&publication_year=2021&author=Ward%2CTM\">\n Google Scholar</a> \n </p></li><li><p>Maier-Hein, L. et al. Can Masses of Non-Experts Train Highly Accurate Image Classifiers?: A Crowdsourcing Approach to Instrument Segmentation in Laparoscopic Images. In <i>Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014</i> (eds. Golland, P., Hata, N., Barillot, C., Hornegger, J. & Howe, R.) 8674 438–445 (Springer International Publishing, Cham, 2014).</p></li><li><p>Vignali, A. et al. Altered microperfusion at the rectal stump is predictive for rectal anastomotic leak. <i>Dis. Colon Rectum.</i> <b>43</b>, 76–82 (2000).</p><p><a target=\"_blank\" href=\"https://link.springer.com/doi/10.1007/BF02237248\">Article</a> \n <a target=\"_blank\" href=\"https://www.nature.com/articles/cas-redirect/1:STN:280:DC%2BD3c3ntFagtA%3D%3D\">CAS</a> \n <a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=10813128\">PubMed</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=Altered%20microperfusion%20at%20the%20rectal%20stump%20is%20predictive%20for%20rectal%20anastomotic%20leak&journal=Dis.%20Colon%20Rectum.&doi=10.1007%2FBF02237248&volume=43&pages=76-82&publication_year=2000&author=Vignali%2CA\">\n Google Scholar</a> \n </p></li><li><p>Skinner, G. et al. Clinical Utility of Laser Speckle Contrast Imaging (LSCI) Compared to Indocyanine Green (ICG) and Quantification of Bowel Perfusion in Minimally Invasive, Left-Sided Colorectal Resections. <i>Dis. Colon. Rectum</i> (In press).</p></li><li><p>Van Gaalen, A. E. J. et al. Gamification of health professions education: a systematic review. <i>Adv. Health Sci. Educ.</i> <b>26</b>, 683–711 (2021).</p><p><a target=\"_blank\" href=\"https://link.springer.com/doi/10.1007/s10459-020-10000-3\">Article</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=Gamification%20of%20health%20professions%20education%3A%20a%20systematic%20review&journal=Adv.%20Health%20Sci.%20Educ.&doi=10.1007%2Fs10459-020-10000-3&volume=26&pages=683-711&publication_year=2021&author=Gaalen%2CAEJ\">\n Google Scholar</a> \n </p></li><li><p>Bhattacherjee, A. & Fitzgerald, B. Shaping the Future of ICT Research: Methods and Approaches. In <i>IFIP WG 8.2 Working Conference, Tampa, FL, USA, Proceedings</i>. (Springer, Heidelberg New York, 2012).</p></li><li><p>Rädsch, T. et al. Labelling instructions matter in biomedical image analysis. <i>Nat. Mach. Intell.</i> <b>5</b>, 273–283 (2023).</p><p><a target=\"_blank\" href=\"https://doi.org/10.1038%2Fs42256-023-00625-5\">Article</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=Labelling%20instructions%20matter%20in%20biomedical%20image%20analysis&journal=Nat.%20Mach.%20Intell.&doi=10.1038%2Fs42256-023-00625-5&volume=5&pages=273-283&publication_year=2023&author=R%C3%A4dsch%2CT\">\n Google Scholar</a> \n </p></li><li><p>Xie, E. et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Preprint at <a target=\"_blank\" href=\"http://arxiv.org/abs/2105.15203\">http://arxiv.org/abs/2105.15203</a> (2021).</p></li><li><p>Hicks, S. A. et al. On evaluation metrics for medical applications of artificial intelligence. <i>Sci. Rep.</i> <b>12</b>, 5979 (2022).</p><p><a target=\"_blank\" href=\"https://doi.org/10.1038%2Fs41598-022-09954-8\">Article</a> \n <a target=\"_blank\" href=\"https://www.nature.com/articles/cas-redirect/1:CAS:528:DC%2BB38XpvVWnt7o%3D\">CAS</a> \n <a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=35395867\">PubMed</a> \n <a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8993826\">PubMed Central</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=On%20evaluation%20metrics%20for%20medical%20applications%20of%20artificial%20intelligence&journal=Sci.%20Rep.&doi=10.1038%2Fs41598-022-09954-8&volume=12&publication_year=2022&author=Hicks%2CSA\">\n Google Scholar</a> \n </p></li><li><p>Kentley, J. et al. Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study. <i>JMIR Med. Inf.</i> <b>11</b>, e38412 (2023).</p><p><a target=\"_blank\" href=\"https://doi.org/10.2196%2F38412\">Article</a> \n <a target=\"_blank\" href=\"http://scholar.google.com/scholar_lookup?&title=Agreement%20Between%20Experts%20and%20an%20Untrained%20Crowd%20for%20Identifying%20Dermoscopic%20Features%20Using%20a%20Gamified%20App%3A%20Reader%20Feasibility%20Study&journal=JMIR%20Med.%20Inf.&doi=10.2196%2F38412&volume=11&publication_year=2023&author=Kentley%2CJ\">\n Google Scholar</a> \n </p></li><li><p>Ribeiro, V., Avila, S. & Valle, E. Handling Inter-Annotator Agreement for Automated Skin Lesion Segmentation. Preprint at <a target=\"_blank\" href=\"http://arxiv.org/abs/1906.02415\">http://arxiv.org/abs/1906.02415</a> (2019).</p></li></ol><p><a target=\"_blank\" href=\"https://citation-needed.springer.com/v2/references/10.1038/s41746-024-01095-8?format=refman&flavour=references\">Download references</a></p></div></div><div><h2 id=\"Ack1\">Acknowledgements</h2><p>This study was funded by Activ Surgical, Inc. (Boston, MA). We extend our gratitude to the hardworking research team at The Ohio State University Wexner Medical Center for collection and generation of the de-identified video database used in this study.</p></div><div><h2 id=\"author-information\">Author information</h2><div><h3 id=\"affiliations\">Authors and Affiliations</h3><ol><li><p>Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA</p><p>Garrett Skinner & Peter Kim</p></li><li><p>Activ Surgical, University at Buffalo, Buffalo, NY, USA</p><p>Garrett Skinner, Tina Chen, Gabriel Jentis, Yao Liu, Christopher McCulloh & Peter Kim</p></li><li><p>Warren Alpert Medical School Alpert Medical School of Brown University, Providence, RI, USA</p><p>Yao Liu</p></li><li><p>The Ohio State University Wexner Medical Center, Columbus, OH, USA</p><p>Alan Harzman, Emily Huang & Matthew Kalady</p></li></ol><div><p><span>Authors</span></p><ol><li><span>Garrett Skinner</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Garrett%20Skinner\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Garrett%20Skinner%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li><li><span>Tina Chen</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Tina%20Chen\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Tina%20Chen%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li><li><span>Gabriel Jentis</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Gabriel%20Jentis\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Gabriel%20Jentis%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li><li><span>Yao Liu</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Yao%20Liu\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Yao%20Liu%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li><li><span>Christopher McCulloh</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Christopher%20McCulloh\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Christopher%20McCulloh%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li><li><span>Alan Harzman</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Alan%20Harzman\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Alan%20Harzman%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li><li><span>Emily Huang</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Emily%20Huang\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Emily%20Huang%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li><li><span>Matthew Kalady</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Matthew%20Kalady\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Matthew%20Kalady%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li><li><span>Peter Kim</span><div><p>You can also search for this author in\n <span><a target=\"_blank\" href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Peter%20Kim\">PubMed</a><span> </span><a target=\"_blank\" href=\"http://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Peter%20Kim%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en\">Google Scholar</a></span></p></div></li></ol></div><h3 id=\"contributions\">Contributions</h3><p>T.C., G.S., and P.K. were responsible for designing and training the initial artificial intelligence models. T.C. and G.S. were responsible for obtaining crowdsourced annotations. G.J, T.C., C.M. and G.S. performed data analysis on crowdsourcing demographics and comparisons to expert annotators. G.S., C.M., Y.L, and P.K. provided expert annotations. A.H., M.K., and E.H., provided design considerations and clinical feedback during training and deployment of artificial intelligence models. P.K. supervised this work. All authors contributed to manuscript preparation, critical revisions, and have read and approved the manuscript.</p><h3 id=\"corresponding-author\">Corresponding author</h3><p>Correspondence to\n <a target=\"_blank\" href=\"mailto:[email protected]\">Peter Kim</a>.</p></div></div><div><h2 id=\"ethics\">Ethics declarations</h2><div>\n <h3 id=\"FPar1\">Competing interests</h3>\n <p>This study was funded by Activ Surgical Inc., Boston, MA. Current or previous consultants for Activ Surgical Inc.: G.S., A.H., M.K. Current or previous employment by Activ Surgical Inc.: T.C., G.J., C.M., Y.L. Founder/Ownership of Activ Surgical Inc.: P.K. No competing interests: E.H.</p>\n </div></div><div><h2 id=\"additional-information\">Additional information</h2><p><b>Publisher’s note</b> Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p></div><div><h2 id=\"Sec11\">Supplementary information</h2></div><div><h2 id=\"rightslink\">Rights and permissions</h2><div>\n <p><b>Open Access</b> This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit <a target=\"_blank\" href=\"http://creativecommons.org/licenses/by/4.0/\">http://creativecommons.org/licenses/by/4.0/</a>.</p>\n <p><a target=\"_blank\" href=\"https://s100.copyright.com/AppDispatchServlet?title=Real-time%20near%20infrared%20artificial%20intelligence%20using%20scalable%20non-expert%20crowdsourcing%20in%20colorectal%20surgery&author=Garrett%20Skinner%20et%20al&contentID=10.1038%2Fs41746-024-01095-8&copyright=The%20Author%28s%29&publication=2398-6352&publicationDate=2024-04-22&publisherName=SpringerNature&orderBeanReset=true&oa=CC%20BY\">Reprints and permissions</a></p></div></div><div><h2 id=\"article-info\">About this article</h2><div><h3 id=\"citeas\">Cite this article</h3><p>Skinner, G., Chen, T., Jentis, G. <i>et al.</i> Real-time near infrared artificial intelligence using scalable non-expert crowdsourcing in colorectal surgery.\n <i>npj Digit. Med.</i> <b>7</b>, 99 (2024). https://doi.org/10.1038/s41746-024-01095-8</p><p><a target=\"_blank\" href=\"https://citation-needed.springer.com/v2/references/10.1038/s41746-024-01095-8?format=refman&flavour=citation\">Download citation</a></p><ul><li><p>Received<span>: </span><span>31 August 2023</span></p></li><li><p>Accepted<span>: </span><span>29 March 2024</span></p></li><li><p>Published<span>: </span><span>22 April 2024</span></p></li><li><p><abbr title=\"Digital Object Identifier\">DOI</abbr><span>: </span><span>https://doi.org/10.1038/s41746-024-01095-8</span></p></li></ul></div></div>\n </div>",