TY - JOUR
T1 - Performance assessment of computational tools to detect microsatellite instability
AU - Anthony, Harrison
AU - Seoighe, Cathal
N1 - Publisher Copyright:
© The Author(s) 2024. Published by Oxford University Press.
PY - 2024/9/1
Y1 - 2024/9/1
N2 - Microsatellite instability (MSI) is a phenomenon seen in several cancer types, which can be used as a biomarker to help guide immune checkpoint inhibitor treatment. To facilitate this, researchers have developed computational tools to categorize samples as having high microsatellite instability, or as being microsatellite stable using next-generation sequencing data. Most of these tools were published with unclear scope and usage, and they have yet to be independently benchmarked. To address these issues, we assessed the performance of eight leading MSI tools across several unique datasets that encompass a wide variety of sequencing methods. While we were able to replicate the original findings of each tool on whole exome sequencing data, most tools had worse receiver operating characteristic and precision-recall area under the curve values on whole genome sequencing data. We also found that they lacked agreement with one another and with commercial MSI software on gene panel data, and that optimal threshold cut-offs vary by sequencing type. Lastly, we tested tools made specifically for RNA sequencing data and found they were outperformed by tools designed for use with DNA sequencing data. Out of all, two tools (MSIsensor2, MANTIS) performed well across nearly all datasets, but when all datasets were combined, their precision decreased. Our results caution that MSI tools can have much lower performance on datasets other than those on which they were originally evaluated, and in the case of RNA sequencing tools, can even perform poorly on the type of data for which they were created.
AB - Microsatellite instability (MSI) is a phenomenon seen in several cancer types, which can be used as a biomarker to help guide immune checkpoint inhibitor treatment. To facilitate this, researchers have developed computational tools to categorize samples as having high microsatellite instability, or as being microsatellite stable using next-generation sequencing data. Most of these tools were published with unclear scope and usage, and they have yet to be independently benchmarked. To address these issues, we assessed the performance of eight leading MSI tools across several unique datasets that encompass a wide variety of sequencing methods. While we were able to replicate the original findings of each tool on whole exome sequencing data, most tools had worse receiver operating characteristic and precision-recall area under the curve values on whole genome sequencing data. We also found that they lacked agreement with one another and with commercial MSI software on gene panel data, and that optimal threshold cut-offs vary by sequencing type. Lastly, we tested tools made specifically for RNA sequencing data and found they were outperformed by tools designed for use with DNA sequencing data. Out of all, two tools (MSIsensor2, MANTIS) performed well across nearly all datasets, but when all datasets were combined, their precision decreased. Our results caution that MSI tools can have much lower performance on datasets other than those on which they were originally evaluated, and in the case of RNA sequencing tools, can even perform poorly on the type of data for which they were created.
KW - benchmarking
KW - cancer biomarker
KW - microsatellite instability
UR - http://www.scopus.com/inward/record.url?scp=85201143004&partnerID=8YFLogxK
U2 - 10.1093/bib/bbae390
DO - 10.1093/bib/bbae390
M3 - Article
C2 - 39129364
AN - SCOPUS:85201143004
SN - 1467-5463
VL - 25
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 5
M1 - bbae390
ER -