Polymerase matters: non-proofreading enzymes inflate fungal community richness estimates by up to 15 %
Rare taxa overwhelm metabarcoding data generated using next-generation sequencing (NGS). Low frequency Operational Taxonomic Units (OTUs) may be artifacts generated by PCR-amplification errors resulting from polymerase mispairing. We analyzed two Internal Transcribed Spacer 2 (ITS2) MiSeq libraries generated with proofreading (ThermoScientific Phusion) and non-proofreading (ThermoScientific Phire) polymerases from the same MiSeq reaction, the same samples, using the same DNA tags, and with two different clustering methods to evaluate the effect of polymerase and clustering tool choices on the estimates of richness, diversity and community composition. Our data show that, while the overall communities are comparable, OTU richness is exaggerated by the use of the non-proofreading polymeraseeup to 15 % depending on the clustering method, and on the threshold of low frequency OTU removal. The overestimation of richness also consistently led to underestimation of community evenness, a result of increase in the low frequency OTUs. Stringent thresholds of eliminating the rare reads remedy this issue; exclusion of reads that occurred 10 times reduced overestimated OTU numbers to <0.3 %. As a result of these findings, we strongly recommend the use of proofreading polymerases to improve the data integrity as well as the use of stringent culling thresholds for rare sequences to minimize overestimation of community richness.