The FDA-led MAQC-IV/SEQC2 Consortium has published the data descriptor paper, i.e., Zhao Y. et al. Sci Data (2021), detailing the multi-center multi-platform whole-genome and whole-exome sequencing data sets for the breast cancer cell line HCC1395 and and its B lymphocyte-derived matched normal HCC1395BL. The genomic DNA was produced in a single batch by ATCC to ensure sample homogeneity. The following are some of the Consortium's papers that used those data sets:
Find all of SEQC2's publications:
- Establish the high-confidence somatic mutation call set that may be used as the "ground truth" for benchmarking analyses or machine learning modelings: Fang L.T. et al. Nat Biotechnol (2021) / PMID:34504347 / SharedIt Link.
- Use the high-confidence somatic mutation call set as the "ground truth" to investigate how different sample preparations, sequencing library kits, and bioinformatic algorithms affect the accuracy of the somatic mutation pipelines, and develop best practices: Xiao W. et al. Nat Biotechnol (2021) / PMID:34504346 / SharedIt Link.
- Use the high-confidence somatic mutation call set as the labeled training data to build more accurate machine learning models for somatic mutation detections: Sahraeian S.M.E. et al. bioRxiv (2019).
Find all of SEQC2's publications:
- SEQC2 Collection on Nature Biotechnology
- SEQC2 Collection on Genome Biology