Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sufficient Reads

    Hello.

    I am doing fastQC on a batch of samples and am wondering how many millions of reads suffices?

    One sample passed QC with 13 x 10^6 reads (roughly 13 million)

    over 20 million reads and of good quality is ideal. but what about 13 million?

  • #2
    That really depends what you're doing with the reads as to whether there are enough reads... More information is required.

    Comment


    • #3
      I am doing variant calling, and will convert these files into VCF files, and do SNP calling for analysis of any SNP's.

      Comment


      • #4
        I am not doing any differential expression analysis. I imagine after I get the VCF files, I will then do a pathway analysis.

        Comment


        • #5
          It depends on what genome you're using, how long your reads are and whether they're paired-end or single-end and how even the coverage is. If you're dealing with say a wheat genome (17 GB) then I'd say the number of reads you have is too low. A good guide to go by is that any variations should be supported by a minimum of 10 reads (preferable both in forward and reverse).

          Further information is required, but I hope this helps.

          Comment


          • #6
            I am dealing with a human genome (homo sapien) using the reference genome Hg19 from UCSC.

            I have reads from 13 million, some in 15 million, and others above 20 million.

            if 13 million is too low, then do I sacrifice quality?

            Comment


            • #7
              sorry. I am doing paired end reads, and most sequence lengths are from 30-128 bp in length

              Comment


              • #8
                What is your current quality cut off? I wouldn't go below Phred20.

                I'm going to assume your "13/15/20 millions" are different samples which can't be pooled?

                Comment


                • #9
                  looking through my QC, most reads are around 17 million. does this suffice? or should I lower my parameters in trimmomatic?

                  Comment


                  • #10
                    1) yes they are different samples, we can not combine the reads
                    2) I chose phred score of 33
                    3) I am using illumina clip, and this is for an RNA seq experiment of bone marrow using a truseq prep kit.

                    here are my parameters,

                    java -classpath /auto/rcf-proj/sa1/software/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 16 -phred33 931269_R1.fastq.gz 931269_R2.fastq.gz paired_trimmed_931269_R1.fastq.gz unpaired_trimmed_931269_R1.fastq.gz paired_trimmed_931269_R2.fastq.gz unpaired_trimmed_931269_R2.fastq.gz ILLUMINACLIP:/auto/rcf-proj/sa1/acolombo/Target_2013_229/BoneMarrows_PolyA/Sample_931269/TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 HEADCROP:15 SLIDINGWINDOW:4:10 MINLEN:30

                    Comment


                    • #11
                      I am trimming the adapters using a custom made TruSeq2-PE.fa file as well.

                      Comment


                      • #12
                        Originally posted by arcolombo698 View Post
                        I am dealing with a human genome (homo sapien) using the reference genome Hg19 from UCSC.

                        I have reads from 13 million, some in 15 million, and others above 20 million.

                        if 13 million is too low, then do I sacrifice quality?
                        Is it exome, whole genome, RNA-Seq, smaller targeted capture?

                        Dan

                        Comment


                        • #13
                          Thank you very much for your response

                          It is RNA-seq experiment.

                          Comment


                          • #14
                            If you are doing variant calling from RNA-seq data, 13M reads is enough to get sufficient read depth on a subset of the genes. Because the number of transcripts from genes varies 1000-fold, it is very difficult to get high depth from genes that are poorly expressed (and impossible to get high depth from genes that are not expressed). So for any particular number of reads, you will be able to make SNP calls for a particular number of genes, and as the number of reads increase, you'll be able to call SNPs from more genes.

                            edit: removed phred bit... thought was about parameters for cutting poor quality, not encoding!
                            Last edited by SNPsaurus; 12-24-2013, 12:17 PM.
                            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                            Comment


                            • #15
                              note that -phred33 in the trimmomatic parameters refers to the Illumina encoding for the base qualities, and not to the cutoff value.
                              Last edited by mastal; 12-24-2013, 11:44 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              47 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X