Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Varying lengths in 2x150 Miseq sequencing data

    I have data from a 2x150 Miseq run but I have noticed that the read lengths vary quite a lot. They range from 35 to 151 bp. I am interested in assembling these data with BWA or BOWTIE2 and I would like to know whether this is would be a problem and if not why?
    Thanks

  • #2
    Hi Elfuser,

    The different read lengths could be caused by adapter clipping? - the MiSeq has a small check box for this on the sample set up screen that is often defaulted to on. This clips off the adapter sequences from your reads. Any library fragment that is shorter than your read length will run into the adapters (a bit like sequencing into the vector in the old days!). Its a good idea to remove the adapter sequences, but it does lead to variable read lengths. Instead of letting the MiSeq do this, I usually remove adapter and quality trim my data with trim_galore, which you can set with a minimum acceptable read length.

    bwa doesn't seem to mind datasets with different length reads. As far as I understand (I'm not a bioinformatician!) bwa works by looking for a short exact match between the read and the reference sequence (called the seed, default 20bp). When it finds a match, it then extends this. So as long as your reads are longer than the seed length it will work.

    I don't know about bowtie2 I'm afraid.

    Best wishes,
    Gavin

    Comment


    • #3
      Hi,

      I am also facing a similar situation right now. I have Miseq data with varying read lengths. Is is fine to go ahead with the denovo assembly into contigs ? will these reads with different lengths affect the assembly and the downstream analysis?

      Comment


      • #4
        Yes, go ahead. Most de novo assemblers these days can handle variable read lengths.

        Depending on the assembler you choose, you probably want to trim your raw data for quality and remove any very short reads before you start the assembly. Remember its important to maintain the same order in your forward and reverse reads if you have paired ends.

        Any reads that are shorter than your hash (k-mer) length ought to be removed or ignored by the assembly software.

        Comment


        • #5
          Hi Gwilkie,
          Thank you for your advice. I have trimmed the reads and then assembled them using velvet. Assembly looks good.

          Comment


          • #6
            Hi,
            Can you suggest which assembler shoud be prefered for longer read lengths i.e above 200bp.

            Comment


            • #7
              It depends on what genome you are trying to assemble - e.g. virus, bacteria, vertebrate. I suggest you start a new thread as this is a complex subject and there are now many different assemblers to choose from... each with their strengths and weaknesses.

              ABySS is a good general assembler that is easy to use and does not require a huge amount of computer resources. However, all De Bruijn graph assemblers have a maximum hash length (k-mer size) that generally cannot exceed 128 due to computing power limitations.

              Therefore very long reads do not necessarily help the initial assembly but can be useful later for closing gaps, joining contigs or resolving repeats.

              Hope that helps


              Originally posted by shaik sabiha View Post
              Hi,
              Can you suggest which assembler shoud be prefered for longer read lengths i.e above 200bp.

              Comment


              • #8
                Hi Gwilkie,

                I think its a good idea to start a new thread. So far I was using velvet for 150bp reads and now I am experiment with IDBA for 250 bp ones. Will be checking out ABySS as well. Thank you

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                45 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X