Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • LizBent
    Member
    • Jan 2012
    • 31

    Velvet paired end after some sequences removed?

    Hi,

    After trimming and filtering my Illumina sequences (paired end 100 bp) for quality, I am left with files that probably don't have the same number of sequences in them any longer. This will be a problem if I try to interleave the files and use them as input for Velvet, right?

    Would it be better to:

    1. Don't remove any sequences from my files, even if very short or 0 bases, so they can be interleaved
    2. Remove sequences and use the (smaller) files in Velvet but not as paired-end reads

    Are there other options I'm not aware of?

    Thanks
  • jjohnson
    Member
    • Aug 2009
    • 20

    #2
    I would bin the valid pairs and singletons (those with mates removed due to quality trimming/filtering) into 2 separate fastq files. Velvet can accept mutiple files and then you can paramertize around the files (such as specifying insert sizes for mates file, etc).

    i.e.

    velveth Assem 35 -shortPaired -fasta pe_lib1.fasta -short3 se_lib1.fa
    Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

    Comment

    • nposnien
      Member
      • May 2011
      • 13

      #3
      Hi,
      we are facing the same problem at the moment. We will have uneven files (one for each pair) after trimming/filtering.

      My question is, if there is a script/program out there that would find the mates in in two different files (or in one file if I would merge/shuffle the files prior to trimming/filtering) and bins the unpaired reads into an extra file?

      Any help is highly appreciated!

      Comment

      • rahularjun86
        Member
        • Jan 2011
        • 58

        #4
        Dear nposnien,
        you can use Sickle tool (https://github.com/najoshi/sickle). You only need to input the pair fastq files, and other parameters (scoring system used, quality score to keep and length cutoff etc.), and it will generate the paired and singleton files.
        If you want to filter out reads with N's, Just replace the whole sequence with N and quality with #, then set Sickle length and quality values. This way it will filter out reads with N's.
        Best wishes,
        Rahul
        Rahul Sharma,
        Ph.D
        Frankfurt am Main, Germany

        Comment

        • LizBent
          Member
          • Jan 2012
          • 31

          #5
          This script may be useful for interleaving pairs for Velvet (and generating non-paired singleton files):

          A tutorial for learning de novo assembly. Contribute to lexnederbragt/denovo-assembly-tutorial development by creating an account on GitHub.

          Comment

          • nposnien
            Member
            • May 2011
            • 13

            #6
            First of all, thanks for the answers!

            @ LizBent: Can I use the script for data that has been processed using CASAVA 1.8? In the discussion you added a link to, it is proposed to replace

            f_suffix = "/1"
            r_suffix = "/2"

            with

            f_suffix = ""
            r_suffix = ""

            My question is: How are the pairs identified then?

            Comment

            • LizBent
              Member
              • Jan 2012
              • 31

              #7
              No idea, you might want to ask the original script writer, who is cited in the comments at the top of the script (and there is also a reference to another SeqAnswers thread there that might answer your question). Sorry I can't help.

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                Yesterday, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              18 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              52 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              111 views
              0 reactions
              Last Post SEQadmin2  
              Working...