Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Velvet paired end after some sequences removed?

    Hi,

    After trimming and filtering my Illumina sequences (paired end 100 bp) for quality, I am left with files that probably don't have the same number of sequences in them any longer. This will be a problem if I try to interleave the files and use them as input for Velvet, right?

    Would it be better to:

    1. Don't remove any sequences from my files, even if very short or 0 bases, so they can be interleaved
    2. Remove sequences and use the (smaller) files in Velvet but not as paired-end reads

    Are there other options I'm not aware of?

    Thanks

  • #2
    I would bin the valid pairs and singletons (those with mates removed due to quality trimming/filtering) into 2 separate fastq files. Velvet can accept mutiple files and then you can paramertize around the files (such as specifying insert sizes for mates file, etc).

    i.e.

    velveth Assem 35 -shortPaired -fasta pe_lib1.fasta -short3 se_lib1.fa
    Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

    Comment


    • #3
      Hi,
      we are facing the same problem at the moment. We will have uneven files (one for each pair) after trimming/filtering.

      My question is, if there is a script/program out there that would find the mates in in two different files (or in one file if I would merge/shuffle the files prior to trimming/filtering) and bins the unpaired reads into an extra file?

      Any help is highly appreciated!

      Comment


      • #4
        Dear nposnien,
        you can use Sickle tool (https://github.com/najoshi/sickle). You only need to input the pair fastq files, and other parameters (scoring system used, quality score to keep and length cutoff etc.), and it will generate the paired and singleton files.
        If you want to filter out reads with N's, Just replace the whole sequence with N and quality with #, then set Sickle length and quality values. This way it will filter out reads with N's.
        Best wishes,
        Rahul
        Rahul Sharma,
        Ph.D
        Frankfurt am Main, Germany

        Comment


        • #5
          This script may be useful for interleaving pairs for Velvet (and generating non-paired singleton files):

          https://github.com/lexnederbragt/den...leave_pairs.py

          Comment


          • #6
            First of all, thanks for the answers!

            @ LizBent: Can I use the script for data that has been processed using CASAVA 1.8? In the discussion you added a link to, it is proposed to replace

            f_suffix = "/1"
            r_suffix = "/2"

            with

            f_suffix = ""
            r_suffix = ""

            My question is: How are the pairs identified then?

            Comment


            • #7
              No idea, you might want to ask the original script writer, who is cited in the comments at the top of the script (and there is also a reference to another SeqAnswers thread there that might answer your question). Sorry I can't help.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X