Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fewer reads R1 than R2 bowtie2

    Dear all,
    perhaps I am going to ask something too much discussed. However I am absolutely unable to find the rigth answer.
    Anyone knows how to perform a mapping step using Bowtie2 with a different number of R1 and R2 reads from Illumina? I haver heard something about using Trim_galore prior to the alignment step, but I must trim from several primers and after a cutadapt trimming process R1 and R2 have a different number of reads. I guess that using Trim_galore after that could trim useful sequences...
    Any advice would be appreciated.
    Thanks a lot!

  • #2
    If you can reformat your R1/R2 such that they only contained paired, and have an extra fastq that contains unpaired reads, you can do...

    -q -1 R1.fastq -2 R2.fastq -U unpaired.fastq

    On my end, I've had good luck with Trimmomatic, which will give you a separate file for the unpaired reads (1P, 2P, 1U, 2U)

    Comment


    • #3
      Hi ctseto! Thanks for your reply!
      I had a look to trimmomatic. However it didn't work as well as I expected. I mean, some primer sequences still remained in my fastq files.
      That's what I decided to give a chance to cutadapt, but I don't know how to "reformat" my R1 and R2 files...
      Thanks!

      Comment


      • #4
        Not familiar with cutadapt, but I'll look into it. From https://github.com/marcelm/cutadapt/...ster/README.md

        If you use one of the read-discarding options, then the --paired-output option is needed to keep the two files synchronized. First trim the forward read, writing output to temporary files:

        cutadapt -a ADAPTER_FWD --minimum-length 20 --paired-output tmp.2.fastq -o tmp.1.fastq reads.1.fastq reads.2.fastq
        Then trim the reverse read, using the temporary files as input:

        cutadapt -a ADAPTER_REV --minimum-length 20 --paired-output trimmed.1.fastq -o trimmed.2.fastq tmp.2.fastq tmp.1.fastq
        Finally, remove the temporary files:

        rm tmp.1.fastq tmp.2.fastq
        I assume this is what you tried?

        An inelegant way of pulling reads that still had pairs would be to build an index of readnames that existed in both R1 and R2; then extract those reads from R1 and R2 and construct a new pair of files that had the appropriate reads.
        Last edited by ctseto; 11-14-2013, 09:09 AM.

        Comment


        • #5
          This is why I prefer Trimmomatic. It handles paired end reads more elegantly. I suggest you give it a try.

          Comment


          • #6
            Exactly!
            Older version....
            Thanks a lot ctseto!
            However, I will let a chance to Trimmomatic again... thanks!

            Comment


            • #7
              Originally posted by jordi View Post
              Exactly!
              Older version....
              Thanks a lot ctseto!
              However, I will let a chance to Trimmomatic again... thanks!
              Might be worth checking your Trimmomatic's list of adapters to be on the safe side.

              That would be the TruSeq3-PE.fa file in your adapters directory in the Trimmomatic directory. (Current version is 0.30)

              Knowing what adapters/primers/kits you are using upstream of your NGS, and armed with something like Illumina's Customer Sequence Letter (http://support.illumina.com/download...es_letter.ilmn) It should give you enough information (to much, even) to put together a list of adaptors to trim off.

              A figure illustrating the schema of Trimmomatic is in its manual (http://www.usadellab.org/cms/uploads...nual_V0.30.pdf)

              From Trimmomatic's notes
              These sequences have not been extensively tested, and depending on specific issues which may occur in library preparation, other sequences may work better for a given dataset.

              To make a custom version of fasta, you must first understand how it will be used. Trimmomatic uses two strategies for adapter trimming: Palindrome and Simple

              With 'simple' trimming, each adapter sequence is tested against the reads, and if a sufficiently accurate match is detected, the read is clipped appropriately.

              'Palindrome' trimming is specifically designed for the case of 'reading through' a short fragment into the adapter sequence on the other end. In this approach, the appropriate adapter sequences are 'in silico ligated' onto the start of the reads, and the combined adapter+read sequences, forward and reverse are aligned. If they align in a manner which indicates 'read-through', the forward read is clipped and the reverse read dropped (since it contains no new data).

              Naming of the sequences indicates how they should be used. For 'Palindrome' clipping, the sequence names should both start with 'Prefix', and end in '/1' for the forward adapter and '/2' for the reverse adapter. All other sequences are checked using 'simple' mode. Sequences with names ending in '/1' or '/2' will be checked only against the forward or reverse read. Sequences not ending in '/1' or '/2' will be checked against both the forward and reverse read. If you want to check for the reverse-complement of a specific sequence, you need to specifically include the reverse-complemented form of the sequence as well, with another name.
              Knowing what adapters/primers/kits you are using upstream of your NGS, and armed with something like Illumina's Customer Sequence Letter (http://support.illumina.com/download...es_letter.ilmn) It should give you enough information (to much, even) to put together a list of adaptors to trim off.
              Last edited by ctseto; 11-14-2013, 11:46 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 01:35 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              38 views
              0 likes
              Last Post seqadmin  
              Working...
              X