Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi, so I'm working with some similar data. Something I found is that alot of trimming tools aren't really set up for paired end stuff. I have a pipeline for trimming and aligning reads. It goes basically like this:


    //There are first two files, paired end illumina. This removes all the ones that failed basic quality checks. Outputs to Filtered
    grep -A 3 '^@.* [^:]*:N:[^:]*:' $INPUT1 > $FILTERED1
    grep -A 3 '^@.* [^:]*:N:[^:]*:' $INPUT2 > $FILTERED2

    //This tool is good for dealing with paired end reads. Best that I could find for paired end trimming. I don't remember all the parameters but theres a great resource out there describing this tool.
    fastq-mcf -o $OUTPUT1 -o $OUTPUT2 -l 16 -q 15 -w 4 -x 10 -u -P 33 $ADAPTERS $FILTERED1 $FILTERED2

    //This aligns using bowtie and gets a samfile made.
    bowtie -t -p 8 --sam $REF_GENOME -1 $OUTPUT1 -2 $OUTPUT2 $ALIGNED_OUTPUT

    //This makes a sorted bam file from our bowtie alignment, which can be used for all sorts of things.
    samtools view -bS $ALIGNED_OUTPUT | samtools sort - $SORTED_BAM
    samtools index $SORTED_BAM.bam $SORTED_BAM.bam.bai



    That's pretty much how I'm doing it for my data. It works pretty well. As for those nasty overrepresented sequences. I'm guessing you're doing quality assessment with fastqc, which is a great tool. In my case, I did RNA-seq on bacterial genomes, so my read depth is really really high, because the genome is small. Add to that some highly expressed genes and you get queues for highly represented sequences. I'm basically ignoring them in my data, but think about how overrepresented sequences apply to your data and how bad or not important they really are.

    Hope this helps.

    Comment


    • #17
      Hello everyone,


      I am working with TruSeq paired end data (150bp). I have a doubt regarding the adapter file provided in Trimmomatic for trimming adapters.

      According to the Trimmomatic provided adapter file "TruSeq3-PE-2.fa" the reverse complement of index adapter sequence is used for trimming reads from R2 file and the universal adapter is used for trimming reads from R1 file.
      >PrefixPE/1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT

      >PrefixPE/2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

      >PE1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT

      >PE1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

      >PE2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

      >PE2_rc AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

      However, it looks like that for my data the actual sequences of the index adapter is in the R1 file and the reverse complement of the universal adapter is in the R2 file.

      This information was also provided to me by Illumina support team.


      Therefore I prepared my adapter file as follows (I'm using the full sequence):
      >PrefixPE/1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG (index adapter)

      >PrefixPE/2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT ( reverse complement of universal adapter)

      >PE1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

      >PE1_rc CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (revcomp of PE1)

      >PE2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

      >PE2_rc AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT (revcomp of PE2)

      Please let me know if this adapter file I prepared is fine or is the Trimmomatic adapter file better and needs to be used always.
      I tried my custom made file as well as the Trimmomatic recommended file and found that both removed adapters when checked using FASTQC!!

      Please correct me or let me know if I'm missing something!
      Appreciate your help and guidane!
      Thanks,
      Candida
      Last edited by candida; 05-03-2017, 12:32 AM. Reason: Delete Post

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 11:49 AM
      0 responses
      15 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X