SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
trim adapter from Illumina Genome Analyzer IIe miRNA reads NicoBxl Bioinformatics 5 01-02-2014 05:31 AM
Checking the Quality of RRBS libraries before actually running them twang11 Sample Prep / Library Generation 0 02-22-2012 04:18 PM
trim 3' adapter sequence for mRNA-Seq? slny Bioinformatics 14 06-14-2011 06:15 AM
csfasta quality hard trimming do i need to hard trim the qual file? KevinLam Bioinformatics 2 05-13-2010 02:27 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 05-25-2016, 02:17 AM   #121
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 586
Default

In that case you might add in the other scenarios and see whether it makes a big difference.

When you look at non-CG methylation levels in general (such as from the summary report), do you see very high levels that are indicative of conversion problems?

Frankly we got mixed results from using this method of looking at the filled-in position. Sometimes the values were very low (e.g. around 0.2% for the Booth et al data), but it sometimes came back with 25% methylation at that position which was clearly some sort of artefact since the overall level of non-CG methylation was 1% or so. So yea I would be a little careful with the values you get from looking at this position. If you take more global values such as (non-CG?) methylation levels over CpG islands as a measure, or possibly methylation of chrMT you might get better estimates for non-conversion. Cheers, Felix
fkrueger is offline   Reply With Quote
Old 06-24-2016, 10:05 AM   #122
xuguorong
Member
 
Location: US

Join Date: Feb 2010
Posts: 27
Default

Recently I am using your tool Trim Galore to trim the adapter string from our miRNA sequencing data. It is amazing tool and very fast! Thanks a lot for your great job! When I looked into the resulting file, I found two issues and I could not figure out.

Question 1:
1) The raw sequencing is:
NCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGTGGCCATC

2) And the adapter string:
TGGAATTCTCGGGTGCCAAGG

3) After I run the command:
trim_galore --path_to_cutadapt /path/to/cutadapt --clip_R1 1 --length 5 -q 10 -a TGGAATTCTCGGGTGCCAAGG $inputFile".fastq" $inputFile".trim.fastq"

4) Then, I got the resulting string:
CCCGTGG

I think the trimming algorithm only kept the left short sequence and ignored the right long sequence. I am not sure if Trim Galore can keep the longer sequence by changing the parameters.

Question 2:
1) The raw sequencings are:
read1: NCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGTGGCCATC
read2: TCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGTGGCCATC

If I use the option "--clip_R1 1 ", the first nucleotide "N" in the read1 will be trimmed. But the first nucleotide "T" in the read2 will be also trimmed. Do you have option which can just trim "N" from reads?

Your response would be really appreciated!
xuguorong is offline   Reply With Quote
Old 06-24-2016, 10:35 AM   #123
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 586
Default

Hi there, regarding your question 1: Trim Galore runs Cutadapt with the option -a, which does the following:
Code:
-a ADAPTER, --adapter=ADAPTER
                        Sequence of an adapter that was ligated to the 3' end.
                        The adapter itself and anything that follows is
                        trimmed. If the adapter sequence ends with the '$'
                        character, the adapter is anchored to the end of the
                        read and only found if it is a suffix of the read.
This means that indeed once the adapter is found anywhere within the read anything from that point and further 3' is removed. This is normally what you want to be doing. I guess if you wanted to keep the sequences further 3' to investigate it you would need to write something custom.
As a side not too should be able to leave out the -a SEQUENCE here completely because Trim Galore should auto-detect your smallRNA adapter sequence. (but since they are the same it won't hurt I guess).

Regarding your second question the current development version has an option to remove reads with too many Ns but I am afraid it doesn't currently have the option to trim Ns from the ends of reads but if this would really help you it could add it in. Often a single N is not going to make a difference in terms of mapping, and it might in this case also change the length of the small RNA-species. So yea if you think it would be absolutely required I could add it. Best, Felix
fkrueger is offline   Reply With Quote
Old 06-24-2016, 10:57 AM   #124
xuguorong
Member
 
Location: US

Join Date: Feb 2010
Posts: 27
Default

Hi Felix,

Thank you so much for your response!

For the question 1:
After trimming, the length of the left sequence is only 7nt but the length of the right sequence is 21nt. Obviously I want to keep the 21nt sequence and ignore the 7nt sequence because it is too short. I am not sure if I can directly run Cutadapt using -g option to keep the 21nt sequence instead of 7nt sequence.

For the question 2:
Sure, a single N cannot make a difference for mapping. But for miRNA seq alignment, it is better to remove the unknown nucleotides before alignment because of the sensitivity.
xuguorong is offline   Reply With Quote
Old 06-24-2016, 12:07 PM   #125
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 586
Default

To 1) The way the sequencing normally works is that you sequence the first base after the 5' adapter, then you sequence the fragment of interest and then you sequence into the adapter on the 3' end. You don't just get the keep the sequences that appears longer and juicier, but you need to keep the sequence of the fragment you wanted to sequence, here the 7bp. Maybe this sequence is just a not very representative example of your entire run because 7bp is also not a typical length of miRNA. I would suggest you run Trim Galore on the file once and then look at the sequence length distribution to see if the majority of the sequences is between 20 and 24bp long.

To 2) I can add it to my list, not quite sure if when I can address it though (we've got a Brexit to stomach right now...)

Cheers, Felix
fkrueger is offline   Reply With Quote
Old 06-27-2016, 04:11 AM   #126
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 586
Default

Hi Guorong,

I have added the option --trim-n now that should do just what you need. It also adds a few other features:

- Added option '--max_n COUNT' to remove all reads (or read pairs) exceeding this limit of tolerated Ns. In a paired-end setting it is sufficient if one read exceeds this limit. Reads (or read pairs) are removed altogether and are not further trimmed or written to the unpaired output.

- Enabled option '--trim-n' to remove Ns from both end of the reads. Does currently not work for RRBS-mode.

- Added new option '--max_length <INT>' which reads that are longer than <INT> bp after trimming. This is only advised for smallRNA sequencing to remove non-small RNA sequences.

- Replaced 'zcat' with 'gunzip -c' so that older versions of Mac OSX do not append a .Z to the end of the file and subsequently fail because the file is not present. Dah...

- Fixed a typo in adapter auto-detection warning message.

I have moved Trim Galore to Github where you can clone the latest development version: https://github.com/FelixKrueger/TrimGalore.
fkrueger is offline   Reply With Quote
Old 06-27-2016, 09:04 AM   #127
xuguorong
Member
 
Location: US

Join Date: Feb 2010
Posts: 27
Default

Hi Felix,

Thank you so much for your new release!
The new features definitely can remove all Ns from the reads! Awesome!

For the question 1, I want to try run cutadapt three times to keep the longer reads.
1: cutadapt -a adapter -q 10 -m 17 --trim-n -o $inputFile".trim.3.fastq" $inputFile".fastq"
2: cutadapt -g adapter -q 10 -m 17 --trim-n -o $inputFile".trim.5.fastq" $inputFile".fastq"
3: cat $inputFile".trim.3.fastq" $inputFile".trim.5.fastq" > $inputFile".trim.fastq"
4: cutadapt -b adapter -q 10 -m 17 --trim-n -o $inputFile".trim.final.fastq" $inputFile".trim.fastq"
5: then keep only one read and delete other one read with the same fastq ID.

The reason why I need to run 3 times is the first run cutadapt will trim the 3' adapter string, then the second run cutadapt will trim the 5' adapter string. After these two runs, some reads in $inputFile".trim.3.fastq" may still have 5' adapter string and some reads in $inputFile".trim.5.fastq" may have 3' adapter string. After I merged these two resulting files, then I run the third run cutadapt to cut either 3' and 5' adapter strings. Since I merged two fastq files and it will have some identical reads, I then scan the $inputFile".trim.final.fastq" to keep only one read and delete the other one with the same fastq ID.

Do you have any suggestions about this solution?

Thanks!
Guorong
xuguorong is offline   Reply With Quote
Old 06-28-2016, 01:37 AM   #128
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 586
Default

Hi Guorong,

Great that it is working. My thoughts to your other problem are, as I have outlined above already, that you should absolutely not be doing what you are suggesting here. The sequence you are after is the sequence from the start of the read until you hit the small RNA adapter which starts with TGGAATTCT... Everything after that is either adapter that binds to the flowcell or something else you don't want to keep. In any case, the sequence on the 3' end should not align to a genome anyway.

Code:
  -g ADAPTER, --front=ADAPTER
                        Sequence of an adapter that was ligated to the 5' end.
Illumina sequencing does not add any adapter to the 5' end that ends up being sequenced, hence trimming using the option -a is what you want to do. In my opinion if you just run
Code:
 trim_galore --trim-n file
you would get exactly what you are looking for.
fkrueger is offline   Reply With Quote
Old 08-04-2016, 02:56 PM   #129
Diadema
Junior Member
 
Location: Boston

Join Date: May 2013
Posts: 2
Default Run Trim Galore! before or after merging technical replicates

I'm quite new to NGS. We just did 4 lanes (2 lanes twice) of Illumina HiSeq Rapid Run 2x51 RNA sequencing of 24 samples. The bcl to fastq conversion was run for us, so every sample has 4 R1 forward fastq files and 4 R2 reverse files. I merged the technical replicates (merged the 4 R1 files, then merged the 4 R2 files) doing a basic command line cat and append. I also ran FastQC on the individual technical replicates, as well as on the merged files. I now plan to upload my files to the Galaxy pipeline for the remainder of the QA/QC and analysis, and was going to start with Trim Galore. But now I'm wondering if Trim Galore needs to work on the original unmerged technical replicates rather than the merged files. E.g., the quality at the beginning of all our reads was spiky, possibly indicating sequencing of the same sequence, and may need to be trimmed; but can trimming the first n bases of each of the 4 files still be done after the files have been merged? So do I upload the unmerged fastq files and run Trim Galore, and then merge them, or upload the merged files and run Trim Galore? Thank you.
Diadema is offline   Reply With Quote
Old 08-05-2016, 12:55 AM   #130
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 586
Default

As long as you merged the R1 and R2 files in the same order (e.g. R1_rep1 R1_rep2, R2_rep1 R2_rep2) it shouldn't matter if you run Trim Galore on the merged files directly or run it first and merge then. All the best!
fkrueger is offline   Reply With Quote
Old 08-05-2016, 01:44 AM   #131
Diadema
Junior Member
 
Location: Boston

Join Date: May 2013
Posts: 2
Default

That is indeed how I merged them. Thank you!
Diadema is offline   Reply With Quote
Old 04-17-2017, 01:17 PM   #132
pig_raffles
Member
 
Location: Sheffield, UK

Join Date: Feb 2012
Posts: 12
Default Choosing minimum RRBS read length in Trim Galore!

I am new to the bioinformatic analysis of RRBS data. I am using Trim Galore! to QC and adapter trim my RRBS read data. I have generated single-end 75bp reads on an Illumina NextSeq.

The default minimum read length parameter in Trim Galore! is 20 bp but I was wondering if there were any practical considerations for alignment/mapping of reads to take into account when choosing a minimum read length and if anyone had any tips on optimizing this parameter?
pig_raffles is offline   Reply With Quote
Old 04-18-2017, 01:04 AM   #133
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 586
Default

Quote:
Originally Posted by pig_raffles View Post
I am new to the bioinformatic analysis of RRBS data. I am using Trim Galore! to QC and adapter trim my RRBS read data. I have generated single-end 75bp reads on an Illumina NextSeq.

The default minimum read length parameter in Trim Galore! is 20 bp but I was wondering if there were any practical considerations for alignment/mapping of reads to take into account when choosing a minimum read length and if anyone had any tips on optimizing this parameter?
Very short reads generally don't tend to align uniquely in bisulfite-seq mapping because the three letter alignment allows more ambiguous alignments. In that sense the shortness of reads sorts itself out in a way. Some programs however don't like it (or didn't like it in the past) when the sequence entry is extremely short or even empty, which is why we are introducing a short (but arbitrary) cutoff. I hope this helps.
fkrueger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO