Seqanswers Leaderboard Ad

**fkrueger** · 05-25-2016, 02:17 AM

In that case you might add in the other scenarios and see whether it makes a big difference.

When you look at non-CG methylation levels in general (such as from the summary report), do you see very high levels that are indicative of conversion problems?

Frankly we got mixed results from using this method of looking at the filled-in position. Sometimes the values were very low (e.g. around 0.2% for the Booth et al data), but it sometimes came back with 25% methylation at that position which was clearly some sort of artefact since the overall level of non-CG methylation was 1% or so. So yea I would be a little careful with the values you get from looking at this position. If you take more global values such as (non-CG?) methylation levels over CpG islands as a measure, or possibly methylation of chrMT you might get better estimates for non-conversion. Cheers, Felix

**xuguorong** · 06-24-2016, 10:05 AM

Recently I am using your tool Trim Galore to trim the adapter string from our miRNA sequencing data. It is amazing tool and very fast! Thanks a lot for your great job! When I looked into the resulting file, I found two issues and I could not figure out.

Question 1:
1) The raw sequencing is:
NCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGTGGCCATC

2) And the adapter string:
TGGAATTCTCGGGTGCCAAGG

3) After I run the command:
trim_galore --path_to_cutadapt /path/to/cutadapt --clip_R1 1 --length 5 -q 10 -a TGGAATTCTCGGGTGCCAAGG $inputFile".fastq" $inputFile".trim.fastq"

4) Then, I got the resulting string:
CCCGTGG

I think the trimming algorithm only kept the left short sequence and ignored the right long sequence. I am not sure if Trim Galore can keep the longer sequence by changing the parameters.

Question 2:
1) The raw sequencings are:
read1: NCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGTGGCCATC
read2: TCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGTGGCCATC

If I use the option "--clip_R1 1 ", the first nucleotide "N" in the read1 will be trimmed. But the first nucleotide "T" in the read2 will be also trimmed. Do you have option which can just trim "N" from reads?

Your response would be really appreciated!

**fkrueger** · 06-24-2016, 10:35 AM

Hi there, regarding your question 1: Trim Galore runs Cutadapt with the option -a, which does the following:

Code:

-a ADAPTER, --adapter=ADAPTER
                        Sequence of an adapter that was ligated to the 3' end.
                        The adapter itself and anything that follows is
                        trimmed. If the adapter sequence ends with the '$'
                        character, the adapter is anchored to the end of the
                        read and only found if it is a suffix of the read.

This means that indeed once the adapter is found anywhere within the read anything from that point and further 3' is removed. This is normally what you want to be doing. I guess if you wanted to keep the sequences further 3' to investigate it you would need to write something custom.
As a side not too should be able to leave out the -a SEQUENCE here completely because Trim Galore should auto-detect your smallRNA adapter sequence. (but since they are the same it won't hurt I guess).

Regarding your second question the current development version has an option to remove reads with too many Ns but I am afraid it doesn't currently have the option to trim Ns from the ends of reads but if this would really help you it could add it in. Often a single N is not going to make a difference in terms of mapping, and it might in this case also change the length of the small RNA-species. So yea if you think it would be absolutely required I could add it. Best, Felix

**xuguorong** · 06-24-2016, 10:57 AM

Hi Felix,

Thank you so much for your response!

For the question 1:
After trimming, the length of the left sequence is only 7nt but the length of the right sequence is 21nt. Obviously I want to keep the 21nt sequence and ignore the 7nt sequence because it is too short. I am not sure if I can directly run Cutadapt using -g option to keep the 21nt sequence instead of 7nt sequence.

For the question 2:
Sure, a single N cannot make a difference for mapping. But for miRNA seq alignment, it is better to remove the unknown nucleotides before alignment because of the sensitivity.

**fkrueger** · 06-24-2016, 12:07 PM

To 1) The way the sequencing normally works is that you sequence the first base after the 5' adapter, then you sequence the fragment of interest and then you sequence into the adapter on the 3' end. You don't just get the keep the sequences that appears longer and juicier, but you need to keep the sequence of the fragment you wanted to sequence, here the 7bp. Maybe this sequence is just a not very representative example of your entire run because 7bp is also not a typical length of miRNA. I would suggest you run Trim Galore on the file once and then look at the sequence length distribution to see if the majority of the sequences is between 20 and 24bp long.

To 2) I can add it to my list, not quite sure if when I can address it though (we've got a Brexit to stomach right now...)

Cheers, Felix

**fkrueger** · 06-27-2016, 04:11 AM

Hi Guorong,

I have added the option --trim-n now that should do just what you need. It also adds a few other features:

- Added option '--max_n COUNT' to remove all reads (or read pairs) exceeding this limit of tolerated Ns. In a paired-end setting it is sufficient if one read exceeds this limit. Reads (or read pairs) are removed altogether and are not further trimmed or written to the unpaired output.

- Enabled option '--trim-n' to remove Ns from both end of the reads. Does currently not work for RRBS-mode.

- Added new option '--max_length <INT>' which reads that are longer than <INT> bp after trimming. This is only advised for smallRNA sequencing to remove non-small RNA sequences.

- Replaced 'zcat' with 'gunzip -c' so that older versions of Mac OSX do not append a .Z to the end of the file and subsequently fail because the file is not present. Dah...

- Fixed a typo in adapter auto-detection warning message.

I have moved Trim Galore to Github where you can clone the latest development version: https://github.com/FelixKrueger/TrimGalore.

**xuguorong** · 06-27-2016, 09:04 AM

Hi Felix,

Thank you so much for your new release!
The new features definitely can remove all Ns from the reads! Awesome!

For the question 1, I want to try run cutadapt three times to keep the longer reads.
1: cutadapt -a adapter -q 10 -m 17 --trim-n -o $inputFile".trim.3.fastq" $inputFile".fastq"
2: cutadapt -g adapter -q 10 -m 17 --trim-n -o $inputFile".trim.5.fastq" $inputFile".fastq"
3: cat $inputFile".trim.3.fastq" $inputFile".trim.5.fastq" > $inputFile".trim.fastq"
4: cutadapt -b adapter -q 10 -m 17 --trim-n -o $inputFile".trim.final.fastq" $inputFile".trim.fastq"
5: then keep only one read and delete other one read with the same fastq ID.

The reason why I need to run 3 times is the first run cutadapt will trim the 3' adapter string, then the second run cutadapt will trim the 5' adapter string. After these two runs, some reads in $inputFile".trim.3.fastq" may still have 5' adapter string and some reads in $inputFile".trim.5.fastq" may have 3' adapter string. After I merged these two resulting files, then I run the third run cutadapt to cut either 3' and 5' adapter strings. Since I merged two fastq files and it will have some identical reads, I then scan the $inputFile".trim.final.fastq" to keep only one read and delete the other one with the same fastq ID.

Do you have any suggestions about this solution?

Thanks!
Guorong

**fkrueger** · 06-28-2016, 01:37 AM

Hi Guorong,

Great that it is working. My thoughts to your other problem are, as I have outlined above already, that you should absolutely not be doing what you are suggesting here. The sequence you are after is the sequence from the start of the read until you hit the small RNA adapter which starts with TGGAATTCT... Everything after that is either adapter that binds to the flowcell or something else you don't want to keep. In any case, the sequence on the 3' end should not align to a genome anyway.

Code:

  -g ADAPTER, --front=ADAPTER
                        Sequence of an adapter that was ligated to the 5' end.

Illumina sequencing does not add any adapter to the 5' end that ends up being sequenced, hence trimming using the option -a is what you want to do. In my opinion if you just run

Code:

 trim_galore --trim-n file

you would get exactly what you are looking for.

**Diadema** · 08-04-2016, 02:56 PM

Run Trim Galore! before or after merging technical replicates

I'm quite new to NGS. We just did 4 lanes (2 lanes twice) of Illumina HiSeq Rapid Run 2x51 RNA sequencing of 24 samples. The bcl to fastq conversion was run for us, so every sample has 4 R1 forward fastq files and 4 R2 reverse files. I merged the technical replicates (merged the 4 R1 files, then merged the 4 R2 files) doing a basic command line cat and append. I also ran FastQC on the individual technical replicates, as well as on the merged files. I now plan to upload my files to the Galaxy pipeline for the remainder of the QA/QC and analysis, and was going to start with Trim Galore. But now I'm wondering if Trim Galore needs to work on the original unmerged technical replicates rather than the merged files. E.g., the quality at the beginning of all our reads was spiky, possibly indicating sequencing of the same sequence, and may need to be trimmed; but can trimming the first n bases of each of the 4 files still be done after the files have been merged? So do I upload the unmerged fastq files and run Trim Galore, and then merge them, or upload the merged files and run Trim Galore? Thank you.

**fkrueger** · 08-05-2016, 12:55 AM

As long as you merged the R1 and R2 files in the same order (e.g. R1_rep1 R1_rep2, R2_rep1 R2_rep2) it shouldn't matter if you run Trim Galore on the merged files directly or run it first and merge then. All the best!

**Diadema** · 08-05-2016, 01:44 AM

That is indeed how I merged them. Thank you!

**pig_raffles** · 04-17-2017, 01:17 PM

Choosing minimum RRBS read length in Trim Galore!

I am new to the bioinformatic analysis of RRBS data. I am using Trim Galore! to QC and adapter trim my RRBS read data. I have generated single-end 75bp reads on an Illumina NextSeq.

The default minimum read length parameter in Trim Galore! is 20 bp but I was wondering if there were any practical considerations for alignment/mapping of reads to take into account when choosing a minimum read length and if anyone had any tips on optimizing this parameter?

**fkrueger** · 04-18-2017, 01:04 AM

Originally posted by pig_raffles View Post

I am new to the bioinformatic analysis of RRBS data. I am using Trim Galore! to QC and adapter trim my RRBS read data. I have generated single-end 75bp reads on an Illumina NextSeq.

The default minimum read length parameter in Trim Galore! is 20 bp but I was wondering if there were any practical considerations for alignment/mapping of reads to take into account when choosing a minimum read length and if anyone had any tips on optimizing this parameter?

Very short reads generally don't tend to align uniquely in bisulfite-seq mapping because the three letter alignment allows more ambiguous alignments. In that sense the shortness of reads sorts itself out in a way. Some programs however don't like it (or didn't like it in the past) when the sequence entry is extremely short or even empty, which is why we are introducing a short (but arbitrary) cutoff. I hope this helps.

**badhik** · 01-17-2019, 12:11 AM

Hi all,

I am trimming Illumina 1.9 encoded data with Trim-galore, and after Fastqc, the box-plot whiskers under the Per base sequence quality goes all the way to 13 or 14 Phred score.

Here is what I used:
trim_galore --rrbs --paired --length 20 -q 28 --illumina

Why am I getting such a result?

Thanks

**fkrueger** · 01-17-2019, 01:47 AM

The quality trimming is performed by a sliding window approach across the read like the one that is used by BWA. Copied below is the text from the Cutadapt --help:

-q 3'CUTOFF Trim low-quality bases from 3' ends of reads before adapter removal. …The algorithm is the same as the one used by BWA (see documentation).

In some cases this may mean that if the quality briefly drops below the quality threshold but then comes back up again, the trimming algorithm decides that it’s not too bad after all.

I hope this clears things up?

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 43 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 29 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News