Seqanswers Leaderboard Ad

**610617109** · 11-07-2015, 11:19 PM

I'm sorry, I found I should trim 6 nucleotides at 5 prime of the sequence.

**610617109** · 11-08-2015, 12:24 AM

Update:
After adapt the parameter on the datasets webpage, still I could get about 10% reads mapped to the genome. Is it normal for CLIP data?

**GenoMax** · 11-08-2015, 05:38 AM

If this is a published data set have you tried to follow the method authors describe in their publication?

**610617109** · 11-08-2015, 05:43 AM

Originally posted by GenoMax View Post

If this is a published data set have you tried to follow the method authors describe in their publication?

Yes, I use the parameter they said. They just discard 6 nucletides length barcode at the 5 prime.

**GenoMax** · 11-08-2015, 05:53 AM

This is a perpetual bioinformatics data reproducibility issue (assuming the directions/settings are clear and you are exactly following them).

You are probably using the latest tophat/bowtie etc, which may not match what the authors used at the time of publication. You could go down the path of exactly matching the versions but not sure if that would be worth the trouble.

Looks like you are going to have to re-do the analysis again.

**610617109** · 11-08-2015, 06:09 AM

Originally posted by GenoMax View Post

This is a perpetual bioinformatics data reproducibility issue (assuming the directions/settings are clear and you are exactly following them).

You are probably using the latest tophat/bowtie etc, which may not match what the authors used at the time of publication. You could go down the path of exactly matching the versions but not sure if that would be worth the trouble.

Looks like you are going to have to re-do the analysis again.

Ok, I'll try. Thanks.

**blancha** · 11-08-2015, 06:29 AM

You could try running fastqc, to check for the presence of any remaining adapter sequences or very low quality bases that should be trimmed before aligning.

**610617109** · 11-08-2015, 11:28 PM

Update:
After trim the fist 6 nucleotides, I try to use tophat/novoalign which is able to map junction reads. But their result is quite different. For one replicate, Tophat finds only 2 million mapped reads while novoalign will report about 15 million. So which should I believe? I use default parameter for both of them.

**GenoMax** · 11-09-2015, 05:29 AM

You should be using parameters described in the original paper otherwise there is no chance of replicating the result.

Since you are going to do an independent analysis with your samples you should set a pipeline up that works for you. Remember to adequately describe (version numbers, settings) when you publish.

As an outside chance it is always possible that the original publication has an error in the analysis. You could correspond with the authors (making it clear that you are only trying to adapt their pipeline for your use) and see if they can provide some additional clarification on what is going on.

**610617109** · 11-09-2015, 05:31 AM

Originally posted by GenoMax View Post

You should be using parameters described in the original paper otherwise there is no chance of replicating the result.

Since you are going to do an independent analysis with your samples you should set a pipeline up that works for you. Remember to adequately describe (version numbers, settings) when you publish.

As an outside chance it is always possible that the original publication has an error in the analysis. You could correspond with the authors (making it clear that you are only trying to adapt their pipeline for your use) and see if they can provide some additional clarification on what is going on.

Thanks for your suggestions.
I'll re-read the paper again and do exactly they do.

**GenoMax** · 11-09-2015, 05:35 AM

Sounds like you have spent enough time working on this data so no harm in checking with the authors. Most will be more than happy to help as long as you ask nicely.

**610617109** · 11-09-2015, 05:38 AM

Originally posted by GenoMax View Post

Sounds like you have spent enough time working on this data so no harm in checking with the authors. Most will be more than happy to help as long as you ask nicely.

Yes, I thought about it...but I'm afraid the problem is too naiive.
I'm e-mail to the author if I fail to map most of reads again.
Thank you. You're very kind.

**SylvainL** · 11-09-2015, 08:09 AM

Hi,

are you sure you have to discard only the first 6 nucleotides? Usually for CLIP, people put more nucleotides, meaning 4 N (which allow the colony recognition if it was sequenced with Illumina tech), and then the barcode...

Quite easy to check: just take the first 10 nucleotides of all the reads and count the different sequences you get...

edit: I just checked, it was sequenced with Illumina tech...

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Problem with alignment: I can only align 10% of reads(CLIP data, tophat/bowtie)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News