SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: CistromeMap: A knowledgebase and web server for ChIP-Seq and DNase-Seq stud Newsbot! Literature Watch 0 04-13-2012 03:30 AM
Galaxy: filtering for unique/reliable alignments onson001 Bioinformatics 1 10-20-2011 05:47 PM
Help with Bowtie, only unique alignments khb General 1 12-16-2010 01:35 AM
More Unique ELAND Alignments Than Reads? seq7 Bioinformatics 2 10-06-2010 08:16 AM
Regarding Unique reads, Unique alignments sridharacharya RNA Sequencing 2 09-20-2010 06:39 AM

Reply
 
Thread Tools
Old 05-30-2012, 03:09 AM   #1
dedee
Junior Member
 
Location: London

Join Date: May 2012
Posts: 3
Default DNAse-seq/Bowtie > Many non-unique alignments???

Hi all,

I have analysed ChIPseq data before, but have no experience DNAse-seq, so any input is greatly appreciated

I have DNA-se seq data from a Solexa GA (from a collaborator). For reasons that are beyond me I got the raw data in bam format. But well.
Trimmed the 50bp reads to 20bp, fed them into bowtie (-m1) and got sth like 95% of reads with more than one alignment.
Is this normal?

Thanks!
dedee
dedee is offline   Reply With Quote
Old 05-30-2012, 07:45 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,170
Default

dedee,

What organism are you working in? What other options did you use for bowtie? Why did you trim the reads to just 20bp?
kmcarr is offline   Reply With Quote
Old 05-30-2012, 07:54 AM   #3
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Sounds like the read trimming is the problem. I seem to remember with 36-40bp Illumina reads I get many non-specific hits in a human transcriptome alignment (>80% I think!).

Perhaps an alternative, not so harsh approach is to align (preferably long sequences, perhaps using bwa) then filter by Mapping quality afterwards.
colindaven is offline   Reply With Quote
Old 05-30-2012, 07:58 AM   #4
dedee
Junior Member
 
Location: London

Join Date: May 2012
Posts: 3
Default

Sorry, thanks for answering anyway.

20bp: the quality scores for everything above 20bp look sh**, so even my collaborator stated that everything more than 20bp is not informative - and suggested to trim down the reads. Which I am trying.

Organism: mouse, mm9
I've tried aligning with -v0 (no mismatches allowed) and this gives some more sensible results:

bowtie -m1 -v0 --sam mm9 filename.sam
[samopen] SAM header is present: 22 sequences.
# reads processed: 55341291
# reads with at least one reported alignment: 42077875 (76.03%)
# reads that failed to align: 3731393 (6.74%)
# reads with alignments suppressed due to -m: 9532023 (17.22%)

Sorry for the hassle - I was asking for help as figuring that out myself can (worst case scenario) take forever. I'm posting the result/solution in case anybody else (with little experience) is running into the same issue.
Thanks!
D
dedee is offline   Reply With Quote
Old 05-30-2012, 08:02 AM   #5
dedee
Junior Member
 
Location: London

Join Date: May 2012
Posts: 3
Default

Quote:
Originally Posted by colindaven View Post
Sounds like the read trimming is the problem. I seem to remember with 36-40bp Illumina reads I get many non-specific hits in a human transcriptome alignment (>80% I think!).

Perhaps an alternative, not so harsh approach is to align (preferably long sequences, perhaps using bwa) then filter by Mapping quality afterwards.
Thanks!
I'm trying this approach as well, using the entire 50bp reads. It just takes a while...
In the meantime, allowing 0 mismatches with bowtie and the 20bp trimmed reads gave me sth that looks about right to me. I'll report further.
dedee is offline   Reply With Quote
Old 05-30-2012, 08:10 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,170
Default

Quote:
Originally Posted by dedee View Post
Sorry, thanks for answering anyway.
...
Sorry for the hassle - I was asking for help as figuring that out myself can (worst case scenario) take forever. I'm posting the result/solution in case anybody else (with little experience) is running into the same issue.
Thanks!
D
No hassle, just wanted to get a better picture of the experiment to advise.

But I agree with colindaven and your subsequent results support that. Aligning 20mers with errors permitted to a complex eukaryote, with repetitive DNA doesn't provide enough uniqueness.

And as an unsolicited piece of advice, if your data is s*^t after 20 bp is it worth using at all? Generating sequence these days is cheap. Time spent analyzing it is expensive.
kmcarr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO