SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks doesn't recognize read type marb Bioinformatics 4 12-13-2011 04:56 AM
cufflinks error:Read Type: 0bp single-end chenyao Bioinformatics 0 07-27-2011 08:09 PM
Does Cufflinks support single-end and paired end data together ? ersenkavak Bioinformatics 1 10-22-2010 08:26 AM
Cufflinks library-type for SOLiD strand specific single data gengen Bioinformatics 1 10-19-2010 09:41 AM
pair-end sequencing produces single-end read artifact pparg Bioinformatics 9 03-29-2010 12:15 PM

Reply
 
Thread Tools
Old 12-13-2010, 05:53 PM   #1
lindymcb
Junior Member
 
Location: New York

Join Date: Sep 2010
Posts: 9
Default cufflinks incorrectly specifies read type as single end

Hi,

I am using cufflinks to analyze a samfile produced using tophat. The data are 75bp paired-end reads and the samfile includes information about location of mate pairs etc. But cufflinks appears to be confused about the nature of the data. It prints that the read type is single-end, yet still estimates a fragment length distribution (see below). Anyone know what the problem is? Should I just ignore the fact that it says the reads are single-end?

> Processed 259464 loci. [*************************] 100%
> Map Properties:
> Total Map Mass: 286777666.76
> Read Type: 75bp single-end
> Fragment Length Distribution: Gaussian (default)
> Estimated Mean: 209.55
> Estimated Std Dev: 70.54
lindymcb is offline   Reply With Quote
Old 12-16-2010, 02:29 PM   #2
polyatail
Member
 
Location: New York, NY

Join Date: Dec 2010
Posts: 25
Default

Hi Lindy,

Can you post a few lines of the input SAM file you're feeding Cufflinks? You can retrieve it from a BAM file with:

Code:
samtools view accepted_hits.bam | tail -n +25 | head -n 10
It may be that the read names in the alignment are not matching up, or are not being properly reported as paired.


Andrew
polyatail is offline   Reply With Quote
Old 12-16-2010, 06:54 PM   #3
lindymcb
Junior Member
 
Location: New York

Join Date: Sep 2010
Posts: 9
Default

Hi Andrew,
Thanks so much for your response. Here are the first 15 lines from one of the many samfiles that give me this problem. It might easier to read if you cut and paste into a text editor. It looks to me like the 5th and 7th reads are paired

Thanks!
Lindy
@HD VN:1.0 SO:sorted
@PG ID:TopHat VN:1.0.14 CL:/usr/local/genome/bin/tophat -p 8 -a 12 -m 1 -r 56 -o /mnt/analyses/LVPant/ AaegL1 LVPa2_R1_fastq_prefilter.txt,LVPa2_B_read_1_fastq_prefilter.txt,LVPa3_A_read_1_fastq_prefilter.txt,LVPa3_B_read_1_fastq_prefilter.txt,LVPa4_A_read_1_fastq_prefilter.txt,LVPa4_B_read_1_fastq_prefilter.txt,LVPa5_A_read_1_fastq_prefilter.txt,LVPa5_B_read_1_fastq_prefilter.txt LVPa2_R2_fastq_prefilter.txt,LVPa2_B_read_2_fastq_prefilter.txt,LVPa3_A_read_2_fastq_prefilter.txt,LVPa3_B_read_2_fastq_prefilter.txt,LVPa4_A_read_2_fastq_prefilter.txt,LVPa4_B_read_2_fastq_prefilter.txt,LVPa5_A_read_2_fastq_prefilter.txt,LVPa5_B_read_2_fastq_prefilter.txt
ILLUMINA-300160:LVPa4_B_read_2:1:6:78:12045:15339#0 137 CH477186.1 702 1 76M * 0 0 CCAAAAAGAGATCAAGATTTATCAAAATGTTATTGTGCAATGGTACAGCCGTTTTGAGAAAGAATTGTTTGGCATC hhhhfffhhhhhhhchhhhhdhhffhhhghhchhehhhghhhfghgchhchhhhhhcfah_fW]chhhhehahhhg NM:i:2 NH:i:4 CC:Z:CH477219.1 CP:i:2212927
HWUSI-EAS1600:LVPa2_R2:10_15_2010:1:8:21:9291:2197#0 137 CH477186.1 934 1 75M * 0 0 GGAACGTCAGCGTAGCAGCTCACTAAACAGCGGTGATCACTCCAGTGCTCAGTGTACCGGACAACATAGTGCCGG bdfchdaefffgeghhghcfhfhghhfhhhhghghhhhghghhhhhhhhhhhhhhhghhhhhhhhhghhhhhhhh NM:i:1 NH:i:3 CC:Z:CH477299.1 CP:i:2381918
ILLUMINA-300160:LVPa3_A_read_2:1:2:114:7815:7323#0 137 CH477186.1 1102 3 76M * 0 0 GCACGACGATTGGAATCGACCTCAACAAAGTGAACAATTTCGGTCACAGAGTGTGGCCAAGATTCGGTGCCCGCCC d`^`R`feedd[a]c[bb]b]ccc]_fdadaffdfd_Q^a_fdbb]fdWfa[dchafffdaafahghh`_gfhefh NM:i:2 NH:i:2 CC:Z:CH477284.1 CP:i:705773
ILLUMINA-300160:LVPa3_A_read_2:1:2:5:9651:17717#0 163 CH477186.1 2520 255 76M = 2641 0 TGCAACTGTAGTAGCTATCCCCAAGGCCAAAAAAGACGTAACGCTGCCTACAAACTACCGTCCCATCAGCCTGCTA gfhhhhhghfh_fhhhgahhhhhhhchghhgahhhhchhhhhhhhdhchhfgfgge_ce[bdfdcahgaga]aaaa NM:i:0 NH:i:1
ILLUMINA-300160:LVPa5_B_read_1:1:8:116:1810:15456#0 99 CH477186.1 2567 255 76M = 2708 0 CTACAAACTACCGTCCCATCAGCCTGCTAAGTAGCCTGAGCAAAGTGTTTGAACGAATCATCTTAAACCTGTTTCA gggggggggggggggggaggdcfffgggggggggggggggfffaeaebefgggggcgggegggdgdfJeddadefg NM:i:1 NH:i:1
ILLUMINA-300160:LVPa3_A_read_1:1:2:5:9651:17717#0 83 CH477186.1 2641 255 76M = 2520 0 CAAAGCCGGTCACTCCACCAACCACCAGTTAGCACGGATAACCAAAATCGTAAAGGACGGTTTCTCTGCAAGAAAA dhcgegdfddfdfcfce]hfehhghhdgghhhfhhhghhhhhhghhhfhhhhhhhhghhhhhhhhhhh_hhhhhhh NM:i:0 NH:i:1
ILLUMINA-300160:LVPa5_B_read_2:1:8:116:1810:15456#0 147 CH477186.1 2708 255 76M = 2567 0 GCAAGAAAATCGACTGGTATGATTATGCTTGACGTCGAAAAGGCATATAATTCCGTCTGGCAGGACGCGATCATCT dbbghggdadghadghgggefdfdfc_dhhhdhhceghhhhhhhhfgchhhcfhhhhfhhhhhhhhhhhhghhhhh NM:i:0 NH:i:1
ILLUMINA-300160:LVPa5_B_read_2:1:8:10:14657:2302#0 163 CH477186.1 2743 255 76M = 2877 0 CGAAAAGGCATATAATTCCGTCTGGCAGGACGCGATCATCTACAAACTACGTAAATGTAACTGTCCTTTCTACATT hhfhhhhhhhhhhhhhhhhhhhhhhhghhfheghhhhchhhhhhh_hghghhhdfhgfhggghdfahhgghhdcgh NM:i:0 NH:i:1
ILLUMINA-300160:LVPa5_B_read_1:1:8:10:14657:2302#0 83 CH477186.1 2877 255 76M = 2743 0 CTGTATCTGGGAAATTCGACATCCCTTGTGGCGTCCCACAGCGATCAGTGTTGAGCCCAACACTCTACAATGTTTT ggdggdgghhhgggWhfhfhhggcgehhghghhffcaggdhhhchhfhhhghhghgghhhgghhhhhhhghhhhhh NM:i:2 NH:i:1
HWUSI-EAS1600:LVPa2_R2:10_15_2010:1:8:3:14836:3337#0 163 CH477186.1 4492 0 75M = 4603 0 ATTCGCCCCATTGCGGGGTGAATAGAAACATACAACATTTTCAATAATATTTGTACGATATAATTTTTATTTTTT hhhhhhhhhhhhhhhhhcbh`ffdehhhhhfhhhhhhhhhhhehhhhhhhhhhgghhghhhhghhhhhghhghhe NM:i:2 NH:i:5 CC:Z:CH477198.1 CP:i:431872
HWUSI-EAS1600:LVPa2_R2:10_15_2010:1:8:47:7635:10354#0 163 CH477186.1 4495 0 75M = 4606 0 TGCCCCATTGCGGGGTGAATAGAAACATACAACATTTTCATAAATATTTGTACGATATAATTTTTATTTTTTAAC hhhhhhhhhhhhhhh`f]dffffffhhhhfhhhhghhhhhhhhdhhhhhhghhhhhghhghhhhhghhhhhhhhh NM:i:1 NH:i:5 CC:Z:CH477198.1 CP:i:431869
ILLUMINA-300160:LVPa2_B_read_2:0:1:53:19937:17604#0 137 CH477186.1 4588 0 76M * 0 0 AAAGGAACACGCGACCCATATTTAAGACTAATAAAATTGGATTTTATCCACACGGTTTAAGTTGTACCCATTCAAA hhfhhZ]GSWRXWXZhhagfhhhhcggfghhhhaeegd]e_afffhgghhahhchhhfedcacfa[VaZZ[[[[Q[ NM:i:2 NH:i:19 CC:Z:CH477198.1 CP:i:431775
HWUSI-EAS1600:LVPa2_R2:10_15_2010:0:8:91:10481:4293#0 137 CH477186.1 4596 0 75M * 0 0 ACGCGACCCATATTTAAGACTAATAAAATTGGATTTTATCCACACGGTTTAAGTTGTACACATTCAAACATGTCG f_[]feae`d^Qb]]c^[caZZZZZZWMWY\aYY\Z_ZUOed]]`VUQTV``WX]d`^Wdb`d^_WUU[QbdZd` NM:i:0 NH:i:16 CC:Z:CH477198.1 CP:i:431768
lindymcb is offline   Reply With Quote
Old 12-16-2010, 09:57 PM   #4
polyatail
Member
 
Location: New York, NY

Join Date: Dec 2010
Posts: 25
Default

From http://samtools.sourceforge.net/SAM1.pdf, the name of both mates in a pair must be the same. So while the two reads you mentioned are certainly paired, they aren't matched up in Cufflinks because the names differ by the number marked with a carrot below. If you preprocess the SAM files to eliminate the mate field given by that number (maybe turning it into an X or something?), the library should be recognized as paired-end.

Code:
ILLUMINA-300160:LVPa5_B_read_1:1:8:116:1810:15456#0 99 CH477186.1 2567 255 76M =
                             ^                                                 
ILLUMINA-300160:LVPa5_B_read_2:1:8:116:1810:15456#0 147 CH477186.1 2708 255 76M =
                             ^
Lemme know how it goes!


Andrew

Last edited by polyatail; 12-16-2010 at 10:11 PM. Reason: misinfo re: sam specification
polyatail is offline   Reply With Quote
Old 04-02-2011, 06:33 PM   #5
lindymcb
Junior Member
 
Location: New York

Join Date: Sep 2010
Posts: 9
Default

I have only just now come back to this issue, and your suggestion worked - thanks! I changed both read_1 and read_2 in the read names in my samfile to read_X, and cufflinks now reports my library as paired end.

I have a second issue now however - less significant than the first. When I run cufflinks with a reference annotation it correctly reports the mean fragment length as ~200 bp. But when I run it without the reference annotation it reports the fragment length as 0 bp (see two outputs below). I assume this is because my samfile was generated by a tophat run which used known junctions from the same reference annotation? It seems like the most likley answer, though I see no reason why cufflinks would not be able to parse the mapping coordinates of each read in a pair and report the correct fragment length distribution regardless of whether a reference annotation is used...

> cufflinks -G aaegypti.BASEFEATURES_Liverpool-AaegL1.2.gff3 LVPa2_abbr.RX.bam
[21:13:01] Inspecting reads and determining fragment length distribution.
> Processed 16360 loci. [*************************] 100%
> Map Properties:
> Total Map Mass: 123784.26
> Read Type: 75bp paired-end
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 197.41
> Estimated Std Dev: 13.71
[21:13:04] Calculating estimated abundances.
> Processed 16360 loci. [*************************] 100%


> cufflinks LVPa2_abbr.RX.bam
[21:05:59] Inspecting reads and determining fragment length distribution.
> Processed 1389 loci. [*************************] 100%
> Map Properties:
> Total Map Mass: 150829.93
> Read Type: 75bp paired-end
> Fragment Length Distribution: Gaussian (default)
> Estimated Mean: 0.02
> Estimated Std Dev: 2.17
[21:06:01] Assembling transcripts and estimating abundances.
> Processed 1389 loci. [*************************] 100%
__________________
Lindy McBride - Rockefeller University
lindymcb is offline   Reply With Quote
Old 04-03-2011, 09:14 AM   #6
adarob
Member
 
Location: Berkeley, CA

Join Date: Jul 2010
Posts: 71
Default

It looks like when you didn't provide an annotation, there was not enough available information to determine the fragment length distribution (see http://cufflinks.cbcb.umd.edu/howitworks.html#hdis). I'm not sure, however, why it defaulted to such low values. Try specifying the mean with -m and the std-dev with -s.

-Adam
adarob is offline   Reply With Quote
Old 04-03-2011, 09:33 AM   #7
lindymcb
Junior Member
 
Location: New York

Join Date: Sep 2010
Posts: 9
Default

Hi Adam - Thanks. I will try that. I guess I can run it with the reference to get the values and then rerun it without the reference specifying the values estimated in the first run. I had not seen the "how cufflinks works" page. That is helpful.
Lindy
__________________
Lindy McBride - Rockefeller University
lindymcb is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:46 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO