![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Cufflinks doesn't recognize read type | marb | Bioinformatics | 4 | 12-13-2011 04:56 AM |
cufflinks error:Read Type: 0bp single-end | chenyao | Bioinformatics | 0 | 07-27-2011 08:09 PM |
Does Cufflinks support single-end and paired end data together ? | ersenkavak | Bioinformatics | 1 | 10-22-2010 08:26 AM |
Cufflinks library-type for SOLiD strand specific single data | gengen | Bioinformatics | 1 | 10-19-2010 09:41 AM |
pair-end sequencing produces single-end read artifact | pparg | Bioinformatics | 9 | 03-29-2010 12:15 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: New York Join Date: Sep 2010
Posts: 9
|
![]()
Hi,
I am using cufflinks to analyze a samfile produced using tophat. The data are 75bp paired-end reads and the samfile includes information about location of mate pairs etc. But cufflinks appears to be confused about the nature of the data. It prints that the read type is single-end, yet still estimates a fragment length distribution (see below). Anyone know what the problem is? Should I just ignore the fact that it says the reads are single-end? > Processed 259464 loci. [*************************] 100% > Map Properties: > Total Map Mass: 286777666.76 > Read Type: 75bp single-end > Fragment Length Distribution: Gaussian (default) > Estimated Mean: 209.55 > Estimated Std Dev: 70.54 |
![]() |
![]() |
![]() |
#2 |
Member
Location: New York, NY Join Date: Dec 2010
Posts: 25
|
![]()
Hi Lindy,
Can you post a few lines of the input SAM file you're feeding Cufflinks? You can retrieve it from a BAM file with: Code:
samtools view accepted_hits.bam | tail -n +25 | head -n 10 Andrew |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: New York Join Date: Sep 2010
Posts: 9
|
![]()
Hi Andrew,
Thanks so much for your response. Here are the first 15 lines from one of the many samfiles that give me this problem. It might easier to read if you cut and paste into a text editor. It looks to me like the 5th and 7th reads are paired Thanks! Lindy @HD VN:1.0 SO:sorted @PG ID:TopHat VN:1.0.14 CL:/usr/local/genome/bin/tophat -p 8 -a 12 -m 1 -r 56 -o /mnt/analyses/LVPant/ AaegL1 LVPa2_R1_fastq_prefilter.txt,LVPa2_B_read_1_fastq_prefilter.txt,LVPa3_A_read_1_fastq_prefilter.txt,LVPa3_B_read_1_fastq_prefilter.txt,LVPa4_A_read_1_fastq_prefilter.txt,LVPa4_B_read_1_fastq_prefilter.txt,LVPa5_A_read_1_fastq_prefilter.txt,LVPa5_B_read_1_fastq_prefilter.txt LVPa2_R2_fastq_prefilter.txt,LVPa2_B_read_2_fastq_prefilter.txt,LVPa3_A_read_2_fastq_prefilter.txt,LVPa3_B_read_2_fastq_prefilter.txt,LVPa4_A_read_2_fastq_prefilter.txt,LVPa4_B_read_2_fastq_prefilter.txt,LVPa5_A_read_2_fastq_prefilter.txt,LVPa5_B_read_2_fastq_prefilter.txt ILLUMINA-300160:LVPa4_B_read_2:1:6:78:12045:15339#0 137 CH477186.1 702 1 76M * 0 0 CCAAAAAGAGATCAAGATTTATCAAAATGTTATTGTGCAATGGTACAGCCGTTTTGAGAAAGAATTGTTTGGCATC hhhhfffhhhhhhhchhhhhdhhffhhhghhchhehhhghhhfghgchhchhhhhhcfah_fW]chhhhehahhhg NM:i:2 NH:i:4 CC:Z:CH477219.1 CP:i:2212927 HWUSI-EAS1600:LVPa2_R2:10_15_2010:1:8:21:9291:2197#0 137 CH477186.1 934 1 75M * 0 0 GGAACGTCAGCGTAGCAGCTCACTAAACAGCGGTGATCACTCCAGTGCTCAGTGTACCGGACAACATAGTGCCGG bdfchdaefffgeghhghcfhfhghhfhhhhghghhhhghghhhhhhhhhhhhhhhghhhhhhhhhghhhhhhhh NM:i:1 NH:i:3 CC:Z:CH477299.1 CP:i:2381918 ILLUMINA-300160:LVPa3_A_read_2:1:2:114:7815:7323#0 137 CH477186.1 1102 3 76M * 0 0 GCACGACGATTGGAATCGACCTCAACAAAGTGAACAATTTCGGTCACAGAGTGTGGCCAAGATTCGGTGCCCGCCC d`^`R`feedd[a]c[bb]b]ccc]_fdadaffdfd_Q^a_fdbb]fdWfa[dchafffdaafahghh`_gfhefh NM:i:2 NH:i:2 CC:Z:CH477284.1 CP:i:705773 ILLUMINA-300160:LVPa3_A_read_2:1:2:5:9651:17717#0 163 CH477186.1 2520 255 76M = 2641 0 TGCAACTGTAGTAGCTATCCCCAAGGCCAAAAAAGACGTAACGCTGCCTACAAACTACCGTCCCATCAGCCTGCTA gfhhhhhghfh_fhhhgahhhhhhhchghhgahhhhchhhhhhhhdhchhfgfgge_ce[bdfdcahgaga]aaaa NM:i:0 NH:i:1 ILLUMINA-300160:LVPa5_B_read_1:1:8:116:1810:15456#0 99 CH477186.1 2567 255 76M = 2708 0 CTACAAACTACCGTCCCATCAGCCTGCTAAGTAGCCTGAGCAAAGTGTTTGAACGAATCATCTTAAACCTGTTTCA gggggggggggggggggaggdcfffgggggggggggggggfffaeaebefgggggcgggegggdgdfJeddadefg NM:i:1 NH:i:1 ILLUMINA-300160:LVPa3_A_read_1:1:2:5:9651:17717#0 83 CH477186.1 2641 255 76M = 2520 0 CAAAGCCGGTCACTCCACCAACCACCAGTTAGCACGGATAACCAAAATCGTAAAGGACGGTTTCTCTGCAAGAAAA dhcgegdfddfdfcfce]hfehhghhdgghhhfhhhghhhhhhghhhfhhhhhhhhghhhhhhhhhhh_hhhhhhh NM:i:0 NH:i:1 ILLUMINA-300160:LVPa5_B_read_2:1:8:116:1810:15456#0 147 CH477186.1 2708 255 76M = 2567 0 GCAAGAAAATCGACTGGTATGATTATGCTTGACGTCGAAAAGGCATATAATTCCGTCTGGCAGGACGCGATCATCT dbbghggdadghadghgggefdfdfc_dhhhdhhceghhhhhhhhfgchhhcfhhhhfhhhhhhhhhhhhghhhhh NM:i:0 NH:i:1 ILLUMINA-300160:LVPa5_B_read_2:1:8:10:14657:2302#0 163 CH477186.1 2743 255 76M = 2877 0 CGAAAAGGCATATAATTCCGTCTGGCAGGACGCGATCATCTACAAACTACGTAAATGTAACTGTCCTTTCTACATT hhfhhhhhhhhhhhhhhhhhhhhhhhghhfheghhhhchhhhhhh_hghghhhdfhgfhggghdfahhgghhdcgh NM:i:0 NH:i:1 ILLUMINA-300160:LVPa5_B_read_1:1:8:10:14657:2302#0 83 CH477186.1 2877 255 76M = 2743 0 CTGTATCTGGGAAATTCGACATCCCTTGTGGCGTCCCACAGCGATCAGTGTTGAGCCCAACACTCTACAATGTTTT ggdggdgghhhgggWhfhfhhggcgehhghghhffcaggdhhhchhfhhhghhghgghhhgghhhhhhhghhhhhh NM:i:2 NH:i:1 HWUSI-EAS1600:LVPa2_R2:10_15_2010:1:8:3:14836:3337#0 163 CH477186.1 4492 0 75M = 4603 0 ATTCGCCCCATTGCGGGGTGAATAGAAACATACAACATTTTCAATAATATTTGTACGATATAATTTTTATTTTTT hhhhhhhhhhhhhhhhhcbh`ffdehhhhhfhhhhhhhhhhhehhhhhhhhhhgghhghhhhghhhhhghhghhe NM:i:2 NH:i:5 CC:Z:CH477198.1 CP:i:431872 HWUSI-EAS1600:LVPa2_R2:10_15_2010:1:8:47:7635:10354#0 163 CH477186.1 4495 0 75M = 4606 0 TGCCCCATTGCGGGGTGAATAGAAACATACAACATTTTCATAAATATTTGTACGATATAATTTTTATTTTTTAAC hhhhhhhhhhhhhhh`f]dffffffhhhhfhhhhghhhhhhhhdhhhhhhghhhhhghhghhhhhghhhhhhhhh NM:i:1 NH:i:5 CC:Z:CH477198.1 CP:i:431869 ILLUMINA-300160:LVPa2_B_read_2:0:1:53:19937:17604#0 137 CH477186.1 4588 0 76M * 0 0 AAAGGAACACGCGACCCATATTTAAGACTAATAAAATTGGATTTTATCCACACGGTTTAAGTTGTACCCATTCAAA hhfhhZ]GSWRXWXZhhagfhhhhcggfghhhhaeegd]e_afffhgghhahhchhhfedcacfa[VaZZ[[[[Q[ NM:i:2 NH:i:19 CC:Z:CH477198.1 CP:i:431775 HWUSI-EAS1600:LVPa2_R2:10_15_2010:0:8:91:10481:4293#0 137 CH477186.1 4596 0 75M * 0 0 ACGCGACCCATATTTAAGACTAATAAAATTGGATTTTATCCACACGGTTTAAGTTGTACACATTCAAACATGTCG f_[]feae`d^Qb]]c^[caZZZZZZWMWY\aYY\Z_ZUOed]]`VUQTV``WX]d`^Wdb`d^_WUU[QbdZd` NM:i:0 NH:i:16 CC:Z:CH477198.1 CP:i:431768 |
![]() |
![]() |
![]() |
#4 |
Member
Location: New York, NY Join Date: Dec 2010
Posts: 25
|
![]()
From http://samtools.sourceforge.net/SAM1.pdf, the name of both mates in a pair must be the same. So while the two reads you mentioned are certainly paired, they aren't matched up in Cufflinks because the names differ by the number marked with a carrot below. If you preprocess the SAM files to eliminate the mate field given by that number (maybe turning it into an X or something?), the library should be recognized as paired-end.
Code:
ILLUMINA-300160:LVPa5_B_read_1:1:8:116:1810:15456#0 99 CH477186.1 2567 255 76M = ^ ILLUMINA-300160:LVPa5_B_read_2:1:8:116:1810:15456#0 147 CH477186.1 2708 255 76M = ^ Andrew Last edited by polyatail; 12-16-2010 at 10:11 PM. Reason: misinfo re: sam specification |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: New York Join Date: Sep 2010
Posts: 9
|
![]()
I have only just now come back to this issue, and your suggestion worked - thanks! I changed both read_1 and read_2 in the read names in my samfile to read_X, and cufflinks now reports my library as paired end.
I have a second issue now however - less significant than the first. When I run cufflinks with a reference annotation it correctly reports the mean fragment length as ~200 bp. But when I run it without the reference annotation it reports the fragment length as 0 bp (see two outputs below). I assume this is because my samfile was generated by a tophat run which used known junctions from the same reference annotation? It seems like the most likley answer, though I see no reason why cufflinks would not be able to parse the mapping coordinates of each read in a pair and report the correct fragment length distribution regardless of whether a reference annotation is used... > cufflinks -G aaegypti.BASEFEATURES_Liverpool-AaegL1.2.gff3 LVPa2_abbr.RX.bam [21:13:01] Inspecting reads and determining fragment length distribution. > Processed 16360 loci. [*************************] 100% > Map Properties: > Total Map Mass: 123784.26 > Read Type: 75bp paired-end > Fragment Length Distribution: Empirical (learned) > Estimated Mean: 197.41 > Estimated Std Dev: 13.71 [21:13:04] Calculating estimated abundances. > Processed 16360 loci. [*************************] 100% > cufflinks LVPa2_abbr.RX.bam [21:05:59] Inspecting reads and determining fragment length distribution. > Processed 1389 loci. [*************************] 100% > Map Properties: > Total Map Mass: 150829.93 > Read Type: 75bp paired-end > Fragment Length Distribution: Gaussian (default) > Estimated Mean: 0.02 > Estimated Std Dev: 2.17 [21:06:01] Assembling transcripts and estimating abundances. > Processed 1389 loci. [*************************] 100%
__________________
Lindy McBride - Rockefeller University |
![]() |
![]() |
![]() |
#6 |
Member
Location: Berkeley, CA Join Date: Jul 2010
Posts: 71
|
![]()
It looks like when you didn't provide an annotation, there was not enough available information to determine the fragment length distribution (see http://cufflinks.cbcb.umd.edu/howitworks.html#hdis). I'm not sure, however, why it defaulted to such low values. Try specifying the mean with -m and the std-dev with -s.
-Adam |
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: New York Join Date: Sep 2010
Posts: 9
|
![]()
Hi Adam - Thanks. I will try that. I guess I can run it with the reference to get the values and then rerun it without the reference specifying the values estimated in the first run. I had not seen the "how cufflinks works" page. That is helpful.
Lindy
__________________
Lindy McBride - Rockefeller University |
![]() |
![]() |
![]() |
Thread Tools | |
|
|