Dear Group,
I never came across errors in Tophat2/Cufflinks analysis and this error is troubling me and I cannot find a proper solution. I am posting in anticipation that I could get some help.
I have a fastq files from RNA-Seq experiment (1x50bp;strand specific dUTP) single-end 50bp reads.
I aligned the files using tophat2.
tophat -p 16 --library-type fr-firststrand -G /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2015-07-17-14-31-42/Genes/genes.gtf -o myDir /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome fastq1.fastq
I get accepted_hits.bam file along with other files and I do not see any error.
2. Next I ran, cufflinks and I did not get any problems here. I could successfully generate transcripts.gtf file.
cufflinks -p 16 --library-type fr-firststrand -o myDirCL accepted_hits.bam
3. I next did cuff merge using all gtf files in a list - gtfList
cuffmerge -p 16 -g /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2015-07-17-14-31-42/Genes/genes.gtf -s /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.fa gtfList
4. Cuffdiff fails:
cuffdiff -p 16 --library-type fr-firststrand -o mycfDiff /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.fa -u merged.gtf samp1a.bam,samp1b.bam samp2a.bam,samp2b.bam
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[16:06:53] Loading reference annotation.
right after saying loading reference annotation, pipeline fails.
I compared this with previous analyses. I never had EOF marker absent with successful cuffdiff runs.
I validated using ValidateSamFile of Picard and I do not see any difference between bam files of successful cuffdiff analysis and failed analysis.
Also Bam file is gzip file and I do see the end of bam file of failed run :
tail accepted_hits.bam | hexdump -C
00000980 54 84 0f 1b e3 be fa f6 50 67 5d b5 92 9c 16 24 |T.......Pg]....$|
00000990 b6 9e 54 68 40 ff 07 3c e5 ef 1d 2f 8e 00 00 1f |..Th@..<.../....|
000009a0 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b |...........BC...|
000009b0 00 03 00 00 00 00 00 00 00 00 00 |...........|
000009bb
Can any one suggest what could be wrong here.
I suspect, if I am correct in executing tophat for single-end 50bp, strand-specific reads.
There are 4 fastq files for each sample.
Header of fastq file:
file 1 : @SN930:564:H3Y5YBCXY:1:1101:1228:2226 1:N:0:AGTCAA
file 2: @SN930:565:H3Y5YBCXY:1:1101:1263:2151 1:N:0:AGTCAAC
file 3: @SN930:564:H3Y5YBCXY:2:1101:1433:2070 1:N:0:AGTCAA
file 4: @SN930:565:H3Y5YBCXY:2:1101:1282:2078 1:N:0:AGTCAAC
thanks a lot.
Adrian
I never came across errors in Tophat2/Cufflinks analysis and this error is troubling me and I cannot find a proper solution. I am posting in anticipation that I could get some help.
I have a fastq files from RNA-Seq experiment (1x50bp;strand specific dUTP) single-end 50bp reads.
I aligned the files using tophat2.
tophat -p 16 --library-type fr-firststrand -G /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2015-07-17-14-31-42/Genes/genes.gtf -o myDir /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome fastq1.fastq
I get accepted_hits.bam file along with other files and I do not see any error.
2. Next I ran, cufflinks and I did not get any problems here. I could successfully generate transcripts.gtf file.
cufflinks -p 16 --library-type fr-firststrand -o myDirCL accepted_hits.bam
3. I next did cuff merge using all gtf files in a list - gtfList
cuffmerge -p 16 -g /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2015-07-17-14-31-42/Genes/genes.gtf -s /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.fa gtfList
4. Cuffdiff fails:
cuffdiff -p 16 --library-type fr-firststrand -o mycfDiff /HumanGenome/Grch37/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.fa -u merged.gtf samp1a.bam,samp1b.bam samp2a.bam,samp2b.bam
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[16:06:53] Loading reference annotation.
right after saying loading reference annotation, pipeline fails.
I compared this with previous analyses. I never had EOF marker absent with successful cuffdiff runs.
I validated using ValidateSamFile of Picard and I do not see any difference between bam files of successful cuffdiff analysis and failed analysis.
Also Bam file is gzip file and I do see the end of bam file of failed run :
tail accepted_hits.bam | hexdump -C
00000980 54 84 0f 1b e3 be fa f6 50 67 5d b5 92 9c 16 24 |T.......Pg]....$|
00000990 b6 9e 54 68 40 ff 07 3c e5 ef 1d 2f 8e 00 00 1f |..Th@..<.../....|
000009a0 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 1b |...........BC...|
000009b0 00 03 00 00 00 00 00 00 00 00 00 |...........|
000009bb
Can any one suggest what could be wrong here.
I suspect, if I am correct in executing tophat for single-end 50bp, strand-specific reads.
There are 4 fastq files for each sample.
Header of fastq file:
file 1 : @SN930:564:H3Y5YBCXY:1:1101:1228:2226 1:N:0:AGTCAA
file 2: @SN930:565:H3Y5YBCXY:1:1101:1263:2151 1:N:0:AGTCAAC
file 3: @SN930:564:H3Y5YBCXY:2:1101:1433:2070 1:N:0:AGTCAA
file 4: @SN930:565:H3Y5YBCXY:2:1101:1282:2078 1:N:0:AGTCAAC
thanks a lot.
Adrian