View Single Post
Old 03-23-2014, 04:54 PM   #1
jans
Junior Member
 
Location: york

Join Date: Mar 2014
Posts: 1
Default help with counting reads (only 0 counts were returned)

Hi,
This is my very first experience analysing RNAseq data. My goal is to do differential analysis between two strains of a bacteria. So far, i managed to align and produce SAM and BAM files. I'm having problems to annotate and count my reads. Here are the commands that I used.
My reads are from SOLID and hence in colourspace


Code:
$ nohup solid2fastq.pl 291_01_01 291_01_01-bwa  #Convert .csfasta and .qual to .fastq

$ nohup bwa index -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta 

$ nohup bwa aln -c TbruceiTreu927Genomic_TriTrypDB-4.0.fasta 291_01_01-bwa.singleF3.fastq 291_01_01-bwa.sai 

$ perl -ne 'if($_ !~ m/^\S+?\t4\t/){print $_}' 291_01_01-bwa.sam > 291_01_01-bwa.sam.filtered #Convert to SAM file

$ samtools sort 291_01_01-bwa.bam 291_01_01-bwa.bam.sorted

$ samtools index 291_01_01-bwa.bam.sorted.bam
#to produce .rpkm file

$ java -jar ~/bin/bam2rpkm-0.06/bam2rpkm-0.06.jar  -i 291_01_01-bwa.bam.sorted.bam -f Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.RPKM2.out  # i get an error here
$ERROR: Problem encountered whilst reading gtf file. Could not interpret line 'GeneDB|Tb427_01_v4 EuPathDB supercontig 1

so i tried different method to count

$ htseq-count -i ID 291_01_01-bwa.sam Tbrucei427_TriTrypDB-4.0.gff > 291_01_01-bwa.sam_htseq-count #still error 
$Error occured when processing GFF file (line 37060 of file Tbrucei427_TriTrypDB-4.0.gff):
need more than 1 value to unpack

and different method

$ python make_bed_from_fasta.py ~/Downloads/reference/TbruceiTreu927AnnotatedCDS_TriTrypDB-4.0.fasta > 927_reference.bed #this python script converts .fasta into .bed file since the .gff file cannot be processed
$multiBamCov -q 30 -p -bams 291_01_01-bwa.bam.sorted.bam -bed 927_reference.bed > test_counts.txt
now I only get 0 counts for all genes. Does this mean that there is something wrong with my alignment files or something wrong with the counting method . And it seems like my .gff (version 3) file was unable to be read by htseq-count and also the java script . I downloaded the gff file from GeneDB and it seems like in many tutorials .gtf files are used instead. So I'm stuck at counting the read part and I really need some help . Help please .
jans is offline   Reply With Quote