SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Cufflinks doesn't recognize read type (http://seqanswers.com/forums/showthread.php?t=16154)

marb 12-11-2011 09:26 AM

Cufflinks doesn't recognize read type
 
Hello.
I used Cufflinks to process data from tophat.
My data come from Illumina and are pair-end type.

I type following command:
Code:

/path/cufflinks tophat_out/accepted_hits.bam -o cuff_out

Unfortunately Cufflinks recognize that data as single-end and lenght 0!

Code:

[13:40:01] Inspecting reads and determining fragment length distribution.
Processed 204293 loci.
Map Properties:
      Total Map Mass: 5928337.99
      Read Type: 0bp single-end
      Fragment Length Distribution: Truncated Gaussian (default)
                    Default Mean: 200
                  Default Std Dev: 80
[13:52:49] Assembling transcripts and estimating abundances.
Processed 204293 loci.

Is it necesarry to use some more option to correct run Cufflinks on pair-end data?

Wallysb01 12-11-2011 05:16 PM

What do your sequence headers look like and how are your files split up, if at all? I've run into this problem before, and it just required getting the formatting right.

marb 12-12-2011 01:32 AM

Quote:

Originally Posted by Wallysb01 (Post 59296)
What do your sequence headers look like and how are your files split up, if at all? I've run into this problem before, and it just required getting the formatting right.

Do you think about header of bam file?
I obtained 28 fastq files from Casava - 14 right-end (R1) 14 left-end (R2).
I have processed them by tophat.

Wallysb01 12-12-2011 02:23 PM

Quote:

Originally Posted by marb (Post 59315)
Do you think about header of bam file?
I obtained 28 fastq files from Casava - 14 right-end (R1) 14 left-end (R2).
I have processed them by tophat.

Are you sure tophat used them as paired and not singled? How do the ends of your sequence headers look in the fastq format? If they come out with:

@XXXX 1:N:0 @XXXX 2:N:0
AGC.. GCT
+XXXX 1:N:0 +XXXX 1:N:0
.... .....

a lot of programs won't recognize that as paired end files. You need to convert it to:

@XXXX/1 @XXXX/2
AGC.. GCT
+XXXX/1 +XXXX/2
.... .....

I came on here with the same kinds of issues and a friendly commenter made this post to help people like me out:

http://contig.wordpress.com/2011/09/...-fastq-header/

marb 12-13-2011 03:56 AM

I've tested cufflinks processing on other data and then cufflinks recognised them correctly as 57bp x 57bp.
Hence I know that there is the mistake at tophat processing fastq files level.

I know that is necessary to all sequences R1 and R2 (pair-end) be typed in the same order, so I used following command:

Code:

tophat /path/to/genome $(printf "%s," ./*.gz | sed 's/,$/\n/')
Do you think that way type args (fastq files) is incorect?


All times are GMT -8. The time now is 03:36 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.