Seqanswers Leaderboard Ad

**KatsenPlatz** · 08-19-2013, 09:15 AM

My Guess is that there is a problem somewhere in the fastq file, although the first 4 lines look good! A few checks that you can do are:

1. there are 4 lines for each sequence read in the file, i.e. total count of lines in the fastq file is 4x the total number of reads
2. every first line of each record starts with @ and every third line starts with +
3. the length of the quality sequence is the same as the length of the sequence read for every record

**GenoMax** · 08-19-2013, 09:21 AM

Looking at the sequence identifiers I wonder if this is old data from a GAII machine. It is then likely in the older illumina (1.3) Fastq format. If that is the case then you may need to add the relevant options for tophat to take that into account.

From TopHat manual

--solexa-quals Use the Solexa scale for quality values in FASTQ files.
--solexa1.3-quals As of the Illumina GA pipeline version 1.3, quality scores are encoded in Phred-scaled base-64. Use this option for FASTQ files from pipeline 1.3 or later.

**manvendra7** · 09-30-2013, 03:50 PM

Thanks guys,
My Problem is figured out. There was a problem with my fastq file

**arkilis** · 09-30-2013, 03:55 PM

Originally posted by manvendra7 View Post

Dear FOlks,
I am so new, an early stage researcher.

I am using TopHat2 to map the reads, I guess, I am fulfilling all the requirements, my code is

/usr/local/bin/tophat2 -p 8 -G ~/path/to/Homo_sapiens.GRCh37.72.gtf -o
~/path/to/Human_mapping_iPS_s7_rep1
--splice-mismatches 1 --max-multihits 30 --microexon-search --fusion-search
~/path/to/bowtie2_index/hg19
~/path/to/myfile.fastq

I am submitting on grid engine cluster with qsub -l h_vmem=50G [above_script]
this is showing error as:
"""""TopHat requires all reads be either FASTQ or FASTA. Mixing formats is not supported"""

I am bit frustrated because my fastq files look fine to me as shown in code

@SOLEXA-GA05_00009_SRi_AD_MS_BN_VW:7:1:2364:933#ATGAGCA
NGGCCTTCCCACATTCTTTACACTCATAGGTTTTCTCACCAGTGTGAGTTCTCTTGTGCACAATAAGGTAAGAGCC
+SOLEXA-GA05_00009_SRi_AD_MS_BN_VW:7:1:2364:933#ATGAGCA
!454478347;09977778<655476;69;8588380745<75;57495945158::=677976:7674:64763-

Please help???????

For all what I know is there are diff verions of fastq format. you better have to check of the software requirements.

FASTQ: three main versions, illumina 1.3+, 1.5+ and 1.8+

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 32 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 48 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

where is real problem, tophat2 code or my fastq files

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News