SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SNP base calling shuang Bioinformatics 7 10-24-2011 12:50 PM
SNP base calling for multiple samples shuang Bioinformatics 2 09-07-2011 03:06 PM
Editing fasta , reference base in snp calling samtools moriah Bioinformatics 2 08-10-2011 12:11 AM
base composition and base calling arolfe Illumina/Solexa 2 07-29-2011 08:50 AM
PubMed: Probabilistic base calling of Solexa sequencing data. Newsbot! Literature Watch 0 10-15-2008 06:41 AM

Reply
 
Thread Tools
Old 06-18-2011, 09:13 PM   #1
atgc
Junior Member
 
Location: California

Join Date: Aug 2010
Posts: 6
Default Mapping and base calling

Hello,

I recently received sequencing data from Illumina's HiSeq. The reads are 100 bp, paired end. The data has been provided to us in fastq format.

I have worked with PE sequencing earlier. However, initially, data was provided in a txt format which we imported into CLC Genomics Workbench. Does anyone know if the fastq format can be imported into CLC Genomics Workbench?
Thanks for your help in advance.

ATGC
atgc is offline   Reply With Quote
Old 06-19-2011, 03:38 AM   #2
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Just works like importing the qseq files (what you probably did import before as txt). Select to import fastq.
sklages is offline   Reply With Quote
Old 06-19-2011, 09:45 AM   #3
atgc
Junior Member
 
Location: California

Join Date: Aug 2010
Posts: 6
Default

Thank you.
atgc is offline   Reply With Quote
Old 06-19-2011, 02:54 PM   #4
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

First of all; probably the 'older' .txt format was also a .fastQ format. CASAVA spits out a .txt file, which is basically a .fastQ file.

Anyway; Best way to import Illumina Paired-end data is to go to 'File -> Import High-throughput sequencing data -> Illumina". Select here both paired-reads and choose your file format (.fastQ). See this link from CLC:
http://www.clcbio.com/index.php?id=1..._Illumina.html

Illumina HiSeq data can be treated the same as Illumina Genome Analyzer data.
boetsie is offline   Reply With Quote
Old 06-19-2011, 08:57 PM   #5
atgc
Junior Member
 
Location: California

Join Date: Aug 2010
Posts: 6
Default

Initially the paired end read data that was given to me was two files/ sample - a forward and a reverse read. However this new data set includes 4 or more files / sample. I don't understand why this is.
atgc is offline   Reply With Quote
Old 06-19-2011, 11:06 PM   #6
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by atgc View Post
Initially the paired end read data that was given to me was two files/ sample - a forward and a reverse read. However this new data set includes 4 or more files / sample. I don't understand why this is.
If this data has been generated via CASAVA 1.8 then this is due to the fact that every fastq file generated has a constant number of sequences (except for the last one which holds the remainder), but at most 16mio. So one lane of HiSeq data is almost always splitted into more than one read (fastq) file.

E.g.
Code:
sample2_CGATGT_L003_R1_001.fastq.gz
sample2_CGATGT_L003_R2_001.fastq.gz
sample2_CGATGT_L003_R1_002.fastq.gz
sample2_CGATGT_L003_R2_002.fastq.gz
sample2_CGATGT_L003_R1_003.fastq.gz
sample2_CGATGT_L003_R2_003.fastq.gz
hth, Sven
sklages is offline   Reply With Quote
Old 06-20-2011, 12:10 PM   #7
atgc
Junior Member
 
Location: California

Join Date: Aug 2010
Posts: 6
Default

That helps. Thanks.
atgc is offline   Reply With Quote
Old 06-20-2011, 01:24 PM   #8
atgc
Junior Member
 
Location: California

Join Date: Aug 2010
Posts: 6
Default

I am getting an error when I import fastq reads into the CLC Genomics Workbench.

The message reads as follows:

"The data seems to be corrupt or originates from different sources since the input files contain different number of reads. The previous sequences have been properly saved but it is highly likely the import data is incomplete or defective. Please double-check the sources."

I have two forward and two reverse fastq files. I tried to import only one of each and I still get the error when importing.

Would anyone here know the source of this error or what it means.

Thanks for your help in advance.

Last edited by atgc; 06-27-2011 at 03:01 PM.
atgc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO