Seqanswers Leaderboard Ad

**sklages** · 11-04-2011, 01:42 AM

Originally posted by skruglyak View Post

Yes, there were strong opinions on both sides of the read naming issue. At the time, unaligned BAM was not supported input to the popular aligners. The format has been getting wider acceptance and I see the value of providing it as an option in the future.

What is the "other side" of "both sides"?

We are running three HiSeqs and a few GAs; reading and rewriting a few hundred gigabytes of compressed sequence data just to fix a deficient header is quite annoying IMHO.

I do agree SAM would be a nice option for data storage (it should probably not replace fastq yet, many people do still use fastq as input for their programs).
If it very wise to use a binary (sequencing specific) storage format like BAM ... I don't know, just a bad feeling :-)

Strange enough (never mentioned) ... lots of IT folks would appreciate if the "we create many, many files" madness would be limited to some reasonable number.
1,629,325 files for a 2x120 run is by far too much ...

just my 2p,
Sven

**afaghalavi** · 11-08-2011, 10:16 AM

Hello Dear Sir/Madam

We received our exome data and now i have 2 files (snps and indels) in text format.
I copy and paste a part of that in below. Please let me know what is next stage for data analysis and what shall I do ??!!! can I use annovar for its analysis and anotation??

#$ COLUMNS seq_name pos bcalls_used bcalls_filt ref Q(snp) max_gt Q(max_gt) max_gt|poly_site Q(max_gt|poly_site) A_used C_used G_used T_used
chr1 12783 2 0 G 24 AA 5 AA 5 2 0 0 0
chr1 13057 3 1 G 3 GG 4 CG 31 0 1 2 0
chr1 13351 1 0 T 1 TT 10 GT 3 0 0 1 0
chr1 14673 2 0 G 32 CC 5 CC 5 0 2 0 0

Best

**Orr Shomroni** · 11-09-2011, 05:45 AM

Thanks for the tip on the filtering, dawe. Our previous filtering resulted with only headers for 'Y' reads and -- as body, and apperently that wasn't much of an issue. Still, the new command makes it look cleaner.

One thing troubles me, though. I am trying to run the filtered files on FastQC, but I'm getting an error that the filtered fastq files are not in gz format. When I try to compress them, it says it cannot, because they are already in .gz format; when I try to decompress them, I get an error because the files are not GZIP files.

I imagine there should be an easy way to modify the extension for the filtered fastq file, but I am not sure how to do that within the "for" loop

**Orr Shomroni** · 11-09-2011, 07:42 AM

Ok, I solved the problem. Maybe I missed it, but this situation only applies if you are dealing with uncompressed fastq files to begin with. The filtering process necessarily returns an unzipped file, so the filename has to be adjusted and the file has to be compressed

**olus** · 12-13-2011, 11:12 AM

Originally posted by sparks View Post

Hi,
V1.8 has some extra fields:
<is filtered> is Y if the read is filtered, N otherwise.
<control number> is 0 when none of the control bits are on, otherwise it is an even number.
Does anyone know what these are for?
Is is_filtered reminiscent of QSEQ quality flag and if so does 'Y' mean high or low quality?

Colin

Hi Colin.
Did find out what
<control number>
in '@' FASTQ line is used for?

Except the light definition in the official pdf I couldn't find any suggestion.

If anybody could give me some hints it would be really appreciated!

Gabriele

**sparks** · 12-13-2011, 05:56 PM

Hi Gabriele,
I never found out about the control bits. The is_filtered is a flag that Illumina sets if they think the read might be from a polyclonal cluster.
Colin

Originally posted by olus View Post

Hi Colin.
Did find out what
<control number>
in '@' FASTQ line is used for?

Except the light definition in the official pdf I couldn't find any suggestion.

If anybody could give me some hints it would be really appreciated!

Gabriele

**olus** · 12-14-2011, 08:59 AM

Originally posted by sparks View Post

Hi Gabriele,
I never found out about the control bits. The is_filtered is a flag that Illumina sets if they think the read might be from a polyclonal cluster.
Colin

Thank you for your reply.
At the end I found some clues of what it could be.
It seems that the bit value is inherited from the .control files and store the information about the eventual PhiX spike in, barcode mismatches etc...:

Cheers

Gabriele

(look at OLB_UG_15009920C.pdf from illumina)

**boetsie** · 05-14-2012, 03:41 AM

Hi all,

For our Illumina HiSeq2000 we use the phiX spike-in. However, we see after demultiplexing that around 0.05% of the produced reads can align to the phiX genome. We now have a script that filters out the reads/pairs out that can align to the phiX genome (with Bowtie). This works ok, but we are wondering if there is an automated way to do this within CASAVA or if there is some flag within the fastQ header that represents if a read comes from the phiX genome?

Regards,
Boetsie

**tahamasoodi** · 07-29-2012, 03:03 AM

I am using CASAVA 1.8.2 on a separate machine and trying to convert bcl files generated by Hiseq 2000 to fastq but I am getting an error message that config.xml file does not exist at /usr/local/lib/CASAVA-1.8.2/perl/Casava/Demultiplex.pm line 111.
Can anyone please help me out from this issue.
Thanks

**sklages** · 07-29-2012, 09:47 AM

Originally posted by tahamasoodi View Post

I am using CASAVA 1.8.2 on a separate machine and trying to convert bcl files generated by Hiseq 2000 to fastq but I am getting an error message that config.xml file does not exist at /usr/local/lib/CASAVA-1.8.2/perl/Casava/Demultiplex.pm line 111.
Can anyone please help me out from this issue.
Thanks

Hmm, it tells you that there is no config.xml file found within the run directory you have supplied. What is the command line you used for bcl conversion? Do you have access to the whole run and all of its files?

Sven

**tahamasoodi** · 07-29-2012, 10:12 AM

Hi Sklages,
Thanks a lot. I have used the following command
configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.

**tahamasoodi** · 07-29-2012, 10:14 AM

Hi Sklages,
Thanks a lot. I have used the following command
configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.

**sklages** · 07-29-2012, 10:41 AM

Originally posted by tahamasoodi View Post

Hi Sklages,
Thanks a lot. I have used the following command
configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.

Well that's not enough. The software needs more than just the BCL files; e.g. it also needs config.xml (among others). You usually convert the whole flowcell from BCL to fastq as most people want to have fastq files at the end. So if you generated the data by yourself just copy the whole run and do the conversion or run the conversion on the machine where the complete data resides. If you have gotten the data from a sequencing provider just ask them to do the converision for you (including demultiplexing).

hth, Sven

**tahamasoodi** · 07-30-2012, 10:02 PM

Originally posted by sklages View Post

Well that's not enough. The software needs more than just the BCL files; e.g. it also needs config.xml (among others). You usually convert the whole flowcell from BCL to fastq as most people want to have fastq files at the end. So if you generated the data by yourself just copy the whole run and do the conversion or run the conversion on the machine where the complete data resides. If you have gotten the data from a sequencing provider just ask them to do the converision for you (including demultiplexing).

hth, Sven

Thanks. Can I run it on a separate machine while connecting that machine via LAN to the the instrument machine where I have the whole flowcell data as it takes long time copying from the one machine to another?

**sklages** · 07-30-2012, 10:18 PM

Originally posted by tahamasoodi View Post

Thanks. Can I run it on a separate machine while connecting that machine via LAN to the the instrument machine where I have the whole flowcell data as it takes long time copying from the one machine to another?

Yes, that's possible. At least with NFS. Keep in mind that this work slower as for local storage as the whole data needs to be read.

Let us know if it worked for you.

Sven

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News