Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Originally posted by skruglyak View Post
    Yes, there were strong opinions on both sides of the read naming issue. At the time, unaligned BAM was not supported input to the popular aligners. The format has been getting wider acceptance and I see the value of providing it as an option in the future.
    What is the "other side" of "both sides"?

    We are running three HiSeqs and a few GAs; reading and rewriting a few hundred gigabytes of compressed sequence data just to fix a deficient header is quite annoying IMHO.

    I do agree SAM would be a nice option for data storage (it should probably not replace fastq yet, many people do still use fastq as input for their programs).
    If it very wise to use a binary (sequencing specific) storage format like BAM ... I don't know, just a bad feeling :-)

    Strange enough (never mentioned) ... lots of IT folks would appreciate if the "we create many, many files" madness would be limited to some reasonable number.
    1,629,325 files for a 2x120 run is by far too much ...

    just my 2p,
    Sven
    Last edited by sklages; 11-04-2011, 05:18 AM. Reason: typos

    Comment


    • #77
      Hello Dear Sir/Madam

      We received our exome data and now i have 2 files (snps and indels) in text format.
      I copy and paste a part of that in below. Please let me know what is next stage for data analysis and what shall I do ??!!! can I use annovar for its analysis and anotation??

      #$ COLUMNS seq_name pos bcalls_used bcalls_filt ref Q(snp) max_gt Q(max_gt) max_gt|poly_site Q(max_gt|poly_site) A_used C_used G_used T_used
      chr1 12783 2 0 G 24 AA 5 AA 5 2 0 0 0
      chr1 13057 3 1 G 3 GG 4 CG 31 0 1 2 0
      chr1 13351 1 0 T 1 TT 10 GT 3 0 0 1 0
      chr1 14673 2 0 G 32 CC 5 CC 5 0 2 0 0


      Best

      Comment


      • #78
        Thanks for the tip on the filtering, dawe. Our previous filtering resulted with only headers for 'Y' reads and -- as body, and apperently that wasn't much of an issue. Still, the new command makes it look cleaner.

        One thing troubles me, though. I am trying to run the filtered files on FastQC, but I'm getting an error that the filtered fastq files are not in gz format. When I try to compress them, it says it cannot, because they are already in .gz format; when I try to decompress them, I get an error because the files are not GZIP files.

        I imagine there should be an easy way to modify the extension for the filtered fastq file, but I am not sure how to do that within the "for" loop
        "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

        Comment


        • #79
          Ok, I solved the problem. Maybe I missed it, but this situation only applies if you are dealing with uncompressed fastq files to begin with. The filtering process necessarily returns an unzipped file, so the filename has to be adjusted and the file has to be compressed
          "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

          Comment


          • #80
            Originally posted by sparks View Post
            Hi,
            V1.8 has some extra fields:
            <is filtered> is Y if the read is filtered, N otherwise.
            <control number> is 0 when none of the control bits are on, otherwise it is an even number.
            Does anyone know what these are for?
            Is is_filtered reminiscent of QSEQ quality flag and if so does 'Y' mean high or low quality?

            Colin
            Hi Colin.
            Did find out what
            <control number>
            in '@' FASTQ line is used for?

            Except the light definition in the official pdf I couldn't find any suggestion.

            If anybody could give me some hints it would be really appreciated!

            Gabriele
            gabriele bucci

            Comment


            • #81
              Hi Gabriele,
              I never found out about the control bits. The is_filtered is a flag that Illumina sets if they think the read might be from a polyclonal cluster.
              Colin
              Originally posted by olus View Post
              Hi Colin.
              Did find out what
              <control number>
              in '@' FASTQ line is used for?

              Except the light definition in the official pdf I couldn't find any suggestion.

              If anybody could give me some hints it would be really appreciated!

              Gabriele

              Comment


              • #82
                Originally posted by sparks View Post
                Hi Gabriele,
                I never found out about the control bits. The is_filtered is a flag that Illumina sets if they think the read might be from a polyclonal cluster.
                Colin
                Thank you for your reply.
                At the end I found some clues of what it could be.
                It seems that the bit value is inherited from the .control files and store the information about the eventual PhiX spike in, barcode mismatches etc...:

                Cheers

                Gabriele



                (look at OLB_UG_15009920C.pdf from illumina)
                gabriele bucci

                Comment


                • #83
                  Hi all,

                  For our Illumina HiSeq2000 we use the phiX spike-in. However, we see after demultiplexing that around 0.05% of the produced reads can align to the phiX genome. We now have a script that filters out the reads/pairs out that can align to the phiX genome (with Bowtie). This works ok, but we are wondering if there is an automated way to do this within CASAVA or if there is some flag within the fastQ header that represents if a read comes from the phiX genome?

                  Regards,
                  Boetsie

                  Comment


                  • #84
                    I am using CASAVA 1.8.2 on a separate machine and trying to convert bcl files generated by Hiseq 2000 to fastq but I am getting an error message that config.xml file does not exist at /usr/local/lib/CASAVA-1.8.2/perl/Casava/Demultiplex.pm line 111.
                    Can anyone please help me out from this issue.
                    Thanks
                    Thanks,

                    Comment


                    • #85
                      Originally posted by tahamasoodi View Post
                      I am using CASAVA 1.8.2 on a separate machine and trying to convert bcl files generated by Hiseq 2000 to fastq but I am getting an error message that config.xml file does not exist at /usr/local/lib/CASAVA-1.8.2/perl/Casava/Demultiplex.pm line 111.
                      Can anyone please help me out from this issue.
                      Thanks
                      Hmm, it tells you that there is no config.xml file found within the run directory you have supplied. What is the command line you used for bcl conversion? Do you have access to the whole run and all of its files?

                      Sven

                      Comment


                      • #86
                        Hi Sklages,
                        Thanks a lot. I have used the following command
                        configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

                        No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.
                        Thanks,

                        Comment


                        • #87
                          Hi Sklages,
                          Thanks a lot. I have used the following command
                          configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

                          No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.
                          Thanks,

                          Comment


                          • #88
                            Originally posted by tahamasoodi View Post
                            Hi Sklages,
                            Thanks a lot. I have used the following command
                            configureBclToFastq.pl --input-dir /home/tahashafi/NGS/illumina/Base_Calls/C1.1 --output-dir /home/tahashafi/Desktop/casava_example_dir --Sample-Sheet /home/tahashafi/NGS/illumina/Base_Calls/C1.1/SampleSheet.csv --mismatch=1 --force --use-bases-mask Y101,I7,Y101

                            No I don't have access to the whole run and its files. Actually I have copied only some bcl files from the instrument connected machine and now am trying to convert those files to fastq on a separate machine.
                            Well that's not enough. The software needs more than just the BCL files; e.g. it also needs config.xml (among others). You usually convert the whole flowcell from BCL to fastq as most people want to have fastq files at the end. So if you generated the data by yourself just copy the whole run and do the conversion or run the conversion on the machine where the complete data resides. If you have gotten the data from a sequencing provider just ask them to do the converision for you (including demultiplexing).


                            hth, Sven

                            Comment


                            • #89
                              Originally posted by sklages View Post
                              Well that's not enough. The software needs more than just the BCL files; e.g. it also needs config.xml (among others). You usually convert the whole flowcell from BCL to fastq as most people want to have fastq files at the end. So if you generated the data by yourself just copy the whole run and do the conversion or run the conversion on the machine where the complete data resides. If you have gotten the data from a sequencing provider just ask them to do the converision for you (including demultiplexing).


                              hth, Sven

                              Thanks. Can I run it on a separate machine while connecting that machine via LAN to the the instrument machine where I have the whole flowcell data as it takes long time copying from the one machine to another?
                              Thanks,

                              Comment


                              • #90
                                Originally posted by tahamasoodi View Post
                                Thanks. Can I run it on a separate machine while connecting that machine via LAN to the the instrument machine where I have the whole flowcell data as it takes long time copying from the one machine to another?
                                Yes, that's possible. At least with NFS. Keep in mind that this work slower as for local storage as the whole data needs to be read.

                                Let us know if it worked for you.

                                Sven

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X