Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • scaffolds without paired end?

    hello all,
    i have data from solid, and from 454, unfortunately both of them are single end.
    is it possible to create scaffolds with single end contigs??

  • #2
    anyone?
    I have read some articles about short read assembly, and i noticed that after the first assembly step, they All use mate pair or pair end reads to make scaffolds or superconyigs. Anyone knows if its possible without pair end reads?

    Comment


    • #3
      Hi John,

      it is impossible to make scaffolds from contigs using single read sequences. You need paired-end or mate pair data for this.

      See these threads;
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

      and
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      Hope this helps,
      Boetsie

      Comment


      • #4
        thanks for the info,

        i'm a software engineer student, and for my final project i need to assemble from solid and 454 reads a whole genome, after using few assembly software like velvet and newbler, my biggest contig was 160kb in size.

        no matter what other software i try to use, and i tried alot, i cant make any bigger contigs.

        so you saying its impossible to make a whole genome from the data i have(which is not paired end)??

        Comment


        • #5
          Strictly it's not assembly, but if you have a closely related reference genome you can try to map your contigs to that (BLAST?) to gain extra information. The GenomeGraphs package in R can be useful for visulalising mapped contigs.

          160 kb isn't too bad for many (especially repeat rich) genomes. I hope your supervisors aren't expecting a finished genome as that's not realistic.

          Comment


          • #6
            Problem with single read sequences is that they can't handle repeats since it is impossible to know where the repeat should be placed.

            Say you have a repeat R, which is present two times in the genome. The neighbouring sequences for the first occurence i call A and B, and the neighbours for the second occurence of the repeat i call C and D:

            A->R->B
            C->R->D

            With de novo assembly, it is impossible to predict whether the sequence should be A>R>B or A>R>D, unless the repeat is smaller than your biggest (454) read.

            With paired data, you can predict if A and B belong together if one read of the sequence falls on contig A and the other sequence on contig B.

            For more information, see this website;



            As colindaven says; 190kb is quite good. I think you should not try to further improve the assembly.

            Boetsie

            Comment


            • #7
              Colindaven and boetsie, thanks for the help.

              Colindaven, you mean i should try to find similiar genome on the internet with blast? Or i should try try do alignmet of first data that i have on the other?

              Comment


              • #8
                Hi John,

                no problem, that's where this forum is for

                About your question; If you have a reference genome (say for example; you have E.coli reads, your reference will be E.coli), do a reference assembly.

                If you don't have a reference genome it is a bit harder... You can try to BLAST your contigs and see if you get a close related genome and use this genome for reference assembly. However, i'm not very familiar with this.

                To do a reference assembly, take a look at the software packages at;

                Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                or


                Also, if you have a reference genome, try to map the contigs using this tool;



                Go to Assembly -> Contig Aligner.

                and see how they map.

                Good luck.
                Boetsie

                Originally posted by john6015 View Post
                Colindaven and boetsie, thanks for the help.

                Colindaven, you mean i should try to find similiar genome on the internet with blast? Or i should try try do alignmet of first data that i have on the other?

                Comment


                • #9
                  Hi everyone, I am probably asking a very basic question. I have FastQ file from ion Torrent PGM sequencer and do not know whether the data is single read or paired end. How can I check for that? I want to know if I can produce scaffolds with this data.

                  Thank you

                  Kind Regards

                  Comment


                  • #10
                    If you have a related genome to serve as a reference you might consider using the bambus2 scaffolder.

                    Comment


                    • #11
                      Originally posted by Mark View Post
                      If you have a related genome to serve as a reference you might consider using the bambus2 scaffolder.
                      Hi, but can I use the fastQ file to see whether they are single or paired end reads? How do I know that?

                      Comment


                      • #12
                        I was responding to the question on scaffolding single end data. For your question, I've not used ion torrent but if it is similar to illumina output, paired end data comes in two files where single end data is in a single file. The pairs in each file are listed in order and have the same name up to where it designateds read 1 or read2. Its possible however that pe reads might come shuffled in a single file, in which case I would expect the first and second fastqs, and the trhird and fourth fastqs, etc., would be pairs sharing the same base ID

                        Comment


                        • #13
                          Originally posted by Mark View Post
                          I was responding to the question on scaffolding single end data. For your question, I've not used ion torrent but if it is similar to illumina output, paired end data comes in two files where single end data is in a single file. The pairs in each file are listed in order and have the same name up to where it designateds read 1 or read2. Its possible however that pe reads might come shuffled in a single file, in which case I would expect the first and second fastqs, and the trhird and fourth fastqs, etc., would be pairs sharing the same base ID
                          Hi, thanks. I have a single fastQ file and the first line is like this:

                          @VW27N:4:11
                          ATGAAACGCCGATTATCTTTAGCAATAACATTGTTGGCCGAACCGGAATTAATCATATTAGATGAACCAACTGTAGGCATTGACCTAAATTGCGCCAACAAATATGGCAACAGTTCAAGCAAATGACCAAAGACGGAAAGAGTGTCGTCATCACAACACATGTTATGGATGAGGCGGAACGTTGTGATAAAGTTGGACTTATTGTCGA
                          +
                          CCCCC?CDE@EEE?CC@@@;@AEE?DD>@@C?CD>C>C>CC@E@E@C@C?C?CCCCCE@E686;<5;;C=CCCD==8?A=9;2.(.5:,<49;?C:B;ABE9CCCD=AA@CDDD5;;5;;;?6=BCC=CD<CCCC=CC9>>>>>=DDA<;;BBCD=CC666DD8=>D<A@D==4<>/606@=9?@===CC7@C;C266D=CC=DD?

                          Could anyone give me a clue? Thanks again

                          Comment


                          • #14
                            Given the head line
                            @VW27N:4:11
                            lists nothing that can be construed as a "1" or "2" designation it seems very likely that you have single end data. Just to be sure, check that the second record in this file is not also named @VW27N:4:11.

                            Comment


                            • #15
                              Paired end data for Ion Torrent is rare, so it is unlikely you have it. Also, I believe most of the time the insert size is still close to the typical read length, so in many cases you can get higher quality fused reads but it won't be much help for scaffolding.

                              Short insert libraries don't tend to give you much scaffolding information anyways.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 11:49 AM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 08:47 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              61 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X