SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting the base/nucleotide across all reads at particular position/s empyrean Bioinformatics 8 10-02-2012 07:21 AM
genename/start/stop position file bioinfo_ Bioinformatics 0 04-13-2012 12:57 PM
Convert chromosomal position to gene sequence position Stephanbio Bioinformatics 5 12-21-2010 07:12 AM
MAQ mapping start from position 17? genotyping Illumina/Solexa 0 02-10-2009 10:10 AM
start position of reads and its distribution baohua100 Bioinformatics 0 11-18-2008 05:20 AM

Reply
 
Thread Tools
Old 09-01-2012, 06:36 AM   #1
amango
Member
 
Location: New York

Join Date: Dec 2009
Posts: 17
Default extracting predicted gene from scaffold: end position precedes start position

I am trying to extract sequences for a list of predicted genes from genomic scaffolds. The list of predicted genes with Scaffold IDs, start and end positions, and other info comes from published supplementary data. My script to extract the sequences doesn't work because for some genes, the start position is a larger number than the end position (fourth-to-last and third-to-last columns below). Here is an example (numbers have been changed from original):
Quote:
geneID Gene_family Class ScaffoldID start_position end_position Number_of_exons Annotation_status
CSP1 cs Protein candidate gi|294506227|gb|GL650210.1| 61498 52100 2 intact
CSP10 cs Protein candidate gi|294507212|gb|GL649715.1| 293074 297989 2 intact
CSP2 cs Protein candidate gi|294507210|gb|GL650017.1| 234944 236074 2 intact
CSP3 cs Protein candidate gi|294507295|gb|GL649612.1| 323100 323743 2 intact
CSP4 cs Protein candidate gi|294506227|gb|GL650210.1| 41911 40888 2 intact
CSP5 cs Protein candidate gi|294507205|gb|GL649712.1| 274408 272617 2 intact
I am new to working with annotated genomes. Does it make sense that the some "starts" come after the "ends"? Is this because the ORF for this gene is on the opposite strand of the scaffold? If so, and if I want to obtain that sequence, what's the best way to get it--should I extract the sequence in the scaffold between the two numbers and then find the reverse complement?

Thanks for any pointers.
amango is offline   Reply With Quote
Old 09-02-2012, 07:25 AM   #2
zhidkov.ilia
Member
 
Location: Israel

Join Date: Dec 2010
Posts: 25
Default

Some genes transcribed from opposite strand of DNA, resulting in reverse coordinates. You can add additional column (i.e. strand) adding '+' in cases when start_position < end_position and '-' start_position > end_position.
zhidkov.ilia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO