- 論壇徽章:
- 7
|
回復(fù) 10# 一串兒葡萄皮
我已經(jīng)在二樓給你提醒過了,ID行有引號(hào)的,不是標(biāo)準(zhǔn)的FASTA格式,BioPerl模塊不能識(shí)別!- >Srcc10g063050.1.1 Receptor-like kinase Try_threonine protein kinase
- MKTKHKLTPRPFTPTPSPTHIMGILLICFIFSITNSFVIAQDDAVPAKSFPVFTPEDNFL
- IDCGATSSITLPGNKAFQPDQNTAKYLSYTGKDIQACASDKINVPSTLYVNAKIFTTEAI
- YTFHASTSGLHWIRLHFFPFKYEEYDLK
- ">Solyc11g072930.1.1 LRR receptor-like serine_threonine-protein kinase, RLP"
- MASSYFLLLVLVLSVFSVSADVFVSLDCGSSEAYTDHETSIDWLGDVDYVANGESHVVPS
- NNSISHVSALEVRGLDSTMYSHVDDNSSDPPRITALYLSKFNLSGSLPDFSSMDAL
- ETIDLSNNNLDGPIPDFFGTLPNLKELNLANNKFSGPVPASLSNKNGLTLDTSGNSDLCS
- SSEESCQNNDSSSPGNDQPTTGSTNNNKKKKKKKNNLPIILGTTISAFLLLWAIVGIFAI
- LHYKNKRAATSLINPGQASGGSTPFVDRVQMSEKIEKNPEVTAHDHENSTNV
- ">Solyc12g00890.1.1 Kinesin IPR001752 Kinesin, motor region"
- ESVINGRDLLGFSLTSPDLVICTGSPDIPARNYGDSPEFLKGCSISLENGI
- KGSEEVQAATKLFTDWQGSKDDDLCAPADFELPSPPVEENSSELSVPIVSINVGSTDCIS
- SESGIQFSEDKYFCGGNVLSTDTRIEESICASVYQTARVGNFSYHFNNLSAGFYLVDLHF
- VEVVLTDGSTGDFSENSPQRNSLEVNGDIKAAGKLQLANVSREK
復(fù)制代碼 這些示例數(shù)據(jù),就用BioPerl的Bio::SeqIO模塊來讀只能識(shí)別第一個(gè)序列!- #!/usr/bin/perl
- use strict;
- use Bio::SeqIO;
- use Bio::Seq;
- my $fasta=shift @ARGV;
- (my $Out=$fasta)=~s/(\.[^\.]+)$/_out$1/;
- open (IN,"<$fasta") or die "Open $fasta failed!\n";
- my $Out=Bio::SeqIO->new(-file=>">$Out",-format=>'fasta');
- my ($flag,$seq,$id,$desc)=();
- while(my $line=<IN>){
- chomp($line);
- if($line=~/\>/){
- if($flag){
- print "$id\t$desc\n$seq\n";
- my $SEQ_OBJ=Bio::Seq->new(-seq=>$seq,-id=>$id,-desc=>$desc,-alphabet=>'protein');
- $Out->write_seq($SEQ_OBJ);
- }
- $line=~s/[\"\>]//g;
- ($id,$desc)=split (/\s+/,$line,2);
- $seq=undef;
- $flag=1;
- }else{
- $seq.=$line;
- }
- }
- print "$id\t$desc\n$seq\n";
- my $SEQ_OBJ=Bio::Seq->new(-seq=>$seq,-id=>$id,-desc=>$desc,-alphabet=>'protein');
- $Out->write_seq($SEQ_OBJ);
- $Out->close();
復(fù)制代碼 |
|