- 論壇徽章:
- 1
|
本帖最后由 56836430 于 2015-10-10 21:42 編輯
有多條這樣的序列,想根據(jù)它們的長(zhǎng)度進(jìn)行篩選,如小于100個(gè)氨基酸的刪除
in.fasta
>lcl|Abi_c1818_g1_i1_m.11845 unnamed protein product
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
------------------------PAVFSKGFNPQHVADGLYGRHLFVYSWPEGSLKQTLDLGSTGLIPLEVRFLHDPAKDTGYVACALSS
TLV----------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
-------------------
>lcl|Abi_c315_g1_i1_m.67006 unnamed protein product
------------------------------------------------------------GP----------------------GY-----
-------------------------------ASPKEA---------MEGPREALIYVTAVYT------------GTGRGKPDYLATVDVDP
TSPTYSKVVHRLPVPHLGDELHHSGWNACSSCHGDASAQRRYLILPSLISGRIYAVDTAKDPRAPVLHKVVEPETILAKTGLGYPHTAHCL
ASGDILVSCLGDKDGNAEGNGFLLLDSDLNVKGR---------------------------------------------------WEKPGN
SPKFGYDFWYQPRHKTMISSSWGAPAAFTKGFNPQHVLDGLYGKHLFVYSWPDGTLKQTIDLGNEGLIPLEVRFLHEPSKDTGYVGCALSG
NMVRFFKTSDGSWDHEVVISVPRFKVQNWILPEMPGLITDLLISLDDRYLYFVNWLHGDVRQYNIEDPKKPVLVGQVWVGGLVRKGSKVIV
EKENGQQWQSDVSDVQGKYLRGGPQMIQLSLDGKRLYVTNSLFSAWDRQFY-PELIEKGSHILQIDCNTEKGGLSVNSNFFVDFETEPEGP
ALAHEMRYPGGDCTSDIWV
out.fasta
>lcl|Abi_c315_g1_i1_m.67006 unnamed protein product
------------------------------------------------------------GP----------------------GY-----
-------------------------------ASPKEA---------MEGPREALIYVTAVYT------------GTGRGKPDYLATVDVDP
TSPTYSKVVHRLPVPHLGDELHHSGWNACSSCHGDASAQRRYLILPSLISGRIYAVDTAKDPRAPVLHKVVEPETILAKTGLGYPHTAHCL
ASGDILVSCLGDKDGNAEGNGFLLLDSDLNVKGR---------------------------------------------------WEKPGN
SPKFGYDFWYQPRHKTMISSSWGAPAAFTKGFNPQHVLDGLYGKHLFVYSWPDGTLKQTIDLGNEGLIPLEVRFLHEPSKDTGYVGCALSG
NMVRFFKTSDGSWDHEVVISVPRFKVQNWILPEMPGLITDLLISLDDRYLYFVNWLHGDVRQYNIEDPKKPVLVGQVWVGGLVRKGSKVIV
EKENGQQWQSDVSDVQGKYLRGGPQMIQLSLDGKRLYVTNSLFSAWDRQFY-PELIEKGSHILQIDCNTEKGGLSVNSNFFVDFETEPEGP
ALAHEMRYPGGDCTSDIWV
從網(wǎng)上找到一個(gè)例子,但是這個(gè)例子中進(jìn)行長(zhǎng)度計(jì)算時(shí),沒有去除“-”,腳本如下:
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
my ($infile, $outfile, $cut) = @ARGV;
my $o_seqi = Bio::SeqIO->new(-file => $infile, -format => 'fasta');
my $o_seqo = Bio::SeqIO->new(-file => ">$outfile",-format => 'fasta');
while (my $o_seq = $o_seqi->next_seq) {
next if ($o_seq->length < $cut);
$o_seqo->write_seq($o_seq);
}
請(qǐng)問怎么改進(jìn)這個(gè)腳本,從而能實(shí)現(xiàn)目標(biāo)?
|
|