Concatenating Multiple .fasta Files

I'm trying to concatenate hundreds of .fasta files into a single, large fasta file containing all of the sequences. I haven't found a specific method to accomplish this in the forums. I did come across this code from, which I have adapted a bit. contains the following code:

class fasta:
    def __init__(self, name, sequence): = name
        self.sequence = sequence

def read_fasta(file):
    items = []
    index = 0
    for line in file:
        if line.startswith(">"):
           if index >= 1:
           name = line[:-1]
           seq = ''
           aninstance = fasta(name, seq)
           seq += line[:-1]
           aninstance = fasta(name, seq)

    return items

And here is the adapted script to concatenate .fasta files:

import sys
import glob
import fasta

#obtain directory containing single fasta files for query
filepattern = input('Filename pattern to match: ')

#obtain output directory
outfile = input('Filename of output file: ')

#create new output file
output = open(outfile, 'w')

#initialize lists
names = []
seqs = []

#glob.glob returns a list of files that match the pattern
for file in glob.glob(filepattern):

    print ("file: " + file)

    #we read the contents and an instance of the class is returned
    contents = fasta.read_fasta(open(file).readlines())

    #a file can contain more than one sequence so we read them in a loop
    for item in contents:

#we print the output
for i in range(len(names)):
    output.write(names[i] + '\n' + seqs[i] + '\n\n')


It is able to read the fasta files but the newly created output file contains no sequences. The error I receive is due to the, which is beyond my capability to mess with:

Traceback (most recent call last):
  File "C:\Python32\myfiles\test\3\", line 28, in <module>
    contents = fasta.read_fasta(open(file).readlines())
  File "C:\Python32\lib\", line 18, in read_fasta
    seq += line[:-1]
UnboundLocalError: local variable 'seq' referenced before assignment

Any suggestions? Thanks!


I think using python for this job is overkill. On the command line, a quick way to concatenate single/multiple fasta files with the .fasta or .fa extensions is to simply:

cat *.fa* > newfile.txt

The problem is in

       seq += line[:-1]
       aninstance = fasta(name, seq)

Try initializing seq before at the start of read_fasta(file).

EDIT: Further explanation

When you first call read_fasta, the first line in the file does not start with >, thus you append the first line to the string seq which has not be initialized yet (not even declared): you are appending a string (the first line) to a null value. The error present in the stack explains the problem:

UnboundLocalError: local variable 'seq' referenced before assignment

Not a python programer but it seems that question code tries to condense the data for each sequence in a single line and also separate sequence with a blank line.


would become



If this is in fact needed the cat based solution above would not work. Otherwise the cat is the simplest and most effective solution.

For windows OS via command prompt: (Note-folder should contain only required files) :

copy *.fasta **space** final.fasta  


The following ensures that new files always start on a new line:

$ awk 1 *.fasta > largefile.fasta

The solution using cat might fail on that:

$ echo -n foo > f1
$ echo bar > f2
$ cat f1 f2
$ awk 1 f1 f2

Need Your Help

Apache/Passenger/RoR Slow - But Why

ruby-on-rails performance apache2 passenger

I am running Ubuntu (64Bit) with Apache 2.2.17, Passenger 3.0.11, Ruby 1.9.3 and Rails 3.2.6

SQL Server Express performance issue

sql sql-server sql-server-2005 performance sql-server-express

I know my questions will sound silly and probably nobody will have perfect answer but since I am in a complete dead-end with the situation it will make me feel better to post it here.