Introduction

The blast programs compare a given sequence (of amino acids or nucleotides) with all the sequences of a database using the statistical methods of Karlin and Altschul [Kar Al 90].

There are six blast applications: blastp, blastn, tblastn, blastx, tblastx and blast3 which were implemented at the NCBI. All these search in a database conatining sequences of amino acids or nucleotides and print the sequences, which ``score'' the highest (i.e. resamble most) to the query sequence. In addition, blast3, after printing the primary results, searches in the results for three way alignments.

One way to parallelize these applications, is, to cut the database into ``chunks'' and give each processor a chunk (i.e. a certain amount of sequences) in which to search for the query sequence. When it is finished with that chunk, send the next one, till no more chunks are available. With this method, blastp was parallelized by E. Masson [Masson94]. The other four applications where parallelized with the same method by the author.

The problem was now, to decide which workstations of the cluster to choose for running a request. In their summer internship, C. Javet and X. Defago, implemented a programm for monitoring and scoring the load on the machines of the cluster. This point is discussed in detail in [Jav Def 94].

Also, a network server had to be written, to receive the requests, to forward them to the parallel blast application, to get the results, and send them back to the client. This was done by the author.

A problem still to be attacked, is what to do with requests, that can not be serviced right away, because the available resources are too busy. One has to decide, if they should wait, if they should be redirected to other servers, or if they should be executed on a smaller number of machines than by default.

Next: Parallelization Up: No Title Previous: No Title

Compte de groupe
Fri Jan 13 15:14:13 MET 1995