All blast serial programs have a very similar structure as shown bellow:
Since we have many processors, we can search on many chunks at the same time. We implemented a master-slave configuration. The master starts its slaves and sends them the chunk numbers to work on. When a slave has finished with a chunk, it requests the next chunk from the master. At the end, every slave sends its results to the master, who adds them together.
The slaves are almost the same as the original program, only the way they get the next chunk changes, and instead of printing the results, they send them to the master. The master is like the original program, except that it does no searching at all, but delegates the searching to the slaves. In addition, the master distributes the chunks.
The pseudoprograms for the master and the slave are as follows:
From the above figures, it can be seen, that it is quite easy to modify the original programs, to get the master and the slave. For the master, the original blast main program, must be cut in two, one for parsing the arguments, and one for printing the results. The searching must be left out (procedure RunWild) and the distribution of chunk numbers and adding of the results must be added. For the slave, one just has to replace the procedure for getting the next chunk (TaskNext) with one, which will ask the master for the next chunk (net_next_seq), and the printing of the results must be left out.
As a refinement, whenever a slave exits with an error, it sends a message to the master that it failed. The original programs call the function exit() when an error occurs. A function which is always called on a process exit can be installed with the atexit() system call. So we installed a function that sends a PVM message to the master before the slave exits.
In version 1.4 of blast programs, the main programs are so similar, that the transformation from the serial to the parallel version is the same for all of them and is implemented with a single Perl [Swarz 93] script. (This was not the case for version 1.3, where one Perl script per blast program was needed). The script looks for some characteristic statements in the original blast programs, and inserts mainly some preprocessor statements.
Figure 1: Generation of master and slave from the original programs
Figure 1 shows for blastp how the master and the slave is generated. The blastp.c program is modified with the Perl script net_patch.pe to net_blastp.c. With conditional compilation statements, the master version net_master_blastp.o and the slave version net_slave_blastp.o are obtained. To these are linked two other modules, resultsp.o which contains functions mainly for mapping and unmapping of results, and m_blastp.o and s_blastp.o which contain the PVM communications.