This thesis concerns multi-objective optimization problems arising in running scientific application on high performance computing platforms and streaming applications on embedded systems. Some scientific applications are commonly modeled as rooted tree. Due to the size of files, running such a tree may exceed local memory capacity. A practical solution on a multiprocessor system is to partition the tree into many subtrees, and run each on a processor, which is equipped with a local memory. We studied how to partition the tree into several subtrees such that each subtree fits in local memory and the makespan is minimized, when communication costs between processors are accounted.
Streaming applications appears often in video, audio domains. They are characterized by a series of operations on streaming data, and a high throughput. Multi-Processor System on Chip (MPSoC) is a multi/many-core embedded system that integrates many specific cores through a high speed interconnect on a single die. Such systems are widely used for multimedia applications. Lots of MPSoCs are limited-size batteries-operated. Such a tight energy budget intrinsically calls for an efficient schedule to meet the intensive computation demands. Dynamic voltage and frequency scaling (DVFS) can save energy by decreasing the frequency and voltage at the price of increasing failure rates. Another technique of reducing energy cost and meeting the reliability target as well is duplication of tasks. We model applications as linear chains and study how to minimize the energy consumption under throughput and reliability constraints, using dynamic voltage and frequency scaling (DVFS), duplication techniques on MPSoC platforms.
In a further extended work, with the same optimization goal, we model streaming applications as series-parallel graphs, which are more complex and more realistic. The target platform is now with two level hierarchical communication systems. The final part of this thesis is a practical work. A steal-technique based dynamic scheduling algorithm is proposed to achieve a good balance between data locality and load balancing in scheduling tree-shaped graphs onto parallel platforms. Our method has been validated by experiments in a parallel sparse direct solver–PaStiX.