Olivier, S., Huan, J., Liu, J., Prins, J.F., Dinan, J., Sadayappan, P., Tseng, C.-W.: UTS: An unbalanced tree search benchmark. 1–26 (2002)Ĭormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, Supercomputing 2002, pp. ACM (2013)Įl-Ghazawi, T., Cantonnet, F.: UPC performance and potential: a NPB experimental study. IEEE (2004)įriedley, A., Bronevetsky, G., Hoefler, T., Lumsdaine, A.: Hybrid MPI: efficient message passing for multi-core systems. In: Proceedings of the 16th Euromicro Conference on Real-Time Systems, ECRTS 2004, pp. Masmano, M., Ripoll, I., Crespo, A., Real, J.: Tlsf: a new dynamic memory allocator for real-time systems. In: Programming Language Design and Implementation (PLDI 2006) (2006) RC25360īarton, C., Cascaval, C., Almasi, G., Zheng, Y., Farreras, M., Chatterje, S., Amaral, J.N.: Shared memory programming for large scale machines. Tanase, G., Almási, G., Tiotto, E., Alvanos, M., Ly, A., Daltonn, B.: Performance Analysis of the IBM XL UPC on the PERCS Architecture, Technical report (2013). Yelick, K.A., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P.N., Graham, S.L., Gay, D., Colella, P., Aiken, A.: Titanium: a high-performance java dialect. In: Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, vol. Ĭharles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an Object-oriented Approach to Non-Uniform Cluster Computing. Numwich, R., Reid, J.: Co-array fortran for parallel programming, Technical report (1998)Ĭray Inc., Chapel Language Specification Version 0.8, April 2011. Consortium, UPC Specifications, v1.2, Lawrence Berkeley National Lab LBNL-59208, Technical report (2005) Rajamony, R., Arimilli, L., Gildea, K.: PERCS: The IBM POWER7-IH high-performance computing system. This process is experimental and the keywords may be updated as the learning algorithm improves. These keywords were added by machine and not by the authors. We evaluate the proposed memory allocation policies for various UPC benchmarks and using the IBM ® Power ® 775 supercomputer . Additionally we describe a novel technique employed by the UPC runtime for transforming remote memory accesses on a same shared memory node into local memory accesses, to further improve performance. This paper presents the memory management techniques employed by the IBM XL UPC compiler to achieve optimal performance on systems with Remote Direct Memory Access (RDMA). Good performance of UPC applications is often one important requirement for a system acquisition. Unified Parallel C (UPC) is a well known PGAS language that is available on most high performance computing systems. Partitioned Global Address Space (PGAS) languages are a popular alternative when building applications to run on large scale parallel machines.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |