BazEkon - Biblioteka Główna Uniwersytetu Ekonomicznego w Krakowie

BazEkon home page

Meny główne

Krużel Filip (Cracow University of Technology, Poland), Banaś Krzysztof (AGH University of Science and Technology Kraków, Poland)
Finite Element Numerical Integration on Xeon Phi coprocessor
Annals of Computer Science and Information Systems, 2014, vol. 2, s. 603 - 612, rys., tab., bibliogr. 24 poz.
Słowa kluczowe
Algorytmy numeryczne, Karty procesorowe, Sprzęt komputerowy
Numeric algorithms, Smart cards, Hardware
In the present article we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor is an extension of the idea of the many-core specialized unit for calculations and, by assumption, its performance has to be competitive with the current families of GPUs. Its main advantage is the built-in set of 512-bit vector registers and the ease of transferring existing codes from normal x86 architectures. In the article we verify the performance of previously developed OpenCL algorithms for finite element numerical integration, ported to the new Xeon Phi coprocessor architecture. The algorithm is tested for standard FEM approximations of selected problems. The obtained timing results allow to compare the performance of the OpenCL kernels executed on the Xeon Phi and the contemporary GPUs.(original abstract)
Pełny tekst
  1. AMD, AMD Accelerated Parallel Processing. OpenCL Programming Guide, revision 2.7, 2013.
  2. Banaś K., and Krużel F., "Large scale numerical integration on GPU", submitted for publication.
  3. Banaś K., Płaszewski P., and Macioł P., "Numerical integration on GPUs for higher order finite elements", Computers & Mathematics with Applications, vol. 67 (6), pp. 1319-1344, 2014,
  4. Barker K. J., Davis K., Hoisie A., Kerbyson D. K., Lang M., Pakin S., and Sancho J. C., "Entering the petaflop era: The architecture and performance of Roadrunner," High Performance Computing, Networking, Storage and Analysis, pp. 1-11, Nov. 2008,
  5. Gaster B., Kaeli D., Howes L., Mistry P., and Schaa D., Heterogeneous Computing With OpenCL, Elsevier Science & Technology, 2011.
  6. Goodwins R., "Intel unveils many-core Knights platform for HPC",, 2010.
  7. Govindaraju N. K., Larsen S., Gray J., and Manocha D., "A memory model for scientific algorithms on graphics processors," SC 2006 Conference, Proceedings of the ACM/IEEE, Nov. 2006,
  8. IBM, Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor, version 1.11, May 2008.
  9. Intel, Intel 64 and IA-32 Architectures Optimization Reference Manual, April 2012.
  10. Intel, Intel SDK for OpenCL Applications XE 2013 R2 Optimization Guide, 2013.
  11. Intel, Intel Xeon Phi Coprocessor Datasheet, June 2013.
  12. Intel, Intel Xeon Phi Product Family Performance, revision 1.4, 12th December 2013.
  13. Khronos OpenCLWorking Group, The OpenCL Specification, Ed. A. Munshi, version 1.2, revision 19, 2012.
  14. Krużel F., and Banaś K., "Vectorized OpenCL implementation of numerical integration for higher order finite elements," Computers & Mathematics with Applications, vol. 66 (10), pp. 2030-2044, 2013,
  15. Michalik K., Banaś K., Płaszewski P., and Cybułka P., "ModFem : a computational framework for parallel adaptive finite element simulations", Computer Methods in Materials Science, vol 13 (1), pp 3-8, 2013.
  16. Morgan T. P., Intel teaches Xeon Phi x86 coprocessor snappy new tricks,, 2012.
  17. NVIDIA, "NVIDIA'a Next Generation CUDA Compute Architecture: Kepler GK110. The Fastest, Most Efficient HPC Architecture Ever Built", Whitepaper, ver. 1.0, 2012.
  18. NVIDIA, "Tesla K-Series Datasheet", Oct. 2013.
  19. NVIDIA, CUDA C Programming Guide, version 6.0, 2014.
  20. Rojek K., and Szustak L., "Adaptation of double-precision matrix multiplication to the Cell Broadband Engine architecture," in: PPAM'09: Proceedings of the 8th international conference on Parallel processing and applied mathematics, Springer-Verlag, Berlin, Heidelberg, pp. 535-546, 2010.
  21. Roth F., System Administration for the Intel Xeon Phi Coprocessor, 2013.
  22. Rul S., Vandierendonck H., D' Haene J., and De Bosschere K., "An experimental study on performance portability of OpenCL kernels", in: Application Accelerators in High Performance Computing, 2010 Symposium, Knoxville, TN, USA, p. 3, 2010.
  23. Seiler L., Carmean D., Sprangle E., Forsyth T., Abrash M., Dubey P., et al., "Larrabee: a many-core x86 architecture for visual computing", in SIGGRAPH '08: ACM SIGGRAPH 2008 papers, pp. 1-15, 2008,
  24. Wilt N., The CUDA Handbook: A Comprehensive Guide to GPU Programming, Addison-Wesley Professional, 2013
Cytowane przez
Udostępnij na Facebooku Udostępnij na Twitterze Udostępnij na Google+ Udostępnij na Pinterest Udostępnij na LinkedIn Wyślij znajomemu