BazEkon - The Main Library of the Cracow University of Economics

BazEkon home page

Main menu

Author
Krużel Filip (Cracow University of Technology, Poland), Banaś Krzysztof (AGH University of Science and Technology Kraków, Poland)
Title
Finite Element Numerical Integration on Xeon Phi coprocessor
Source
Annals of Computer Science and Information Systems, 2014, vol. 2, s. 603 - 612, rys., tab., bibliogr. 24 poz.
Keyword
Algorytmy numeryczne, Karty procesorowe, Sprzęt komputerowy
Numeric algorithms, Smart cards, Hardware
Note
summ.
Abstract
In the present article we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor is an extension of the idea of the many-core specialized unit for calculations and, by assumption, its performance has to be competitive with the current families of GPUs. Its main advantage is the built-in set of 512-bit vector registers and the ease of transferring existing codes from normal x86 architectures. In the article we verify the performance of previously developed OpenCL algorithms for finite element numerical integration, ported to the new Xeon Phi coprocessor architecture. The algorithm is tested for standard FEM approximations of selected problems. The obtained timing results allow to compare the performance of the OpenCL kernels executed on the Xeon Phi and the contemporary GPUs.(original abstract)
Full text
Show
Bibliography
Show
  1. AMD, AMD Accelerated Parallel Processing. OpenCL Programming Guide, revision 2.7, 2013.
  2. Banaś K., and Krużel F., "Large scale numerical integration on GPU", submitted for publication.
  3. Banaś K., Płaszewski P., and Macioł P., "Numerical integration on GPUs for higher order finite elements", Computers & Mathematics with Applications, vol. 67 (6), pp. 1319-1344, 2014, http://dx.doi.org/10.1016/j.camwa.2014.01.021
  4. Barker K. J., Davis K., Hoisie A., Kerbyson D. K., Lang M., Pakin S., and Sancho J. C., "Entering the petaflop era: The architecture and performance of Roadrunner," High Performance Computing, Networking, Storage and Analysis, pp. 1-11, Nov. 2008, http://dx.doi.org/10.1109/SC.2008.5217926
  5. Gaster B., Kaeli D., Howes L., Mistry P., and Schaa D., Heterogeneous Computing With OpenCL, Elsevier Science & Technology, 2011.
  6. Goodwins R., "Intel unveils many-core Knights platform for HPC", www.zdnet.co.uk, 2010.
  7. Govindaraju N. K., Larsen S., Gray J., and Manocha D., "A memory model for scientific algorithms on graphics processors," SC 2006 Conference, Proceedings of the ACM/IEEE, Nov. 2006, http://dx.doi.org/10.1109/SC.2006.2
  8. IBM, Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor, version 1.11, May 2008.
  9. Intel, Intel 64 and IA-32 Architectures Optimization Reference Manual, April 2012.
  10. Intel, Intel SDK for OpenCL Applications XE 2013 R2 Optimization Guide, 2013.
  11. Intel, Intel Xeon Phi Coprocessor Datasheet, June 2013.
  12. Intel, Intel Xeon Phi Product Family Performance, revision 1.4, 12th December 2013.
  13. Khronos OpenCLWorking Group, The OpenCL Specification, Ed. A. Munshi, version 1.2, revision 19, 2012.
  14. Krużel F., and Banaś K., "Vectorized OpenCL implementation of numerical integration for higher order finite elements," Computers & Mathematics with Applications, vol. 66 (10), pp. 2030-2044, 2013, http://dx.doi.org/10.1016/j.camwa.2013.08.026
  15. Michalik K., Banaś K., Płaszewski P., and Cybułka P., "ModFem : a computational framework for parallel adaptive finite element simulations", Computer Methods in Materials Science, vol 13 (1), pp 3-8, 2013.
  16. Morgan T. P., Intel teaches Xeon Phi x86 coprocessor snappy new tricks, www.theregister.co.uk, 2012.
  17. NVIDIA, "NVIDIA'a Next Generation CUDA Compute Architecture: Kepler GK110. The Fastest, Most Efficient HPC Architecture Ever Built", Whitepaper, ver. 1.0, 2012.
  18. NVIDIA, "Tesla K-Series Datasheet", Oct. 2013.
  19. NVIDIA, CUDA C Programming Guide, version 6.0, 2014.
  20. Rojek K., and Szustak L., "Adaptation of double-precision matrix multiplication to the Cell Broadband Engine architecture," in: PPAM'09: Proceedings of the 8th international conference on Parallel processing and applied mathematics, Springer-Verlag, Berlin, Heidelberg, pp. 535-546, 2010.
  21. Roth F., System Administration for the Intel Xeon Phi Coprocessor, 2013.
  22. Rul S., Vandierendonck H., D' Haene J., and De Bosschere K., "An experimental study on performance portability of OpenCL kernels", in: Application Accelerators in High Performance Computing, 2010 Symposium, Knoxville, TN, USA, p. 3, 2010.
  23. Seiler L., Carmean D., Sprangle E., Forsyth T., Abrash M., Dubey P., et al., "Larrabee: a many-core x86 architecture for visual computing", in SIGGRAPH '08: ACM SIGGRAPH 2008 papers, pp. 1-15, 2008, http://dx.doi.org/10.1145/1399504.1360617
  24. Wilt N., The CUDA Handbook: A Comprehensive Guide to GPU Programming, Addison-Wesley Professional, 2013
Cited by
Show
ISSN
2300-5963
Language
eng
Share on Facebook Share on Twitter Share on Google+ Share on Pinterest Share on LinkedIn Wyślij znajomemu