Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
documentation:tools:testspgi [2015/04/28 12:35] – [Tests de fonctionnement et performance pgi et cura fortran] cicaluga | documentation:tools:testspgi [2023/01/12 09:40] (Version actuelle) – supprimée ltaulell | ||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
- | ====== Tests de fonctionnement et performance pgi et cuda fortran ====== | ||
- | {{INLINETOC}} | ||
- | ===== Benchmarks ===== | ||
- | |||
- | Plusieurs tests de fonctionnement et de performance de ces cartes sont présentés : | ||
- | |||
- | ==== Tests de détection matériel et logiciel ==== | ||
- | |||
- | Avec la commande linux lspci (qui affiche la liste de périphériques PCI, dont les cartes GPU) : | ||
- | |||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | 05:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1) | ||
- | Subsystem: NVIDIA Corporation Device 1015 | ||
- | Kernel driver in use: nvidia | ||
- | 83:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1) | ||
- | Subsystem: NVIDIA Corporation Device 1015 | ||
- | Kernel driver in use: nvidia | ||
- | </ | ||
- | |||
- | Cette commande ne retourne rien si lancée sur les serveurs de compilations (p.ex. e5-2670comp1) puisqu' | ||
- | |||
- | La sortie précédente est obtenue sur un noeud de calcul qui dispose de cartes GPU (dans cet exemple il s'agit du noeud c82gpgpu34 qui dispose de 2 cartes Tesla K20). | ||
- | |||
- | La commande linux lsmod affiche l' | ||
- | |||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | nvidia_uvm | ||
- | nvidia | ||
- | i2c_core | ||
- | </ | ||
- | |||
- | Pour afficher la version du driver CUDA installé : | ||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | NVRM version: NVIDIA UNIX x86_64 Kernel Module | ||
- | GCC version: | ||
- | </ | ||
- | |||
- | Pour afficher la version du CUDA Toolkit installé : | ||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | nvcc: NVIDIA (R) Cuda compiler driver | ||
- | Copyright (c) 2005-2013 NVIDIA Corporation | ||
- | Built on Wed_Jul_17_18: | ||
- | Cuda compilation tools, release 5.5, V5.5.0 | ||
- | </ | ||
- | |||
- | nvcc est le compilateur fourni dans le driver pour compiler des programmes CUDA (il appelle le compilateur gcc pour compiler le code C) | ||
- | |||
- | |||
- | **Uns autre possibilité** (hors commandes Linux) pour détecter la présence et le type de GPUS NVIDIA est de faire appel au programme deviceQuery dont le source .cpp est contenu dans la suite NVIDIA_GPU_Computing_SDK (devenue NVIDIA_CUDA-x.y_Samples dans les versions x.y récentes). Après compilation, | ||
- | |||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | c82gpgpu34: | ||
- | |||
- | ./ | ||
- | |||
- | CUDA Device Query (Runtime API) version (CUDART static linking) | ||
- | |||
- | Found 2 CUDA Capable device(s) | ||
- | |||
- | Device 0: "Tesla K20m" | ||
- | CUDA Driver Version / Runtime Version | ||
- | CUDA Capability Major/Minor version number: | ||
- | Total amount of global memory: | ||
- | MapSMtoCores SM 3.5 is undefined (please update to the latest SDK)! | ||
- | MapSMtoCores SM 3.5 is undefined (please update to the latest SDK)! | ||
- | (13) Multiprocessors x (-1) CUDA Cores/ | ||
- | GPU Clock Speed: | ||
- | Memory Clock rate: | ||
- | Memory Bus Width: | ||
- | L2 Cache Size: | ||
- | Max Texture Dimension Size (x, | ||
- | Max Layered Texture Size (dim) x layers | ||
- | Total amount of constant memory: | ||
- | Total amount of shared memory per block: | ||
- | Total number of registers available per block: 65536 | ||
- | Warp size: 32 | ||
- | Maximum number of threads per block: | ||
- | Maximum sizes of each dimension of a block: | ||
- | Maximum sizes of each dimension of a grid: | ||
- | Maximum memory pitch: | ||
- | Texture alignment: | ||
- | Concurrent copy and execution: | ||
- | Run time limit on kernels: | ||
- | Integrated GPU sharing Host Memory: | ||
- | Support host page-locked memory mapping: | ||
- | Concurrent kernel execution: | ||
- | Alignment requirement for Surfaces: | ||
- | Device has ECC support enabled: | ||
- | Device is using TCC driver mode: No | ||
- | Device supports Unified Addressing (UVA): | ||
- | Device PCI Bus ID / PCI location ID: 5 / 0 | ||
- | Compute Mode: | ||
- | < Exclusive Process (many threads in one process is able to use :: | ||
- | |||
- | Device 1: "Tesla K20m" | ||
- | CUDA Driver Version / Runtime Version | ||
- | CUDA Capability Major/Minor version number: | ||
- | Total amount of global memory: | ||
- | MapSMtoCores SM 3.5 is undefined (please update to the latest SDK)! | ||
- | MapSMtoCores SM 3.5 is undefined (please update to the latest SDK)! | ||
- | (13) Multiprocessors x (-1) CUDA Cores/ | ||
- | GPU Clock Speed: | ||
- | Memory Clock rate: | ||
- | Memory Bus Width: | ||
- | L2 Cache Size: | ||
- | Max Texture Dimension Size (x, | ||
- | Max Layered Texture Size (dim) x layers | ||
- | Total amount of constant memory: | ||
- | Total amount of shared memory per block: | ||
- | Total number of registers available per block: 65536 | ||
- | Warp size: 32 | ||
- | Maximum number of threads per block: | ||
- | Maximum sizes of each dimension of a block: | ||
- | Maximum sizes of each dimension of a grid: | ||
- | Maximum memory pitch: | ||
- | Texture alignment: | ||
- | Concurrent copy and execution: | ||
- | Run time limit on kernels: | ||
- | Integrated GPU sharing Host Memory: | ||
- | Support host page-locked memory mapping: | ||
- | Concurrent kernel execution: | ||
- | Alignment requirement for Surfaces: | ||
- | Device has ECC support enabled: | ||
- | Device is using TCC driver mode: No | ||
- | Device supports Unified Addressing (UVA): | ||
- | Device PCI Bus ID / PCI location ID: 131 / 0 | ||
- | Compute Mode: | ||
- | < Exclusive Process (many threads in one process is able to use :: | ||
- | |||
- | deviceQuery, | ||
- | |||
- | </ | ||
- | |||
- | ==== Test de la bande passante ==== | ||
- | |||
- | Un autre test fourni avec NVIDIA_GPU_Computing_SDK est le programme bandwidthTest. Après la compilation du programme source .cpp, son exécution donne la largeur de la bande pour trois transferts qui doivent être pris en compte dans le développement de codes CUDA : | ||
- | * transfert depuis CPU sur le GPU | ||
- | * transfert depuis GPU sur le CPU | ||
- | * transfert depuis GPU sur le GPU (intra GPU) | ||
- | Ci-dessous la sortie complète de ce programme sur le même noeud que précédemment : | ||
- | |||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | c82gpgpu34: | ||
- | |||
- | ./ | ||
- | |||
- | Running on... | ||
- | |||
- | | ||
- | Quick Mode | ||
- | |||
- | Host to Device Bandwidth, 1 Device(s), Paged memory | ||
- | | ||
- | | ||
- | |||
- | | ||
- | | ||
- | | ||
- | |||
- | | ||
- | | ||
- | | ||
- | |||
- | </ |