Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
documentation:tools:benchmarksgpu [2015/04/26 18:58] – [Tests de détection matériel et logiciel] cicaluga | documentation:tools:benchmarksgpu [2023/01/12 09:39] (Version actuelle) – supprimée ltaulell | ||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
- | ====== Utilisation des GPUs au PSMN ====== | ||
- | {{INLINETOC}} | ||
- | |||
- | ===== Matériel disponible au PSMN ===== | ||
- | |||
- | Plusieurs cartes GPU NVIDIA sont installées au PSMN : | ||
- | * [[http:// | ||
- | * [[http:// | ||
- | * [[http:// | ||
- | |||
- | Une comparaison des spécifications constructeur est donnée dans le tableau ci-dessous : | ||
- | |||
- | |||
- | ^ Spécifications techniques | ||
- | | |||| | ||
- | | Performance peak en double précision | 1.17 Tflops | ||
- | | |||| | ||
- | | Performance peak en simple précision | 3.52 Tflops | ||
- | | |||| | ||
- | | Nombre de coeurs | 2496 | 2304 | 512 | | ||
- | | |||| | ||
- | | Fréquence coeurs | 0.706 GHz | 0.863 GHz | 1.3 GHz | | ||
- | | |||| | ||
- | | Mémoire | 5 GB | 3 GB | 6 GB | | ||
- | | |||| | ||
- | | Bande passante max (ECC off) | 208 GB/s | 288.4 GB/s | 177.6 GB/s | | ||
- | | |||| | ||
- | | ECC (Error-correcting code memory)| | ||
- | | |||| | ||
- | | Consommation max | 225 W | 250 W | ||
- | ===== Logiciel disponible au PSMN ===== | ||
- | |||
- | Le soft nécessaire au fonctionnement de ces cartes est installé avec les paquets Debian. Il s'agit de : | ||
- | * drivers | ||
- | * librarie CUDA | ||
- | * CUDA SDK (optionnel) | ||
- | |||
- | |||
- | ===== Files d' | ||
- | |||
- | Les files d' | ||
- | |||
- | ===== Benchmarks ===== | ||
- | |||
- | Plusieurs tests de fonctionnement et de performance de ces cartes sont présentés : | ||
- | |||
- | ==== Tests de détection matériel et logiciel ==== | ||
- | |||
- | Avec la commande linux lspci (qui affiche la liste de périphériques PCI, dont les cartes GPU) : | ||
- | |||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | 05:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1) | ||
- | Subsystem: NVIDIA Corporation Device 1015 | ||
- | Kernel driver in use: nvidia | ||
- | 83:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1) | ||
- | Subsystem: NVIDIA Corporation Device 1015 | ||
- | Kernel driver in use: nvidia | ||
- | </ | ||
- | |||
- | Cette commande ne retourne rien si lancée sur les serveurs de compilations (p.ex. e5-2670comp1) puisqu' | ||
- | |||
- | La sortie précédente est obtenue sur un noeud de calcul qui dispose de cartes GPU (dans cet exemple il s'agit du noeud c82gpgpu34 qui dispose de 2 cartes Tesla K20). | ||
- | |||
- | La commande linux lsmod affiche l' | ||
- | |||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | nvidia_uvm | ||
- | nvidia | ||
- | i2c_core | ||
- | </ | ||
- | |||
- | Pour afficher la version du driver CUDA installé : | ||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | NVRM version: NVIDIA UNIX x86_64 Kernel Module | ||
- | GCC version: | ||
- | </ | ||
- | |||
- | Pour afficher la version du CUDA Toolkit installé : | ||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | nvcc: NVIDIA (R) Cuda compiler driver | ||
- | Copyright (c) 2005-2013 NVIDIA Corporation | ||
- | Built on Wed_Jul_17_18: | ||
- | Cuda compilation tools, release 5.5, V5.5.0 | ||
- | </ | ||
- | |||
- | nvcc est le compilateur fourni dans le driver pour compiler des programmes CUDA (il appelle le compilateur gcc pour compiler le code C) | ||
- | |||
- | |||
- | **Uns autre possibilité** (hors commandes Linux) pour détecter la présence et le type de GPUS NVIDIA est de faire appel au programme deviceQuery dont le source .cpp est contenu dans la suite NVIDIA_GPU_Computing_SDK (devenue NVIDIA_CUDA-x.y_Samples dans les versions x.y récentes). Après compilation, | ||
- | |||
- | <code bash> | ||
- | c82gpgpu34: | ||
- | c82gpgpu34: | ||
- | |||
- | ./ | ||
- | |||
- | CUDA Device Query (Runtime API) version (CUDART static linking) | ||
- | |||
- | Found 2 CUDA Capable device(s) | ||
- | |||
- | Device 0: "Tesla K20m" | ||
- | CUDA Driver Version / Runtime Version | ||
- | CUDA Capability Major/Minor version number: | ||
- | Total amount of global memory: | ||
- | MapSMtoCores SM 3.5 is undefined (please update to the latest SDK)! | ||
- | MapSMtoCores SM 3.5 is undefined (please update to the latest SDK)! | ||
- | (13) Multiprocessors x (-1) CUDA Cores/ | ||
- | GPU Clock Speed: | ||
- | Memory Clock rate: | ||
- | Memory Bus Width: | ||
- | L2 Cache Size: | ||
- | Max Texture Dimension Size (x, | ||
- | Max Layered Texture Size (dim) x layers | ||
- | Total amount of constant memory: | ||
- | Total amount of shared memory per block: | ||
- | Total number of registers available per block: 65536 | ||
- | Warp size: 32 | ||
- | Maximum number of threads per block: | ||
- | Maximum sizes of each dimension of a block: | ||
- | Maximum sizes of each dimension of a grid: | ||
- | Maximum memory pitch: | ||
- | Texture alignment: | ||
- | Concurrent copy and execution: | ||
- | Run time limit on kernels: | ||
- | Integrated GPU sharing Host Memory: | ||
- | Support host page-locked memory mapping: | ||
- | Concurrent kernel execution: | ||
- | Alignment requirement for Surfaces: | ||
- | Device has ECC support enabled: | ||
- | Device is using TCC driver mode: No | ||
- | Device supports Unified Addressing (UVA): | ||
- | Device PCI Bus ID / PCI location ID: 5 / 0 | ||
- | Compute Mode: | ||
- | < Exclusive Process (many threads in one process is able to use :: | ||
- | |||
- | Device 1: "Tesla K20m" | ||
- | CUDA Driver Version / Runtime Version | ||
- | CUDA Capability Major/Minor version number: | ||
- | Total amount of global memory: | ||
- | MapSMtoCores SM 3.5 is undefined (please update to the latest SDK)! | ||
- | MapSMtoCores SM 3.5 is undefined (please update to the latest SDK)! | ||
- | (13) Multiprocessors x (-1) CUDA Cores/ | ||
- | GPU Clock Speed: | ||
- | Memory Clock rate: | ||
- | Memory Bus Width: | ||
- | L2 Cache Size: | ||
- | Max Texture Dimension Size (x, | ||
- | Max Layered Texture Size (dim) x layers | ||
- | Total amount of constant memory: | ||
- | Total amount of shared memory per block: | ||
- | Total number of registers available per block: 65536 | ||
- | Warp size: 32 | ||
- | Maximum number of threads per block: | ||
- | Maximum sizes of each dimension of a block: | ||
- | Maximum sizes of each dimension of a grid: | ||
- | Maximum memory pitch: | ||
- | Texture alignment: | ||
- | Concurrent copy and execution: | ||
- | Run time limit on kernels: | ||
- | Integrated GPU sharing Host Memory: | ||
- | Support host page-locked memory mapping: | ||
- | Concurrent kernel execution: | ||
- | Alignment requirement for Surfaces: | ||
- | Device has ECC support enabled: | ||
- | Device is using TCC driver mode: No | ||
- | Device supports Unified Addressing (UVA): | ||
- | Device PCI Bus ID / PCI location ID: 131 / 0 | ||
- | Compute Mode: | ||
- | < Exclusive Process (many threads in one process is able to use :: | ||
- | |||
- | deviceQuery, | ||
- | |||
- | </ | ||
- | |||
- | ==== Tests de la bande passante ==== | ||
- | |||
- | ==== Tests de performance des bibliothèques d' | ||
- | ==== Tests de performance de la composante FFT de Cuda ==== | ||
- | ==== Tests de performance des codes " | ||
- | des codes " |