Benchmarks for SGEMM with PCIe bus data transfer on HD 5870
Several interesting observations:
- Memory buffer kernels are faster than image kernels when A and B are written from host to device.
- Performance approaches the same asymptotic maximum in all cases with increasing matrix size. The usual trade of space (memory) for time (gigaFLOPS) applies.
- Accounting for PCIe bus data transfer will reduce the performance gap between IL/ISA and OpenCL. The host to device I/O bottleneck makes OpenCL more competitive than synthetic kernel benchmarks suggest.
 |
kernel only |
 |
kernel + write A and B from host to device |
 |
kernel + read C from device to host |
 |
kernel + write A and B from host to device + read C from device to host |
Bellevue, WA, May 19, 2010