Stencil Computation Optimization and Auto-Tuning on State-of-the-art Multicore Architectures