# Difference between revisions of "SW:Matlab"

(→Running (parallel) Matlab jobs on HPRC resources) |
|||

Line 30: | Line 30: | ||

=Running (parallel) Matlab jobs on HPRC resources= | =Running (parallel) Matlab jobs on HPRC resources= | ||

− | When your Matlab job needs more resources than are allowed during an interactive session (e.g. cpu time, number of cores) you can run it on the compute nodes. Normally, you would need to create a batch script and submit it to the batch scheduler (LSF on ada and SLURM on terra). However, HPRC provides | + | When your Matlab job needs more resources than are allowed during an interactive session (e.g. cpu time, number of cores) you can run it on the compute nodes. Normally, you would need to create a batch script and submit it to the batch scheduler (LSF on ada and SLURM on terra). However, HPRC provides a variety of options to run your Matlab jobs on the compute nodes without the need to write (and submit) your own batch script. We will discuss how to do this in the following sections. First, we will introduce TAMUClusterProperties |

==TAMUClusterProperties== | ==TAMUClusterProperties== |

## Revision as of 15:48, 28 March 2017

*THIS PAGE IS UNDER CONSTRUCTION*

## Contents

# Running Matlab interactively

Matlab is accessible to all HPRC users within the terms of our license agreement. If you have particular concerns about whether specific usage falls within the TAMU HPRC license, please send an email to HPRC Helpdesk.

## Setting up the environment

To be able to use matlab, the Matlab module needs to be loaded first. This can be done using the following command:

[ netID@cluster ~]$module load Matlab/R2016b

This will setup the environment for Matlab version R2016b. To see a list of all installed versions, use the following command:

[ netID@cluster ~]$module spider Matlab

**Note:** New versions of software become available periodically. Version numbers may change.

## Starting Matlab

To start matlab, use the following command:

[ netID@cluster ~]$matlab

Depending on your X server settings, this will start either the Matlab GUI or the Matlab command line interface. To start Matlab in command line interface mode, use the following command with the appropriate flags:

[ netID@cluster ~]$matlab -nosplash -nodisplay

By default, Matlab will execute a large number of built-in operators and functions multi-threaded and will use as many threads (i.e. cores) as are available on the node. Since login nodes are shared among all users, HPRC restricts the number of computational threads to 8. This should suffice for most cases. Speedup achieved through multi-threading depends on many factors and in certain cases, it is possible that using 8 thread might negatively affect runtime.To explicitly change the number of computational threads, use the following Matlab command:

>>feature('NumThreads',4);

This will set the number of computational threads to 4.

To completely disable multi-threading, use the -singleCompThread option when starting Matlab:

[ netID@cluster ~]$matlab -singleCompThread

## Usage on the Login Nodes

Please limit interactive processing to short, non-intensive usage. Use non-interactive batch jobs for resource-intensive and/or multiple-core processing. Users are requested to be **responsible** and **courteous to other users** when using software on the login nodes.

The most important processing limits here are:

**ONE HOUR**of**PROCESSING TIME**per login session.**EIGHT CORES**per login session on the same node or (cumulatively) across all login nodes.

**Anyone found violating the processing limits will have their processes killed without warning. Repeated violation of these limits will result in account suspension.**

**Note:** Your login session will disconnect after **one hour** of inactivity.

# Running (parallel) Matlab jobs on HPRC resources

When your Matlab job needs more resources than are allowed during an interactive session (e.g. cpu time, number of cores) you can run it on the compute nodes. Normally, you would need to create a batch script and submit it to the batch scheduler (LSF on ada and SLURM on terra). However, HPRC provides a variety of options to run your Matlab jobs on the compute nodes without the need to write (and submit) your own batch script. We will discuss how to do this in the following sections. First, we will introduce TAMUClusterProperties

## TAMUClusterProperties

TAMUClusterProperties is a Matlab class developed by HPRC where you can define properties for your Matlab job. Properties include the number of workers you want to use for parallel processing (e.g. parfor, spmd, distributed), the number of threads, if you want to use Matlab GPU operators. You can also define properties wrt wall time and memory (the latter are needed for the batch scheduler). To create a TAMUClusterProperties object:

>> tp=TAMUClusterProperties();

To see all the methods you can use to set properties, type:

>> help TAMUClusterProperties TAMUClusterProperties class that defines all the cluster properties this class stores all the properties used to submit jobs to HPRC clusters. TAMUClusterProperties methods: workers - returns (and sets) #workers workers_per_node - return (and sets) #workers per node threads - returns (and sets) #threads per worker memory - returns (and sets) total memory gpu - returns (and sets) logical indicating gpu walltime - returns (and sets) walltime hostname - returns (and sets) hostname for remote jobs user - returns (ans sets) username for remote jobs scheduler_string - returns (and sets) specific scheduler commands

### Example 1: Specifying number of workers

Suppose you want to use 4 matlab workers for explicit parallelism (e.g. parfor/spmd/distributed) in your Matlab run. You can use:

>> tp = TAMUClusterProperties; >> tp.workers(4);

### Example 2: Specifying number of computational threads

As mentioned in a previous section Matlab can execute many built-in operators and functions multi-threaded but HPRC imposes a limit of 8 threads when running on the login nodes. When you run on the compute nodes the only limit is the number of cores per node ( 20 on ada and 28 on terra). To set the number of threads to 16

>> tp = TAMUClusterProperties; >> tp.threads(16)

**NOTE**: This will set the number of threads to 16 on the client and **all** Matlab workers (if workers are being used). If the requested number of workers and threads/worker does not fit on a single node, the workers will be distributed over multiple nodes.

### Example 3: Specifying workers on multiple nodes

Suppose you want to run a matlab script that will utilize 4 workers where every worker needs to run on a different node and needs access to a gpu. You also want every worker to use 16 threads.

>> tp = TAMUClusterProperties(); >> tp.workers(4); >> tp.workers_per_node(1); >> tp.threads(16); >> tp.gpu(1);

When you actually run your code every worker can use up to 16 threads and has access to a GPU.

## submitting jobs from a Matlab session

HPRC developed a convenience function to easily submit your matlab jobs from within an interactive Matlab session using the **tamu_run_batch** function.

>> help tamu_run_batch tamu_run_batch runs Matlab script on worker(s). j = TAMU_RUN_BATH(tp,'script') runs the script script.m on the worker(s) using the TAMUClusterProperties object tp. Returns j, a handle to the job object that runs the script.

### Example 1: running a job on the compute nodes

Suppose you have a matlab script named mysimulation.m, the script uses a parfor loop and you want to use 8 workers, and every worker (and the client) can use 2 threads.

>> tp = TAMUClusterProperties(); >> tp.workers(8); >> tp.threads(2); >> myjob=tamu_run_batch(tp,'mysimulation');

## matlabsubmit: submitting jobs from the command line

TAMU HPRC provides a tool named **matlabsubmit** to automate the process of running Matlab simulations on the compute nodes without the need to create your own batch script. This is the recommended way of running Matlab simulations on the compute nodes since it guarantees all batch resources are set correctly. In addition, matlabsubmit will also set the number of Matlab computational threads. If any additional Matlab workers are requested, it will automatically create a Matlab *parpool* using the correct profile (using a *local* profile for single node and a *ClusterProfile* for multiple nodes).

To submit your Matlab script, use the following command:

matlabsubmit myscript.m

When executing, matlabsubmit will do the following:

- generate boiler plate Matlab code to setup the matlab environment (e.g. #threads, #workers)
- generate a batch script with all resources set correctly and the command to run matlab
- submit the generated batch script to the batch scheduler and return control back to the user

In Addition, matlabsubmit will also save the complete workspace after the matlab script finishes executing.

### Example 1: basic use

The following example shows the simplest use of matlabsubmit. It will execute matlab script *test.m* using default values for batch resources and Matlab resources. matlabsubmit will also print some useful information to the screen. As can be seen in the example, it will show the Matlab resources requested (e.g. #threads, #workers), the submit command that will be used to submit the job, the batch scheduler JobID, and the location of output generated by Matlab and the batch scheduler.

-bash-4.1$ matlabsubmit test.m =============================================== Running Matlab script with following parameters ----------------------------------------------- Script : test.m Workers : 0 Nodes : 1 Mem/proc : 2500 #threads : 8 =============================================== bsub -e MatlabSubmitLOG1/lsf.err -o MatlabSubmitLOG1/lsf.out -L /bin/bash -n 8 -R span[ptile=8] -W 02:00 -M 2500 -R rusage[mem=2500] -J test1 MatlabSubmitLOG1/submission_script Verifying job submission parameters... Verifying project account... Account to charge: 082839397478 Balance (SUs): 81535.6542 SUs to charge: 16.0000 Job <2847580> is submitted to default queue <sn_regular>. ----------------------------------------------- matlabsubmit ID : 1 matlab output file : MatlabSubmitLOG1/matlab.log LSF/matlab output file : MatlabSubmitLOG1/lsf.out LSF/matlab error file : MatlabSubmitLOG1/lsf.err -bash-4.1$

The matlab script *test.m* has to be in the current directory. Control will be returned immediately after executing the matlabsubmit command. To check the run status or kill a job, use the respective batch scheduler commands (e.g. **bjobs** and **bkill** on ada). matlabsubmit will create a sub directory named **MatlabSubmitLOG<N>** (where **N** is the matlabsubmit ID). In this directory matlabsubmit will store all its relevant files; the generated batch script, matlab driver, redirected output and error, and a copy of the workspace (after the job is done). A listing of this directory will show the following files:

**lsf.err**redirected error**lsf.out**redirected output (both LSF and Matlab)**matlab.log**redirected Matlab screen output**matlabsubmit_wrapper.m**Matlab code that sets #threads and calls user function**submission_script**the generated LSF batch script**workspace.mat**a copy of the matlab workspace (after execution has finished)

### Options with matlabsubmit

The example above showed the most simple case of using matlabsubmit. No options where specified and matlabsubmit used default values for requested resources. However, matlabsubmit provides a number of options to set batch resources (e.g. walltime, memory) as well as matlab related options (e.g. number of threads to use, number of workers, etc). To see all the available options you can use the "**-h**" option. See below for the output of "**matlabsubmit -h**":

-bash-4.1$ matlabsubmit -h /software/hprc/Matlab/bin/matlabsubmit: option requires an argument -- h Usage: /software/hprc/Matlab/bin/matlabsubmit [options] SCRIPTNAME This tools automates the process of running matlab codes on the compute nodes. OPTIONS: -h Shows this message -m set the amount of requested memory in MEGA bytes(e.g. -m 20000) -t sets the walltime; form hh:mm (e.g. -t 03:27) -w sets the number of ADDITIONAL workers -g indicates script needs GPU (no value needed) -b sets the billing account to use -s set number of threads for multithreading (default: 8 ( 1 when -w > 0) -p set number of workers per node -f run function call instead of script -x add explicit batch scheduler option DEFAULT VALUES: memory : 2500 per core time : 02:00 workers : 0 gpu : no gpu threading: on, 8 threads -bash-4.1$

For example, the command matlabsubmit -t "03:27" -m 17000 -s 20 myscript.m will request 17gb of memory and 3 hours and 27 minutes of computing time. It will also set the number of computational threads in Matlab to 20 and execute the Matlab script myscript.m.

**NOTE** when using the **-f** flag to execute a function instead of a script, the function call must be enclosed with double quotes when it contains parentheses. For example: **matlabsubmit -f "myfunc(21)"**

### Example 2: Utilizing Matlab workers (single node)

To utilize additional workers used by Matlab's parallel features such as *parfor*,*spmd*, and *distributed* matlabsubmit provides the option to specify the number of workers. This is done using the *-w <N>* flag (where <N> represents the number of workers). The following example shows a simple case of using additional workers; in this case 8 workers

-bash-4.1$ matlabsubmit -w 8 test.m =============================================== Running Matlab script with following parameters ----------------------------------------------- Script : test.m Workers : 8 Nodes : 1 Mem/proc : 2500 #threads : 1 =============================================== bsub -e MatlabSubmitLOG5/lsf.err -o MatlabSubmitLOG5/lsf.out -L /bin/bash -n 9 -R span[ptile=9] -W 02:00 -M 2500 -R rusage[mem=2500] -J test5 MatlabSubmitLOG5/submission_script Verifying job submission parameters... Verifying project account... Account to charge: 082839397478 Balance (SUs): 80533.2098 SUs to charge: 18.0000 Job <2901543> is submitted to default queue <sn_regular>. ----------------------------------------------- matlabsubmit ID : 5 matlab output file : MatlabSubmitLOG5/matlab.log LSF/matlab output file : MatlabSubmitLOG5/lsf.out LSF/matlab error file : MatlabSubmitLOG5/lsf.err -bash-4.1$

In this example, matlabsubmit will first execute matlab code to create a *parpool* with 8 workers (using the local profile). As can be seen in the output, in this case, matlabsubmit requests 9 cores: 1 core for the client and 8 cores for the workers. The only exception is when the user requests 20 workers. In that case, matlabsubmit will request 20 cores.

### Example 3: Utilizing Matlab workers (multi node)

matlabsubmit provides excellent options for Matlab runs that need more than 20 workers (maximum for single node) and/or when the Matlab workers need to be distributed among multiple nodes. Reasons for distributing workers among different nodes include: need to use certain resources such as gpu on multiple nodes, enable multi threading on every worker, and use the available memory on multiple nodes. The following example shows how to run a matlab simulation that utilizes 24 workers, where every node will run 4 workers (i.e. the workers will be distributed among 24/4 = 6 nodes).

-bash-4.1$ matlabsubmit -w 24 -p 4 test.m =============================================== Running Matlab script with following parameters ----------------------------------------------- Script : test.m Workers : 24 Nodes : 6 Mem/proc : 2500 #threads : 1 =============================================== ... starting matlab batch. This might take some time. See MatlabSubmitLOG8/matlab-batch-commands.log ...Starting Matlab from host: login4 MATLAB is selecting SOFTWARE OPENGL rendering. < M A T L A B (R) > Copyright 1984-2016 The MathWorks, Inc. R2016a (9.0.0.341360) 64-bit (glnxa64) February 11, 2016 To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. ... Interactive Matlab session, multi threading reduced to 4 Academic License commandToRun = bsub -L /bin/bash -J Job1 -o '/general/home/pennings/Job1/Job1.log' -n 25 -M 2500 -R rusage[mem=2500] -R "span[ptile=4]" -W 02:00 "source /general/home/pennings/Job1/mdce_envvars ; /general/software/x86_64/tamusc/Matlab/toolbox/tamu/profiles/lsfgeneric/communicatingJobWrapper.sh" job = Job Properties: ID: 1 Type: pool Username: pennings State: running SubmitTime: Mon Aug 01 12:15:15 CDT 2016 StartTime: Running Duration: 0 days 0h 0m 0s NumWorkersRange: [25 25] AutoAttachFiles: true Auto Attached Files: /general/home/pennings/MatlabSubmitLOG8/matlabsubmit_wrapper.m /general/home/pennings/test.m AttachedFiles: {} AdditionalPaths: {} Associated Tasks: Number Pending: 25 Number Running: 0 Number Finished: 0 Task ID of Errors: [] Task ID of Warnings: [] ----------------------------------------------- matlabsubmit JOBID : 8 batch output file (client) : Job1/Task1.diary.txt batch output files (workers) : Job1/Task[2-25].diary.txt Done -bash-4.1$

As can be seen the output is very different from the previous examples. When a job uses multiple nodes the approach matlabsubmit uses is a bit different. matlabsubmit will start a regular *interactive* matlab session and from within it will run the Matlab *batch* command using the **TAMUG** cluster profile. It will then exit Matlab while the Matlab script is executed on the compute nodes.

The contents of the MatlabSubmitLOG directory are also slightly different. A listing will show the following files:

**matlab-batch-commands.log**screen output from Matlab**matlabsubmit_driver.m**Matlab code that sets up the cluster profile and calls Matlab*batch***matlabsubmit_wrapper.m**Matlab code that sets #threads and calls user function**submission_script**The actual command to start Matlab

In addition to the MatlabSubmitLOG directory created by matlabsubmit, Matlab will also create a directory named **Job<N>** used by the cluster profile to store meta data, log files, and screen output. The ***.diary.txt** text files will show screen output for the client and all the workers.

### Hybrid jobs (utilize workers and enable multi threading)

By default matlabsubmit will turn off the multi threading features when workers are requested. To override this, use both the **-w** flag and the **-s** flag. In that case the total number of cores matlabsubmit will request is *#workers*#threads + 1*. matlabsubmit will set the number of threads for both the client and all the workers.

## Matlab Cluster Profiles

In addition to the 50 general Matlab licenses, HPRC also purchased a Matlab Distributed Computing Server license for a total 96 tokens. These tokens are used to start additional Matlab workers and are used by parallel Matlab constructs like *parfor*, *spmd*, and *distributed*.

For parallel processing on the compute nodes Matlab uses Cluster profiles. A cluster profile acts as an interface between Matlab and the batch scheduler (e.g. LSF, SLURM) and lets you define certain properties of your cluster (e.g. how to submit jobs, submission parameters, job requirements, etc). Matlab will use the cluster profile to offload parallel (or sequential) matlab code to one or more workers.

For your convenience, HPRC already created a custom Cluster Profile. You can use this profile to define how many workers you want, how you want to distribute the workers over the nodes Before you can use this profile you need to import it first (you only need to do this once). This can be done using by calling the following Matlab function.

>>tamu_import_TAMU_clusterprofile()

This function imports the cluster profile into the workspace and it also creates a sub directory structure in you scratch to store job information for that cluster

We will discuss briefly some of the most common parallel matlab concepts. For more detailed information about these constructs, as well as additional parallel constructs consult the Parallel Computing Toolbox User Guide
matlabpool

The matlabpool functions enables the full functionality of the parallel language features (parfor and spmd, will be discussed below). matlabpool creates a special job on a pool of workers, and connects the pool to the MATLAB client. For example: matlabpool open 4

: :

matlabpool close This code starts a worker pool using the default cluster profile, with 4 additional workers.

NOTE: only instructions within parfor and spmd blocks are executed on the workers. All other instructions are executed on the client.

NOTE: all variables declared inside the matlabpool block will be destroyed once the block is finished.

For more detailed information please visit the Matlab matlabpool page. parfor

The concept of a parfor-loop is similar to the standard Matlab for-loop. The difference is that parfor partitions the iterations among the available workers to run in parallel. For example:

matlabpool open 2 parfor i=1:1024 A(i)=sin((i/1024)*2*pi); end matlabpool close

This code will open a matlab pool with 2 workers using the default cluster profile and execute the loop in parallel.

For more information please visit the Matlab parfor page. spmd

spmd runs the same program on all workers concurrently. A typical use of spmd is when you need to run the same program on multiple sets of input. For example, Suppose you have 4 inputs named data1,data2,data3,data4 and you want run funcion myfun on all of them:

matlabpool open 4 spmd (4) data = load(['data' num2str(labindex)]) myresult = myfun(data) end matlabpool close

NOTE: labindex is a Matlab variable and is set to the worker id, values range from 1 to number of workers.

Every worker will have its own version of variable myresult. To access these variables outside the spmd block you append {i} to the variable name, e.g. myresult{3} represents variable myresult from worker 3.

For more information please visit the Matlab spmd page. batch

The parallel constructs we discussed so far are all interactive, meaning that the client starts the workers and then waits for completion of the job before accepting any other input. The batch command will submit a job and return control back to the client immediately. For example, suppose we want to run the parfor loop from above without waiting for the result. First create a matlab function myloop.m

parfor i=1:1024 A(i)=sin((i/1024)*2*pi); end

To run using the batch command: myjob = batch('myloop','matlabpool',4) This will start the parallel job on the workers and control is returned to the client immediately. To see all your running jobs click on Parallel/Monitor Jobs. Use the wait command, e.g. wait(myjob), to wait for the job to finish, use the load command, e.g. load(myjob), to load all variables from the job into the client workspace.

For more information please visit the Matlab batch page. Using GPU

Normally all variables reside in the client workspace and matlab operations are executed on the client machine (e.g. your desktop, or an eos login node). However, Matlab also provides options to utilize available GPUs to run code faster. Running code on the gpu is actually very straightforward. Matlab provides GPU versions for many build-in operations. These operations are executed on the GPU automatically when the variables involved reside on the GPU. The results of these operations will also reside on the GPU. To see what functions can be run on the GPU type:

methods('gpuArray') This will show a list of all available functions that can be run on the GPU, as well as a list of available static functions to create data on the GPU directly (will be discussed later).

NOTE: There is significant overhead of executing code on the gpu because of memory transfers.

Another useful function is: gpuDevice This functions shows all the properties of the GPU. When this function is called from the client (or a node without a GPU) it will just print an error message. Adjusting Cluster Profile

to use the gpus on EOS we need to adjust the job requirements to make sure the job is scheduled on a node with a gpu, the same way you would do it with a regular eos job.

dcluster = parcluster dcluster.ResourceTemplate='-l nodes=1:ppn=1:gpus=1,walltime=02:00:00,mem=20gb'

The above job requirements are just an example. You can adjust the various properties to suit your needs. More detailed information about changing Profile Properties can be found here Copying between client and GPU

To copy variables from the client workspace to the GPU, you can use the gpuArray command. For example:

carr = ones(1000); garr = gpuArray(carr);

will copy variable carr to the GPU wit name garr. If variable carr is not used in the client workspace you can write it as:

garr = gpuArray(ones(1000));

The two versions have the same problem. They both need to copy the 1000x1000 matrix from client workspace to the GPU. We mentioned above that Matlab provides methods to create data directly on the GPU to avoid the overhead of copying data to the GPU. For example:

garr=gpuArray.ones(1000)

This will create a 1000x1000 matrix directly on the GPU consisting of all ones.

You can find a list of all methods to create data directly on the GPU here.

To copy data back to the client workspace Matlab provides the gather operation.

carr2 = gather(garr)

This will copy the array garr on the GPU back to variable carr2 in the client workspace. Overhead

As mentioned before there is considerable overhead involved when using the GPU. Actually, there are two types of overhead. Warming up GPU (first time GPU is used). Data transfer. Warming up

When the GPU is just starting up computation, there are many things that need to be done, both on the Matlab part and the GPU device itself (e.g. loading libraries, initializing the GPU state, etc). For example: matlabpool open 1 spmd 1 tic gpuArray.ones(10,1); toc end This code only creates a 10x1 array of ones on the GPU device. The first run takes an astounding 21.5 seconds to execute while every successive run only needs about 0.00017 seconds. This shows the huge cost of warming up the GPU.

NOTE:These are running times on EOS. Other systems might have very different timing results. Data transfer

GPU operations in Matlab can only be done when the data is physically located on the GPU device. Therefore data might need to be transferred to the GPU device (and vice versa). This is a significant overhead. For example: spmd 1 tic;ag=gpuArray(ones(10000));toc; end The above code only copies a 10000x10000 matrix from client workspace to GPU device. The time it takes is almost 0.6 seconds. This is a significant overhead. Example

Here is a little example that performs a matrix multiplication on the client, a matrix multiplication on the GPU, and prints out elapsed times for both. The actual cpu-gpu matrix multiplication code can be written as: a = rand(1000); tic; b = a*a; toc; tic; ag = gpuArray(a); bg = ag*ag; toc; c = gather(cg) Almost no additional steps are required to use the gpu. Actually, copying the results to the client workspace is not even needed. Variables that reside on the gpu can be printed or plotted just like variables in the client workspace.

The above code will run without problems if Matlab is installed on a computer with a gpu attached. Since EOS does not have gpus attached to the login nodes (where the client is running) we need to ensure the above code is run on a gpu node. We will show how to do it in interactive mode (using matlabpool), and by using the Matlab batch command.

For convenience the code above is saved as mymatrixmult.m Interactive using matlabpool

A matlabpool needs to be opened since a gpu node is needed and the client is running on one of the login nodes (no gpu available) and mymatrixmult needs to be inside a spmd block to ensure code will actually run on the worker instead of the client (see matlabpool section). The code will be as follows: matlabpool open 1 spmd 1 mymatrixmult end matlabpool close Using Matlab batch command

This example is a basic sequential code (i.e. uses only one cpu core), so in this case a matlabpool is not even needed. The Matlab batch command will start the job on one of the workers (which has a gpu). The code will look as follows: batch('mymatrixmult') Warming up the GPU

there is considerable overhead involved when using the GPU. Besides the data transfer overhead mentioned before, there is another kind of overhead; warming up time. When the GPU is just starting up computation, there are many things that need to be done, both on the Matlab part and the GPU device itself (e.g. loading libraries, initializing the GPU state, etc). To get an indication how much time is needed look at the following example:

matlabpool open 1 spmd 1 tic gpuArray.ones(10,1); toc end This code only creates a 10x1 array of ones on the GPU device. The first run takes 0.026 seconds to execute while every sucessive run only needs about 0.00017 seconds (of course different runs will produce slightly different results). This shows the huge cost of warming up the GPU .

NOTE:These are running times on EOS. Other systems might have very different timing results.