|
|
5 months ago | |
|---|---|---|
| ARM | 5 months ago | |
| CPU | 5 months ago | |
| GPU | 5 months ago | |
| Roadmap | 5 months ago | |
| README.md | 5 months ago | |
| arch.png | 5 months ago | |
This project contains multiple matrix addition and multiplication implementations using different computing machines and methods.
The following is a layout showing how the files are distributed.
📁addition
|
📁multiplication
|
📁addition
|
📁multiplication
|
| Operation | Method | File |
|---|---|---|
| Addition | GPU (CUDA) | multi_gpu_matrix_add_mmap.cu |
| Multiplication | GPU (CUDA) | multi_gpu_matrix_mul_mmap.cu |
| Implementation | File |
|---|---|
| Summa Algorithm | summa.cpp |
| Strassen Algorithm | strassen.cpp |
| Cannon Algoritmhm | cannon.cpp |
| Fox Algorithm | fox.cpp |

*NFS: Network File System
**NIS: Network Information Service
At least 5 Raspberry Pi4
At least 4 GeForce GT 630 :
One GeForce RTX 3060Ti :
Each folder contains a corresponding Makefile for the specific machine it is intended to run on. After downloading the appropriate folder, the user should run
make clean
before compiling for the first time. This ensures a clean build and helps avoid possible build errors. Then, by running
make
all files are compiled, and on subsequent runs, only the files that have been changed are recompiled.
In CPU and ARM versions of our code about threads and processes, the program expects the user to input the matrix size directly via the command line when starting the program (this is handled using atoi to convert the input to an integer).
For example, if we want to run "single_mul.cpp" with n = 1000:
./single_mul 1000
With the according output :
n,setup(us),exec*(us)1000,244,14474
In the GPU version, once the user starts the program, it will prompt for the matrix size, followed by a prompt to select the number of available GPUs to distribute the workload across. In the same way as in the previous versions, if we want to run "multi_gpu_matrix_mul_mmap.cu":
./multi_gpu_matrix_mul_mmap
After that, the following messages will be displayed, asking the user to input the matrix size(n) and the available GPUs to use:
Enter matrix size (N): 1000
Enter number of GPUs (K): 4
then the according output will appear:
Setup Time:47ms
Comp Time: 1642 ms
The OpenMP scenario works similarly to the GPU's one. The only difference is that the user is asked to enter the chunk size instead of the number of GPUs, so the message looks like this:
Enter matrix size N: 1000
Enter chunk size: 100
Then, as the program is made to compute with all three OpenMP scheduling methods, the output bellow will be appeared:
Schedule: static (chunk = 100), Time: 627 ms
Schedule: dynamic (chunk = 100), Time: 606 ms
Schedule: guided (chunk = 100), Time: 614 ms
Afterwards, MPI's compilation is done as follows:
mpic++ mpimul_openmp.cpp -fopenmp -lboost_chrono -lboost_system -o mpimul
The user can then run the program using the following command
mpirun --hostfile /etc/hostfile --mca btl_tcp_if_exclude docker0,lo -np 13 ./mpimul 1000