You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Sotirios Kontogiannis 233b7588e9 first commit 5 months ago
ARM first commit 5 months ago
CPU first commit 5 months ago
GPU first commit 5 months ago
Roadmap first commit 5 months ago
README.md first commit 5 months ago
arch.png first commit 5 months ago

README.md

💻 Matrix Computations on Diverse Platforms: A Multiprocessing Evaluation Across CPU, GPU and ARM Architectures

This project contains multiple matrix addition and multiplication implementations using different computing machines and methods.
The following is a layout showing how the files are distributed.

Implementations - File Map

📁CPU


📁addition

Method File
Single Core single_add.cpp
mmap + n forks mmap_add.cpp
mmap + 24 forks mmap_24_add.cpp
mmap + 100 forks mmap_100_add.cpp
100 Threads threads_add.cpp

📁multiplication

Method File/Folder
Single Core single_mul.cpp
📁Prolific Forking Number of Processes File
n processes mmap_pr_n_mul.cpp
24 processes mmap_pr_24_mul.cpp
100 processes mmap_pr_100_mul.cpp
📁Collective Forking
n processes mmap_c_n_mul.cpp
24 processes mmap_c_24_mul.cpp
100 processes mmap_c_100_mul.cpp
100 Threads threads_100_mul.cpp
n Threads threads_n_mul.cpp
OpenMP mul_openMP.cpp

📁ARM


📁addition

Method File
Single Core single_add.cpp
mmap + n forks mmap_add.cpp
mmap + 24 forks mmap_24_add.cpp
mmap + 100 forks mmap_100_add.cpp
100 Threads threads_add.cpp

📁multiplication

Method File/Folder
Single Core single_mul.cpp
📁Prolific Forking Number of Processes File
n processes mmap_pr_n_mul.cpp
24 processes mmap_pr_24_mul.cpp
100 processes mmap_pr_100_mul.cpp
📁Collective Forking
n processes mmap_c_n_mul.cpp
24 processes mmap_c_24_mul.cpp
100 processes mmap_c_100_mul.cpp
100 Threads threads_100_mul.cpp
n Threads threads_n_mul.cpp
OpenMP mul_openMP.cpp
OpenMPI mpimul_openMP.cpp

📁GPU


Operation Method File
Addition GPU (CUDA) multi_gpu_matrix_add_mmap.cu
Multiplication GPU (CUDA) multi_gpu_matrix_mul_mmap.cu

📁Roadmap

Implementation File
Summa Algorithm summa.cpp
Strassen Algorithm strassen.cpp
Cannon Algoritmhm cannon.cpp
Fox Algorithm fox.cpp

🛠️ Requirements

Architecture Diagram
*NFS: Network File System
**NIS: Network Information Service

x86-64

  • Linux Ubuntu LTS
  • C++ boost library
  • OpenMP library for C++
  • OpenMPI
  • C++ NVIDIA CUDA SDK any version later than 10

ARM

At least 5 Raspberry Pi4

  • Linux Ubuntu Mate LTS 64bit
  • 512MB
  • Cloud Core

GPU

At least 4 GeForce GT 630 :

  • 980MHz
  • 981MB RAM

One GeForce RTX 3060Ti :

  • 1.6GHz
  • 8GB RAM

🛠️ Compilation-Execution

Each folder contains a corresponding Makefile for the specific machine it is intended to run on. After downloading the appropriate folder, the user should run

make clean

before compiling for the first time. This ensures a clean build and helps avoid possible build errors. Then, by running

make

all files are compiled, and on subsequent runs, only the files that have been changed are recompiled. In CPU and ARM versions of our code about threads and processes, the program expects the user to input the matrix size directly via the command line when starting the program (this is handled using atoi to convert the input to an integer).
For example, if we want to run "single_mul.cpp" with n = 1000:

./single_mul 1000

With the according output :

n,setup(us),exec*(us)1000,244,14474

In the GPU version, once the user starts the program, it will prompt for the matrix size, followed by a prompt to select the number of available GPUs to distribute the workload across. In the same way as in the previous versions, if we want to run "multi_gpu_matrix_mul_mmap.cu":

./multi_gpu_matrix_mul_mmap

After that, the following messages will be displayed, asking the user to input the matrix size(n) and the available GPUs to use:

Enter matrix size (N): 1000
Enter number of GPUs (K): 4

then the according output will appear:

Setup Time:47ms
Comp Time: 1642 ms

The OpenMP scenario works similarly to the GPU's one. The only difference is that the user is asked to enter the chunk size instead of the number of GPUs, so the message looks like this:

Enter matrix size N: 1000
Enter chunk size: 100

Then, as the program is made to compute with all three OpenMP scheduling methods, the output bellow will be appeared:

Schedule: static (chunk = 100), Time: 627 ms
Schedule: dynamic (chunk = 100), Time: 606 ms
Schedule: guided (chunk = 100), Time: 614 ms

Afterwards, MPI's compilation is done as follows:

mpic++ mpimul_openmp.cpp -fopenmp -lboost_chrono -lboost_system -o mpimul

The user can then run the program using the following command

mpirun --hostfile /etc/hostfile --mca btl_tcp_if_exclude docker0,lo -np 13 ./mpimul 1000