GRU and slide-NN models for AQI predictions
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
Sotirios Kontogiannis 814648a47c Upload files to 'presentation' 2 months ago
GRU secont commit 2 months ago
data secont commit 2 months ago
presentation Upload files to 'presentation' 2 months ago
slideNN secont commit 2 months ago
README.md first commit 2 months ago

README.md

💻 A Proposed Deep Learning Framework for Air Quality Forecasts, Combining Localized Particle Concentration Measurements and Meteorological Data

This project contains two distinct deep learning models for AQI predictions.
The following is a layout showing how the files are distributed.

Implementations - File Map

Before executing any file, you should keep in mind that this particular file arrangement is meant to separate the different purposes of each file, and that some files are interdependent. So, for the correct execution to take place, the specific data files and execution files must be located in the same folder. Therefore, the user needs to move them into dedicated folders depending on their intended use.

📁Project


📁available_data

Info File
Csv with all raw data used all_data.csv
Csv with only raw aqi data aqi_data.csv
Csv with all raw available measurements (used & unused) complete_data.scv
Csv with only raw input data input_data.csv
Csv with only aqi data after preprocessing processed_aqi.csv
Csv with only input data after preprocessing processed_dataset.csv

📁slideNN

File/Folder
📁codes Info File
General slideNN model, hyperparameters left to the user slideNN.py
SlideNN model with the first input/output configuration 64->2 slideNN_1.py
SlideNN model with the second input/output configuration 128->4 slideNN_2.py
SlideNN model with the third input/output configuration 256->8 slideNN_3.py
SlideNN model with the fourth input/output configuration 512->16 slideNN_4.py
📁 slideNN_data
Seperate data into the input and output dataframes dataframes.py
Make a graph of the reshaping process for each model model_graph.py
Preprocess data (input and output) with the according method preprocess_data.py
Reshape the aqi data into the desired shape for slideNN's architecture reshaping_aqi.py
Reshape the input data into the desired shape for slideNN's architecture reshaping_models_input.py

📁GRU

File/Folder
📁codes Info File
General GRU model, hyperparameters left to the user WideGRU.py
GRU model with the first input/output configuration 64->2 in an edge setting WideGRU_64.py
GRU model with the first input/output configuration 64->2 in a cloud setting WideGRU_64_cloud.py
GRU model with the second input/output configuration 128->4 in an edge setting WideGRU_128.py
GRU model with the second input/output configuration 128->4 in an cloud setting WideGRU_128_cloud.py
GRU model with the third input/output configuration 256->8 in an edge setting WideGRU_256.py
GRU model with the third input/output configuration 256->8 in a cloud setting WideGRU_256_cloud.py
GRU model with the fourth input/output configuration 512->16 in an edge setting WideGRU_512.py
GRU model with the fourth input/output configuration 512->16 in a cloud setting WideGRU_512_cloud.py
📁GRU_data
Generates random data to test this model generate_data.py
📁data
Dataset containg only the input for testing X_test.csv
Dataset containg only the input for training X_train.csv
Dataset containg only the output for testing y_test.csv
Dataset containg only the output for training y_train.csv
📁data_64
Dataset of the first configuration containg only the input for testing X_test.csv
Dataset of the first configuration containg only the input for training X_train.csv
Dataset of the first configuration containg only the output for testing y_test.csv
Dataset of the first configuration containg only the output for training y_train.csv
📁data_128
Dataset of the second configuration containg only the input for testing X_test.csv
Dataset of the second configuration containg only the input for training X_train.csv
Dataset of the second configuration containg only the output for testing y_test.csv
Dataset of the second configuration containg only the output for training y_train.csv
📁data_256
Dataset of the third configuration containg only the input for testing X_test.csv
Dataset of the third configuration containg only the input for training X_train.csv
Dataset of the third configuration containg only the output for testing y_test.csv
Dataset of the third configuration containg only the output for training y_train.csv
📁data_512
Dataset of the fourth configuration containg only the input for testing X_test.csv
Dataset of the fourth configuration containg only the input for training X_train.csv
Dataset of the fourth configuration containg only the output for testing y_test.csv
Dataset of the fourth configuration containg only the output for training y_train.csv

🛠️ Compilation-Execution

1. Setting up a Python Environment on CMD

Before running your Python file, it’s a good practice to create a virtual environment. This ensures that your project dependencies remain isolated from the global Python installation.

Step 1: Navigate to Your Project Directory

cd path\to\your\project

Step 2: Create a Virtual Environment

python -m venv venv

This will create a folder named venv that contains the virtual environment.

Step 3: Activate the Virtual Environment

On Windows CMD:

venv\Scripts\activate

On PowerShell:

.\venv\Scripts\Activate

Once activated, you should see (venv) at the beginning of your command prompt line.

Step 4: Install Dependencies

This project requires specific libraries\packages, install them using:

pip install name_of_package

3. Running the Python File in CMD

Once the environment is ready:

Step 1: Ensure You Are Inside the Virtual Environment

You should see (venv) in your terminal.

Step 2: Run the File

python filename.py

Step 2.5: For files with extra command-line arguments

The script generate_data.py uses command-line arguments via argparse. You need to provide:

  • --X_file: Path to the input X CSV file

  • --y_file: Path to the input y CSV file

  • --seq_length: Length of each sequence (default 8)

  • --forecast_steps: Forecast horizon (default 2)

  • --test_size: Fraction for testing (default 0.2)

  • --output_dir: Directory to save outputs (default ./data)

Example Run

python generate_data.py --X_file ./input/X.csv --y_file ./input/y.csv --seq_length 8 --forecast_steps 2 --test_size 0.2 --output_dir ./processed_data

4. Output

The script generates and saves 4 CSV files in the specified output directory:

  • X_train.csv

  • X_test.csv

  • y_train.csv

  • y_test.csv

It will also print the shapes of the train/test sets.