💻 A Proposed Deep Learning Framework for Air Quality Forecasts, Combining Localized Particle Concentration Measurements and Meteorological Data
This project contains two distinct deep learning models for AQI predictions.
The following is a layout showing how the files are distributed.
Implementations - File Map
Before executing any file, you should keep in mind that this particular file arrangement is meant to separate the different purposes of each file, and that some files are interdependent. So, for the correct execution to take place, the specific data files and execution files must be located in the same folder. Therefore, the user needs to move them into dedicated folders depending on their intended use.
📁Project
📁available_data
| Info |
File |
| Csv with all raw data used |
all_data.csv |
| Csv with only raw aqi data |
aqi_data.csv |
| Csv with all raw available measurements (used & unused) |
complete_data.scv |
| Csv with only raw input data |
input_data.csv |
| Csv with only aqi data after preprocessing |
processed_aqi.csv |
| Csv with only input data after preprocessing |
processed_dataset.csv |
|
📁slideNN
| File/Folder |
|
|
|
| 📁codes |
Info |
File |
|
|
General slideNN model, hyperparameters left to the user |
slideNN.py |
|
|
SlideNN model with the first input/output configuration 64->2 |
slideNN_1.py |
|
|
SlideNN model with the second input/output configuration 128->4 |
slideNN_2.py |
|
|
SlideNN model with the third input/output configuration 256->8 |
slideNN_3.py |
|
|
SlideNN model with the fourth input/output configuration 512->16 |
slideNN_4.py |
|
| 📁 slideNN_data |
|
|
|
|
Seperate data into the input and output dataframes |
dataframes.py |
|
|
Make a graph of the reshaping process for each model |
model_graph.py |
|
|
Preprocess data (input and output) with the according method |
preprocess_data.py |
|
|
Reshape the aqi data into the desired shape for slideNN's architecture |
reshaping_aqi.py |
|
|
Reshape the input data into the desired shape for slideNN's architecture |
reshaping_models_input.py |
|
|
📁GRU
| File/Folder |
|
|
|
|
| 📁codes |
|
Info |
File |
|
|
|
General GRU model, hyperparameters left to the user |
WideGRU.py |
|
|
|
GRU model with the first input/output configuration 64->2 in an edge setting |
WideGRU_64.py |
|
|
|
GRU model with the first input/output configuration 64->2 in a cloud setting |
WideGRU_64_cloud.py |
|
|
|
GRU model with the second input/output configuration 128->4 in an edge setting |
WideGRU_128.py |
|
|
|
GRU model with the second input/output configuration 128->4 in an cloud setting |
WideGRU_128_cloud.py |
|
|
|
GRU model with the third input/output configuration 256->8 in an edge setting |
WideGRU_256.py |
|
|
|
GRU model with the third input/output configuration 256->8 in a cloud setting |
WideGRU_256_cloud.py |
|
|
|
GRU model with the fourth input/output configuration 512->16 in an edge setting |
WideGRU_512.py |
|
|
|
GRU model with the fourth input/output configuration 512->16 in a cloud setting |
WideGRU_512_cloud.py |
|
| 📁GRU_data |
|
|
|
|
|
|
Generates random data to test this model |
generate_data.py |
|
|
📁data |
|
|
|
|
|
Dataset containg only the input for testing |
X_test.csv |
|
|
|
Dataset containg only the input for training |
X_train.csv |
|
|
|
Dataset containg only the output for testing |
y_test.csv |
|
|
|
Dataset containg only the output for training |
y_train.csv |
|
|
📁data_64 |
|
|
|
|
|
Dataset of the first configuration containg only the input for testing |
X_test.csv |
|
|
|
Dataset of the first configuration containg only the input for training |
X_train.csv |
|
|
|
Dataset of the first configuration containg only the output for testing |
y_test.csv |
|
|
|
Dataset of the first configuration containg only the output for training |
y_train.csv |
|
|
📁data_128 |
|
|
|
|
|
Dataset of the second configuration containg only the input for testing |
X_test.csv |
|
|
|
Dataset of the second configuration containg only the input for training |
X_train.csv |
|
|
|
Dataset of the second configuration containg only the output for testing |
y_test.csv |
|
|
|
Dataset of the second configuration containg only the output for training |
y_train.csv |
|
|
📁data_256 |
|
|
|
|
|
Dataset of the third configuration containg only the input for testing |
X_test.csv |
|
|
|
Dataset of the third configuration containg only the input for training |
X_train.csv |
|
|
|
Dataset of the third configuration containg only the output for testing |
y_test.csv |
|
|
|
Dataset of the third configuration containg only the output for training |
y_train.csv |
|
|
📁data_512 |
|
|
|
|
|
Dataset of the fourth configuration containg only the input for testing |
X_test.csv |
|
|
|
Dataset of the fourth configuration containg only the input for training |
X_train.csv |
|
|
|
Dataset of the fourth configuration containg only the output for testing |
y_test.csv |
|
|
|
Dataset of the fourth configuration containg only the output for training |
y_train.csv |
|
|
🛠️ Compilation-Execution
1. Setting up a Python Environment on CMD
Before running your Python file, it’s a good practice to create a virtual environment. This ensures that your project dependencies remain isolated from the global Python installation.
Step 1: Navigate to Your Project Directory
cd path\to\your\project
Step 2: Create a Virtual Environment
python -m venv venv
This will create a folder named venv that contains the virtual environment.
Step 3: Activate the Virtual Environment
On Windows CMD:
venv\Scripts\activate
On PowerShell:
.\venv\Scripts\Activate
Once activated, you should see (venv) at the beginning of your command prompt line.
Step 4: Install Dependencies
This project requires specific libraries\packages, install them using:
pip install name_of_package
3. Running the Python File in CMD
Once the environment is ready:
Step 1: Ensure You Are Inside the Virtual Environment
You should see (venv) in your terminal.
Step 2: Run the File
python filename.py
Step 2.5: For files with extra command-line arguments
The script generate_data.py uses command-line arguments via argparse. You need to provide:
-
--X_file: Path to the input X CSV file
-
--y_file: Path to the input y CSV file
-
--seq_length: Length of each sequence (default 8)
-
--forecast_steps: Forecast horizon (default 2)
-
--test_size: Fraction for testing (default 0.2)
-
--output_dir: Directory to save outputs (default ./data)
Example Run
python generate_data.py --X_file ./input/X.csv --y_file ./input/y.csv --seq_length 8 --forecast_steps 2 --test_size 0.2 --output_dir ./processed_data
4. Output
The script generates and saves 4 CSV files in the specified output directory:
-
X_train.csv
-
X_test.csv
-
y_train.csv
-
y_test.csv
It will also print the shapes of the train/test sets.