Browse Source

first commit

master
Sotirios Kontogiannis 2 months ago
commit
0718cf0dc3
1 changed files with 184 additions and 0 deletions
  1. +184
    -0
      README.md

+ 184
- 0
README.md View File

@ -0,0 +1,184 @@
# 💻 A Proposed Deep Learning Framework for Air Quality Forecasts, Combining Localized Particle Concentration Measurements and Meteorological Data
This project contains two distinct deep learning models for AQI predictions. \
The following is a layout showing how the files are distributed.
## Implementations - File Map
Before executing any file, you should keep in mind that this particular file arrangement is meant to separate the different purposes of each file, and that some files are interdependent. So, for the correct execution to take place, the specific data files and execution files must be located in the same folder. Therefore, the user needs to move them into dedicated folders depending on their intended use.
### 📁Project
---
<table>
<tr>
<td style="vertical-align: top; padding-right: 40px;">
#### 📁available_data
| Info | File |
|---------------------|-----------------------|
| Csv with all raw data used | `all_data.csv` |
| Csv with only raw aqi data | `aqi_data.csv` |
| Csv with all raw available measurements (used & unused) | `complete_data.scv` |
| Csv with only raw input data | `input_data.csv` |
| Csv with only aqi data after preprocessing | `processed_aqi.csv` |
| Csv with only input data after preprocessing | `processed_dataset.csv` |
</td>
<td style="padding-left: 40px;">
#### 📁slideNN
| File/Folder | | | |
|------------------------------|--------------------|-------------------------|--|
|📁codes |Info | File ||
| |General slideNN model, hyperparameters left to the user | `slideNN.py` ||
| |SlideNN model with the first input/output configuration 64->2 | `slideNN_1.py` ||
| |SlideNN model with the second input/output configuration 128->4 | `slideNN_2.py` ||
| |SlideNN model with the third input/output configuration 256->8 | `slideNN_3.py` ||
| |SlideNN model with the fourth input/output configuration 512->16 | `slideNN_4.py` ||
|📁 slideNN_data | | ||
| |Seperate data into the input and output dataframes | `dataframes.py` ||
| |Make a graph of the reshaping process for each model | `model_graph.py` ||
| |Preprocess data (input and output) with the according method | `preprocess_data.py` ||
| |Reshape the aqi data into the desired shape for slideNN's architecture | `reshaping_aqi.py` ||
| |Reshape the input data into the desired shape for slideNN's architecture | `reshaping_models_input.py` ||
</td>
</tr>
</table>
<table>
<tr>
<td style="padding-left: 40px;">
#### 📁GRU
| File/Folder | | | | |
|------------------------------|--------------------|-------------------------|--|---|
|📁codes | |Info | File ||
| | |General GRU model, hyperparameters left to the user | `WideGRU.py` ||
| | |GRU model with the first input/output configuration 64->2 in an edge setting | `WideGRU_64.py` ||
| | |GRU model with the first input/output configuration 64->2 in a cloud setting | `WideGRU_64_cloud.py` ||
| | |GRU model with the second input/output configuration 128->4 in an edge setting | `WideGRU_128.py` ||
| | |GRU model with the second input/output configuration 128->4 in an cloud setting | `WideGRU_128_cloud.py` ||
| | |GRU model with the third input/output configuration 256->8 in an edge setting | `WideGRU_256.py` ||
| | |GRU model with the third input/output configuration 256->8 in a cloud setting | `WideGRU_256_cloud.py` ||
| | |GRU model with the fourth input/output configuration 512->16 in an edge setting | `WideGRU_512.py` ||
| | |GRU model with the fourth input/output configuration 512->16 in a cloud setting | `WideGRU_512_cloud.py` ||
|📁GRU_data | | | ||
| | |Generates random data to test this model | `generate_data.py` ||
| | 📁data | | ||
| | |Dataset containg only the input for testing | `X_test.csv` ||
| | |Dataset containg only the input for training | `X_train.csv` ||
| | |Dataset containg only the output for testing | `y_test.csv` ||
| | |Dataset containg only the output for training | `y_train.csv` ||
| | 📁data_64 | | ||
| | |Dataset of the first configuration containg only the input for testing | `X_test.csv` ||
| | |Dataset of the first configuration containg only the input for training | `X_train.csv` ||
| | |Dataset of the first configuration containg only the output for testing | `y_test.csv` ||
| | |Dataset of the first configuration containg only the output for training | `y_train.csv` ||
| | 📁data_128 | | ||
| | |Dataset of the second configuration containg only the input for testing | `X_test.csv` ||
| | |Dataset of the second configuration containg only the input for training | `X_train.csv` ||
| | |Dataset of the second configuration containg only the output for testing | `y_test.csv` ||
| | |Dataset of the second configuration containg only the output for training | `y_train.csv` ||
| | 📁data_256 | | ||
| | |Dataset of the third configuration containg only the input for testing | `X_test.csv` ||
| | |Dataset of the third configuration containg only the input for training | `X_train.csv` ||
| | |Dataset of the third configuration containg only the output for testing | `y_test.csv` ||
| | |Dataset of the third configuration containg only the output for training | `y_train.csv` ||
| | 📁data_512 | | ||
| | |Dataset of the fourth configuration containg only the input for testing | `X_test.csv` ||
| | |Dataset of the fourth configuration containg only the input for training | `X_train.csv` ||
| | |Dataset of the fourth configuration containg only the output for testing | `y_test.csv` ||
| | |Dataset of the fourth configuration containg only the output for training | `y_train.csv` ||
</td>
</tr>
</table>
## 🛠️ Compilation-Execution
**1. Setting up a Python Environment on CMD**
Before running your Python file, it’s a good practice to create a virtual environment. This ensures that your project dependencies remain isolated from the global Python installation.
**Step 1: Navigate to Your Project Directory**
```bash
cd path\to\your\project
```
**Step 2: Create a Virtual Environment**
```bash
python -m venv venv
```
This will create a folder named `venv` that contains the virtual environment.
**Step 3: Activate the Virtual Environment**
On Windows CMD:
```bash
venv\Scripts\activate
```
On PowerShell:
```bash
.\venv\Scripts\Activate
```
Once activated, you should see `(venv)` at the beginning of your command prompt line.
**Step 4: Install Dependencies**
This project requires specific libraries\packages, install them using:
```bash
pip install name_of_package
```
**3. Running the Python File in CMD**
Once the environment is ready:
**Step 1: Ensure You Are Inside the Virtual Environment**
You should see `(venv)` in your terminal.
**Step 2: Run the File**
```bash
python filename.py
```
**Step 2.5: For files with extra command-line arguments**
The script `generate_data.py` uses command-line arguments via argparse. You need to provide:
* `--X_file`: Path to the input X CSV file
* `--y_file`: Path to the input y CSV file
* `--seq_length`: Length of each sequence (default 8)
* `--forecast_steps`: Forecast horizon (default 2)
* `--test_size`: Fraction for testing (default 0.2)
* `--output_dir`: Directory to save outputs (default ./data)
**Example Run**
```bash
python generate_data.py --X_file ./input/X.csv --y_file ./input/y.csv --seq_length 8 --forecast_steps 2 --test_size 0.2 --output_dir ./processed_data
```
**4. Output**
The script generates and saves 4 CSV files in the specified output directory:
* `X_train.csv`
* `X_test.csv`
* `y_train.csv`
* `y_test.csv`
It will also print the shapes of the train/test sets.

Loading…
Cancel
Save