{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": [ "# 预测建模\n", "北京市空气质量指数预测(推荐难度系数10)\n", "\n", "这个数据集是北京市2022年11月1日至2023年10月31日期间空气质量相关数据。\n", "根据这个数据集,回答以下问题" ], "id": "b610f839dca4877" }, { "cell_type": "code", "id": "initial_id", "metadata": { "collapsed": true, "ExecuteTime": { "end_time": "2025-03-22T07:55:04.926730Z", "start_time": "2025-03-22T07:55:03.071940Z" } }, "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from calculate import *\n", "from heatmap import *" ], "outputs": [], "execution_count": 1 }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:05.632142Z", "start_time": "2025-03-22T07:55:04.941177Z" } }, "cell_type": "code", "source": [ "#读取数据\n", "data=pd.read_excel('北京市空气质量指数与气象数据.xlsx')\n", "data.head()" ], "id": "92ea7ba1218799cd", "outputs": [ { "data": { "text/plain": [ " date hour AQI CO NO2 O3 PM10 \\\n", "0 2022-11-01 2 18.371429 0.211429 23.771429 29.057143 13.257143 \n", "1 2022-11-01 5 21.914286 0.180000 26.571429 20.142857 18.914286 \n", "2 2022-11-01 8 28.628571 0.311429 30.028571 14.285714 27.942857 \n", "3 2022-11-01 11 19.000000 0.237143 17.971429 40.529412 17.852941 \n", "4 2022-11-01 14 21.742857 0.252941 15.588235 53.617647 20.941176 \n", "\n", " PM2.5 SO2 T ... P Pa U Ff Tn Tx VV Td \\\n", "0 3.057143 2.628571 6.7 ... 770.5 0.1 36.0 1.0 5.3 17.3 30.0 -7.3 \n", "1 3.771429 2.542857 2.0 ... 770.8 0.3 62.0 0.0 1.9 17.3 7.0 -4.5 \n", "2 6.857143 2.400000 6.6 ... 771.7 0.9 56.0 0.0 0.9 17.3 10.0 -7.1 \n", "3 5.914286 2.176471 13.5 ... 771.3 -0.4 19.0 2.0 0.9 17.3 30.0 -9.7 \n", "4 6.742857 2.000000 15.7 ... 768.6 -2.7 19.0 2.0 0.9 17.3 30.0 -7.9 \n", "\n", " RRR tR \n", "0 0.0 12 \n", "1 0.0 12 \n", "2 0.0 12 \n", "3 0.0 12 \n", "4 0.0 12 \n", "\n", "[5 rows x 21 columns]" ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datehourAQICONO2O3PM10PM2.5SO2T...PPaUFfTnTxVVTdRRRtR
02022-11-01218.3714290.21142923.77142929.05714313.2571433.0571432.6285716.7...770.50.136.01.05.317.330.0-7.30.012
12022-11-01521.9142860.18000026.57142920.14285718.9142863.7714292.5428572.0...770.80.362.00.01.917.37.0-4.50.012
22022-11-01828.6285710.31142930.02857114.28571427.9428576.8571432.4000006.6...771.70.956.00.00.917.310.0-7.10.012
32022-11-011119.0000000.23714317.97142940.52941217.8529415.9142862.17647113.5...771.3-0.419.02.00.917.330.0-9.70.012
42022-11-011421.7428570.25294115.58823553.61764720.9411766.7428572.00000015.7...768.6-2.719.02.00.917.330.0-7.90.012
\n", "

5 rows × 21 columns

\n", "
" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 2 }, { "metadata": {}, "cell_type": "markdown", "source": [ "## 题目1\n", "研究单日内空气质量指数与各项指标的变化趋势,这种趋势是否具有周期性?" ], "id": "bca65e544d8bef55" }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:05.749697Z", "start_time": "2025-03-22T07:55:05.746320Z" } }, "cell_type": "code", "source": [ "#数据预处理:将数据按小时分组,计算每个小时各指标的平均值\n", "\n", "#可视化:绘制各指标小时均值的折线图,观察是否存在规律性波动\n" ], "id": "5f8e89a8d1561e4f", "outputs": [], "execution_count": 3 }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:05.777089Z", "start_time": "2025-03-22T07:55:05.774038Z" } }, "cell_type": "code", "source": "#ACF检验周期性\n", "id": "4521bfa63d480997", "outputs": [], "execution_count": 4 }, { "metadata": {}, "cell_type": "markdown", "source": [ "## 题目2\n", "简述各项指标间的相互关系。" ], "id": "59e20f3463e819a6" }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:05.992326Z", "start_time": "2025-03-22T07:55:05.988969Z" } }, "cell_type": "code", "source": [ "#计算相关系数矩阵\n", "\n", "#绘制热力图\n" ], "id": "c917d14115569bcd", "outputs": [], "execution_count": 5 }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:06.153442Z", "start_time": "2025-03-22T07:55:06.150747Z" } }, "cell_type": "code", "source": "#因子分析(PCA)\n", "id": "509d783a82bbdcb2", "outputs": [], "execution_count": 6 }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:06.261340Z", "start_time": "2025-03-22T07:55:06.258833Z" } }, "cell_type": "code", "source": "#多元线性回归(我试试玩的)\n", "id": "bb2d87337f46df", "outputs": [], "execution_count": 7 }, { "metadata": {}, "cell_type": "markdown", "source": [ "## 题目3\n", "令2022年11月1日至2023年9月30日的空气质量数据为训练集,剩余数据为测试集。基于训练集,尝试使用两种不同的方法构建空气质量指数预测模型,并在测试集上测试。比较所选模型的预测效果。" ], "id": "3f89fa62a897a3e3" }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:06.414915Z", "start_time": "2025-03-22T07:55:06.410784Z" } }, "cell_type": "code", "source": "#数据划分:训练集:2022-11-01至2023-09-30,测试集:2023-10-1至2023-10-31。\n", "id": "d1bdac1e4e1562f2", "outputs": [], "execution_count": 8 }, { "metadata": {}, "cell_type": "markdown", "source": "### (1)SARIMA模型", "id": "75bc1cfcc85f60a7" }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:06.452015Z", "start_time": "2025-03-22T07:55:06.446830Z" } }, "cell_type": "code", "source": [ "\"\"\"\n", "该模型在假设不知道测试集其他指标的情况下,仅使用AQI历史数据预测未来AQI\n", "\"\"\"\n", "\n", "#训练模型\n", "\n", "#输出预测与实际AQI的对比图\n", "\n", "#计算拟合度\n" ], "id": "24996a0c06820cdc", "outputs": [ { "data": { "text/plain": [ "'\\n该模型在假设不知道测试集其他指标的情况下,仅使用AQI历史数据预测未来AQI\\n'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 9 }, { "metadata": {}, "cell_type": "markdown", "source": "### (2)XGBOOST模型", "id": "ebe88094b6c13e0c" }, { "metadata": { "ExecuteTime": { "end_time": "2025-03-22T07:55:06.482520Z", "start_time": "2025-03-22T07:55:06.477496Z" } }, "cell_type": "code", "source": [ "\"\"\"\n", "该模型在同样未知测试集其他指标的情况下,考虑到训练集的多种参数预测未来AQI\n", "\"\"\"\n", "\n", "#训练模型\n", "\n", "#输出预测与实际AQI的对比图\n", "\n", "#计算拟合度\n" ], "id": "66f104e110aba36", "outputs": [ { "data": { "text/plain": [ "'\\n该模型在同样未知测试集其他指标的情况下,考虑到训练集的多种参数预测未来AQI\\n'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 10 } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }