jupyterノートブックを公開いたします（ Public２位、Private１位）

datascience

jupyterノートブックを公開いたします（ Public２位、Private１位）

皆様お疲れさまでした。また運営のみなさまありがとうございました。
いろいろと試し、楽しく参加させていただきました。
コードを公開いたしますます。

環境

　Python 3.9.8
　pandas 2.0.3
　numpy 1.21.6
　matplotlib 3.5.2
　sklearn 1.1.1
　tensorflow 2.11.0

手法の概要

　LSTM４層＋全結合層２層の深層学習モデルです。

工夫した点

　１）休日を説明変数に加えました。13種類の休日があったので非祝祭日を０として、祝祭日は1-13の番号にしました。
　２）過去の全地点のタクシー乗車数と気象データを連結してLSTMに入力しました。
　３）最後の全結合層のところで、予測対象日時の気象データ＋月、日、曜日、時刻、祝祭日　を追加で入力しました。
　４）気象は気温、湿度、風速、降水量の４つを利用しました。
　５）最初と、LSTMの層の間に BatchNormalizationを入れたところ、学習が早くなり少し汎化したようでした。

その他

　このコードで複数回実行しましたが、毎回結果が異なりRMSEでおおむね　0.5程度の標準偏差があるようです。
　そのため、１位の結果は ”たまたま” だったかもしれず、恐縮です。

コーディングが下手でお恥ずかしいですが、コード公開いたします。

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

import gc
import datetime

#環境確認
import pandas as pd
import numpy as np
#import statsmodels
!python3 --version
print(pd.__version__)
print(np.__version__)
#print(statsmodels.__version__)
import matplotlib
print(matplotlib.__version__)
import matplotlib.pyplot as plt

Python 3.9.8
2.0.3
1.21.6
3.5.2

from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
import sklearn

from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout, Bidirectional, Flatten, BatchNormalization
from tensorflow.keras.utils import plot_model
from keras.callbacks import EarlyStopping
from tensorflow.keras import Input, Model

2023-12-19 13:53:54.812281: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-19 13:53:55.012782: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-19 13:53:55.908083: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2023-12-19 13:53:55.908216: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2023-12-19 13:53:55.908223: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

print(sklearn.__version__)
print(tf.__version__)

1.1.1
2.11.0

# google colab を使う場合

#from google.colab import drive
#drive.mount('/content/drive')

#ニューヨークの祝祭日を設定

holydays_data =( [
datetime.date(2017, 1, 1),  # "New Year's Day"
datetime.date(2017, 1, 16), # 'Martin Luther King Jr. Day'
datetime.date(2017, 2, 13), # "Lincoln's Birthday "
datetime.date(2017, 2, 20), # "Washington's Birthday"
datetime.date(2017, 5, 29), # 'Memorial Day'
datetime.date(2017, 7, 4),  # 'Independence Day'
datetime.date(2017, 9, 4),  # 'Labor Day'
datetime.date(2017, 10, 9), # 'Columbus Day'
datetime.date(2017, 11, 7), # 'Election Day'
datetime.date(2017, 11, 11),# 'Veterans Day'
datetime.date(2017, 11, 23),# 'Thanksgiving'
datetime.date(2017, 12, 25),# 'Christmas Day'
datetime.date(2018, 1, 1),  # "New Year's Day"
datetime.date(2018, 1, 15), # 'Martin Luther King Jr. Day'
datetime.date(2018, 2, 12), # "Lincoln's Birthday "
datetime.date(2018, 2, 19), # "Washington's Birthday"
datetime.date(2018, 5, 28), # 'Memorial Day'
datetime.date(2018, 7, 4),  # 'Independence Day'
datetime.date(2018, 9, 3),  # 'Labor Day'
datetime.date(2018, 10, 8), # 'Columbus Day'
datetime.date(2018, 11, 6), # 'Election Day'
datetime.date(2018, 11, 12),# 'Veterans Day'
datetime.date(2018, 11, 22),# 'Thanksgiving'
datetime.date(2018, 12, 25),# 'Christmas Day'
datetime.date(2019, 1, 1),  # "New Year's Day"
datetime.date(2019, 1, 21), # 'Martin Luther King Jr. Day'
datetime.date(2019, 2, 12), # "Lincoln's Birthday "
datetime.date(2019, 2, 18), # "Washington's Birthday"
datetime.date(2019, 5, 27), # 'Memorial Day'
datetime.date(2019, 7, 4),  # 'Independence Day'
datetime.date(2019, 9, 2),  # 'Labor Day'
datetime.date(2019, 10, 14),# 'Columbus Day'
datetime.date(2019, 11, 5), # 'Election Day'
datetime.date(2019, 11, 11),# 'Veterans Day'
datetime.date(2019, 11, 28),# 'Thanksgiving'
datetime.date(2019, 12, 25) # 'Christmas Day'
])

# GPUが認識されていることを確認
physical_devices = tf.config.list_physical_devices('GPU')
if len(physical_devices) > 0:
    for device in physical_devices:
        tf.config.experimental.set_memory_growth(device, True)
        print('{} memory growth: {}'.format(device, tf.config.experimental.get_memory_growth(device)))
else:
    print("Not enough GPU hardware devices available")

PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU') memory growth: True
2023-12-19 13:53:56.821992: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-19 13:53:56.991445: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-19 13:53:56.991768: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.

#データの読み込みと前処理
import pandas as pd

path = "/home/kodama3104/ws_011/data/" #データを置くフォルダ ( train , zone , test , weather  )
#path = "/content/drive/MyDrive/data/" #データを置くフォルダ

train_data = pd.read_csv(path + "train_data.csv", index_col='tpep_pickup_datetime')
weather_data = pd.read_csv(path + "nyc_weather_2017_2019.csv")
zone_data = pd.read_csv(path + "taxi_zones.csv")
print(train_data.shape)
print(weather_data.shape)
print(zone_data.shape)

(51072, 79)
(40261, 20)
(79, 3)
/tmp/ipykernel_31951/1210795780.py:8: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.
  weather_data = pd.read_csv(path + "nyc_weather_2017_2019.csv")

#日付をdatetime型に変換

train_data.index = pd.to_datetime(train_data.index)
weather_data['DATE'] = pd.to_datetime(weather_data['DATE'])

x_col = train_data.columns
x_num = train_data.shape[1]

weather_data = weather_data.set_index('DATE')
weather_data = weather_data[weather_data['REPORT_TYPE']=='FM-15']

# 気象データの時間間隔を、タクシー乗車数のデータと時間間隔をそろえる

weather_30min = weather_data.resample('30min').nearest()
weather_30min.loc['2017-01-01 00:00:00'] = weather_30min.iloc[0]
weather_30min.index = pd.to_datetime(weather_30min.index)
weather_30min = weather_30min.sort_index(ascending=True)
print(weather_data.shape)
print(weather_30min.shape)

(25660, 19)
(51408, 19)

weather_30min = weather_30min.fillna(method = 'ffill')
train_data    = train_data.fillna(method = 'ffill')

# 気象データの中から影響の少なそうな変数を削除
weather_30min = weather_30min.drop(['REPORT_TYPE','SOURCE','HourlyPresentWeatherType',
                                    'HourlySkyConditions','HourlyWindGustSpeed','REM','HourlyDryBulbTemperature',
                                    'HourlyAltimeterSetting','HourlyPressureChange','HourlyPressureTendency',
                                    'HourlySeaLevelPressure','HourlyDewPointTemperature','HourlyWindDirection',
                                    'HourlyStationPressure','HourlyVisibility'],axis=1)

# 変数に、月、曜日、日付、時間、分を加える

weather_30min['month'] = weather_30min.index.month
weather_30min['week'] = weather_30min.index.weekday
weather_30min['day'] = weather_30min.index.day
weather_30min['hour'] = weather_30min.index.hour
weather_30min['minute'] = weather_30min.index.minute

train_data['month'] = train_data.index.month
train_data['week'] = train_data.index.weekday
train_data['day'] = train_data.index.day
train_data['hour'] = train_data.index.hour
train_data['minute'] = train_data.index.hour

# 休日の列を追加して、何の休日かを１－１３の数字で入れる

holydays_flag = []
holydays_count = 0
for i in range( len(weather_30min)):
    if(datetime.date( weather_30min.index[i].year, weather_30min.index[i].month, weather_30min.index[i].day ) in holydays_data):
        holydays_count += 1
        if(holydays_count == 13 ):
          holydays_count = 1
        holydays_flag.append(holydays_count)
    else:
        holydays_flag.append(0)

weather_30min['holydays'] = holydays_flag

holydays_flag = []
holydays_count = 0
for i in range( len(train_data)):
    if(datetime.date( train_data.index[i].year, train_data.index[i].month, train_data.index[i].day ) in holydays_data):
        holydays_count += 1
        if(holydays_count == 13 ):
          holydays_count = 1
        holydays_flag.append(holydays_count)
    else:
        holydays_flag.append(0)
train_data['holydays']    = holydays_flag

#for i in range(train_data.shape[1]):
#    plt.figure(figsize=(18,2))
#    plt.plot(train_data.iloc[:train_data.shape[0], i])
#    plt.show()
#    plt.clf()

# 気象データの中の非数値を、数値に置き換える

for elem in weather_30min.select_dtypes(include=object).columns :
    print(elem)
    weather_30min[elem] = weather_30min[elem].str.replace('s', '0')
    weather_30min[elem] = weather_30min[elem].str.replace('V', '0')
    weather_30min[elem] = weather_30min[elem].str.replace('RB', '0')
    weather_30min[elem] = weather_30min[elem].str.replace('T', '0.001')

weather_30min = weather_30min.astype('float32')

HourlyPrecipitation

#for i in range(weather_30min.shape[1]):
#    plt.figure(figsize=(18,2))
#    plt.plot(weather_30min.iloc[:48*92, i])
#    plt.show()
#    plt.clf()

weather_30min.head(1)

	HourlyPrecipitation	HourlyRelativeHumidity	HourlyWetBulbTemperature	HourlyWindSpeed	month	week	day	hour	minute	holydays
DATE
2017-01-01	0.0	44.0	38.0	10.0	1.0	6.0	1.0	0.0	0.0	1.0

weather_30min.tail(1)

	HourlyPrecipitation	HourlyRelativeHumidity	HourlyWetBulbTemperature	HourlyWindSpeed	month	week	day	hour	minute	holydays
DATE
2019-12-07 23:30:00	0.0	50.0	27.0	8.0	12.0	5.0	7.0	23.0	30.0	0.0

lag_max = int(48*7)      # 過去何日分のデータを使って予測するか
predict_len = int(48*7)  # 予測日数

# 気象データを、学習機関(train)、予測対象期間(future)、テスト期間（test)に分ける
wea_30_train  = weather_30min[:-predict_len]
wea_30_future = weather_30min[lag_max + predict_len-1:-predict_len]
wea_30_test   = weather_30min[-predict_len:]
wea_30_train  = wea_30_train.astype('float32')
wea_30_future = wea_30_future.astype('float32')
wea_30_test   = wea_30_test.astype('float32')

print(train_data.shape)
print(wea_30_train.shape)
print(wea_30_future.shape)
print(wea_30_test.shape)

(51072, 85)
(51072, 10)
(50401, 10)
(336, 10)

# タクシー乗車数データと気象データを結合

df_train = pd.concat([train_data, wea_30_train['HourlyPrecipitation'],wea_30_train['HourlyRelativeHumidity']
                       ,wea_30_train['HourlyWindSpeed'],wea_30_train['HourlyWetBulbTemperature']
                      ],axis='columns', ignore_index=True)

# df_train = pd.concat([train_data, wea_30_train],axis='columns', ignore_index=True)

df_train = df_train.astype('float32')

df_train.head(1)

	0	1	2	3	4	5	6	7	8	9	...	79	80	81	82	83	84	85	86	87	88
2017-01-01	53.0	16.0	45.0	38.0	12.0	6.0	2.0	47.0	31.0	238.0	...	1.0	6.0	1.0	0.0	0.0	1.0	0.0	44.0	10.0	38.0

1 rows × 89 columns

df_train.tail(1)

	0	1	2	3	4	5	6	7	8	9	...	79	80	81	82	83	84	85	86	87	88
2019-11-30 23:30:00	12.0	1.0	9.0	11.0	4.0	4.0	0.0	23.0	7.0	25.0	...	11.0	5.0	30.0	23.0	23.0	0.0	0.0	42.0	9.0	27.0

1 rows × 89 columns

df_train.shape

(51072, 89)

# 学習用の説明変数、目的変数、予測用の説明変数の生成
def gen_dataset(dataset, lag_max, predict_len):
    X, y = [], []
    test_X = []

    for i in range(len(dataset) - lag_max - predict_len + 1):
        a = i + lag_max
        X.append(dataset[i:a, :]) #説明変数
        y.append(dataset[a + predict_len - 1, :])   #目的変数

    for i in range(len(dataset),len(dataset)+predict_len):
        end = i - predict_len + 1
        start = end - lag_max
        test_X.append(dataset[start:end, :])
    return np.array(X), np.array(y), np.array(test_X)

X_train, y_train, X_test = gen_dataset(df_train.values, lag_max, predict_len)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)

X_train = X_train.astype(np.float32)
y_train = y_train.astype(np.float32)
X_test  = X_test.astype(np.float32)

(50401, 336, 89)
(50401, 89)
(336, 336, 89)

del df_train,  wea_30_train, weather_30min
gc.collect()

y_train = y_train[:,:x_num]

validation_len = 48*1

X_valid = X_train[-validation_len:]
X_train = X_train[:-validation_len]

w_valid = wea_30_future[-validation_len:]
w_train = wea_30_future[:-validation_len]

y_valid = y_train[-validation_len:]
y_train = y_train[:-validation_len]

print(X_train.shape, w_train.shape)
print(X_valid.shape, w_valid.shape)

print(y_train.shape)
print(y_valid.shape)

(50353, 336, 89) (50353, 10)
(48, 336, 89) (48, 10)
(50353, 79)
(48, 79)

# モデルの各パラメータ

# 隠れ層のノード数
hidden_num1 = 512

# 入力データ
input1 = Input(( X_train.shape[1], X_train.shape[2])) # タクシー乗車数データと気象データを結合したデータ
input2 = Input(  w_train.shape[1] ) # 予測対象時刻の気象データ

# モデル
x  = BatchNormalization()(input1)
x  = LSTM(hidden_num1, return_sequences = True)(x)
x  = BatchNormalization()(x)
x  = LSTM(hidden_num1, return_sequences = True)(x)
x  = BatchNormalization()(x)
x  = LSTM(hidden_num1, return_sequences = True)(x)
x  = BatchNormalization()(x)
x  = LSTM(hidden_num1, return_sequences = True)(x)

x1 = Flatten()(x)
x2 = input2 # 予測対象時刻の気象データ
x3 = tf.keras.layers.Concatenate(axis=1)([x1, x2])
x3 = Dense(128, activation="relu")(x3)
x3 = Dense(128, activation="relu")(x3)

# 出力層
outputs = Dense(y_train.shape[1], activation="linear")(x3)  #全結合層

model = Model(inputs=[input1,input2], outputs=[outputs])
model.summary()

2023-12-19 13:54:22.235870: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-19 13:54:22.292678: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-19 13:54:22.301207: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-19 13:54:22.302398: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-19 13:54:23.363925: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-19 13:54:23.364416: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-19 13:54:23.364430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2023-12-19 13:54:23.364613: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-19 13:54:23.364655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9368 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 336, 89)]    0           []                               
                                                                                                  
 batch_normalization (BatchNorm  (None, 336, 89)     356         ['input_1[0][0]']                
 alization)                                                                                       
                                                                                                  
 lstm (LSTM)                    (None, 336, 512)     1232896     ['batch_normalization[0][0]']    
                                                                                                  
 batch_normalization_1 (BatchNo  (None, 336, 512)    2048        ['lstm[0][0]']                   
 rmalization)                                                                                     
                                                                                                  
 lstm_1 (LSTM)                  (None, 336, 512)     2099200     ['batch_normalization_1[0][0]']  
                                                                                                  
 batch_normalization_2 (BatchNo  (None, 336, 512)    2048        ['lstm_1[0][0]']                 
 rmalization)                                                                                     
                                                                                                  
 lstm_2 (LSTM)                  (None, 336, 512)     2099200     ['batch_normalization_2[0][0]']  
                                                                                                  
 batch_normalization_3 (BatchNo  (None, 336, 512)    2048        ['lstm_2[0][0]']                 
 rmalization)                                                                                     
                                                                                                  
 lstm_3 (LSTM)                  (None, 336, 512)     2099200     ['batch_normalization_3[0][0]']  
                                                                                                  
 flatten (Flatten)              (None, 172032)       0           ['lstm_3[0][0]']                 
                                                                                                  
 input_2 (InputLayer)           [(None, 10)]         0           []                               
                                                                                                  
 concatenate (Concatenate)      (None, 172042)       0           ['flatten[0][0]',                
                                                                  'input_2[0][0]']                
                                                                                                  
 dense (Dense)                  (None, 128)          22021504    ['concatenate[0][0]']            
                                                                                                  
 dense_1 (Dense)                (None, 128)          16512       ['dense[0][0]']                  
                                                                                                  
 dense_2 (Dense)                (None, 79)           10191       ['dense_1[0][0]']                
                                                                                                  
==================================================================================================
Total params: 29,585,203
Trainable params: 29,581,953
Non-trainable params: 3,250
__________________________________________________________________________________________________

# 損失関数：MSE， 最適化関数：Adam
opt = tf.keras.optimizers.Adam( learning_rate=0.0001)
model.compile(loss="mse", optimizer=opt)

# Lossをグラフ表示
def history_graph(history):
    loss = history.history['loss']
    val_loss = history.history["val_loss"]
    plt.plot(loss, label='train')
    plt.plot(val_loss, label='validation')
    plt.ylabel('loss')
    plt.xlabel('epochs')
    plt.show()

# 学習
early_stopping =  EarlyStopping(monitor='val_loss', min_delta=0.0, patience=10)

history = model.fit([X_train, w_train] , y_train,
                    validation_data = ([X_valid, w_valid], y_valid),
                    epochs = 300,
                    callbacks=[early_stopping],
                    batch_size = 64)

Epoch 1/300
2023-12-19 13:54:34.505869: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:428] Loaded cuDNN version 8700
2023-12-19 13:54:35.021752: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:630] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-12-19 13:54:35.035516: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7ff058118d00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-12-19 13:54:35.035551: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA GeForce RTX 3080 Ti, Compute Capability 8.6
2023-12-19 13:54:35.062816: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-12-19 13:54:35.302645: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
184/787 [======>.......................] - ETA: 56s - loss: 3625.9412

# Lossをグラフ表示
history_graph(history)

# 予測
def predict_test(model, data1, data2):
    dt = pd.date_range(start='2019-12-01 00:00:00', end='2019-12-07 23:30:00',freq='30T')
    pred = model.predict([data1, data2])

    df_pred = pd.DataFrame(pred, columns=x_col)
    df_pred['tpep_pickup_datetime'] = dt
    rnn_pred = df_pred.set_index("tpep_pickup_datetime")
    return rnn_pred

predict_df = predict_test(model, X_test, wea_30_test)

11/11 [==============================] - 1s 34ms/step

# 予測結果の表示
predict_df.head(5)

	0	1	2	3	4	5	6	7	8	9	...	69	70	71	72	73	74	75	76	77	78
tpep_pickup_datetime
2019-12-01 00:00:00	20.524151	10.021410	8.623723	5.892232	5.599199	1.902857	3.138924	9.182867	3.296710	32.088543	...	65.809982	5.078995	55.101887	205.071075	14.857251	10.543024	-3.234933	3.898872	12.069722	76.063858
2019-12-01 00:30:00	33.481152	11.196672	6.777467	3.840772	7.880636	2.677950	3.882199	9.041745	6.170432	22.862257	...	57.680309	4.072216	54.963619	234.529877	20.588902	13.257601	-3.807296	0.079872	13.746686	83.828537
2019-12-01 01:00:00	42.203674	12.913897	8.758183	5.161508	7.966932	4.861914	1.806206	9.746524	8.666472	14.500937	...	50.382183	2.432377	71.211182	252.702972	23.935171	17.963701	-2.904502	0.630962	9.963893	81.621979
2019-12-01 01:30:00	44.559643	14.488832	6.742269	4.959348	8.468466	5.595797	1.123459	11.294849	10.808940	2.638332	...	35.153545	1.659405	74.073509	230.377274	25.196012	20.871161	-0.390971	1.259590	2.504596	71.689178
2019-12-01 02:00:00	32.880737	14.082376	8.775491	3.238468	5.866222	2.661674	1.542237	9.066517	8.514407	-4.506200	...	23.671654	3.578797	71.901833	169.839447	19.969809	15.573454	2.060479	5.451656	2.956710	51.237854

5 rows × 79 columns

submit_df = predict_df
submit_df = submit_df.where(submit_df > 0.0, 0.0)

submit_df = submit_df.round()

submit_df.to_csv('./submit/submission_LSTM_BEST.csv')
#submit_df.to_csv('/content/drive/MyDrive/submit/submission_180.csv')

for i in range(submit_df.shape[1]):
    plt.figure(figsize=(18,2))
    plt.plot(train_data.iloc[-48*21:, i])
    plt.plot(submit_df.iloc[:, i])
    plt.show()
    plt.clf()

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

添付データ

jupyterノートブックを公開いたします（ Public２位、Private１位）

環境

手法の概要

工夫した点

その他

添付データ

T.T

ProbSpace_official

datascience

T.T

T.T

T.T

T.T

ProbSpace_official

new user