Material Book of Statistics

統計、機械学習、プログラミングなどで実験的な試みを書いていきます。

TensorFlow + Kerasの環境構築メモ

はじめに

2週間ほど前に機械学習やkaggleの勉強用に新しいPCを買いました。
このPCにTensorFlowとKerasをインストールした際の作業メモを残します。

インストール環境

インストールした環境は以下のとおりです。
pythonはすでにAnacondaでインストールしてあります。
色々言われるAnacondaですが、使いたいパッケージが揃っていて、やはり便利。

項目 製品名またはバージョン
PC GALLERIA ZZ
OS Ubuntu18.04 Desktop
GPU NVIDIA GeForce GTX1080Ti
Python 3.6.5 (Anaconda 5.2)

今回参考にした情報

今回は以下の2つを参考にインストールしました。
https://qiita.com/gott/items/28524953547d13894e08
https://medium.com/@taylordenouden/installing-tensorflow-gpu-on-ubuntu-18-04-89a142325138

インストール手順

CUDA-9.0をインストール

下のURLからCUDA Toolkit 9.0をダウンロードしました。
https://developer.nvidia.com/cuda-90-download-archive

今回は"Select Target Platform"を以下のとおりに選択します。

項目 選択した値
Operating System Linux
Architecture x86_64
Distribution Ubuntu
Version 17.04
Installer Type runfile(local)

ダウンロードしたら下記のコマンドを実行してインストールを開始します。

$ cd ~/Downloads    # 今回のダウンロード先
$ sudo chmod +x cuda_9.0.176_384.81_linux.run
$ ./cuda_9.0.176_384.81_linux.run --override

長い免責事項の後に出てくる選択肢には、 Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?のみ noを選択し、他はすべてyesと答えます。
実際の内容は以下のとおりです。

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
    OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.  

-----------------
Do you accept the previously read EULA?
accept/decline/quit: accept

You are attempting to install on an unsupported configuration. Do you wish to continue?
(y)es/(n)o [ default is no ]: yes

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: no

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: yes

Enter Toolkit Location
 [ default is /usr/local/cuda-9.0 ]: 

/usr/local/cuda-9.0 is not writable.
Do you wish to run the installation with 'sudo'?
(y)es/(n)o: yes

Please enter your password: 
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: yes

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: yes

Enter CUDA Samples Location
 [ default is /home/katsuya ]: 

Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...

CUDNN 7.1をインストールする

Download cuDNN v7.1.4 (May 16, 2018), for CUDA 9.0cuDNN v7.1.4 Library for Linuxをダウンロードしました。

以下のコマンドを実行してライブラリを適切なディレクトリに配置します。

$ tar -zxvf cudnn-9.0-linux-x64-v7.1.tgz 
$ sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-9.0/lib64/
$ sudo cp cuda/include/cudnn.h /usr/local/cuda-9.0/include/
$ sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

NVIDIA CUDA Profile Tools Interfaceをインストールする

これはメモリロードの状況やボトルネックの同定などの機能を提供するパッケージです。
下記のとおり、aptで簡単にインストールできます。

$ sudo apt install libcupti-dev

環境変数を設定する

.bashrcの末尾に次の2行を追加します。

export PATH="/usr/local/cuda-9.0/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

完了したら次のコマンドを実行して、環境変数を有効化します。

$ source ~/.bashrc

TensorFlowとKerasをインストールする

やっとTensorFlowとKerasをインストールできる環境が整いました。
あとは

$ pip install tensorflow-gpu
$ pip install keras

を実行すればインストール作業は終了です。

テストコードを実行してみる

実際に動作するか試してみます。
テストコードは

https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py

をそのまま拝借してkeras_test_code.pyを作成します。

'''Train a simple deep CNN on the CIFAR10 small images dataset.
It gets to 75% validation accuracy in 25 epochs, and 79% after 50 epochs.
(it's still underfitting at that point, though).
'''

from __future__ import print_function
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os

batch_size = 32
num_classes = 10
epochs = 100
data_augmentation = True
num_predictions = 20
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'keras_cifar10_trained_model.h5'

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)

# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        zca_epsilon=1e-06,  # epsilon for ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        # randomly shift images horizontally (fraction of total width)
        width_shift_range=0.1,
        # randomly shift images vertically (fraction of total height)
        height_shift_range=0.1,
        shear_range=0.,  # set range for random shear
        zoom_range=0.,  # set range for random zoom
        channel_shift_range=0.,  # set range for random channel shifts
        # set mode for filling points outside the input boundaries
        fill_mode='nearest',
        cval=0.,  # value used for fill_mode = "constant"
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False,  # randomly flip images
        # set rescaling factor (applied before any other transformation)
        rescale=None,
        # set function that will be applied on each input
        preprocessing_function=None,
        # image data format, either "channels_first" or "channels_last"
        data_format=None,
        # fraction of images reserved for validation (strictly between 0 and 1)
        validation_split=0.0)

    # Compute quantities required for feature-wise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(x_train)

    # Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(x_train, y_train,
                                     batch_size=batch_size),
                        epochs=epochs,
                        validation_data=(x_test, y_test),
                        workers=4)

# Save model and weights
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
model_path = os.path.join(save_dir, model_name)
model.save(model_path)
print('Saved trained model at %s ' % model_path)

# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

実行結果はご覧のとおりです。

katsuya@katsuya:~$ python ./keras_test_code.py 
/home/katsuya/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 203s 1us/step
x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
Using real-time data augmentation.
Epoch 1/100
2018-09-05 01:20:30.859870: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-05 01:20:31.135949: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-09-05 01:20:31.137171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.42GiB
2018-09-05 01:20:31.137233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-05 01:20:36.353767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-05 01:20:36.353854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 
2018-09-05 01:20:36.353880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N 
2018-09-05 01:20:36.363145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10075 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
1563/1563 [==============================] - 18s 11ms/step - loss: 1.8860 - acc: 0.3071 - val_loss: 1.5808 - val_acc: 0.4329
Epoch 2/100
1563/1563 [==============================] - 10s 6ms/step - loss: 1.5933 - acc: 0.4186 - val_loss: 1.4743 - val_acc: 0.4629

正常に動いていることが確認できました。