Deploy MongoDB Cluster as a Microservice on Kubernetes with Persistent storage

Overview

In this post we will learn to deploy a MongoDB replica set (cluster) as a microservice running on docker containers in Kubernetes. Since MongoDB is a database and we need its data to be persistent, if docker container is deleted and recreated, to achieve persistent storage we will use persistent volume feature in kuberntes to allocate volumes to containers using NFS.

Prerequisites
To complete this article we need a kubernetes cluster up and running with Service Discoverey enabled via DNS (i.e KubeDNS)

Things to consider when running MongoDB on container

  • MongoDB database nodes are stateful. In the event that a container fails, and is rescheduled, it's undesirable for the data to be lost (it could be recovered from other nodes in the replica set, but that takes time). To solve this, features such as the Persistent Volume abstraction in Kubernetes can be used to map what would otherwise be an ephemeral MongoDB data directory in the container to a persistent location where the data survives container failure and rescheduling.

  • MongoDB database nodes within a replica set must communicate with each other – including after rescheduling. All of the nodes within a replica set must know the addresses of all of their peers, but when a container is rescheduled, it is likely to be restarted with a different IP address. For example, all containers within a Kubernetes Pod share a single IP address, which changes when the pod is rescheduled. With Kubernetes, this can be handled by associating a Kubernetes Service with each MongoDB node, which uses the Kubernetes DNS service to provide a hostname for the service that remains constant through rescheduling.

  • Once each of the individual MongoDB nodes is running (each within its own container), the replica set must be initialized and each node added. This is likely to require some additional logic beyond that offered by off the shelf orchestration tools. Specifically, one MongoDB node within the intended replica set must be used to execute the rs.initiate and rs.add commands.

Ref:mongodb.com

We will create MongoDB replica set in a single Kubernetes cluster. We will create 3 replica sets 1 as primary and other 2 are secondary

Each member of the replica set will run as its own pod with a service exposed using NodePort. We will use service discovery in kubernetes so that all the replica sets can talk to each other and even if IP address changes due to container recreation they still can communicate to each other using service we created.

We will 3 MongoDB replica set members

mongo-node-1
mongo-node-2
mongo-node-3

We will also create 3 services and attach to each member

mongo-node-1 - service for mongo-node-1 (container)
mongo-node-2 - service for mongo-node-2 (container)
mongo-node-3 - service for mongo-node-3 (container)

For persitent storage I am using an external NFS server where I have create 3 directories named as respective replica set members. I will create 3 persistent volumes in kubernetes with 10GB each for replica set using below yaml .

ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-1-pv.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mongo-node-1-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  nfs:
    # FIXME: use the right IP
    server: 10.9.80.58
    path: "/kubernetes/mongo-node-1"


ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-2-pv.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mongo-node-2-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  nfs:
    # FIXME: use the right IP
    server: 10.9.80.58
    path: "/kubernetes/mongo-node-2"


ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-3-pv.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mongo-node-3-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  nfs:
    # FIXME: use the right IP
    server: 10.9.80.58
    path: "/kubernetes/mongo-node-3"
ubuntu@kube-apiserver-1:~/mongodb$

Then we need to create persistent volume claim to bind these volumes to replica set members by calling these claims in our replica set pod definition

ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-1-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-node-1-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 10Gi


ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-2-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-node-2-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 10Gi

ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-3-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-node-3-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 10Gi
ubuntu@kube-apiserver-1:~/mongodb$

Now we need to create our pod definition for our 3 replica sets as below. You can see below I am passing below command arguments(mongod --replSet rs0 bind_ip_all) when mongod process starts in the container which sets a replica set name and bind all the IPs on this container to mongod service. If you dont mention bind_ip argument then starting mongoDB 3.6 default bind_ip will be 127.0.0.1 and with this you wont be able to access the DB from outside. When these replica set members are created we need to connect to any one of the replica set member and initialize the replica set and add the other members


ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-1.yaml 
apiVersion: v1
kind: Service
metadata:
  name: mongo-node-1
  labels:
    name: mongo-node-1
spec:
  type: NodePort
  ports:
  - port: 27017
    targetPort: 27017
    protocol: TCP
    name: mongo-node-1
  selector:
    name: mongo-node-1
    
---
apiVersion: v1
kind: ReplicationController
metadata:
  name: mongo-node-1-rc
  labels:
    name: mongo-node-1-rc
spec:
  replicas: 1
  selector:
    name: mongo-node-1
  template:
    metadata:
      labels:
        name: mongo-node-1
  
    spec:
      containers:
      - name: mongo-node-1
        image: mongo
        command:
          - mongod
          - "--replSet"
          - rs0
          - "--bind_ip_all"
        ports:
        - containerPort: 27017
        volumeMounts:
          - name: mongo-node-1-db
            mountPath: /data/db
      volumes:
        - name: mongo-node-1-db
          persistentVolumeClaim:
            claimName: mongo-node-1-pvc


ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-2.yaml 
apiVersion: v1
kind: Service
metadata:
  name: mongo-node-2
  labels:
    name: mongo-node-2
spec:
  type: NodePort
  ports:
  - port: 27017
    targetPort: 27017
    protocol: TCP
    name: mongo-node-2
  selector:
    name: mongo-node-2
    
---
apiVersion: v1
kind: ReplicationController
metadata:
  name: mongo-node-2-rc
  labels:
    name: mongo-node-2-rc
spec:
  replicas: 1
  selector:
    name: mongo-node-2
  template:
    metadata:
      labels:
        name: mongo-node-2
      
    spec:
      containers:
      - name: mongo-node-2
        image: mongo
        command:
          - mongod
          - "--replSet"
          - rs0
          - "--bind_ip_all"
        ports:
        - containerPort: 27017
        volumeMounts:
          - name: mongo-node-2-db
            mountPath: /data/db
      volumes:
        - name: mongo-node-2-db
          persistentVolumeClaim:
            claimName: mongo-node-2-pvc


ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-3.yaml 
apiVersion: v1
kind: Service
metadata:
  name: mongo-node-3
  labels:
    name: mongo-node-3
spec:
  type: NodePort
  ports:
  - port: 27017
    targetPort: 27017
    protocol: TCP
    name: mongo-node-3
  selector:
    name: mongo-node-3
    
---
apiVersion: v1
kind: ReplicationController
metadata:
  name: mongo-node-3-rc
  labels:
    name: mongo-node-3-rc
spec:
  replicas: 1
  selector:
    name: mongo-node-3
  template:
    metadata:
      labels:
        name: mongo-node-3
    spec:
      containers:
      - name: mongo-node-3
        image: mongo
        command:
          - mongod
          - "--replSet"
          - rs0
          - "--bind_ip_all"
        ports:
        - containerPort: 27017
        volumeMounts:
          - name: mongo-node-3-db
            mountPath: /data/db
      volumes:
        - name: mongo-node-3-db
          persistentVolumeClaim:
            claimName: mongo-node-3-pvc
ubuntu@kube-apiserver-1:~/mongodb$

Lets create the Persistent Volume , Persistent Volume Claim and replica set pods

ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-1-pv.yaml
persistentvolume "mongo-node-1-pv" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-2-pv.yaml
persistentvolume "mongo-node-2-pv" created
ubuntu@kube-apiserver-1:~/mongodb$   sudo kubectl create -f mongo-node-3-pv.yaml
persistentvolume "mongo-node-3-pv" created


ubuntu@kube-apiserver-1:~/mongodb$  sudo kubectl create -f mongo-node-1-pvc.yaml
persistentvolumeclaim "mongo-node-1-pvc" created
ubuntu@kube-apiserver-1:~/mongodb$  sudo kubectl create -f mongo-node-2-pvc.yaml
persistentvolumeclaim "mongo-node-2-pvc" created
ubuntu@kube-apiserver-1:~/mongodb$  sudo kubectl create -f mongo-node-3-pvc.yaml
persistentvolumeclaim "mongo-node-3-pvc" created


ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-1.yaml 
service "mongo-node-1" created
replicationcontroller "mongo-node-1-rc" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-2.yaml 
service "mongo-node-2" created
replicationcontroller "mongo-node-2-rc" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-3.yaml
service "mongo-node-3" created
replicationcontroller "mongo-node-3-rc" created

Verification

Kubernetes cluster

ubuntu@kube-apiserver-1:~$ kubectl get nodes
NAME            STATUS    AGE       VERSION
kube-worker-1   Ready     139d      v1.7.4
kube-worker-2   Ready     139d      v1.7.4
kube-worker-3   Ready     139d      v1.7.4

ubuntu@kube-apiserver-1:~$ kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok                   
controller-manager   Healthy   ok                   
etcd-0               Healthy   {"health": "true"}   
etcd-1               Healthy   {"health": "true"}   
etcd-2               Healthy   {"health": "true"}   
ubuntu@kube-apiserver-1:~$

Persistent Volume

ubuntu@kube-apiserver-1:~$ kubectl get pv
NAME              CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                      STORAGECLASS   REASON    AGE
mongo-node-1-pv   10Gi       RWX           Retain          Bound     default/mongo-node-1-pvc                            1h
mongo-node-2-pv   10Gi       RWX           Retain          Bound     default/mongo-node-2-pvc                            1h
mongo-node-3-pv   10Gi       RWX           Retain          Bound     default/mongo-node-3-pvc                            1h
nfs               500Gi      RWX           Retain          Bound     default/nfs                                         3d

Persistent Volume Claim

ubuntu@kube-apiserver-1:~$ kubectl get pvc
NAME               STATUS    VOLUME            CAPACITY   ACCESSMODES   STORAGECLASS   AGE
mongo-node-1-pvc   Bound     mongo-node-1-pv   10Gi       RWX                          1h
mongo-node-2-pvc   Bound     mongo-node-2-pv   10Gi       RWX                          1h
mongo-node-3-pvc   Bound     mongo-node-3-pv   10Gi       RWX                          1h
nfs                Bound     nfs               500Gi      RWX                          3d
ubuntu@kube-apiserver-1:~$ 

Kubernetes Services

ubuntu@kube-apiserver-1:~$ kubectl  get svc
NAME           CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
kubernetes     10.20.0.1       <none>        443/TCP           140d
mongo-node-1   10.20.134.1     <nodes>       27017:32521/TCP   2h
mongo-node-2   10.20.178.21    <nodes>       27017:31164/TCP   2h
mongo-node-3   10.20.134.155   <nodes>       27017:30326/TCP   2h

Kubernetes pods

ubuntu@kube-apiserver-1:~$ kubectl get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP               NODE
busybox-2125412808-h9wd0   1/1       Running   1136       48d       172.200.180.26   kube-worker-1
mongo-node-1-rc-3d35h      1/1       Running   0          2h        172.200.127.15   kube-worker-2
mongo-node-2-rc-z5wtv      1/1       Running   0          2h        172.200.127.22   kube-worker-2
mongo-node-3-rc-c45wr      1/1       Running   0          2h        172.200.127.23   kube-worker-2

Our MongoDB replica set pods are running , Now lets initialize replic set. To do this I will connect to mongo-node-1 using mongo cli and issue below commands

rs.initiate()
conf=rs.conf()
conf.members[0].host="mongo-node-1:27017"
rs.reconfig(conf)
rs.add("mongo-node-2")
rs.add("mongo-node-3")
ubuntu@kube-apiserver-1:~/mongodb$ kubectl get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP               NODE
busybox-2125412808-h9wd0   1/1       Running   1134       48d       172.200.180.26   kube-worker-1
mongo-node-1-rc-3d35h      1/1       Running   0          33s       172.200.127.15   kube-worker-2
mongo-node-2-rc-z5wtv      1/1       Running   0          33s       172.200.127.22   kube-worker-2
mongo-node-3-rc-c45wr      1/1       Running   0          31s       172.200.127.23   kube-worker-2


ubuntu@kube-apiserver-1:~/mongodb$ 
ubuntu@kube-apiserver-1:~/mongodb$ mongo --host 172.200.127.15
MongoDB shell version: 2.6.10
connecting to: 172.200.127.15:27017/test
Server has startup warnings: 
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] 
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] 
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] 
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] 
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-02-05T23:50:45.758+0000 I CONTROL  [initandlisten] 
> 
> rs.initiate()
{
        "info2" : "no configuration specified. Using a default configuration for the set",
        "me" : "mongo-node-1-rc-3d35h:27017",
        "ok" : 1,
        "operationTime" : Timestamp(1517874756, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1517874756, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
rs0:SECONDARY> conf=rs.conf()
{
        "_id" : "rs0",
        "version" : 1,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 0,
                        "host" : "mongo-node-1-rc-3d35h:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "catchUpTimeoutMillis" : -1,
                "catchUpTakeoverDelayMillis" : 30000,
                "getLastErrorModes" : {

                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("5a78ee430c0ce5fbd023ca9e")
        }
}
rs0:PRIMARY> conf.members[0].host="mongo-node-1:27017"
mongo-node-1:27017
rs0:PRIMARY> rs.reconfig(conf)
{
        "ok" : 1,
        "operationTime" : Timestamp(1517874871, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1517874871, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
rs0:PRIMARY> rs.add("mongo-node-2")
{
        "ok" : 1,
        "operationTime" : Timestamp(1517874896, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1517874896, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
rs0:PRIMARY> rs.add("mongo-node-3")
{
        "ok" : 1,
        "operationTime" : Timestamp(1517874901, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1517874901, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
rs0:PRIMARY> 
rs0:PRIMARY> 
rs0:PRIMARY> 
rs0:PRIMARY> 

Verify Replica set status

rs0:PRIMARY> rs.status()
{
        "set" : "rs0",
        "date" : ISODate("2018-02-06T00:08:16.827Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1517875688, 1),
                        "t" : NumberLong(1)
                },
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1517875688, 1),
                        "t" : NumberLong(1)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1517875688, 1),
                        "t" : NumberLong(1)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1517875688, 1),
                        "t" : NumberLong(1)
                }
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "mongo-node-1:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 1058,
                        "optime" : {
                                "ts" : Timestamp(1517875688, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2018-02-06T00:08:08Z"),
                        "electionTime" : Timestamp(1517874756, 2),
                        "electionDate" : ISODate("2018-02-05T23:52:36Z"),
                        "configVersion" : 4,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "mongo-node-2:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 800,
                        "optime" : {
                                "ts" : Timestamp(1517875688, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1517875688, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2018-02-06T00:08:08Z"),
                        "optimeDurableDate" : ISODate("2018-02-06T00:08:08Z"),
                        "lastHeartbeat" : ISODate("2018-02-06T00:08:16.057Z"),
                        "lastHeartbeatRecv" : ISODate("2018-02-06T00:08:15.046Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "mongo-node-1:27017",
                        "configVersion" : 4
                },
                {
                        "_id" : 2,
                        "name" : "mongo-node-3:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 795,
                        "optime" : {
                                "ts" : Timestamp(1517875688, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1517875688, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2018-02-06T00:08:08Z"),
                        "optimeDurableDate" : ISODate("2018-02-06T00:08:08Z"),
                        "lastHeartbeat" : ISODate("2018-02-06T00:08:16.057Z"),
                        "lastHeartbeatRecv" : ISODate("2018-02-06T00:08:15.339Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "mongo-node-2:27017",
                        "configVersion" : 4
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1517875688, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1517875688, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
rs0:PRIMARY>  

Thanks.