PolarSPARC |
Hands-on MongoDB :: Part-1
Bhaskar S | 01/23/2021 (UPDATED) |
Overview
MongoDB is a very popular, modern, general purpose, distributed, Document-Oriented NoSQL database with the following features:
JSON Data Model
Rich Hierarchical Structure
Flexible Dynamic Schema
Ad-hoc Queries
Server-side Javascript
High Availability via Replica Set
Horizontal Scalability via Sharding
Installation and Setup
The installation will be on a Ubuntu 20.04 LTS based Linux desktop.
Ensure Docker is installed on the system. Else, follow the instructions provided in the article Introduction to Docker to complete the installation.
Check the latest stable version of Mongo official docker image. Version 4.4.3 was the latest at the time of this article.
To download the latest docker image of Mongo, execute the following command:
$ docker pull mongo:4.4.3
The following would be a typical output:
4.4.3: Pulling from library/mongo f22ccc0b8772: Pull complete 3cf8fb62ba5f: Pull complete e80c964ece6a: Pull complete 329e632c35b3: Pull complete 3e1bd1325a3d: Pull complete 4aa6e3d64a4a: Pull complete 035bca87b778: Pull complete 874e4e43cb00: Pull complete 0e50e71d834e: Pull complete 27768a0d0c67: Pull complete be4e0bd8b992: Pull complete 262b87da894c: Pull complete Digest: sha256:001400644bfc27b5da634ee09b95b4129566e2d4dccb6d27bd403b10cff9191b Status: Downloaded newer image for mongo:4.4.3 docker.io/library/mongo:4.4.3
For our MongoDB setup, we will go with a 3-node high availability cluster with each node running in docker.
In MongoDB, one server (referred to as the primary node) can replicate data to other nodes in the cluster (referred to as the secondary nodes). The MongoDB secondary nodes are often referred to as the Replica Sets. The nodes in the cluster go through a voting process to select a node as the primary, while the others automatically become the secondary nodes.
We need to specifiy a data directory on the host that will be mounted as a data volume for each of the MongoDB nodes in the cluster.
To create a data directory (for each of the nodes in the cluster) on the host, execute the following commands:
$ mkdir -p $HOME/Downloads/DATA/mongodb/node-1
$ mkdir -p $HOME/Downloads/DATA/mongodb/node-2
$ mkdir -p $HOME/Downloads/DATA/mongodb/node-3
For our cluster setup, we will create and use a docker bridge network. This will allow containers connected to the bridge network to communicate with each other using container names, while providing network isolation from containers outside the bridge network.
To create a docker bridge network called mongodb-net, execute the following command:
$ docker network create mongodb-net --driver bridge
The following would be a typical output:
b779bd624f445b822821821bbe3c49e5ee2007a0015075c239a8ac9e6065002e
Time to initialze and start each of the 3 nodes in the MongoDB database cluster.
To start the first MongoDB database node (with name mongodb-n1 and using the database port of 5001), execute the following command:
$ docker run -d --rm -it --name mongodb-n1 --net mongodb-net -p 5001:5001 -v $HOME/Downloads/DATA/mongodb/node-1:/data/db mongo:4.4.3 mongod --bind_ip_all --replSet mongodb-rs --port 5001
The following would be a typical output:
b680df1b4180d277ef96f78679b4744bd675a674deb55c128e3a3a1d2c9dce1f
The following are brief descriptions for some of the options used to start the mongod daemon:
--bind_ip_all :: By default, the mongod daemon binds to localhost. This option allows it to bind to all the IP address(es)
--replSet :: Specifies the name of the replica set (mongodb-rs), which will be used later to initialize and configure the replica set
To check the MongoDB database node log, execute the following command:
$ docker logs mongodb-n1
The following would be a typical output:
{"t":{"$date":"2021-01-23T02:12:38.985+00:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"} {"t":{"$date":"2021-01-23T02:12:39.000+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"} {"t":{"$date":"2021-01-23T02:12:39.000+00:00"},"s":"I", "c":"NETWORK", "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."} {"t":{"$date":"2021-01-23T02:12:39.000+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"} {"t":{"$date":"2021-01-23T02:12:39.001+00:00"},"s":"I", "c":"STORAGE", "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":5001,"dbPath":"/data/db","architecture":"64-bit","host":"b680df1b4180"}} {"t":{"$date":"2021-01-23T02:12:39.001+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.3","gitVersion":"913d6b62acfbb344dde1b116f4161360acd8fd13","openSSLVersion":"OpenSSL 1.1.1 11 Sep 2018","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu1804","distarch":"x86_64","target_arch":"x86_64"}}}} {"t":{"$date":"2021-01-23T02:12:39.001+00:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"18.04"}}} {"t":{"$date":"2021-01-23T02:12:39.001+00:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*","port":5001},"replication":{"replSet":"mongodb-rs"}}}} {"t":{"$date":"2021-01-23T02:12:39.003+00:00"},"s":"I", "c":"STORAGE", "id":22297, "ctx":"initandlisten","msg":"Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem","tags":["startupWarnings"]} {"t":{"$date":"2021-01-23T02:12:39.003+00:00"},"s":"I", "c":"STORAGE", "id":22315, "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=15544M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"}} {"t":{"$date":"2021-01-23T02:12:39.804+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1611367959:804920][1:0x7f5ea98c9ac0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (0, 0)"}} {"t":{"$date":"2021-01-23T02:12:39.804+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1611367959:804974][1:0x7f5ea98c9ac0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (0, 0)"}} {"t":{"$date":"2021-01-23T02:12:39.811+00:00"},"s":"I", "c":"STORAGE", "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":808}} {"t":{"$date":"2021-01-23T02:12:39.811+00:00"},"s":"I", "c":"RECOVERY", "id":23987, "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}} {"t":{"$date":"2021-01-23T02:12:39.823+00:00"},"s":"I", "c":"STORAGE", "id":4366408, "ctx":"initandlisten","msg":"No table logging settings modifications are required for existing WiredTiger tables","attr":{"loggingEnabled":false}} {"t":{"$date":"2021-01-23T02:12:39.823+00:00"},"s":"I", "c":"STORAGE", "id":22262, "ctx":"initandlisten","msg":"Timestamp monitor starting"} {"t":{"$date":"2021-01-23T02:12:39.827+00:00"},"s":"W", "c":"CONTROL", "id":22120, "ctx":"initandlisten","msg":"Access control is not enabled for the database. Read and write access to data and configuration is unrestricted","tags":["startupWarnings"]} {"t":{"$date":"2021-01-23T02:12:39.834+00:00"},"s":"I", "c":"STORAGE", "id":20536, "ctx":"initandlisten","msg":"Flow Control is enabled on this deployment"} {"t":{"$date":"2021-01-23T02:12:39.837+00:00"},"s":"I", "c":"SHARDING", "id":20997, "ctx":"initandlisten","msg":"Refreshed RWC defaults","attr":{"newDefaults":{}}} {"t":{"$date":"2021-01-23T02:12:39.837+00:00"},"s":"I", "c":"STORAGE", "id":20320, "ctx":"initandlisten","msg":"createCollection","attr":{"namespace":"local.startup_log","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"d9529b4d-4ed9-4758-b817-6371c2dcc4d0"}},"options":{"capped":true,"size":10485760}}} {"t":{"$date":"2021-01-23T02:12:39.847+00:00"},"s":"I", "c":"INDEX", "id":20345, "ctx":"initandlisten","msg":"Index build: done building","attr":{"buildUUID":null,"namespace":"local.startup_log","index":"_id_","commitTimestamp":{"$timestamp":{"t":0,"i":0}}}} {"t":{"$date":"2021-01-23T02:12:39.847+00:00"},"s":"I", "c":"FTDC", "id":20625, "ctx":"initandlisten","msg":"Initializing full-time diagnostic data capture","attr":{"dataDirectory":"/data/db/diagnostic.data"}} {"t":{"$date":"2021-01-23T02:12:39.848+00:00"},"s":"I", "c":"STORAGE", "id":20320, "ctx":"initandlisten","msg":"createCollection","attr":{"namespace":"local.replset.oplogTruncateAfterPoint","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"ed917ea5-0f43-4295-a609-80419ea20491"}},"options":{}}} {"t":{"$date":"2021-01-23T02:12:39.857+00:00"},"s":"I", "c":"INDEX", "id":20345, "ctx":"initandlisten","msg":"Index build: done building","attr":{"buildUUID":null,"namespace":"local.replset.oplogTruncateAfterPoint","index":"_id_","commitTimestamp":{"$timestamp":{"t":0,"i":0}}}} {"t":{"$date":"2021-01-23T02:12:39.857+00:00"},"s":"I", "c":"STORAGE", "id":20320, "ctx":"initandlisten","msg":"createCollection","attr":{"namespace":"local.replset.minvalid","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"190d60d6-7cb7-424c-8984-f356390474a8"}},"options":{}}} {"t":{"$date":"2021-01-23T02:12:39.866+00:00"},"s":"I", "c":"INDEX", "id":20345, "ctx":"initandlisten","msg":"Index build: done building","attr":{"buildUUID":null,"namespace":"local.replset.minvalid","index":"_id_","commitTimestamp":{"$timestamp":{"t":0,"i":0}}}} {"t":{"$date":"2021-01-23T02:12:39.867+00:00"},"s":"I", "c":"STORAGE", "id":20320, "ctx":"initandlisten","msg":"createCollection","attr":{"namespace":"local.replset.election","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"313af4b1-3c34-4ce9-ac09-132cb8cbfe00"}},"options":{}}} {"t":{"$date":"2021-01-23T02:12:39.876+00:00"},"s":"I", "c":"INDEX", "id":20345, "ctx":"initandlisten","msg":"Index build: done building","attr":{"buildUUID":null,"namespace":"local.replset.election","index":"_id_","commitTimestamp":{"$timestamp":{"t":0,"i":0}}}} {"t":{"$date":"2021-01-23T02:12:39.876+00:00"},"s":"I", "c":"REPL", "id":21311, "ctx":"initandlisten","msg":"Did not find local initialized voted for document at startup"} {"t":{"$date":"2021-01-23T02:12:39.876+00:00"},"s":"I", "c":"REPL", "id":21312, "ctx":"initandlisten","msg":"Did not find local Rollback ID document at startup. Creating one"} {"t":{"$date":"2021-01-23T02:12:39.876+00:00"},"s":"I", "c":"STORAGE", "id":20320, "ctx":"initandlisten","msg":"createCollection","attr":{"namespace":"local.system.rollback.id","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"aed2c993-7563-4031-990b-cc6706e6c37a"}},"options":{}}} {"t":{"$date":"2021-01-23T02:12:39.884+00:00"},"s":"I", "c":"INDEX", "id":20345, "ctx":"initandlisten","msg":"Index build: done building","attr":{"buildUUID":null,"namespace":"local.system.rollback.id","index":"_id_","commitTimestamp":{"$timestamp":{"t":0,"i":0}}}} {"t":{"$date":"2021-01-23T02:12:39.884+00:00"},"s":"I", "c":"REPL", "id":21531, "ctx":"initandlisten","msg":"Initialized the rollback ID","attr":{"rbid":1}} {"t":{"$date":"2021-01-23T02:12:39.884+00:00"},"s":"I", "c":"REPL", "id":21313, "ctx":"initandlisten","msg":"Did not find local replica set configuration document at startup","attr":{"error":{"code":47,"codeName":"NoMatchingDocument","errmsg":"Did not find replica set configuration document in local.system.replset"}}} {"t":{"$date":"2021-01-23T02:12:39.885+00:00"},"s":"I", "c":"REPL", "id":40440, "ctx":"initandlisten","msg":"Starting the TopologyVersionObserver"} {"t":{"$date":"2021-01-23T02:12:39.885+00:00"},"s":"I", "c":"REPL", "id":40445, "ctx":"TopologyVersionObserver","msg":"Started TopologyVersionObserver"} {"t":{"$date":"2021-01-23T02:12:39.885+00:00"},"s":"I", "c":"NETWORK", "id":23015, "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-5001.sock"}} {"t":{"$date":"2021-01-23T02:12:39.885+00:00"},"s":"I", "c":"NETWORK", "id":23015, "ctx":"listener","msg":"Listening on","attr":{"address":"0.0.0.0"}} {"t":{"$date":"2021-01-23T02:12:39.886+00:00"},"s":"I", "c":"NETWORK", "id":23016, "ctx":"listener","msg":"Waiting for connections","attr":{"port":5001,"ssl":"off"}} {"t":{"$date":"2021-01-23T02:12:39.888+00:00"},"s":"I", "c":"CONTROL", "id":20714, "ctx":"LogicalSessionCacheRefresh","msg":"Failed to refresh session cache, will try again at the next refresh interval","attr":{"error":"NotYetInitialized: Replication has not yet been configured"}} {"t":{"$date":"2021-01-23T02:12:39.888+00:00"},"s":"I", "c":"CONTROL", "id":20712, "ctx":"LogicalSessionCacheReap","msg":"Sessions collection is not set up; waiting until next sessions reap interval","attr":{"error":"NamespaceNotFound: config.system.sessions does not exist"}}
Now, start the second MongoDB database node (with name mongodb-n2 and using the database port of 5002), execute the following command:
$ docker run -d --rm -it --name mongodb-n2 --net mongodb-net -p 5002:5002 -v $HOME/Downloads/DATA/mongodb/node-2:/data/db mongo:4.4.3 mongod --bind_ip_all --replSet mongodb-rs --port 5002
Next, start the third MongoDB database node (with name mongodb-n3 and using the database port of 5003), execute the following command:
$ docker run -d --rm -it --name mongodb-n3 --net mongodb-net -p 5003:5003 -v $HOME/Downloads/DATA/mongodb/node-3:/data/db mongo:4.4.3 mongod --bind_ip_all --replSet mongodb-rs --port 5003
Make a note of the host's IP address - in our case it is 192.168.1.53.
Finally, time to configure the MongoDB high availability replica set. For this, we will need to connect to one of the nodes in the cluster. To connect to the first Mongodb node, execute the following command:
$ docker exec -it mongodb-n1 mongo --port 5001
The following would be a typical output:
MongoDB shell version v4.4.3 connecting to: mongodb://127.0.0.1:5001/?compressors=disabled&gssapiServiceName=mongodb Implicit session: session { "id" : UUID("9605137b-1afd-4e0c-b51a-bc8622a8dd1b") } MongoDB server version: 4.4.3 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see https://docs.mongodb.com/ Questions? Try the MongoDB Developer Community Forums https://community.mongodb.com --- The server generated these startup warnings when booting: 2021-01-23T02:12:39.003+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem 2021-01-23T02:12:39.827+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted --- --- Enable MongoDB's free cloud-based monitoring service, which will then receive and display metrics about your deployment (disk utilization, CPU, operation statistics, etc). The monitoring data will be available on a MongoDB website with a unique URL accessible to you and anyone you share the URL with. MongoDB may use this information to make product improvements and to suggest MongoDB products and deployment options to you. To enable free monitoring, run the following command: db.enableFreeMonitoring() To permanently disable this reminder, run the following command: db.disableFreeMonitoring() --- >
The prompt will change to >
We need to create a configuration object (in JSON format) specifying the nodes of the replica set. To do that, execute the following command in the > prompt:
> config = {"_id": "mongodb-rs", "members": [{"_id":0, "host":"192.168.1.53:5001"}, {"_id":1, "host":"192.168.1.53:5002"}, {"_id":2, "host":"192.168.1.53:5003"}]}
The very first document field _id is set to the name of the replica set mongodb-rs, which was used when starting each of the mongod nodes of the cluster.
The following would be a typical output:
{ "_id" : "mongodb-rs", "members" : [ { "_id" : 0, "host" : "192.168.1.53:5001" }, { "_id" : 1, "host" : "192.168.1.53:5002" }, { "_id" : 2, "host" : "192.168.1.53:5003" } ] }
It is *VERY* important to use the host's IP address and *NOT* the docker host name (or container name)
To initialize the MongoDB high availabilty replica set, execute the following command:
> rs.initiate(config)
The following would be a typical output:
{ "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1611368632, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1611368632, 1) }
SUCCESS - since we got an "ok" : 1. Also, the prompt will change to either mongodb-rs:PRIMARY> or mongodb-rs:SECONDARY> based on the role this node is designated as - primary oy secondary.
In this case, the prompt changed to mongodb-rs:PRIMARY> indicating this is the primary node.
To check the status of the replica set, execute the following command:
mongodb-rs:PRIMARY> rs.status()
The following would be a typical output:
{ "set" : "mongodb-rs", "date" : ISODate("2021-01-23T02:28:51.453Z"), "myState" : 1, "term" : NumberLong(1), "syncSourceHost" : "", "syncSourceId" : -1, "heartbeatIntervalMillis" : NumberLong(2000), "majorityVoteCount" : 2, "writeMajorityCount" : 2, "votingMembersCount" : 3, "writableVotingMembersCount" : 3, "optimes" : { "lastCommittedOpTime" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "lastCommittedWallTime" : ISODate("2021-01-23T02:28:44.174Z"), "readConcernMajorityOpTime" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "readConcernMajorityWallTime" : ISODate("2021-01-23T02:28:44.174Z"), "appliedOpTime" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "durableOpTime" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "lastAppliedWallTime" : ISODate("2021-01-23T02:28:44.174Z"), "lastDurableWallTime" : ISODate("2021-01-23T02:28:44.174Z") }, "lastStableRecoveryTimestamp" : Timestamp(1611368884, 1), "electionCandidateMetrics" : { "lastElectionReason" : "electionTimeout", "lastElectionDate" : ISODate("2021-01-23T02:24:04.140Z"), "electionTerm" : NumberLong(1), "lastCommittedOpTimeAtElection" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) }, "lastSeenOpTimeAtElection" : { "ts" : Timestamp(1611368632, 1), "t" : NumberLong(-1) }, "numVotesNeeded" : 2, "priorityAtElection" : 1, "electionTimeoutMillis" : NumberLong(10000), "numCatchUpOps" : NumberLong(0), "newTermStartDate" : ISODate("2021-01-23T02:24:04.159Z"), "wMajorityWriteAvailabilityDate" : ISODate("2021-01-23T02:24:05.346Z") }, "members" : [ { "_id" : 0, "name" : "192.168.1.53:5001", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 973, "optime" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2021-01-23T02:28:44Z"), "syncSourceHost" : "", "syncSourceId" : -1, "infoMessage" : "", "electionTime" : Timestamp(1611368644, 1), "electionDate" : ISODate("2021-01-23T02:24:04Z"), "configVersion" : 1, "configTerm" : 1, "self" : true, "lastHeartbeatMessage" : "" }, { "_id" : 1, "name" : "192.168.1.53:5002", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 298, "optime" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "optimeDurable" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2021-01-23T02:28:44Z"), "optimeDurableDate" : ISODate("2021-01-23T02:28:44Z"), "lastHeartbeat" : ISODate("2021-01-23T02:28:50.147Z"), "lastHeartbeatRecv" : ISODate("2021-01-23T02:28:49.652Z"), "pingMs" : NumberLong(0), "lastHeartbeatMessage" : "", "syncSourceHost" : "mongodb-n1:5001", "syncSourceId" : 0, "infoMessage" : "", "configVersion" : 1, "configTerm" : 1 }, { "_id" : 2, "name" : "192.168.1.53:5003", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 298, "optime" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "optimeDurable" : { "ts" : Timestamp(1611368924, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2021-01-23T02:28:44Z"), "optimeDurableDate" : ISODate("2021-01-23T02:28:44Z"), "lastHeartbeat" : ISODate("2021-01-23T02:28:50.147Z"), "lastHeartbeatRecv" : ISODate("2021-01-23T02:28:49.652Z"), "pingMs" : NumberLong(0), "lastHeartbeatMessage" : "", "syncSourceHost" : "mongodb-n1:5001", "syncSourceId" : 0, "infoMessage" : "", "configVersion" : 1, "configTerm" : 1 } ], "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1611368924, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1611368924, 1) }
Finally, to exit from the MongoDB node shell, execute the following command:
mongodb-rs:PRIMARY> exit
The following will be the output:
bye
Basic Concepts
This section will cover some basic concepts of MongoDB:
A Document is the most basic unit of data in MongoDB. It is analogous to a record in the traditional relational database. A document is nothing more than a JSON object, which is an ordered set of key-value pairs, where the key is a string and the value could be: a boolean, a number, a string, an array, or another JSON document. Internally, MongoDB stores a document in an optimized Binary JSON (BSON) format
A Collection is a group of documents. It is analogous to a table in the traditional relational database. Within a collection, documents can have different key-value pairs resulting in a dynamic schema
A Database is a container for collections. A single instance of MongoDB can have many databases. Each database is stored in separate files on the underlying disk
MongoDB by default stores all the database files in the directory /data/db. In our case, we mapped it to a data directory on the host using docker volumes
The default port on which the MongoDB database is listening for MongoDB clients is 27017. In our case, we use ports 5001 through 5003 for each of the nodes in our cluster
Hands-on with MongoDB
The best way to explore MongoDB is to use the command-line interface called mongo, which is nothing more than an interactive JavaScript shell. In the following paragraphs we will explore some basics of MongoDB.
From the Output.8 above, we can infer the PRIMARY node is running on the host port 5001.
To launch the command-line interactive MongoDB client on the PRIMARY node running on the port 5001 using docker, execute the following command:
docker run --rm -it mongo:4.4.3 mongo --host 192.168.1.53 --port 5001 test
The following will be the output:
MongoDB shell version v4.4.3 connecting to: mongodb://192.168.1.53:5001/test?compressors=disabled&gssapiServiceName=mongodb Implicit session: session { "id" : UUID("2597330c-c633-4cc4-ab54-9efb1a7a34b2") } MongoDB server version: 4.4.3 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see https://docs.mongodb.com/ Questions? Try the MongoDB Developer Community Forums https://community.mongodb.com --- The server generated these startup warnings when booting: 2021-01-23T02:12:39.003+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem 2021-01-23T02:12:39.827+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted --- --- Enable MongoDB's free cloud-based monitoring service, which will then receive and display metrics about your deployment (disk utilization, CPU, operation statistics, etc). The monitoring data will be available on a MongoDB website with a unique URL accessible to you and anyone you share the URL with. MongoDB may use this information to make product improvements and to suggest MongoDB products and deployment options to you. To enable free monitoring, run the following command: db.enableFreeMonitoring() To permanently disable this reminder, run the following command: db.disableFreeMonitoring() --- mongodb-rs:PRIMARY>
To list all the currently available databases, execute the following command:
mongodb-rs:PRIMARY> show dbs
The following will be the output:
admin 0.000GB config 0.000GB local 0.000GB
The above three MongoDB databases are system specific internal databases.
By default the MongoDB client connects to the database called test, which will be physically created only when we perform some operation on that database.
Executing any of the MongoDB database commands on the SECONDARY node(s), will produce the following error: uncaught exception: Error: listDatabases failed:{ "topologyVersion" : { "processId" : ObjectId("6025cd3844aef80738ee67f5"), "counter" : NumberLong(5) }, "operationTime" : Timestamp(1613090185, 1), "ok" : 0, "errmsg" : "not master and slaveOk=false", "code" : 13435, "codeName" : "NotPrimaryNoSecondaryOk", "$clusterTime" : { "clusterTime" : Timestamp(1613090185, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } More on this in a future part of this series.
For our demo, we will create our own database called mydb. To create the mydb database, execute the following command:
mongodb-rs:PRIMARY> use mydb
The following will be the output:
switched to db mydb
Notice that we only indicated that we want to use the mydb database; we did not create one. MongoDB uses lazy initialization and delays the creation of the database physically until we create a collection and add a document to it.
MongoDB client sets the global variable db to the current database in use.
To check the database currently in use, execute the following command:
mongodb-rs:PRIMARY> db
The following will be the output:
mydb
To perform any operation on a database, we will use the global variable db. For the demo, we will work with the collection contacts. To access this collection, we refer to it as db.contacts. Again, just as MongoDB did not physically create a database, MongoDB will defer the creation of the collection contacts until we add at least one document to that collection.
To list all the collection(s) in a database, execute the following command:
mongodb-rs:PRIMARY> show collections
The output will be empty indicating that there are no collection(s) yet.
To create the collection contacts, we need to add at least one document to the collection. To add a new document to a collection, use the insert() command. Let us add a new document by executing the following command:
mongodb-rs:PRIMARY> db.contacts.insert({ first: "Alice", last: "Thompson", email: { personal: "alice.t@home.io", work: "alice.thompson@work.net" }, mobile: { personal: "123 456 7890" } })
The following will be the output:
WriteResult({ "nInserted" : 1 })
The above output indicates that there were no errors and the document was successfully added.
This is similar to the INSERT INTO contacts VALUES(...) SQL statement from the relational world.
Let us now list all the collection(s) in a database by executing the following command:
mongodb-rs:PRIMARY> show collections
The following will be the output:
contacts
As can be seen from the above output, MongoDB has created the collection contacts.
To display the number of documents in the collection contacts, use the count() command. Now, execute the following command:
mongodb-rs:PRIMARY> db.contacts.count()
The following will be the output:
1
As can be seen from the above output, we have 1 document in the collection contacts.
This is similar to the SELECT COUNT(*) FROM contacts SQL statement from the relational world.
To display all the documents in the collection contacts, use the find() command. Now, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find()
The following will be the output:
{ "_id" : ObjectId("600b8fd9df2ffe1ec2eaa031"), "first" : "Alice", "last" : "Thompson", "email" : { "personal" : "alice.t@home.io", "work" : "alice.thompson@work.net" }, "mobile" : { "personal" : "123 456 7890" } }
As can be seen from the above output, we see the document we inserted into the collection contacts earlier.
This is similar to the SELECT * FROM contacts SQL statement from the relational world.
But wait !!! What is with the key _id ??? We never had that in the document when we added it.
Every MongoDB document must have a unique key by with the document can be identified. This is analogous to the primary key of a table in relational database. The key _id is the unique primary key automatically added by MongoDB.
The document key _id is an object of type ObjectId which contains a hex-string of 12 bytes that is guaranteed to be unique across a cluster of machines and is generated by concatenating:
Time in Seconds since Epoch (4 bytes)
Hash of the Machine Host Name (3 bytes)
Process ID of mongod (2 bytes)
Next Value from an Incrementing Counter (3 bytes)
To display all the documents in the collection contacts in a prettier readable format, use the pretty() command. Now, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find().pretty()
The following will be the output:
{ "_id" : ObjectId("600b8fd9df2ffe1ec2eaa031"), "first" : "Alice", "last" : "Thompson", "email" : { "personal" : "alice.t@home.io", "work" : "alice.thompson@work.net" }, "mobile" : { "personal" : "123 456 7890" } }
The insert() command on execution just returns a status of how many document(s) were inserted. What if we need want the key _id value after the addition ? In order to do that, use the insertOne() command. Let us add a new document by executing the following command:
mongodb-rs:PRIMARY> db.contacts.insertOne({ first: "Bob", last: "Jones", email: { work: "bobj@doktor.net" }, mobile: { work: "234 567 8901" } })
The following will be the output:
{ "acknowledged" : true, "insertedId" : ObjectId("600b94a5df2ffe1ec2eaa032") }
Now let us insert 4 more documents to the collection contacts by executing the following commands:
mongodb-rs:PRIMARY> db.contacts.insertOne({ first: "Charlie", last: "Lee", email: { personal: "cl3000@ranch.net" } })
mongodb-rs:PRIMARY> db.contacts.insert({ first: "Eve", middle: "Jo", last: "Parker", email: { work: "ej_parker@awesome.org" }, mobile: { personal: "345 678 9012" } })
mongodb-rs:PRIMARY> db.contacts.insert({ first: "Frank", last: "Smith", email: { personal: "frank45@root.org", work: "frank.smith@excellent.net" }, mobile: { personal: "456 789 0123", work: "567 890 1234" } })
mongodb-rs:PRIMARY> db.contacts.insertOne({ first: "Frank", last: "Cooper", email: { personal: "frankc@runner.org" } })
Now, let us query and display all the documents in the collection contacts by executing the following command:
mongodb-rs:PRIMARY> db.contacts.find()
The following will be the output:
{ "_id" : ObjectId("600b8fd9df2ffe1ec2eaa031"), "first" : "Alice", "last" : "Thompson", "email" : { "personal" : "alice.t@home.io", "work" : "alice.thompson@work.net" }, "mobile" : { "personal" : "123 456 7890" } } { "_id" : ObjectId("600b94a5df2ffe1ec2eaa032"), "first" : "Bob", "last" : "Jones", "email" : { "work" : "bobj@doktor.net" }, "mobile" : { "work" : "234 567 8901" } } { "_id" : ObjectId("600b94fadf2ffe1ec2eaa033"), "first" : "Charlie", "last" : "Lee", "email" : { "personal" : "cl3000@ranch.net" } } { "_id" : ObjectId("600b9513df2ffe1ec2eaa034"), "first" : "Eve", "middle" : "Jo", "last" : "Parker", "email" : { "work" : "ej_parker@awesome.org" }, "mobile" : { "personal" : "345 678 9012" } } { "_id" : ObjectId("600b9527df2ffe1ec2eaa035"), "first" : "Frank", "last" : "Smith", "email" : { "personal" : "frank45@root.org", "work" : "frank.smith@excellent.net" }, "mobile" : { "personal" : "456 789 0123", "work" : "567 890 1234" } } { "_id" : ObjectId("600b9636df2ffe1ec2eaa036"), "first" : "Frank", "last" : "Cooper", "email" : { "personal" : "frankc@runner.org" } }
To query all the document(s) on the key first with a value of Bob from the collection contacts, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({ first: "Bob" })
The following will be the output:
{ "_id" : ObjectId("600b94a5df2ffe1ec2eaa032"), "first" : "Bob", "last" : "Jones", "email" : { "work" : "bobj@doktor.net" }, "mobile" : { "work" : "234 567 8901" } }
As can be seen from the above output, we have one document from the collection contacts with the key first having a value of Bob.
This is similar to the SELECT * FROM contacts WHERE first = "Bob" statement from the relational world.
To query all the document(s) on the key first with a value of Charlie and on the key last with a value of Lee from the collection contacts, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({ first: "Charlie", last: "Lee" })
The following will be the output:
{ "_id" : ObjectId("600b94fadf2ffe1ec2eaa033"), "first" : "Charlie", "last" : "Lee", "email" : { "personal" : "cl3000@ranch.net" } }
As can be seen from the above output, we have one document from the collection contacts with the key first having a value of Charlie and the key last having a value of Lee.
This is similar to the SELECT * FROM contacts WHERE first = "Charlie" AND last = "Lee" statement from the relational world.
To query all the document(s) on the key first with a value of Frank from the collection contacts and display them in a pretty JSON format, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({ first: "Frank" }).forEach(printjson)
The following will be the output in a pretty JSON format:
{ "_id" : ObjectId("600b9527df2ffe1ec2eaa035"), "first" : "Frank", "last" : "Smith", "email" : { "personal" : "frank45@root.org", "work" : "frank.smith@excellent.net" }, "mobile" : { "personal" : "456 789 0123", "work" : "567 890 1234" } } { "_id" : ObjectId("600b9636df2ffe1ec2eaa036"), "first" : "Frank", "last" : "Cooper", "email" : { "personal" : "frankc@runner.org" } }
What if we desire to find all the document(s) based on the key work that is inside the key email ??? The search key to use would be the composite key "email.work". The dot between email and work instructs the query engine to look for a key named email that contains an inner key named work and then to match the value of the inner key. To query all the document(s) on the key "email.work" with a value of "bobj@doktor.net" from the collection contacts, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({ "email.work": "bobj@doktor.net" })
The following will be the output:
{ "_id" : ObjectId("600b94a5df2ffe1ec2eaa032"), "first" : "Bob", "last" : "Jones", "email" : { "work" : "bobj@doktor.net" }, "mobile" : { "work" : "234 567 8901" } }
To query all the document(s) and list only the keys first and last from the collection contacts, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({}, { first: 1, last: 1 })
The following will be the output:
{ "_id" : ObjectId("600b8fd9df2ffe1ec2eaa031"), "first" : "Alice", "last" : "Thompson" } { "_id" : ObjectId("600b94a5df2ffe1ec2eaa032"), "first" : "Bob", "last" : "Jones" } { "_id" : ObjectId("600b94fadf2ffe1ec2eaa033"), "first" : "Charlie", "last" : "Lee" } { "_id" : ObjectId("600b9513df2ffe1ec2eaa034"), "first" : "Eve", "last" : "Parker" } { "_id" : ObjectId("600b9527df2ffe1ec2eaa035"), "first" : "Frank", "last" : "Smith" } { "_id" : ObjectId("600b9636df2ffe1ec2eaa036"), "first" : "Frank", "last" : "Cooper" }
As can be seen from the above output, it shows all the documents from the collection contacts with the keys first and last.
This is similar to the SELECT first, last FROM contacts SQL statement from the relational world.
But WAIT !!! Why is the key _id showing up ??? We never asked for it - did we ?
MongoDB by default includes the key _id in every query irrespective of whether we asked for it or not. If we do not want the key _id to show up, we need to explicitly suppress it.
To query all the document(s) and list only the keys first and last (without the key _id) from the collection contacts, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({}, { first: 1, last: 1, _id: 0 })
The following will be the output:
{ "first" : "Alice", "last" : "Thompson" } { "first" : "Bob", "last" : "Jones" } { "first" : "Charlie", "last" : "Lee" } { "first" : "Eve", "last" : "Parker" } { "first" : "Frank", "last" : "Smith" } { "first" : "Frank", "last" : "Cooper" }
To query all the document(s) and list only the keys first, last, and mobile.personal from the collection contacts, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({}, { first: 1, last: 1, "mobile.personal": 1, _id: 0 })
The following will be the output:
{ "first" : "Alice", "last" : "Thompson", "mobile" : { "personal" : "123 456 7890" } } { "first" : "Bob", "last" : "Jones", "mobile" : { } } { "first" : "Charlie", "last" : "Lee" } { "first" : "Eve", "last" : "Parker", "mobile" : { "personal" : "345 678 9012" } } { "first" : "Frank", "last" : "Smith", "mobile" : { "personal" : "456 789 0123" } } { "first" : "Frank", "last" : "Cooper" }
Until now we have been using the find() command on the MongoDB collection contacts and it appears to return a list of documents from that collection. In reality, the find() command returns a database cursor and not a list of documents (even if there is one entry).
Since MongoDB client is also a Javascript engine, we can iterate the database cursor from the command-line interface. Execute the following commands in the command-line interface:
mongodb-rs:PRIMARY> var cur = db.contacts.find({}, { first: 1, last: 1, _id: 0 })
while (cur.hasNext()) {
... var doc = cur.next();
... print("First name: " + doc.first + ", Last name: " + doc.last);
... }
The following will be the output:
First name: Alice, Last name: Thompson First name: Bob, Last name: Jones First name: Charlie, Last name: Lee First name: Eve, Last name: Parker First name: Frank, Last name: Smith First name: Frank, Last name: Cooper
This is cool, ain't it !!!
Now, to query and return an actual document for the key first with a value of Eve from the collection contacts, execute the following command:
mongodb-rs:PRIMARY> db.contacts.findOne({ first: "Eve" })
The following will be the output:
{ "_id" : ObjectId("600b9513df2ffe1ec2eaa034"), "first" : "Eve", "middle" : "Jo", "last" : "Parker", "email" : { "work" : "ej_parker@awesome.org" }, "mobile" : { "personal" : "345 678 9012" } }
To limit the number of documents returned by the find() query command, use the limit() function. To demonstrate this capability, execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({}, { first: 1, last: 1, "mobile.personal": 1, _id: 0 }).limit(3)
The following will be the output:
{ "first" : "Alice", "last" : "Thompson", "mobile" : { "personal" : "123 456 7890" } } { "first" : "Bob", "last" : "Jones", "mobile" : { } } { "first" : "Charlie", "last" : "Lee" }
We will COVER more advanced queries in a later part in this series.
Let us move on to updating documents now.
To update a document, use the update() function.
Let us go ahead and update the document for the key first with a value of Charlie to contain the key mobile. For this, let us execute the following command:
mongodb-rs:PRIMARY> db.contacts.update({ first: "Charlie" }, { mobile: { personal: "678 901 2345" } } )
The following will be the output:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
Now, let us query the document for the key first with a value of Charlie from the collection contacts by executing the following command:
mongodb-rs:PRIMARY> db.contacts.find({ first: "Charlie" })
There will be nothing returned. No document found ??? What happened here ???
The default behavior of the update() command is to replace the whole document. If we query the document for the key mobile with a value of { personal: "678 901 2345" }, we will find the document. Let us execute the following command:
mongodb-rs:PRIMARY> db.contacts.find({ mobile: { personal: "678 901 2345" } })
The following will be the output:
{ "_id" : ObjectId("600b94fadf2ffe1ec2eaa033"), "mobile" : { "personal" : "678 901 2345" } }
Let us fix the document for the key mobile with a value of { "personal" : "678 901 2345" } to contain the missing keys first, last, email, and mobile. For this, let us execute the following command:
mongodb-rs:PRIMARY> db.contacts.update({ mobile: { personal: "678 901 2345" } }, { first: "Charlie", last: "Lee", email: { personal: "cl3000@ranch.net" }, mobile: { personal: "678 901 2345" } })
Now, we should be able to query the document for the key first with a value of Charlie from the collection contacts by executing the following command:
mongodb-rs:PRIMARY> db.contacts.find({ first: "Charlie" })
The following will be the output:
{ "_id" : ObjectId("600b94fadf2ffe1ec2eaa033"), "first" : "Charlie", "last" : "Lee", "email" : { "personal" : "cl3000@ranch.net" }, "mobile" : { "personal" : "678 901 2345" } }
What would happen if we try to update a document that does not EXIST ???
Let us go ahead and update the document for the key first with a value of George. For this, let us execute the following command:
mongodb-rs:PRIMARY> db.contacts.update({ first: "George" }, { first: "George", last: "Baker", email: { work: "g_baker@crap.org" }, mobile: { work: "789 012 3456" } })
The following will be the output:
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
As is evident from the Output.34 above, nothing was matched, inserted, or updated. What if we desired the document be updated if present or inserted if NOT present - an upsert operation ???
To upsert a document, use the update() function with a third parameter specifying the option { upsert: true }.
Let us go ahead and upsert the document for the key first with a value of George. For this, let us execute the following command:
mongodb-rs:PRIMARY> db.contacts.update({ first: "George" }, { first: "George", last: "Baker", email: { work: "g_baker@crap.org" }, mobile: { work: "789 012 3456" } }, { upsert: true })
The following will be the output:
WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : ObjectId("601c911b8d37d50b592856f5") })
Now, we should be able to query the document for the key first with a value of George from the collection contacts by executing the following command:
mongodb-rs:PRIMARY> db.contacts.find({ first: "George" })
The following will be the output:
{ "_id" : ObjectId("601c911b8d37d50b592856f5"), "first" : "George", "last" : "Baker", "email" : { "work" : "g_baker@crap.org" }, "mobile" : { "work" : "789 012 3456" } }
We will COVER more advanced updates in a later part in this series.
Let us move on to deleting documents now.
To delete a document, use the remove() function.
Let us go ahead and delete the document for the key first with a value of Bob . For this, let us execute the following command:
mongodb-rs:PRIMARY> db.contacts.remove({ first: "Bob" })
The following will be the output:
WriteResult({ "nRemoved" : 1 })
This is similar to the DELETE FROM contacts WHERE first = "Bob" SQL statement from the relational world.
Now, let us query all the documents from the collection contacts by executing the following command:
mongodb-rs:PRIMARY> db.contacts.find()
The following will be the output:
{ "_id" : ObjectId("600b8fd9df2ffe1ec2eaa031"), "first" : "Alice", "last" : "Thompson", "email" : { "personal" : "alice.t@home.io", "work" : "alice.thompson@work.net" }, "mobile" : { "personal" : "123 456 7890" } } { "_id" : ObjectId("600b94fadf2ffe1ec2eaa033"), "first" : "Charlie", "last" : "Lee", "email" : { "personal" : "cl3000@ranch.net" } } { "_id" : ObjectId("600b9513df2ffe1ec2eaa034"), "first" : "Eve", "middle" : "Jo", "last" : "Parker", "email" : { "work" : "ej_parker@awesome.org" }, "mobile" : { "personal" : "345 678 9012" } } { "_id" : ObjectId("600b9527df2ffe1ec2eaa035"), "first" : "Frank", "last" : "Smith", "email" : { "personal" : "frank45@root.org", "work" : "frank.smith@excellent.net" }, "mobile" : { "personal" : "456 789 0123", "work" : "567 890 1234" } } { "_id" : ObjectId("600b9636df2ffe1ec2eaa036"), "first" : "Frank", "last" : "Cooper", "email" : { "personal" : "frankc@runner.org" } } { "_id" : ObjectId("601c911b8d37d50b592856f5"), "first" : "George", "last" : "Baker", "email" : { "work" : "g_baker@crap.org" }, "mobile" : { "work" : "789 012 3456" } }
As can be seen from the above output, the document for the key first with a value of Bob is gone !!!
To delete all the documents from a collection, use the remove({}) function without any criteria.
Let us go ahead and delete all the documents by executing the following command:
mongodb-rs:PRIMARY> db.contacts.remove({})
The following will be the output:
WriteResult({ "nRemoved" : 6 })
This is similar to the DELETE FROM contacts SQL statement from the relational world.
Now let us display the number of documents in the collection contacts by executing the following command:
mongodb-rs:PRIMARY> db.contacts.count()
The following will be the output:
0
As can be seen from the above output, all the documents from the collection contacts are gone !!!
To drop the collection contacts, use the drop() function.
Let us go ahead and drop the collection contacts by executing the following command:
mongodb-rs:PRIMARY> db.contacts.drop()
The following will be the output:
true
This is similar to the DROP TABLE contacts SQL statement from the relational world.
Finally, to exit the MongoDB command-line shell, execute the following command:
exit
References