Wednesday, 30 November 2016

MongoDB Basics II - Data Types

For MongoDB version 3.4 the data types are as follows

TypeNumberAliasNotes
Double1“double”stores floating point values
String2“string”stores UTF8 text
Object3“object”stores embedded documents
Array4“array”can store multiple values of multiple data types
Binary data5“binData”stores binary data
Undefined6“undefined”Deprecated. Works like Null
ObjectId7“objectId”stores the document's ID. 12 bytes, system generated
Boolean8“bool”true or false
Date9“date”64-bit integer representing milliseconds since 01.01.1970
Null10“null”
Regular Expression11“regex”
DBPointer12“dbPointer”Deprecated.
JavaScript13“javascript”stores JavaScript data without Scope
Symbol14“symbol”Deprecated.
JavaScript (with scope)15“javascriptWithScope”stores JavaScript data with Scope
32-bit integer16“int”
Timestamp17“timestamp”64-bit integer. 32 bits representing seconds since 01.01.1970 + 32 bits incrementing. Unique per Mongo instance
64-bit integer18“long”
Decimal12819“decimal”New in version 3.4
 Wikipedia:Decimal128_floating-point_format
Min key-1“minKey”always the lowest result in a sort
Max key127“maxKey”always highest result in a sort

Each data type has a number and string alias that can be used with the $type operator to query documents by BSON type. Additionally, there is a "number" alias that corresponds to all the numeric data-types.

db.addresses.find( { "areaCode" : { $type : "number" } } )
This will find all the Documents in the addresses Collection where areaCode is numeric
 
db.addresses.find( { "areaCode" : { $type : "string" } } )
This will find all the Documents in the addresses Collection where areaCode is text

Types can be implicitly defined:
{
 screenname: "Max",                    //  string
 age: 29,                              // integer
 visible: false,                       // boolean
 subscriptions: ["mongoDB","MariaDB"]  // array
 "phone" : {"cell" : "555-123-4567",
            "office" : "555-312-6789"},// Object (embedded document)

 email: null                           // null
}


or explicitly defined:
{
 "_id" : ObjectId("5b6b9f34a1f359177d123a5a"),
 field1: NumberLong(123456) ,
 field2: NumberInt(123456789)
}





Tuesday, 29 November 2016

MongoDB Basics I - Data Structures

Here's a quick look at some of the MongoDB basics:
  • Documents: The basic unit of data in MongoDB is a Document composed of Field and Value pairs which is equivalent to a row in a relational DB.
    Documents use JSON format but are stored internally as BSON
    A sample document structure:
    {
       _id: ObjectId("5087250df3f4948bd2f72351"),
       "field2": value2,
       ...
       fieldN: "valueN"
    }

    Fields are case-sensitive. Values are case and type-sensitive.
  • Collections: MongoDB stores Documents in Collections, which are analogous to tables in a relational DB but with a dynamic schema.
    Collections can be created explicitly or implicitly during an insert or index creation.
  • Databases: Collections are grouped into Databases. Databases are logical and physical. Each Database has it's own permissions and files in the filesystem. A MongoDB server can host multiple independent databases, each having its own collections.
    Databases can be created dynamically by using them
  • _id: Every document has an identifying key, "_id", that is unique within a collection.
Running the following in MongoDB's JavaScript shell will create a new Database and Collection (if they don't already exist) and a new Document:

use
myNewDB
db.myNewCollection1.insert( { x: 1 } )


As no _id field:value pair is specified here, it will also be created by default with an ObjectId. If we specify it instead, the field can be any data type but its value must be unique in the Collection.

To query Collections we use the find command.
db.collection.find(query, projection)

query and projection are optional parameters used to filter Documents and Fields respectively.

To return the _id of the Document we selected above (and any others where x=1) we would run:
db.myNewCollection.find({x: 1}, {_id: 1})

Documents can be embedded within parent Documents. Here, the name and contact fields are embedded Documents:
{
   _id: 1001
   name: { first: "Max", last: "Musterman" },
   contact: { phone: { type: "cell", number: "555-123-4567" } },
   fieldN: "valueN"
}

We reference fields within embedded Documents using dotted notation, in this case if it were the clients Collection we would use:
db.clients.name.last
or
db.clients.contact.phone


To query for all the Documents in the Collection for "Max Musterman" would run
db.clients.find({ name: { last: "Musterman", first: "Max" }})

Embedded Documents are useful for denormalising data and avoiding joins.



Monday, 28 November 2016

MongoDB Europe 2016

We attended the MongoDB Europe conference in London.

Our company was making a technical presentation at this event, but I was not involved in presenting and was there simply as an attendee.

Overall, I didn't find MongoDB Europe as valuable as other such conferences (eg IDUG DB2 or Oracle World). I think it would be greatly improved by extending it to at least three days and making all the additional content user presentations.

The keynote address by Prof Brian Cox was not very revelant and just barely touched on the tenuous link to MongoDB ... that some observatory data was hosted in MongoDB. Other than that, his talk had nothing to do with MongoDB and was a real waste of an hour in a one-day event.

The vendor presentations focused quite heavily on pushing their cloud solution, Atlas, and less on the release of MongoDB 3.4 than I had hoped.

Recordings of the presentations for "Shard 1" are available online:
  1. Welcome
    Dev Ittycheria, CEO, MongoDB
  2. MongoDB 3.4 preview and introduction to MongoDB Atlas
    Eliot Horowitz, CTO and Co-founder, MongoDB
  3. Debugging MongoDB Performance
    Asya Kamsky, Lead Product Manager, MongoDB
  4. Building WiredTiger
    Keith Bostic, Senior Staff Engineer, MongoDB
  5. Distributed Ledgers, Blockchain + MongoDB
    Bryan Reinero, Product Manager, MongoDB
  6. MongoDB Atlas
    Andrew Davidson, Product Manager, MongoDB
  7. Who’s Helping Themselves To Your Data? Demystifying MongoDB’s Security Capabilities
    Paul Done, Solutions Architect, MongoDB

Slides from the other presentations in Shards 2 and 3 are available here

The most interesting sessions for me were
Debugging MongoDB Performance
Instant Search from Amadeus
Comparison of Drivers

MongoDB University M202: MongoDB Advanced Deployment and Operations

M202: MongoDB Advanced Deployment and Operations is an advanced course for operations staff and DBAs. It gives a much better understanding of MongoDB concepts and is backed-up by much more hands-on work than the introductory M102 Course

The course lectures are provided via YouTube videos as normal with MongoDB University and the practical side is performed on a provided VM installation.

I took the August 2016 release of the course and passed with a 100% grade.

Chapter 1: System Sizing and Tuning 
Installing your VMs, MongoDB's use of memory, pre-heating data, spinning disks, SSDs, RAID, network storage, swap space, readahead, MongoDB CPU and disk usage

Chapter 2: Backup Options and Disaster Recovery 
Disaster recovery requirements, assessing tolerance for data loss, assessing tolerance for downtime, disaster recovery in sharded clusters, backup strategies

Chapter 3: Fault Tolerance and Availability  Rolling maintenance
Reading from secondaries, driver options, connection management, read preferences, rollback 

Chapter 4: Sharded Cluster Management 
Scaling out, config servers, periodic maintenance, the mongos process, chunks and splitting, pre-splitting data, the balancer, migration, tag-based sharding, hash-based sharding, unbalanced chunks, orphaned chunks, removing a shard

Chapter 5: Log Files
database profiler, examining log files, mtools

Chapter 6: Security, Authentication and Authorization 
Native authentication, authorization roles, securing MongoDB, using SSL and x509 with MongoDB

Final Exam 

Sunday, 27 November 2016

MongoDB University Course M102: MongoDB for DBAs

My first introduction to MongoDB was to sign up to university.mongodb.com and take the course M102: MongoDB for DBAs. This is the basic course for DBAs and while I wouldn't say it made me feel production-ready, it demystified MongoDB and gave me a good overview of JSON and the various MongoDB concepts.

The course lectures are provided via YouTube videos as normal with MongoDB University and the practical side is performed on a personal MongoDB installation.

I installed the latest version of MongodDB version 3.2.6 for Windows although the course recommended 3.2.2. However, I didn't have any problems related to the version.

I did encounter some problems due to starting MongoDB without the necessary permissions. This comes from how you start the Windows command line interface, cmd.exe. Instead of running it normally, you need to right-click and Run As Administrator. Then when you start your MongoDB processes they will function correctly. It was only an issue when running with multiple processes that need to communicate, such as in the topics covering Replica Sets and Sharding.

I took the May 2016 release of the course and passed with a 100% grade.

The questions were all quite straightforward and covered in the online course material.

Chapter 1: Introduction 
Introduction to MongoDB, key concepts and installing Mongo

Homework 1.1
What do you get as a result?


Homework 1.2
What's the result?


Homework 1.3
Now, what query would you run to get all the products where brand equals the string "ACME"?


Homework 1.4
Check all that apply:
 var c = db.products.find( { }, { name : 1, _id : 0 } ).sort( { name : 1 } ); while( c.hasNext() ) { print( c.next().name); }
 var c = db.products.find( { } ).sort( { name : 1 } ); c.forEach( function( doc ) { print( doc.name ) } );


Chapter 2: CRUD and Administrative Commands  
Creating, reading and updating data

Homework 2.1
What is the output? (The above will check that products_bak is populated.)


Homework 2.2
What is the output?


Homework 2.3
How many products have a voice limit? (That is, have a voice field present in the limits subdocument.)



Chapter 3: Performance 
Indexing and monitoring

Homework 3.1
When you are done, run:
homework.a()
and enter the numeric result below (no spaces).


Homework 3.2
Once you have eliminated the slow operation, run (on your second tab):
homework.c()
and enter the output below. Once you have it right and are ready to move on, ctrl-c (terminate) the shell that is still running the homework.b() function.


Homework 3.3
  • Q1: How many products match this query?
  • Q2: Run the same query, but this time do an explain(). How many documents were examined?
  • Q3: Does the explain() output indicate that an index was used?
Check all that apply:
Which of the following are available in WiredTiger but not in MMAPv1? Check all that apply.
Check all that apply:


Chapter 4: Replication 
Replication, Failover, Recovery
Homework 4.1
Now run:
homework.a()
and enter the result. This will simply confirm all the above happened ok.

Homework 4.2
Once done with that, run
homework.b()
in the mongo shell and enter that result below.

Homework 4.3
Once you have two secondary servers, both of which have sync'd with the primary and are caught up, run (on your primary):
homework.c()
and enter the result below.

Homework 4.4
When done, run:
> homework.d()
and enter the result.

Homework 4.5
What result does this expression give when evaluated?
db.oplog.rs.find( { } ).sort( { $natural : 1 } ).limit( 1 ).next( ).o.msg[0]

Chapter 5: Replication Part 2 
Optimizing and monitoring your Replica Sets
Homework 5.1
what is the text in the "state" field for the arbiter when you run rs.status()?

Homework 5.2
Which of the following options will allow you to ensure that a primary is available during server maintenance, and that any writes it receives will replicate during this time?

Homework 5.3
You only have two data centers available. Which arrangement(s) of servers will allow you to be stay up (as in, still able to elect a primary) in the event of a failure of either data center (but not both at once)? Check all that apply.

Homework 5.4
Find out the optional parameter that you'll need, and input it into the box below for your rs.reconfig(new_cfg, OPTIONAL PARAMETER).

Chapter 6: Scalability 
Sharding setup, sharding monitoring, shard key selection, inserting large amounts of data
Homework 6.1
Run homework.a() and enter the result below. This method will simply verify that this simple cluster is up and running and return a result key.

Homework 6.2
Run homework.b() to verify the above and enter the return value below.

Homework 6.3
When done, run homework.c() and enter the result value.

Chapter 7: Backup and Recovery 
Security, backups and restoring for backups

Final Exam:
Question 1:
How many documents do you have?

Question 2:
Question: Which of the following are true about mongodb's operation in these scenarios? Check all that apply.
Check all that apply.
Choose the best answer:

Reconfigure the replica set so that the third member can never be primary. Then run:
$ mongo --shell a.js --port 27003
And run:
> part4()
And enter the result in the text box below (with no spaces or line feeds just the exact value returned).

Which of these statements is true?
Check all that apply:

db.postings.find( { "comments.flagged" : true } )

db.postings.update(
  { _id: . . . , voters:{$ne:'joe'} },
  { $inc : {votes:1}, $push : {voters:'joe'} } );



Question 6:
Which of these statements is true?
Check all that apply:

Question 7:
Which of these statements is true?
Check all that apply:
Once you have the config server running, confirm the restore of the config server data by running the last javascript line below in the mongo shell, and entering the 5 character result it returns.
Connect to the mongos with a mongo shell. Run this:
use snps
var x = db.elegans.aggregate( [ { $match : { N2 : "T" } } , { $group : { _id:"$N2" , n : { $sum : 1 } } } ] ).next(); print( x.n )
Enter the number output for n.
Based on the explain output, which of the following statements below are true?
Check all that apply:

 

Welcome

I am a DBA with 20 years experience of RDMSs from DB2 V5 on mainframes to Oracle and MariaDB on unix.

My company is now jumping on the NoSQL bandwagon and we are adding MongoDB to our portfolio. We are actually quite far along this path already and are one of the largest MongoDB shops in Europe. The problem we face now is that we don't have any MongoDB DBAs in Ops to support it all and our recruiters seemingly can't find any MongoDB DBAs out there to recruit. So responsibility for maintenance is being handed to the SQL DBAs! Problem solved!!

I plan to use this Blog to document my personal MongoDB journey, and the mistakes we will no doubt make along the way.