CST-363 Week 7

This week was our seventh week in CST-363, also known as Introduction to Database Systems.

Questions

Compare MongoDB with MySQL.

What are some similarities?

Both MongoDB and MySQL are database applications. Both function similarly in fundamental ways, such as having tables/collections.
Both are best used alongside a general-purpose coding language, such as through the use of an API.

What are some differences?

MySQL uses SQL, while MongoDB uses Javascript. Because of this the two applications are very different to use. Using the two applications with Java works very differently also.
MySQL uses relational databases, while MongoDB uses document databases. This results in the two applications acting very differently. For example, MySQL has strict schema, whereas MongoDB has flexible schema.

When would you choose one over the other?

If the database needed to be very structured, or the structure was guaranteed, then I would use MySQL. If the database deals with Big Data, or needs to have a flexible schema, then I would use MongoDB.

Database Architecture Vocabulary

MySQL Architecture

Architecture: Describes the components of a computer system and the relationships between components.

Tools: Interact directly with database users and administrators, and send queries to the query processor.

Query Processor: Manages connections from multiple users and compiles queries into low-level instructions for the storage engine.
Storage Engine: Also called a storage manager, executes instructions, manages indexes, and interacts with the file system.
File System: Accesses data on storage media.

Utility Programs: Include approximately 30 tools, grouped in five categories: installation, client, administrative, developer, and miscellaneous tools. Most utility programs are intended for database administrators or programmers.

Connection: A link between tools and the query processor. Each connection specifies a database name, server address, logon name, and password.

Execution Plan: A detailed, low-level sequence of steps that specify exactly how to process a query.

Query Parser: Checks each query for syntax errors and converts valid queries to an internal representation.

Query Optimizer: Reads the internal representation, generates alternative execution plans, estimates execution times, and selects the fastest plan.

Cache Manager: Stores reusable information in main memory.

Buffer Manager: Retains data blocks from the file system for possible reuse.

Buffer: The data blocks are retained in an area of main memory called the buffer.

Least Recently Used Algorithm: The LRU algorithm tracks the time each block was last accessed and, when space is needed, discards 'stale' blocks.

Catalog: Also known as a data dictionary, is a directory of tables, columns, keys, indexes, and other objects in a relational database.

Cloud Databases

Tier: Multiple computers linked by a network are often grouped in layers, called tiers, and arranged in a hierarchy.

Single-Tier Architecture: Consists of personal or corporate computer connected directly to monitors.

Multi-Tier Architecture

The top tier consists of computers interacting directly with end-users.
The bottom tier consists of servers managing resources like databases and email.
One or more middle tiers execute a variety of functions, such as user authorization, business logic, and communication with other computers.

Web Architecture: A multi-tier architecture consisting of web browsers and web servers communicating over the internet:

Web browsers, on the top tier, manage user interaction.
Web servers, on a middle tier, generate web pages for display on web browsers and transmit user requests to services running on lower tiers.
Application servers run application software, process user requests, and communicate with databases and other services.
Services, such as database and authentication, comprise the bottom tier.

On-Premise: Prior to 2000, most software was installed and ran on customer computers.

Cloud Services: Since 2000, cloud services have increasingly replaced on-premise software. With cloud services, a vendor such as Amazon, Microsoft, or Google implements computer services on lower tiers of a web architecture.

Infrastructure-as-a-service (IaaS): Provides computer processing, memory, and storage media, as if the customer were renting a computer.
Platform-as-a-service (PaaS): Provides tools and services, such as databases, application development tools, and messaging services.
Software-as-a-service (SaaS): Provides complete applications, usually through web browsers on customer machines.

Virtual Machine (VM): A software layer that emulates a complete, independent computing environment.

Distributed Databases

Parallel Computer: Consists of multiple processors managed by a single operating system instance. Parallel computers achieve faster processing speeds by processing multiple instructions concurrently.

Shared Memory: Processors share the same memory and storage media.
Shared Storage: Processors share storage media only. Each processor has private memory.
Shared Nothing: Processors share neither memory nor storage media.

Local Area Network: Consists of cables extending over a small area, typically within one facility. Local area networks usually use the Ethernet communication protocol.

Wide Area Network: Spans multiple facilities in different geographic locations, separated by many miles. Wide area networks may communicate via cables, satellite, or telephone lines, often using internet communication protocols.

Node: One of a group of computers connected by either a local or wide area network.

Cluster: A group of nodes connected by a local area network, managed by separate operating system instances, and coordinated by specialized cluster management software.

Parallel Database: Runs on a parallel computer or cluster.

Distributed Database: Runs on multiple computers connected by a wide area network.

Distributed Transactions: Updates data on multiple nodes of a distributed database.

Two-Phase Commit: A common technique implemented by databases for distributed transactions.

Local Transaction: Updates data on a single node of a distributed database.

Consistent Database: Conforms to all rules at all times.

Available Database: 'live' nodes must respond to queries at all times.

Network Partition: Forms when a network error prevents nodes from communicating.

Partition-Tolerant Database: Continues to function when a network partition occurs.

CAP Theorem: States that a distributed database cannot simultaneously be Consistent, Available, and Partition-tolerant. A distributed database can guarantee any two, but not all three, of these properties.

Replicated Databases

Replica: A copy of an entire database, a table, or a subset of table data.

Replicated Database: Maintains two or more replicas on separate storage devices.

Storage Arrays: Some storage devices, called storage arrays, manage replicas internally, without database intervention.

Primary/Secondary: Designates one node as primary. All updates are first applied to the primary node in local transactions. Secondary nodes are updated after the primary node commits, with independent local transactions. If the primary node fails, the database automatically designates a new primary node to ensure continued availability.

Group Replication: Technique applies updates to any node in a group. Prior to committing, a node broadcasts transaction information to other nodes, which look for conflicts with concurrent transactions. If any node detects a conflict, an algorithm determines which transaction commits and which rolls back.

Catalog: A system database that describes objects in user databases, such as tables, columns, keys, and indexes.

Central Catalog: Resides on a single node.

Replicated Catalog: Maintains a copy of the catalog on each node.

Synchronous Update: Applied to all replicas in one transaction. The transaction fails if any replica is unavailable.

Asynchronous Update: Applied to the local replica and then propagated to other replicas in separate transactions. As these transactions execute, remote replicas are momentarily out of date, which may cause a query to fail.

NoSQL Databases Vocabulary

Big Data

Transactional Data: Structured data created within an organization, with sizes ranging from gigabytes to terabytes.

Big Data: Data generated by new internet and multimedia applications. Differs from Transactional Data in volume, velocity, variety, and veracity.

Transactional Data Requirements: Stable Schema, Vertical Scaling, Stringent Transaction Requirements, Limited Replication: Partitioning, Rapid Insert, Update, Delete

Vertical Scaling: Increases capacity by increasing speed and size of CPUs and storage devices for a limited number of machines.

Horizontal Scaling: Increases capacity by adding large numbers of low-cost components like standard disk drives and CPUs.

Partitioning: Splits large tables into separate physical files on one machine.

Big Data Requirements: Flexible Schema, Horizontal Scaling, Relaxed Transaction Requirements, Extensive Replication, Sharding, Rapid Insert and Analysis

Sharding: Splits data sets across multiple machines.

Key-Value Databases

Key-Value Database: Data is represented as a single key and an associated value. The key is used to access the value.

Wide Column Databases

Wide Column Database: Also called column-based, column family, or tabular database. Data is represented as a key and multiple values. Since each record has multiple values, a descriptive name is stored with each value.

Document Databases

Document Database: Data is represented as a key and a 'document'. Usually the document is in a structured, human-readable format such as XML or JSON.

Graph Databases

Graph Database: Data is represented as a graph with nodes and edges.

Vertex: Also called a node, is a hub where network lines converge.

Edge: Also called a node, is a hub where network lines converge.

Property: Is descriptive information associated with vertices and edges.

Directed Graph: Edges have a starting and ending vertex and are depicted as arrows.

Undirected Graph: Edges have no direction and are depicted as lines.

MongoDB

BSON Document: A binary representation of JSON with additional type information. MongoDB stores documents internally as BSON documents.

insertOne(): Inserts a single document into a collection.

insertMany(): Inserts multiple documents into a collection.

_id: Is automatically created for every document and is always the first field in the document. The _id acts as a primary key.

ObjectId: A 12-byte BSON type that contains a unique value. An ObjectId is displayed as hexadecimal numbers.

find(): Returns all documents by default or documents that match an optional query parameter.

findOne(): Returns only the first document matching the query.

updateOne(): Modifies a single document in a collection.

updateMany(): Modifies multiple documents in a collection.

deleteOne(): Deletes a single document from a collection.

deleteMany(): Deletes multiple document from a collection.

Search This Blog

CS Online Learning Journal

Week 23