

When a query is executed, the leader node breaks up the task into a number of parallel steps, executed by the Compute Nodes which actually store the data, and perform the heavy lifting. This method maximizes parallel execution and supports scalability as the system can be migrated to a larger cluster with additional nodes. When data is loaded, it’s distributed across each compute node in the cluster as a series of slices, where each slice corresponds to a CPU core, memory allocation, and disk space. The diagram below illustrates how every query is submitted to the Leader Node which is responsible for parsing the query, determining the best execution plan, and coordinating and aggregating results. System Architectureīefore diving into the detail it’s worth giving an overview of how Redshift is internally architected. While there are few options available to tune or customize the database, it’s absolutely critical to correctly design the physical table layout to maximize performance.
#Redshift distribution keys install#
Amazon Redshift is (for the most part) a Data Warehouse as a service, and there’s no need to provision hardware, install databases or patches with few options to tune the system.
