redshift vs teradata an in-depth comparison
Embed Size (px)
TRANSCRIPT
MergedFileEBOOK
Redshift Data Model 4
Teradata Pros and Cons 12 Pros 12 Cons 13
Features supported only by Teradata, not Redshift 15
Redshift Vs Teradata In A Nutshell 16
Pricing and Effort Comparison 20
When and How to Migrate data from Teradata to Redshift 21
Summary 22
Redshift Vs Teradata
Redshift versus Teradata has been one of the most debatable data warehouse comparisons. In this ebook, we will cover the detailed comparison between Redshift and Teradata.
Redshift Architecture & Its Features
Redshift is a fully managed petabyte scale data warehouse on the cloud. You can even start working from a few Gigabytes or Terabytes of data. Additionally, you can also scale it up to petabytes depending upon your business requirement. Redshift engine is also called a cluster and it is built up from one or more nodes. There are two types of nodes called Compute and Leader node. Compute node contains 2 or more slices depending upon node types. Leader node does multiple roles which include communicating with JDBC/ODBC client and creating the query execution plan to transfer it to compute node(s). Also, the cluster is incomplete without a Leader node.
You can check out our blog for a detailed article on Redshift Architecture.
Teradata Architecture & Its Features
3
Redshift Data Model
Redshift data model is designed for Data warehousing purposes. The unique features of Redshift make it a smart Data warehouse choice. 1. Redshift is a fully managed data warehouse. You don't have to worry about setting up and installing the database. You just have to spin up your cluster and the database is ready. 2. Redshift’s backup and restore are fully automatic. Through automatic snapshots, data in Redshift automatically gets backed up in S3 internally at regular intervals. 3. Data is fully secured by inbound security rule and SSL connection. It has VPC for VPC mode and inbound security rule for classic mode cluster. 4. Redshift stores data in the columnar format, unlike other data warehouses storage. For example, if you hit your query for a specific column, Redshift will exclusively search in that specific column instead of the entire row. This saves an enormous amount of time in query processing. 5. Data is stored in blocks of 1 MB instead of typical blocks of 8 KB or 64 KB which helps Redshift to store more data in a single block. 6. Redshift does not have the concept of indexes. Instead, it has zone maps. With the help of zone map Redshift easily identifies which block has lowest and highest value for that column. Zone maps inform the cluster about all the blocks that are needed to be read. 7. Redshift has column compression (encoding). ANALYZE COMPRESSION command automatically tells what compression strategy to apply for that table. Redshift provides various encoding techniques. Refer AWS documentation for more details on encoding.
5
8. Redshift has a feature of caching the result of repeat queries for faster performance. To check whether your query has used cache, you can see the output of column source_query available in SVL_QLOG. If your query has used cache it will store the value of query id of which was run by the specific user id. Example:
SELECT USERID, QUERY, ELAPSED, SOURCE_QUERY from SVL_QLOG WHERE
USERID in (600, 601);
In the below example, QUERY ID 853219 of USERID 601 has used the cache. (QUERY ID 123456 of USERID 600). Also, QUERY ID 853219 ran by userid → 601 has utilized the cache and elapsed time in microseconds has reduced drastically.
USERID | QUERY ID | ELAPSED | SOURCE_QUERY
--------+-------------+----------+---------------
6
Redshift Pros and Cons
Pros 1. Loading and unloading of data is exceptionally fast. You can load data in parallel mode. Redshift, even for a high volume of data, supports data loading from the zipped file. Redshift recommends loading the data from the COPY command for faster performance. 2. You can load data from NoSQL database service, AWS DynamoDB. Refer AWS documentation for more detailed information about DynamoDB. 3. You have an option to choose the node type (Dense Storage or Dense Compute) of your cluster depending upon your data needs and business requirements. 4. You can scale your cluster's storage and CPU for better performance at any instant without any impact to the cluster. 5. You can migrate your data from various data warehouses into Redshift without much hassle. AWS does provide a service for the same called Database Migration Service (DMS). Refer to AWS documentation for more detailed information. 6. You do not have to worry about the security as you can build your cluster inside a VPC and also use SSL encryption for further protection. 7. Redshift backup and restore feature is pretty simple. Through automatic snapshots, your data is automatically backed up regularly. Snapshots are incremental, so you do not have to worry about any misses. You can also copy data to another region in case of any business need. Kindly refer AWS documentation for more details on working with snapshots.
8
8. Redshift has an advanced feature called Redshift Spectrum. Using Redshift Spectrum you can query huge amounts of data directly from S3. While doing so, you can skip the loading of data through COPY command or any other method. You can refer to the detailed guide on Redshift Spectrum for more information. 9. Using Sort Keys, data can be pre-sorted based on specific columns. Also, the query performance can be improved automatically. 10. Using Distribution Keys, data can be easily distributed across nodes equally to increase the query performance. 11. Redshift provides various pre-built system tables and views to help developers and designers to help out during ETL and other processes. 12. Setup related commands can be run through various modes such as AWS console, Command Line Interface (CLI), API, etc. 13. AWS Redshift applies some patches and upgrades to the cluster automatically through maintenance window (configurable value). ence you do not have to worry about applying patches.
Cons 1. In Redshift, there is no concept of function, triggers, and procedures. 2. There is no concept of sequence column in Redshift. You need to handle it through your ETL logic in case you need to generate sequence number of your column. 3. Unlike other common data warehouses, Redshift does not enforce Primary keys or Foreign keys which can create data integrity issues.
9
4. Only S3, DynamoDB, and EMR support a parallel load in Redshift. In case you want to load data from other services you need to write ETL scripts or use ETL solutions such as Hevo. 5. It requires a good understanding of Sort and Dist key. There are some basic ground rules to set for sort and dist keys. If set improperly then it could lead to hampering of performance. 6. Distribution keys cannot be changed once it is created. You need to be extremely careful while designing your tables. Wrong distribution keys could hamper the overall performance. 7. In Redshift, there is no concept of DBLink, you cannot directly connect to another database/data warehouse tables for your queries. 8. In Redshift, VACUUM and ANALYZE are mandatory on key tables. It can hamper the performance badly if run during business hours. Hence it needs to be handled carefully. 9. In Redshift cluster, there is a limit on the number of nodes, databases, tables, etc. Maximum storage limit is still lesser than data warehouses like Teradata. Here is the node limitation list:
Node Type vCPU Storage per Node Node Range dc1.large 2 160 GB SSD 1-32
dc1.8xlarge 32 2.56 TB SSD 2-128
dc2.large 2 160 GB NVMe-SSD 1-32
dc2.8xlarge 32 2.56 TB NVMe-SSD 2-128
ds2.xlarge 4 2 TB HDD 1-32
ds2.8xlarge 36 16 TB HDD 2-128
You can refer to AWS documentation to know more about the limits in Amazon Redshift.
10
10. Although Redshift in classic mode is still in use, its cluster performance is relatively modest. 11. Redshift still supports only a single AZ environment and does not support multi-AZ environment. 12. Redshift has a limit on query concurrency of 15. You can have a maximum of 8 queues in a cluster. If your queues are unmanaged, then it hinders the performance. 13. Your design should make sure that the cluster is not in use during the maintenance window period, else job will fail. 14. There is no concept of table partitioning in Redshift. 15. In Redshift, you do not have a concept of SET and MULTISET tables (SET tables are the tables that do not allow duplicates). This needs to be handled programmatically else it could lead to reporting errors if handled inappropriately. You can refer to Hevo’s blog which talks about the Pros and Cons of Amazon Redshift in complete detail.
Teradata Pros and Cons
12
Cons 1. One of the biggest cons of Teradata is that it is not cloud-based unless scaled up to run over the cloud. It requires some initial setup or you need to integrate with other cloud service providers i.e, AWS or Azure. 2. It is not a columnar data warehouse. 3. Since Teradata is not a columnar DB, it runs entire row even if you search over a single column. You may end up with performance issues unless your data warehouse is properly designed. 4. If a query runs on a set of different columns over the bigger dataset, it could lead to performance issues; unless query has been run on the indexed columns. 5. Teradata only supports a maximum of 128 joins in a single query. If you want to perform more joins, you need to break them into chunks and handle it accordingly. 6. Redshift outperforms Teradata in Analytical performance, Visualisation on storage, & CPU utilization visualization. Everything can be viewed in a single AWS console or through the Cloudwatch monitor in Redshift. On the other hand, Teradata provides separate visual tools while for few others checks and commands need to be hit in Teradata client. 7. Teradata has no default column compression mechanism. Column compression needs to be done manually, and you can perform up to 256 unique column value compression per column. 8. There are a lot of limitations on the number of columns, table value, and table name length in Teradata. You can refer to Teradata documentation for more detailed information.
13
14
Features supported only by Teradata, not Redshift 1. Teradata supports various features including Procedures, Triggers, etc. 2. Teradata has a column sequencing feature while Redshift doesn't. 3. Teradata provides various load and unload utilities i.e. TPT, FastLoad, FastExport, Multiload, TPump, and BTEQ. You can use them depending upon data volume, business logic, and leverage it in your ETL logic. 4. Teradata has a few visual utilities which Redshift should have such as Teradata Visual Explain. In Redshift, you need to hit query to view Explain plan. 5. Teradata supports MULTISET and SET tables while Redshift doesn't. 6. Teradata supports Macros but Redshift doesn't. Macros are a set of predefined SQL statements logically stored in Database. Macros also reduce LAN traffic. Example:
);
Exec Get_Sales;
Items Redshift Teradata
Fully managed Data Warehouse over cloud.
Core Data Warehouse is not over the cloud. Initial setup is required by DBAs/Export. Teradata can be scaled to run over the cloud (AWS/Azure) with pay-as-you-go model.
Backup and restore strategy
Backups are automatically taken care of through the snapshot feature. Snapshots are stored internally stored in S3, which is highly durable.
Teradata backup and restore can be manual or automated (using BAR) but data is stored in an outside system.
Data Load and Unload
Redshift leverages data load through COPY command and unload through UNLOAD command. Using COPY command, data is loaded automatically so that all nodes can participate equally for faster performance.
In Teradata, we have separate utilities to handle load/unload. Teradata provides TPT, FastExport, FastLoad, etc. They can be leveraged accordingly for your ETL/ELT.
Table Storage
Redshift follows columnar storage format. If the query is hit based on a specific set of the columns or only on specific column then it provides an impressive performance. Hence, aggregates are very fast in Redshift as it leverages column level hit.
Internal Storage
In Redshift, data is stored over chunks of 1 MB blocks of each column. Each block follows zone mapping. Using zone mapping, blocks stores minimum and maximum value of that column.
In Teradata, the data storage is managed by AMPs under vDisks and data is distributed based on hash algorithm (i.e. based on index defined etc) and data is retrieved accordingly.
Referential Integrity Model
Redshift tables do have Primary Keys and Foreign Keys but it does not follow enforcement. You need to apply your logic such that referential integrity model is applied on Redshift tables.
Teradata tables have Primary Keys and Foreign Keys and it follows enforcement. Hence, it has an additional overhead of doing reference checks while processing.
Sequence Support
There is no concept of column sequencing. If you want to create a sequence on any column you need to handle it programmatically.
You can define Sequence on a column.
Triggers, Stored Procedures
In Redshift, there is no concept of Triggers or Stored Procedures.
You can create Triggers or Stored Procedures in Teradata.
Visual Features
Redshift is a part of AWS, an integrated service. Entire Redshift performance can be monitored through AWS console, Cloudwatch, and automatic alerts.
It has few visual tools like Teradata Visual Explain but they are cluttered.
queries.
NoSQL to Redshift Feature
Although, Redshift cannot load NoSQL data from other vendors but it can load data from DynamoDB.
No such feature supported yet.
Maximum Storage Capacity
Storage capacity of much more than 2 PB of data.
Column Compression
In Redshift, when the table is created it automatically creates default compression on all columns. It also provides a command called ANALYSE COMPRESSION to help on column compression.
In Teradata, you need to specify column compress on individual columns. You can compress up to 128 unique values per column in a table.
Maximum Columns Per Table
Maximum 1600 columns per table.
Maximum 258 columns per row.
Maximum Joins No limit as such. 64 joins per query block.
Data Warehouse Maintenance/ Updates
Redshift applies regular patches and does automatic maintenance inside maintenance window.
In Teradata, DBAs need to take care of all these activities manually or through some tool.
Table Indexes It does not have table index concept but its performance is
Teradata does provide various types of index i.e. Primary
Index, Secondary Index, etc.
Tables can be partitioned.
Fault Tolerance
Redshift is Fault Tolerant. In case, there is any node failure, Redshift will automatically replace the failed node with the replacement node. Although, multi-AZ is not supported in Redshift.
19
Pricing and Effort Comparison Redshift leads Teradata in effort and in-house pricing. Redshift is cheaper and easier than Teradata. For Redshift, you only need to turn on the cluster, set up security settings, few other options (maintenance window period, snapshot enabling option, etc), and you are ready to go. This way DBAs efforts get reduced. However, in terms of storage, Teradata has upper hand because Redshift cluster has limitations. However, in Redshift, we can still handle that through S3 as it does not have any space limitation. Remember, both Teradata and Redshift Data Warehouses are designed to solve different purposes. You can refer to Redshift and Teradata to know about pricing.
20
When and How to Migrate data from Teradata to Redshift There are various considerations that need to be made on whether to migrate from Teradata to AWS/cloud.
1) How stable is your Teradata Warehouse? 2) How much is your Teradata data volume? 3) How complex is your Teradata data model? 4) How much is your current Teradata data latency? 5) How good is your Teradata RDBMS performance? 6) How many BI tools are you using on your Teradata
tables/views/cubes? 7) Are you using plenty of unsupported features of Redshift in
Teradata? 8) Will migrating your data warehouse from Teradata to Redshift
break your system? 9) Your budget of maintaining the Redshift and other key AWS
services post-migration. If all conditions are satisfied, you easily migrate your data from Teradata to Redshift. AWS provides a useful service called Data Migration Service (DMS) and Schema Conversion Tool (SCT). Although, this pretty handy service is not fully automated as some minor manual efforts are required. Please refer to AWS documentation for migrating data from Teradata to Redshift.
21
22
from Any Source to AWS Redshift?
TRY HEVO
Redshift Data Model 4
Teradata Pros and Cons 12 Pros 12 Cons 13
Features supported only by Teradata, not Redshift 15
Redshift Vs Teradata In A Nutshell 16
Pricing and Effort Comparison 20
When and How to Migrate data from Teradata to Redshift 21
Summary 22
Redshift Vs Teradata
Redshift versus Teradata has been one of the most debatable data warehouse comparisons. In this ebook, we will cover the detailed comparison between Redshift and Teradata.
Redshift Architecture & Its Features
Redshift is a fully managed petabyte scale data warehouse on the cloud. You can even start working from a few Gigabytes or Terabytes of data. Additionally, you can also scale it up to petabytes depending upon your business requirement. Redshift engine is also called a cluster and it is built up from one or more nodes. There are two types of nodes called Compute and Leader node. Compute node contains 2 or more slices depending upon node types. Leader node does multiple roles which include communicating with JDBC/ODBC client and creating the query execution plan to transfer it to compute node(s). Also, the cluster is incomplete without a Leader node.
You can check out our blog for a detailed article on Redshift Architecture.
Teradata Architecture & Its Features
3
Redshift Data Model
Redshift data model is designed for Data warehousing purposes. The unique features of Redshift make it a smart Data warehouse choice. 1. Redshift is a fully managed data warehouse. You don't have to worry about setting up and installing the database. You just have to spin up your cluster and the database is ready. 2. Redshift’s backup and restore are fully automatic. Through automatic snapshots, data in Redshift automatically gets backed up in S3 internally at regular intervals. 3. Data is fully secured by inbound security rule and SSL connection. It has VPC for VPC mode and inbound security rule for classic mode cluster. 4. Redshift stores data in the columnar format, unlike other data warehouses storage. For example, if you hit your query for a specific column, Redshift will exclusively search in that specific column instead of the entire row. This saves an enormous amount of time in query processing. 5. Data is stored in blocks of 1 MB instead of typical blocks of 8 KB or 64 KB which helps Redshift to store more data in a single block. 6. Redshift does not have the concept of indexes. Instead, it has zone maps. With the help of zone map Redshift easily identifies which block has lowest and highest value for that column. Zone maps inform the cluster about all the blocks that are needed to be read. 7. Redshift has column compression (encoding). ANALYZE COMPRESSION command automatically tells what compression strategy to apply for that table. Redshift provides various encoding techniques. Refer AWS documentation for more details on encoding.
5
8. Redshift has a feature of caching the result of repeat queries for faster performance. To check whether your query has used cache, you can see the output of column source_query available in SVL_QLOG. If your query has used cache it will store the value of query id of which was run by the specific user id. Example:
SELECT USERID, QUERY, ELAPSED, SOURCE_QUERY from SVL_QLOG WHERE
USERID in (600, 601);
In the below example, QUERY ID 853219 of USERID 601 has used the cache. (QUERY ID 123456 of USERID 600). Also, QUERY ID 853219 ran by userid → 601 has utilized the cache and elapsed time in microseconds has reduced drastically.
USERID | QUERY ID | ELAPSED | SOURCE_QUERY
--------+-------------+----------+---------------
6
Redshift Pros and Cons
Pros 1. Loading and unloading of data is exceptionally fast. You can load data in parallel mode. Redshift, even for a high volume of data, supports data loading from the zipped file. Redshift recommends loading the data from the COPY command for faster performance. 2. You can load data from NoSQL database service, AWS DynamoDB. Refer AWS documentation for more detailed information about DynamoDB. 3. You have an option to choose the node type (Dense Storage or Dense Compute) of your cluster depending upon your data needs and business requirements. 4. You can scale your cluster's storage and CPU for better performance at any instant without any impact to the cluster. 5. You can migrate your data from various data warehouses into Redshift without much hassle. AWS does provide a service for the same called Database Migration Service (DMS). Refer to AWS documentation for more detailed information. 6. You do not have to worry about the security as you can build your cluster inside a VPC and also use SSL encryption for further protection. 7. Redshift backup and restore feature is pretty simple. Through automatic snapshots, your data is automatically backed up regularly. Snapshots are incremental, so you do not have to worry about any misses. You can also copy data to another region in case of any business need. Kindly refer AWS documentation for more details on working with snapshots.
8
8. Redshift has an advanced feature called Redshift Spectrum. Using Redshift Spectrum you can query huge amounts of data directly from S3. While doing so, you can skip the loading of data through COPY command or any other method. You can refer to the detailed guide on Redshift Spectrum for more information. 9. Using Sort Keys, data can be pre-sorted based on specific columns. Also, the query performance can be improved automatically. 10. Using Distribution Keys, data can be easily distributed across nodes equally to increase the query performance. 11. Redshift provides various pre-built system tables and views to help developers and designers to help out during ETL and other processes. 12. Setup related commands can be run through various modes such as AWS console, Command Line Interface (CLI), API, etc. 13. AWS Redshift applies some patches and upgrades to the cluster automatically through maintenance window (configurable value). ence you do not have to worry about applying patches.
Cons 1. In Redshift, there is no concept of function, triggers, and procedures. 2. There is no concept of sequence column in Redshift. You need to handle it through your ETL logic in case you need to generate sequence number of your column. 3. Unlike other common data warehouses, Redshift does not enforce Primary keys or Foreign keys which can create data integrity issues.
9
4. Only S3, DynamoDB, and EMR support a parallel load in Redshift. In case you want to load data from other services you need to write ETL scripts or use ETL solutions such as Hevo. 5. It requires a good understanding of Sort and Dist key. There are some basic ground rules to set for sort and dist keys. If set improperly then it could lead to hampering of performance. 6. Distribution keys cannot be changed once it is created. You need to be extremely careful while designing your tables. Wrong distribution keys could hamper the overall performance. 7. In Redshift, there is no concept of DBLink, you cannot directly connect to another database/data warehouse tables for your queries. 8. In Redshift, VACUUM and ANALYZE are mandatory on key tables. It can hamper the performance badly if run during business hours. Hence it needs to be handled carefully. 9. In Redshift cluster, there is a limit on the number of nodes, databases, tables, etc. Maximum storage limit is still lesser than data warehouses like Teradata. Here is the node limitation list:
Node Type vCPU Storage per Node Node Range dc1.large 2 160 GB SSD 1-32
dc1.8xlarge 32 2.56 TB SSD 2-128
dc2.large 2 160 GB NVMe-SSD 1-32
dc2.8xlarge 32 2.56 TB NVMe-SSD 2-128
ds2.xlarge 4 2 TB HDD 1-32
ds2.8xlarge 36 16 TB HDD 2-128
You can refer to AWS documentation to know more about the limits in Amazon Redshift.
10
10. Although Redshift in classic mode is still in use, its cluster performance is relatively modest. 11. Redshift still supports only a single AZ environment and does not support multi-AZ environment. 12. Redshift has a limit on query concurrency of 15. You can have a maximum of 8 queues in a cluster. If your queues are unmanaged, then it hinders the performance. 13. Your design should make sure that the cluster is not in use during the maintenance window period, else job will fail. 14. There is no concept of table partitioning in Redshift. 15. In Redshift, you do not have a concept of SET and MULTISET tables (SET tables are the tables that do not allow duplicates). This needs to be handled programmatically else it could lead to reporting errors if handled inappropriately. You can refer to Hevo’s blog which talks about the Pros and Cons of Amazon Redshift in complete detail.
Teradata Pros and Cons
12
Cons 1. One of the biggest cons of Teradata is that it is not cloud-based unless scaled up to run over the cloud. It requires some initial setup or you need to integrate with other cloud service providers i.e, AWS or Azure. 2. It is not a columnar data warehouse. 3. Since Teradata is not a columnar DB, it runs entire row even if you search over a single column. You may end up with performance issues unless your data warehouse is properly designed. 4. If a query runs on a set of different columns over the bigger dataset, it could lead to performance issues; unless query has been run on the indexed columns. 5. Teradata only supports a maximum of 128 joins in a single query. If you want to perform more joins, you need to break them into chunks and handle it accordingly. 6. Redshift outperforms Teradata in Analytical performance, Visualisation on storage, & CPU utilization visualization. Everything can be viewed in a single AWS console or through the Cloudwatch monitor in Redshift. On the other hand, Teradata provides separate visual tools while for few others checks and commands need to be hit in Teradata client. 7. Teradata has no default column compression mechanism. Column compression needs to be done manually, and you can perform up to 256 unique column value compression per column. 8. There are a lot of limitations on the number of columns, table value, and table name length in Teradata. You can refer to Teradata documentation for more detailed information.
13
14
Features supported only by Teradata, not Redshift 1. Teradata supports various features including Procedures, Triggers, etc. 2. Teradata has a column sequencing feature while Redshift doesn't. 3. Teradata provides various load and unload utilities i.e. TPT, FastLoad, FastExport, Multiload, TPump, and BTEQ. You can use them depending upon data volume, business logic, and leverage it in your ETL logic. 4. Teradata has a few visual utilities which Redshift should have such as Teradata Visual Explain. In Redshift, you need to hit query to view Explain plan. 5. Teradata supports MULTISET and SET tables while Redshift doesn't. 6. Teradata supports Macros but Redshift doesn't. Macros are a set of predefined SQL statements logically stored in Database. Macros also reduce LAN traffic. Example:
);
Exec Get_Sales;
Items Redshift Teradata
Fully managed Data Warehouse over cloud.
Core Data Warehouse is not over the cloud. Initial setup is required by DBAs/Export. Teradata can be scaled to run over the cloud (AWS/Azure) with pay-as-you-go model.
Backup and restore strategy
Backups are automatically taken care of through the snapshot feature. Snapshots are stored internally stored in S3, which is highly durable.
Teradata backup and restore can be manual or automated (using BAR) but data is stored in an outside system.
Data Load and Unload
Redshift leverages data load through COPY command and unload through UNLOAD command. Using COPY command, data is loaded automatically so that all nodes can participate equally for faster performance.
In Teradata, we have separate utilities to handle load/unload. Teradata provides TPT, FastExport, FastLoad, etc. They can be leveraged accordingly for your ETL/ELT.
Table Storage
Redshift follows columnar storage format. If the query is hit based on a specific set of the columns or only on specific column then it provides an impressive performance. Hence, aggregates are very fast in Redshift as it leverages column level hit.
Internal Storage
In Redshift, data is stored over chunks of 1 MB blocks of each column. Each block follows zone mapping. Using zone mapping, blocks stores minimum and maximum value of that column.
In Teradata, the data storage is managed by AMPs under vDisks and data is distributed based on hash algorithm (i.e. based on index defined etc) and data is retrieved accordingly.
Referential Integrity Model
Redshift tables do have Primary Keys and Foreign Keys but it does not follow enforcement. You need to apply your logic such that referential integrity model is applied on Redshift tables.
Teradata tables have Primary Keys and Foreign Keys and it follows enforcement. Hence, it has an additional overhead of doing reference checks while processing.
Sequence Support
There is no concept of column sequencing. If you want to create a sequence on any column you need to handle it programmatically.
You can define Sequence on a column.
Triggers, Stored Procedures
In Redshift, there is no concept of Triggers or Stored Procedures.
You can create Triggers or Stored Procedures in Teradata.
Visual Features
Redshift is a part of AWS, an integrated service. Entire Redshift performance can be monitored through AWS console, Cloudwatch, and automatic alerts.
It has few visual tools like Teradata Visual Explain but they are cluttered.
queries.
NoSQL to Redshift Feature
Although, Redshift cannot load NoSQL data from other vendors but it can load data from DynamoDB.
No such feature supported yet.
Maximum Storage Capacity
Storage capacity of much more than 2 PB of data.
Column Compression
In Redshift, when the table is created it automatically creates default compression on all columns. It also provides a command called ANALYSE COMPRESSION to help on column compression.
In Teradata, you need to specify column compress on individual columns. You can compress up to 128 unique values per column in a table.
Maximum Columns Per Table
Maximum 1600 columns per table.
Maximum 258 columns per row.
Maximum Joins No limit as such. 64 joins per query block.
Data Warehouse Maintenance/ Updates
Redshift applies regular patches and does automatic maintenance inside maintenance window.
In Teradata, DBAs need to take care of all these activities manually or through some tool.
Table Indexes It does not have table index concept but its performance is
Teradata does provide various types of index i.e. Primary
Index, Secondary Index, etc.
Tables can be partitioned.
Fault Tolerance
Redshift is Fault Tolerant. In case, there is any node failure, Redshift will automatically replace the failed node with the replacement node. Although, multi-AZ is not supported in Redshift.
19
Pricing and Effort Comparison Redshift leads Teradata in effort and in-house pricing. Redshift is cheaper and easier than Teradata. For Redshift, you only need to turn on the cluster, set up security settings, few other options (maintenance window period, snapshot enabling option, etc), and you are ready to go. This way DBAs efforts get reduced. However, in terms of storage, Teradata has upper hand because Redshift cluster has limitations. However, in Redshift, we can still handle that through S3 as it does not have any space limitation. Remember, both Teradata and Redshift Data Warehouses are designed to solve different purposes. You can refer to Redshift and Teradata to know about pricing.
20
When and How to Migrate data from Teradata to Redshift There are various considerations that need to be made on whether to migrate from Teradata to AWS/cloud.
1) How stable is your Teradata Warehouse? 2) How much is your Teradata data volume? 3) How complex is your Teradata data model? 4) How much is your current Teradata data latency? 5) How good is your Teradata RDBMS performance? 6) How many BI tools are you using on your Teradata
tables/views/cubes? 7) Are you using plenty of unsupported features of Redshift in
Teradata? 8) Will migrating your data warehouse from Teradata to Redshift
break your system? 9) Your budget of maintaining the Redshift and other key AWS
services post-migration. If all conditions are satisfied, you easily migrate your data from Teradata to Redshift. AWS provides a useful service called Data Migration Service (DMS) and Schema Conversion Tool (SCT). Although, this pretty handy service is not fully automated as some minor manual efforts are required. Please refer to AWS documentation for migrating data from Teradata to Redshift.
21
22
from Any Source to AWS Redshift?
TRY HEVO