bloor sybase iq technology overview - · pdf filemultiplexed grid architecture, sybase iq...

13
Sybase IQ technology overview An InDetail paper by Bloor Research Author : Philip Howard Publish date : January 2008

Upload: habao

Post on 14-Feb-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

Sybase IQ technology overview

An InDetail paper by Bloor ResearchAuthor : Philip HowardPublish date : January 2008

Page 2: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

Sybase IQ can offer significant performance and total cost of ownership advantages over traditional products for query-intensive computing requirements. Philip Howard

Page 3: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

page 1

© 2008 Bloor Research

Sybase IQ—Technology OverviewFast facts

Sybase IQ is a column-based relational database that has been designed specifically for analytics and business intelligence applications. It can offer a number of very significant advantages within a data warehousing environment, including performance, scalability and cost of ownership benefits when compared to conventional approaches.

Although the product has broad applicability, the company targets three primary markets:

Data warehouses for aggregators of data, typically those • who offer multi-client data and analytics services.

Advanced analytics, where there is a significant • requirement to support complex and unpredictable queries, either as an EDW in its own right or as an analytics accelerator, where Sybase IQ is complementary to an existing EDW.

As a report accelerator, offloading high performance • reports from operational databases or centralised data warehouses that perform too slowly to meet business needs.

In addition, the company expects to expand into applications where there are significant amounts of unstructured elements (for example, documents and images for insurance claims) and where very large quantities of data need to kept on-line, amongst others.

The benefits associated with Sybase IQ are predicated upon its column-based approach, which differs significantly from the row-based approach that is traditional for relational databases such as Sybase ASE, Oracle, IBM DB2 or Microsoft SQL Server. For example, storage by column means that, in effect, all tables are automatically indexed but without the overhead (storage, management and tuning) that is associated with traditional approaches to indexing. Columnar storage also means that much more effective compression algorithms can be applied to the data so that storage requirements are reduced even further. Indeed, in general Sybase does not expect any data warehouse based on Sybase IQ to exceed the size of the raw data and compression rates of up to 90% are not uncommon.

As a result of these and other features, such as Sybase IQ’s multiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of magnitude when compared to row-based database solutions, while requiring fewer hardware resources. This is especially true where queries are complex or require large table scans and, in the latter case, this has the knock-on advantage that you do not have to pre-aggregate data, which represents both a performance and a management saving when compared to traditional approaches to data warehousing. The reduced size of Sybase IQ data warehouses (along with other features of the product) also means that Sybase IQ has the potential to offer significant performance advantages when scaling for large numbers of users.

In other words, Sybase IQ provides much better performance with a lower total cost of ownership. Moreover, apart from the fact that data is stored by column, in all other respects Sybase IQ acts exactly like a conventional relational database. For instance, you use standard SQL, hardware and

operating systems: database schemas are (or may be) the same, as are applications; and training requirements are similar, in that you can add a column as easily as a row, and so on.

Key findings

Bloor Research believes prospective users should be aware of the following key facts:

In addition to its column-based storage Sybase IQ delivers • a number of specialised indexes in order to further accelerate ad hoc query performance. These include indexes for low cardinality data (which further reduces storage requirements, and improves query performance, through the use of tokenisation), grouped data, range data, joined columns, textual analysis (providing analytics on unstructured data that may be combined with structured analysis), real-time comparisons for Web applications, date and time analysis.

Both multithreading and 24/7 high availability features • (including partnerships with relevant storage vendors for high availability and disaster recovery) are available with Sybase IQ. In particular, separate read and write nodes allow for procedures to be executed in parallel, without affecting one another. Separated read nodes are particularly useful for data aggregators offering multi-client analytics services because a node can be assigned to an individual account for later chargeback.

Sybase IQ offers significant performance advantages • when compared to conventional approaches. Apart from the features already mentioned, it also supports Rcube flat schemas that can provide major benefits when compared to conventional star schemas. In particular, Rcubes can significantly speed up implementation as well as improving run-time performance and providing increased flexibility. In addition, Sybase IQ allows on-the-fly changes to schema attributes (columns); that is, you can add/delete columns in a table while the Sybase IQ server is up and running.

Sybase has designed Sybase IQ to support as many • queries as possible running in parallel, rather than optimise the performance of any particular query. This is not the trade-off that it would appear, since Sybase IQ’s columnar approach provides such intrinsic performance improvements (ad hoc queries can often be hundreds of times faster) that virtually all queries will be faster when compared to traditional approaches.

Sybase provides column-based encryption capabilities as • well as database-level encryption. This is particularly important for data aggregators with multi-client services where you want to be able to encrypt different customer’s data using different algorithms. It is also worth noting that Sybase IQ has been Common Criteria Certified (ISO/IEC15408 EAL3) for its user administration facilities.

Sybase IQ provides standard ODBC/JDBC/OLE-DB • connections to its query engine, thereby enabling access from any standards-based front-end BI tool. Sybase IQ is certified to work with most industry leading tools such as Business Objects, Cognos, Microstrategy, SAS, SPSS and others.

Page 4: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

page 2

© 2008 Bloor Research

Sybase IQ also offers an ETL tool as part of its Extended • Enterprise Edition enabling developers to quickly build and deploy their data sets for analysis on Sybase IQ.

Sybase IQ can also be loaded on a continuous near-real • time basis using an infrastructure comprised of Sybase replication server, a staging database (Sybase ASE or Sybase ASA) and a set of scripts generated by Sybase Power Designer. Simultaneous loading and querying is provided via Sybase IQ’s versioning capability—a new version is created for the load process while the queries run on the older version until the new load is committed.

The bottom line

Unlike standard row-based databases that were originally designed for online transaction processing, Sybase IQ has been engineered specifically for query processing and ad hoc analysis. As a result, Sybase IQ can offer significant performance and total cost of ownership advantages over traditional products for query-intensive computing requirements.

These advantages will be most obvious in environments where the query loads are unpredictable, composed largely of ad hoc enquiries. In this scenario, traditional databases cannot be pre-tuned for unexpected queries. But the column-based approach used by Sybase IQ provides effective self-tuning capabilities. In addition, complex queries that involve multiple selection criteria across a variety of tables, and those involving large table scans, can be deployed much more efficiently within a column-based environment. Finally, Sybase IQ scales well for large data stores containing finely detailed transactions and sub-transactions, such as clickstream data. Sybase IQ does not require data to be pre-aggregated for analysis, allowing users to efficiently and quickly analyse atomic level data.

Sybase IQ’s underlying columnar architecture is more efficient; servers have to do much less work to answer any particular query when data is organised by column. Of course, Sybase has augmented this basic design advantage in other ways but this is its key differentiator, along with its reduced cost of ownership. This is achieved in two ways: first through lower absolute price, which is both a function of this improved performance, as well as the compression techniques that Sybase IQ applies against individual columns. This reduces disk requirements and, consequently, the necessary investment in hardware. Secondly, this combines with the product’s reduced administration and tuning requirements to produce significantly less management overhead when compared with that of the traditional enterprise data warehouse vendors.

The data warehouse landscape, however, is now no longer the sole domain of the traditional suppliers. Today, there is significant interest in data warehouse appliances. In one sense, the market entry of these specialists has helped Sybase IQ because it has called into question the dominance exerted by the major providers of row-based databases as organisations re-think their available options. The rise of purpose-built appliances has raised the interest in Sybase IQ for the report ‘accelerator’ market in particular. However, interest in data warehouse appliances also introduces a new class of competitors that claim similar performance improvements with an excellent ease of use profile.

Where Sybase has an advantage over these appliance vendors is that Sybase IQ offers more flexible tuning abilities. Most appliance vendors eschew the use of indexes completely or only use them in very limited circumstances. Sybase IQ provides a range of indexing options. While these increase the amount of administration required, they make the product far more adaptable to variations in data cardinality and datatypes. Further, we know of no data warehouse appliance that is currently able to support text analytics (as Sybase IQ does) and the ability to manage mixed query workloads is also typically limited when compared to what Sybase can offer. It is also worth noting that Sybase IQ can easily and linearly scale up to support a large data set and user workload. Additional disk space can be added to the shared disk pool to support growing data sets and reader node(s) may be added to the Sybase IQ multiplex grid in small increments to support an expanding user base.

To conclude, Sybase IQ is well positioned to compete with both the traditional and appliance vendors. While it has different advantages in different environments it is our view that Sybase IQ merits careful review by organisations investigating data warehousing, high-speed analytics and business intelligence options.

Sybase IQ—Technology OverviewFast facts

Page 5: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

page 3

© 2008 Bloor Research

Background information

Sybase IQ is based on technology that Sybase acquired when it purchased Expressway in 1995. Many of its key defining features were introduced in version 12.0 in 1999. With the 12.0 release, the product was marketed as Sybase IQ with multiplexing (that is, support for multiple read nodes and a parallel write capability) as an additional option. Multiplexing became a standard component with version 12.4.2 in 2000, thus emphasising the product’s ability to scale incrementally all the way up to very large data warehouses (VLDW). In this context it is worth noting Sybase’s recent implementation of a reference 1Pb VLDW in conjunction with Sun and BmmSoft. This is the largest, independently audited, warehouse that we are aware of.

Sybase now focuses on Sybase IQ as an analytic server especially for analytics services, advanced analytics and fast reporting, as discussed previously, and this approach has clearly paid off. There are now more than 1,900 Sybase IQ installations worldwide in more than 930 organisations. Further, Sybase IQ revenues have been rising steadily with growth in the first three quarters of 2007 of close to 65%, so this strategy is clearly working. It is also particularly pertinent to note that a significant percentage of the product’s sales have been to organisations that do not use Sybase ASE (the company’s flagship transactional database).

In addition to the markets identified at the top of this paper, Sybase is now actively working in a number of interesting emerging areas where the company expects Sybase IQ’s combination of features to offer significant value, including:

Capital Markets Risk Management: Sybase Risk Analytics 1. Platform is a product that leverages Sybase IQ as a VLDW engine, capable of combining massive volumes of both real-time and historical data to provide a holistic view of the market needed by traders, portfolio managers and corporate risk officers.

Compliance: with the retention and reporting of both 2. structured and unstructured data. This often means the need to keep very large and increasing volumes of data on-line (which you might also want to do for analytic purposes) for which Sybase IQ is well suited. Note that for capital markets, compliance has a wider connotation, which can be met through the Risk Analytics Platform. Sybase IQ is itself compliant to various regulations pertaining to accessibility (508 compliance—which relates to disabilities—for both the user interface and documentation) and security. A further compliance capability is that you can lock down selected IQ databases (timestamped) into read-only hardware. Old data can be changed only in a new ‘version’ (copy) of the data. Reverting to data at a given point in time is possible when using this technique.

Data/text mining: Sybase is working with partners to 3. provide solutions for the packaged data and text mining markets. In particular, you can cross-correlate relational data stored in IQ tables with non-relational data stored as a LOB. In this context, it is worth noting that Sybase IQ has extended facilities for supporting complex analytics that involve CLOBS (character large objects) as well as BLOBS (binary large objects) and XML. In particular, in the latest release, LOB load sizes have been increased to 20Mb (from 32Kb) and enhanced CLOB indexing capabilities have been introduced to support text searching.

Sybase has entered into a number of partnerships focused on Sybase IQ with vendors that include specialists in hardware, storage, data quality, business intelligence, complex event processing and other areas, as well as various VARs and system integrators. It is notable that there is also a partnership with SAP (as an accelerator for BW).

Sybase IQ Web address: www.sybase.com/bi.

Sybase IQ—Technology OverviewVendor information

Page 6: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

page 4

© 2008 Bloor Research

The current version number of Sybase IQ is version 12.7. The Extended Enterprise Edition includes the ETL tool that Sybase acquired in 2006 as a standard feature.

The main themes of the latest release are information lifecycle management and enhanced operational management, with a particular focus on more efficient concurrent inserts, enhanced LOB (large object) management and ease of use. Most of the relevant features that support these are discussed during the course of this report. However, there are two that are not. The first of these, which is a security enhancement, is the introduction of the ability to log connection details (user ids, durations and so forth) directly to a file and, secondly, there is a new table/index usage tracking facility that allows you to monitor these details based upon the use of timestamps.

The product runs under Windows NT/2003/XP, Linux (Red Hat 5.0 and SuSE 10 are both supported in the latest release on both Intel and IBM POWER platforms), and the leading UNIX operating systems from HP, IBM and Sun Microsystems. Windows Vista is supported as a client.

Product overview

The Sybase approach to data warehousing is fundamentally different to that of other relational database vendors. Sybase has concluded that conventional relational approaches (that is, row-based architectures) to ROLAP are inefficient for all data warehouse needs except for routine reporting applications on relatively small data volumes. These traditional systems can only provide adequate performance (if that) at the cost of a significant, and otherwise unnecessary, investment in additional hardware, software, resources, money and time. Sybase therefore has developed what might best be described as an inverted relational database. That is, it uses a conventional relational structure and a similarly familiar terminology, but is column-based rather than row-oriented. In our view, this offers significant advantages when compared to conventional approaches.

Architecture

Unlike ordinary relational databases that store data in tables by row, Sybase IQ stores and accesses data in tables by column. While this would obviously be inappropriate for a transactional environment, in which a transaction is effectively equivalent to a row, it is entirely sensible within a query-processing environment, since queries are generally selected on the basis of defining columns.

A major advantage of this column-based approach is that, in effect, the entire database is automatically indexed, because selection criteria in a query are defined by column. In fact, it is somewhat more complicated than this since there a number of ways in which Sybase IQ supports these indexes as columns, which are discussed below.

By using columns, Sybase IQ is much more efficient than traditional approaches when it comes to data compression. This is because, needless to say, all the data fields in the same column have the same data type. So, each column can be compressed for optimal efficiency and retrieval. By contrast, when data is stored by row, different fields will consist of the distinct data types that are best suited for transaction processing. In such an environment, it is generally impracticable to keep changing to the optimal

compression algorithm, which means that any compression offered will tend to be of the lowest common denominator variety. That said, merchant database vendors have been introducing advanced compression facilities into their products recently, which means that Sybase’s advantage in this area, while it should still be significant, is less now than it was formerly.

Another major advantage of a column-based approach is simply the amount of data that needs to be read. Whenever you access data from a conventional database, you read each row in its entirety, regardless of the actual fields that you are interested in. In practice, this might mean reading a 3000 byte record to retrieve just 20 characters of data. But by reading data on a columnar basis, you only have to read what you want to know. Of course the difference in performance when you are reading a single record will be negligible, but many queries require full table scans. Multiply that single read by a few million rows per table and the performance difference is very significant.

Sybase argues that the columnar nature of Sybase IQ results in performance so much better than normal ROLAP approaches that it does not need to support hardware parallelism in the same way as do its major competitors. In particular, the company points out the problems that are associated with the data partitioning needed to support hardware parallelism. While it is certainly true that partitioning, no matter how implemented, can create problems (not the least of which is additional maintenance), it nevertheless opens the way to substantial performance improvements and Sybase does support partitioning within Sybase ASE. However, Sybase would further argue that partitioning is simply a compensating mechanism for the poor performance inherent in a row-based approach when it comes to analytics and fast reporting.

While there is a lot of truth in Sybase’s arguments, this does not mean that Sybase eschews all forms of data partitioning. However, rather than implementing horizontal partitioning, it instead implements vertical partitioning: partitioning by column rather than by row. One of the advantages of this approach is that partitions can never become unbalanced, since there will always be the same number of fields in each column of a table. This significantly reduces the maintenance requirement of managing partitions and should eliminate the database reorganisation that may become necessary when partitions become unbalanced and start to impair performance. That said, there are advantages to certain types of partitioning, notably range partitioning (where you can partition by geography, say, or by time period), that Sybase cannot currently offer. The company plans to extend into this area with its next major release.

Finally, it should be noted that Sybase does not eschew the use of OLAP. For users who want to query data at an aggregated level in a relatively pre-determined fashion, OLAP has significant advantages. For this reason, Sybase supports OLAP capabilities with features such as rankings, partition windows, percentiles and averaging.

Sybase IQ—Technology OverviewProduct availability

Page 7: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

page 5

© 2008 Bloor Research

Supported schemas

Sybase IQ supports conventional relational schemas, including the normalized schemas used for transaction processing as well as the star, snowflake and constellation (a collection of stars) schemas that are used in data warehousing. However, its column-based approach also opens the possibility of using Rcubes. These support what is known as a flat schema, an example of which is illustrated in Figure 1 as a comparison to the use of star schemas.

As can be seen, it is much simpler to use Rcubes than to employ multiple star schemas. In this particular example, there are eight fewer tables (and, therefore, eight fewer ETL operations to be completed) and there are 25 fewer joins.

However, perhaps the biggest impact of using Rcubes is that they are simply much easier to understand and, most particularly, that they offer much more efficient navigation for cross-functional purposes than constellation schemas. As a consequence it is much easier to get an implementation up and running when using Rcubes.

Other advantages of Rcubes include:

It is much easier to load data from operational systems • because Rcubes are simple and require much less ETL code than traditional schemas.

The complexity of fact and dimension tables is • significantly reduced (see Figure 1). As long as all the columns in a table share the same keys, Sybase IQ will manage the facts and dimension data in one table, without data explosion.

Figure 1: Star schemas and Rcubes

Rcubes leverage data at the transaction level and perform • aggregations ‘on the fly’. This makes Rcubes intrinsically more flexible and users should not have the problems that arise when pre-aggregated data does not meet the needs of an unexpected query.

Rcubes work well with data mining tools because these • tools expect data to be presented as flat files with large numbers of columns, which is exactly what Rcubes represent.

Rcubes deliver very fast query performance and greatly • simplify queries by significantly reducing the number of joins required.

Because it is easy to add columns to a table with Sybase • IQ, Rcubes are very forgiving if there are changes in the business environment or if new data sources become available.

Because Rcubes are easy to implement, fast to load and • extremely fast to access, they lend themselves to real-time enterprise and closed-loop applications.

Indexes

Although every column is, in effect, its own index, there are substantial advantages to using specific indexes in a number of situations. This is one area where Sybase has a major advantage over appliance vendors. Indeed, the secret of Sybase IQ is its indexing capabilities. As Sybase customers discover new needs for analysis, Sybase can simply create new index types to meet those needs. The beauty of this approach is that new indexes can be added to the data warehouse with little, if any, impact on the data warehouse architecture or the analytical applications using the warehouse. In the realm of real-time enterprise and closed-loop applications, Sybase sees its approach to indexes as the key for even greater query performance against multi-terabyte and (in the future) petabyte data warehouses. Today, Sybase IQ uses seven indexing techniques:

Low Fast Indexes: these are low cardinality indexes that • use a process known as tokenisation. Using this process, the data is converted into a token and then the tokens are stored rather than the data. This is particularly useful for reducing the quantity of redundant data. For example, a supplier with a large customer base throughout the UK would have to store the customer’s address. This would mean a very large duplication of county names. So, rather than having hundreds of instances of “Banffshire,” for example, the supplier might replace each county with a numerical value. So, as Banffshire is the fifth county in the UK alphabetically (after Aberdeen, Armagh, Avon and Ayrshire), it might therefore be assigned the value 5. Where a column consists of a numeric value anyway, that value can itself be used as the basis for the tokenisation. Once the tokens are established (which will be an automated process), a bitmapped index is created to reference these tokens.

Tokenisation typically will apply when there are a limited number of possible data values. This is why Sybase refers to these as low-cardinality indexes since they are typically only used for fields that have less than 1,500 unique values.

Bit-Wise Indexes: for high cardinality fields, where the • number of possible values exceeds 1,500 (for example, monetary values), Sybase IQ uses a patented technology known as Bit-Wise indexing. This is particularly useful where you want to combine calculations with range searches, for example to find the total revenue and number of units sold where the price was less than £50.

High Group Indexes: these are, in fact, B-trees. However, • the principle here is that the user only defines these indexes when several columns are likely to be used in a group, in particular to combine low and high-cardinality searches. An example here might be an inquiry about product item sales and value (high cardinality) by store (low cardinality).

Sybase IQ—Technology OverviewProduct availability

Page 8: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

page 6

© 2008 Bloor Research

Fast Projection Indexes: this indexing type (which is the • default) is simply the column store itself. If a user always plans to retrieve an entire column of data, then the fact that storage is columnar means the column can be projected into a report or inquiry without having to explicitly define any index at all. This is useful, for example, in “where” clauses.

Word Index: this is a text index capability. It is based on • key words or phrase string searches. This type of indexing has not, historically, been associated with data warehousing. However, there are a number of significant markets where it is important to be able to combine quantitative and qualitative information. For example, within healthcare, medical notes are usually exactly just that: notes. To extract information about, for example, morbidity rates it may be necessary to have access to this unstructured data. Insurance is another such sector. This index may be also used against documents or large text-based objects.

Compare Indexes: this indexing technique allows data • column comparisons that are effectively equivalent to an “if … then … else” statement. For example, “if expenses are greater than revenue, then …”. This type of index is particularly useful for real-time comparisons in web applications.

Join Indexes: as the name implies, these are designed to • obviate the need for table joins. Like a number of the supported indexes, these will be most useful when query requirements can be predicted in advance.

Time Analytic Indexes: these offer the option to create • indexes based on a date, time, or date and time. It should be noted that time-based queries tend to be particularly difficult for conventional relational databases to handle.

A number of extended facilities are supported to allow the use of these indexes in a variety of circumstances. These include index compression to reduce disk (or memory: bitmaps may be cached) requirements, the ability to use different types of index in combination, and the facility to filter the bit arrays using Boolean operators such as AND and OR. These features mean that the indexing in Sybase IQ overcomes a number of the traditional drawbacks of bitmapping, namely, that it is not suitable for joining tables or aggregating data. It is also noteworthy that Sybase IQ includes an Index Advisor that will advise administrators as to when it would be useful to add a new index and of what type. In the latest release the Optimiser can present query plans in XML format.

Database operations

Sybase IQ includes a SQL API that allows SQL-based access. This is SQL-99 compliant and is the same SQL that is used in Sybase Adaptive Server Anywhere and (with a few exceptions) is also compatible with the syntax employed in Sybase ASE (that is, T-SQL) so that Sybase IQ can natively use most ASE stored procedures. In this context it is also worth noting that both Sybase IQ and Sybase ASE have the same look and feel and that Sybase IQ includes support for the ASE bulk copy facility. Within the product, Sybase IQ includes a graphical SQL Editor.

Sybase IQ also supports both ODBC and JDBC (2.0) call-level interfaces. Alternatively, Sybase IQ also provides Java 2 capability and this language can be used for writing stored procedures and for creating user-defined functions. However, Java objects are not supported in the database.

There is also extensive support for XML, including the ability to store XML documents (thanks to an XML datatype) and query them, as well as the ability to export query results in XML format (with an embedded DTD). There is also support for SQL/XML, which is a draft standard describing how SQL can be used in conjunction with XML; Sybase IQ supports a number of functions from this standard, which can be used as an alternative to the FOR XML clause that Sybase IQ has added as a SELECT statement option in this release.

In conjunction with the above, it is important to appreciate the web services functionality that is available in Sybase IQ. There is an HTTP(S) web server built directly into the database, which supports the retrieval of data in XML format as well as standards such as SOAP. There is also direct integration with Microsoft Visual Studio .NET via an ADO.NET provider.

Historically, Sybase IQ has leveraged what has been known as EnterpriseConnect to provide interoperability between Sybase products and other popular database products, ranging from replication (actually via the Sybase Replication Server) to the ability to support combined queries that cross Sybase IQ and external databases, and to populate Sybase IQ databases in the first place. However, EnterpriseConnect has been replaced by the Sybase Data Integration Suite, which not only replaces but also extends the facilities currently offered. This is discussed later.

Sybase IQ—Technology OverviewProduct availability

Page 9: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

page 7

© 2008 Bloor Research

Multiplex

The Multiplex component of Sybase IQ adds the ability to support multiple SMP machine nodes within a single Sybase IQ environment. Within each of these nodes the product employs lightweight multiple operating system threads that underlie each process. This multithreading significantly reduces processing and memory overheads.

One node must be designated to own, manage and update the database, while all the other nodes have read-only access to the database. Since there is only one write instance, there is never any need to lock records, so there is no contention between the read-only instances. That said, Sybase plans to enhance the multiplex architecture to provide a more robust read/write grid architecture in the future. As a first step in this process the company has introduced concurrent write queuing with this release.

A further point to note about this architecture is the advantage that it offers to data aggregators and resellers, because it means that each subscriber can have its own reader, separate from anyone else, which obviously has beneficial security (as well as chargeback) implications. The fact that you can encrypt data on a column-by-column basis (and you can similarly choose not to encrypt) further reinforces this message. While on this topic it is also worth noting that you can also support different service levels for different users.

An example of a Multiplex environment is illustrated in Figure 2. Here, five nodes (there could be any number and you can add them as required) are connected to a single Sybase IQ physical database (again, there could be many of these, including mirrored options for 24x7 operations) by means of a fibre channel (though we would like to see twin fibre channels for resiliency purposes).

Figure 2: Example of a Multiplex environment

Should any node fail, including the updating node, you can switch users or responsibilities to another node. There are also hot standby capabilities, failover and extended versioning capabilities, and load-balancing capabilities across nodes. These capabilities are not automated but are under the DBA’s control, which allows the DBA to define dynamic resource allocation based upon business needs. In addition, there is an OpenSwitch load balancing application available, if required, that operates at the application server level. It is also worth noting the company’s partnerships with a number of storage hardware vendors to further ensure high availability and disaster recovery.

IQReader

IQReader

IQReader

IQWriter Node

IQReader

CPU

CPU

CPU

CPUCPU

CPUCPUCPUCPU

CPU

CPU

CPUCPU

CPU

CPU

CPUCPU

Mem

Mem

Mem

MemMemMem

Mem

Mem

Mem

Mem

Mem

Mem

Mem

Fiber Channel SAN

There is also a NonStopIQ HA-DR methodology, which substitutes the disk shown in Figure 2 with two further SANs (typically, one local and one remote) with either synchronous or asynchronous communications between them. The big advantage of this is not just that it provides disaster recovery but also that it eliminates the need to take the system down even for planned outages. Note that as more and more companies adopt operational BI and embed query capability into operational applications then the warehouse increasingly becomes as mission critical as those applications, for which you need a solution such as this.

It is worth noting here that Sybase’s strategy in terms of parallelism is at odds with most of its competitors, and arguably determines the circumstances in which Sybase IQ will be most suitable. Most suppliers have focused their efforts, with respect to parallelism, primarily on improving the performance of individual queries. The basis for this is that if each individual query performs better, then this will also reap benefits in terms of the number of concurrent queries that can be supported. However, this is not necessarily valid. It is relatively easy, for example, to see how a parallel database might use data partitioning to optimise a particular query but, at the same time, cause deterioration in the performance of a second query.

Sybase’s stance, on the other hand, is that its Sybase IQ product is intrinsically designed for individual query optimisation. Thus it leaves its parallel facilities, as instanced by the Multiplex component, to focus on supporting multiple queries rather than enhancing individual query performance.

Sybase IQ supports in-flight maintenance operations (including column addition on the fly). During database maintenance, a query user does not see any updates that take place during that session (because of the separation between read and write nodes) but only when the user re-connects to the database in a subsequent session.

There are various ways to keep a Sybase IQ warehouse up-to-date. You can either load in batch mode (though batches may be time-delimited so that you can define a batch as consisting of, say, 2 minutes worth of transactional data). Alternatively, you can use Sybase’s own ETL capabilities (or those of a third party) to load the data initially and then update it by means of the synchronisation capabilities provided by Sybase Replication Server; or you can use third party change data capture facilities.

Sybase IQ—Technology OverviewProduct availability

Page 10: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

page 8

© 2008 Bloor Research

We used to believe that Sybase IQ faced two hurdles: its column-based technology (which was not well understood by the market) and disbelief that it could genuinely offer the performance advantages it claims. However, we do not think that either of these are obstacles anymore. In the first case, Sybase has proved the column-based concept with repeated success in many leading companies. Secondly, both Sybase, as well as the appliances vendors, have made it quite clear that better analytics performance is truly possible, whetting the appetites of companies who still struggle with traditional relational databases in query-intensive applications.

From Bloor Research’s viewpoint the column-based approach advocated by Sybase will provide substantially better performance at lower cost than traditional approaches for analytical, reporting and data warehousing environments. Furthermore, it provides considerably more flexibility than can be provided by appliance vendors. We therefore see no obstacles to its continued success. Indeed, we would like to see the product targeted more widely. We are pleased to see that the company is starting to address new markets but its stance remains one of targeting specific sub-sectors of the market rather than advocating its use as a general-purpose product for the entire data warehousing sector. We think it is capable of that and we would like to hear Sybase saying it.

Sybase IQ—Technology OverviewSummary

Page 11: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

Bloor Research overview

Bloor Research has spent the last decade developing what is recognised as Europe’s leading independent IT research organisation. With its core research activities underpinning a range of services, from research and consulting to events and publishing, Bloor Research is committed to turning knowledge into client value across all of its products and engagements. Our objectives are:

Save clients’ time by providing comparison • and analysis that is clear and succinct.

Update clients’ expertise, enabling • them to have a clear understanding of IT issues and facts and validate existing technology strategies.

Bring an independent perspective, • minimising the inherent risks of product selection and decision-making.

Communicate our visionary • perspective of the future of IT.

Founded in 1989, Bloor Research is one of the world’s leading IT research, analysis and consultancy organisations—distributing research and analysis to IT user and vendor organisations throughout the world via online subscriptions, tailored research services and consultancy projects.

About the author

Philip HowardResearch Director - Data

Philip started in the computer industry way back in 1973 and has variously worked as a systems analyst, programmer and salesperson, as well as in marketing and product management, for a variety of companies including GEC Marconi, GPT, Philips Data Systems, Raytheon and NCR.

After a quarter of a century of not being his own boss Philip set up what is now P3ST (Wordsmiths) Ltd in 1992 and his first client was Bloor Research (then ButlerBloor), with Philip working for the company as an associate analyst. His relationship with Bloor Research has continued since that time and he is now Research Director. His practice area encompasses anything to do with data and content and he has five further analysts working with him in this area. While maintaining an overview of the whole space Philip himself specialises in databases, data management, data integration, data quality, data federation, master data management, data governance and data warehousing. He also has an interest in event stream/complex event processing.

In addition to the numerous reports Philip has written on behalf of Bloor Research, Philip also contributes regularly to www.IT-Director.com and www.IT-Analysis.com and was previously the editor of both “Application Development News” and “Operating System News” on behalf of Cambridge Market Intelligence (CMI). He has also contributed to various magazines and published a number of reports published by companies such as CMI and The Financial Times.

Away from work, Philip’s primary leisure activities are canal boats, skiing, playing Bridge (at which he is a Life Master) and walking the dog.

Page 12: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

Copyright & disclaimer

This document is copyright © 2007 Bloor Research. No part of this publication may be reproduced by any method whatsoever without the prior consent of Bloor Research.

Due to the nature of this material, numerous hardware and software products have been mentioned by name. In the majority, if not all, of the cases, these product names are claimed as trademarks by the companies that manufacture the products. It is not Bloor Research’s intent to claim these names or trademarks as our own. Likewise, company logos, graphics or screen shots have been reproduced with the consent of the owner and are subject to that owner’s copyright.

Whilst every care has been taken in the preparation of this document to ensure that the information is correct, the publishers cannot accept responsibility for any errors or omissions.

Page 13: Bloor Sybase IQ Technology Overview - · PDF filemultiplexed grid architecture, Sybase IQ installations will normally improve query performance by orders of ... increase the amount

Suite 4, Town Hall, 86 Watling Street East

TOWCESTER, Northamptonshire,

NN12 6BS, United Kingdom

Tel: +44 (0)870 345 9911 Fax: +44 (0)870 345 9922

Web: www.bloor-research.com email: [email protected]