netezza sql toolkit

86
Netezza Corporation Corporate Headquarters 26 Forest St., Marlborough, Massachusetts 01752 tel 508.382.8200 fax 508.382.8300 www.netezza.com NPS SQL Extensions Toolkit User’s Guide Document Number: D20484 Rev. 1 Software Release: 4.5.2 Revised: January 30, 2009

Upload: jose-andres-leon

Post on 02-Jan-2016

655 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s GuideDocument Number: D20484 Rev. 1Software Release: 4.5.2Revised: January 30, 2009

Netezza Corporation

Corporate Headquarters26 Forest St., Marlborough, Massachusetts 01752tel 508.382.8200 fax 508.382.8300 www.netezza.com

Page 2: Netezza SQL Toolkit

The specifications and information regarding the products described in this manual are subject to change without notice. All statements, information, and recommendations in this manual are believed to be accurate.

Netezza makes no representations or warranties of any kind, express or implied, including, without limitation, those of merchantability, fitness for a partic-ular purpose, and noninfringement, regarding this manual or the products' use or performance. In no event will Netezza be liable for indirect, incidental, consequential, special, or economic damages (including lost business profits, business interruption, loss or damage of data, and the like) arising out of the use or inability to use this manual or the products, regardless of the form of action, whether in contract, tort (including negligence), breach of warranty, or otherwise, even if Netezza has been advised of the possibility of such damages.

Copyright © 2005-2009 Intelligent Integration Systems, Inc.

Portions of this publication were derived from PostgreSQL documentation. For those portions of the documentation that were derived originally from Postgr-eSQL documentation, and only for those portions, the following applies:

PostgreSQL is copyright © 1996-2001 by the PostgreSQL global development group and is distributed under the terms of the license of the university of california below.

Postgres95 is copyright © 1994-5 by the Regents of the University of California.

Permission to use, copy, modify, and distribute this documentation for any purpose, without fee, and without a written agreement is hereby granted, pro-vided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.

In no event shall the University of California be liable to any party for direct, indirect, special, incidental, or consequential damages, including lost profits, arising out of the use of this documentation, even if the University of California has been advised of the possibility of such damage.

The University of California specifically disclaims any warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The documentation provided hereunder is on an "as-is" basis, and the University of California has no obligations to provide maintenance, support, updates, enhancements, or modifications.

Netezza, the Netezza logo, NPS, Snippet, Snippet Processing Unit, SPU, Snippet Processing Array, SPA, Performance Server, Netezza Performance Server, Asymmetric Massively Parallel Processing, AMPP, Intelligent Query Streaming, SQL-Blast and other marks are trademarks or registered trademarks of Netezza Corporation in the United States and/or other countries. All rights reserved.

The Netezza implementation of the ODBC driver is an adaptation of an open source driver, Copyright © 2000, 2001, Great Bridge LLC. The source code for this driver and the object code of any Netezza software that links with it are available upon request to [email protected].

Red Hat is a trademark or registered trademark of Red Hat, Inc. in the United States and/or other countries.

Linux is a trademark or registered trademark of Linus Torvalds in the United States and/or other countries.

D-CC, D-C++, Diab+, FastJ, pSOS+, SingleStep, Tornado, VxWorks, Wind River, and the Wind River logo are trademarks, registered trademarks, or service marks of Wind River Systems, Inc. Tornado patent pending.

APC and the APC logo are trademarks or registered trademarks of American Power Conversion Corporation.

All document files and software of the above named third-party suppliers are provided "as is" and may contain deficiencies. Netezza and its suppliers dis-claim all warranties of any kind, express or implied, including, without limitation, those of merchantability, fitness for a particular purpose, and noninfringement.

In no event will Netezza or its suppliers be liable for indirect, incidental, consequential, special, or economic damages (including lost business profits, busi-ness interruption, loss or damage of data, and the like), or the use or inability to use the above-named third-party products, even if Netezza or its suppliers have been advised of the possibility of such damages.

All other trademarks mentioned in this document are the property of their respective owners.

Document Number: 20484

Software Release Number: 4.5.2

NPS SQL Extensions Toolkit User’s Guide

Copyright © 2009 Netezza Corporation.

All rights reserved.

Regulatory Notices

Install the NPS 8000 Series in a restricted-access location. Ensure that only those trained to operate or service the equipment have physical access to it.

Install each AC power outlet near the NPS rack that plugs into it, and keep it freely accessible.

You must provide all disconnect devices and over-current protection devices.

Product may be powered by redundant power sources. Disconnect ALL power sources before servicing.

FCC Statement

This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment gen-erates, uses, and can radiate radio-frequency energy and, if not installed and used in accordance with the instruction manual, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case users will be required to correct the interference at their own expense.

CSA Statement

This Class A digital apparatus meets all requirements of the Canadian Interference-Causing Equipment Regulations (ICES-003).

Cet appareil numérique de la classe A est conforme à la norme NMB-003 du Canada.

CE Statement (Europe)

This product complies with the European Low Voltage Directive 73/23/EEC and EMC Directive 89/336/EEC as amended by European Directive 93/68/EEC/.

Warning: This is a class A product. In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures.

Page 3: Netezza SQL Toolkit

Contents

Preface

1 Installation and SetupLicensing Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

NPS Administration Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

NPS System Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Installing the Netezza SQL Extensions Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

Enabling SQL Functions Support in a Database . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

User Account Permissions and Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4

Displaying the SQL Extensions Toolkit Version . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4

Upgrading the SQL Extensions Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4

Disabling the SQL Extensions Toolkit in a Database . . . . . . . . . . . . . . . . . . . . . . 1-4

Removing the SQL Extensions Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5

Using Different Versions of the SQL Extensions Toolkit . . . . . . . . . . . . . . . . . . . . 1-5

Best Practices for Upgrading NPS Systems with the SQL Extensions Toolkit . . . . . 1-5

Best Practices for Backups and Restores of the NPS Data . . . . . . . . . . . . . . . . . . 1-6

Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6

2 XML DataUser Type XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

Referencing Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

Getting Started: Publishing SQL Data as XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

Using XPath Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7

XML Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8

IsValidXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8

IsXML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8

XMLAGG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9

XMLAttributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10

XMLConcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10

XMLElement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11

XMLExistsNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11

XMLExtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12

XMLExtractValue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12

XMLParse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13

iii

Page 4: Netezza SQL Toolkit

XMLRoot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14

XMLSerialize. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14

XMLUpdate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15

3 Data TransformationData Transformation Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

compress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

decompress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2

encrypt/decrypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2

uuencode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3

uudecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4

4 HashingHash Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

hash. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

hash4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3

hash8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3

5 Date and Time ComparisonsDate and Time Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

days_between . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2

hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2

hours_between . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2

minute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3

minutes_between . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3

month . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3

next_month. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4

next_quarter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4

next_year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4

second . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5

seconds_between . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5

this_month . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5

this_quarter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6

this_week . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6

this_year. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6

weeks_between . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7

year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7

iv

Page 5: Netezza SQL Toolkit

6 Text AnalyticsWord Comparison Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

word_diff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

word_find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2

word_key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3

word_key_tochar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4

word_keys_diff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5

word_stem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

Regular Expression Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

The Flags Argument. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

regexp_extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7

regexp_extract_all . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7

regexp_extract_all_sp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8

regexp_extract_sp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8

regexp_instr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9

regexp_like . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10

regexp_match_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10

regexp_replace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11

regexp_replace_sp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11

7 Text UtilityText Utility Function Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

hextoraw. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

rawtohex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

replace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2

strleft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2

strright . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3

8 ArrayArray Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1

add_element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1

array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2

array_combine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3

array_concat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3

array_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3

array_split. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4

array_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4

v

Page 6: Netezza SQL Toolkit

delete_element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5

element_name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5

get_value_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5

replace_element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6

9 CollectionUser Type Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

Collection Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

element_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2

10 MiscellaneousMiscellaneous Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1

greatest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1

least. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2

mt_random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2

corr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3

covar_pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3

covar_samp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3

Index

vi

Page 7: Netezza SQL Toolkit

List of Tables

Table 1-1: Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6

Table 3-1: Uuencoding, Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4

Table 3-2: Uuencoding, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4

Table 4-1: Algorithms Supported for Cryptographic Hashing . . . . . . . . . . . . . . . 4-2

Table 6-1: Algorithms Supported for Phonetic Encoding . . . . . . . . . . . . . . . . . . 6-4

Table 6-2: Flags used in Regular Expressions Functions . . . . . . . . . . . . . . . . . . 6-6

Table 8-1: Array Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2

vii

Page 8: Netezza SQL Toolkit

viii

Page 9: Netezza SQL Toolkit

PrefaceThis document describes the SQL Extension Toolkit for the Netezza platform. The Netezza SQL Extensions Toolkit was developed by NDN innovator, Intelligent Integration Systems, Inc.

Audience

This guide is intended for users who require the additional capabilities provided by the SQL Extension functions, which enable users to manipulate SQL data in more sophisticated ways. Users should be familiar with the basic operation and concepts of the NPS system. Users should also be familiar with C style function declarations, as the API defined in this document uses C style declarations rather than SQL style declarations.

About This Guide

The guide contains the following chapters.

Topics See

System prerequisites, installation, version information, upgrading, disabling, and removing the toolkit, using different toolkit versions, backups, and restores.

“Installation and Setup” on page 1-1

Importing and storing XML data in a SQL database, manipulating XML within the database, and publishing both XML and conventional SQL data in XML form.

“XML Data” on page 2-1

Transforming data by compressing, encrypting, or uuencoding, and restoring to the original form using decompress, decrypt, and uudecode.

“Data Transformation” on page 3-1

Using hash functions for cryptography, checksums, and lookups.

“Hashing” on page 4-1

Using date and time functions to compare values of type date or of type timestamp.

“Date and Time Comparisons” on page 5-1

Performing “fuzzy” comparisons (approxi-mately matching a search key) and using regular expressions to match precise pat-terns of characters.

“Text Analytics” on page 6-1

Converting between ASCII hexadecimal and ASCII, substituting strings, and extracting strings.

“Text Utility” on page 7-1

ix

Page 10: Netezza SQL Toolkit

Symbols and Conventions

This guide uses the following typographical conventions:

Numbered steps for procedures

Bulleted lists for topics

Italics for terms, and user-defined variables such as file names

bold for command line input and system output examples

If You Need Help

If you are having trouble using the Netezza Performance Server, you should:

1. Retry the action, carefully following the instructions given for that task in the documentation.

2. Go to the Netezza Support Web page at https://support.netezza.com. Select “Login to Customer Support Center” and enter your support username and password. Click the Knowledge tab to search the knowledgebase solutions, or click the Service Desk tab to submit a support request.

3. If you are unable to access the Support Web site, you can also contact Netezza Support at the following telephone numbers:

North American Toll-Free: +1.877.810.4441

United Kingdom Free-Phone: +0.800.032.8382

International Direct: +1.508.620.2281

For a description of the Netezza Support plans, refer to http://www.netezza.com/support/offerings.cfm. Refer to your Netezza maintenance agree-ment for details about your support plan choices and coverage.

Creating, combining and splitting arrays, and retrieving, deleting, replacing and counting array elements.

“Array” on page 8-1

Grouping heterogeneous pieces of data, i.e. data of different types.

“Collection” on page 9-1

Determining the greatest/least value, corre-lation coefficient, covariance, and generating random numbers.

“Miscellaneous” on page 10-1

Topics See

x

Page 11: Netezza SQL Toolkit

Netezza Welcomes Your Comments

Let us know what you like or dislike about our manuals. To help us with future versions of our manuals, we want to know about any corrections or clarifications that you would find useful.

Include the following information:

The name and version of the manual that you are using

Any comments that you have about the manual

Your name, address, and phone number

Send us an e-mail message at the following address: [email protected]

The doc alias is reserved exclusively for reporting errors and omissions in our documentation.

We appreciate your suggestions.

xi

Page 12: Netezza SQL Toolkit

xii

Page 13: Netezza SQL Toolkit

C H A P T E R 1

Installation and Setup

What’s in this chapterLicensing Information

NPS Administration Information

Known Issues

The Netezza SQL Extensions Toolkit is an optional package for Netezza Performance Server (NPS) systems. This toolkit was developed by NDN innovator, Intelligent Integration Systems, Inc.

This chapter provides information on installing and configuring the Netezza SQL Exten-sions Toolkit on an NPS system, as well as special information for managing backups and upgrades.

Licensing Information

Netezza customers can obtain the toolkit from the Netezza FTP server in the Releases area. The software kit is contained in two files, libnetcrypto-version.tar.gz and libnetxml-ver-sion.tar.gz, where version indicates the currently released version of the software kit. The software kit contains a readme file, libraries, the object files for the functions, and scripts which ease the process of defining and using the toolkit functions in an NPS database, as well as disabling and removing the functions.

NPS Administration Information

This section describes the system prerequisites and administration information for the Netezza SQL Extensions Toolkit.

NPS System PrerequisitesThe Netezza SQL Extensions Toolkit is designed for use on NPS systems that run the NPS Release 4.5.2 and above.

1-1

Page 14: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

Installing the Netezza SQL Extensions ToolkitTo install the Netezza SQL Extensions Toolkit, do the following:

1. Log in to the NPS system as the root user.

2. Copy the sqlext.package.tar.z file to a directory on the NPS system such as /home/nz or another location. (You obtain the package from the Netezza FTP site.)

3. Untar the package using the following command:

tar -xzvpf sqlext.package.tar.z

The command extracts two files, libnetcrypto-version.tar.gz and libnetxml-ver-sion.tar.gz.

4. Extract the software files and compiled objects in the libnetcrypto-version.tar.gz file:

tar -xzf libnetcrypto-version.tar.gz

The tar command uncompresses and untars the contents to a directory named libnetcrypto/version in the current directory, where version is the version number of the SQL Extensions Toolkit.

5. Extract the software files and compiled objects in the libnetxml-version.tar.gz file:

tar -xzf libnetxml-version.tar.gz

The tar command uncompresses and untars the contents to a directory named libnetxml/version in the current directory, where version is the version number of the SQL Extensions Toolkit.

Enabling SQL Functions Support in a DatabaseAfter you untar the SQL Extensions Toolkit files, you can enable SQL Extensions query sup-port by registering the SQL Extensions functions and API.

To enable SQL Extensions queries, do the following:

1. Log in to the NPS system as the nz user account.

2. Change to the directory where the first part of the SQL Extensions library files resides, where dir is the directory in which you untarred the files:

cd <dir>/libnetcrypto/version

3. Run the following command and specify the database name where you want to define the SQL Extensions functions and the NPS user account and password who will own the functions:

./install -d <dbname> -u <username> -W <password>

The command could take up to one minute to run. Upon completion, the command displays the message Successfully Installed Crypto Library to <dbname>.

Note: If your database name uses spaces or mixed-case letters such as “myDatabase”, make sure that you specify double-quotation marks around the database name and escape the quotes. For example:

./install -d \"myDatabase\" -u user -W password

1-2 D20484 Rev.1

Page 15: Netezza SQL Toolkit

NPS Administration Information

4. Change to the directory where the second part of the SQL Extensions library files resides, where dir is the directory in which you untarred the files:

cd <dir>/libnetxml/version

5. Run the following command and specify the database name where you want to define the SQL Extensions functions and the NPS user account and password who will own the functions:

./install -d <dbname> -u <username> -W <password>

The command could take up to one minute to run. Upon completion, the command displays the message Successfully Installed XML Library to <dbname>.

These commands define the SQL Extensions Functions and register them in the specified database. The NPS user account you specify becomes the owner of the functions. After this procedure, NPS administrators can manage the SQL Extensions functions as objects in the NPS database, and users who have permission to use the SQL Extensions functions can include them in queries.

Figure 1-1 shows a sample NzAdmin window for an NPS system that has the SQL Exten-sions Toolkit.

Figure 1-1: NzAdmin Interface with the Netezza SQL Extensions Toolkit Functions

D20484 Rev.1 1-3

Page 16: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

User Account Permissions and RequirementsTo run a SQL Extensions query, NPS user accounts must have the execute permission for function and aggregate objects, as well as for the toolkit functions and aggregates that are added to the system.

Users who need to modify the functions (such as to replace the object files with new object files) must also have create and alter permission for the function and aggregate objects.

Displaying the SQL Extensions Toolkit VersionTo display the version of the XML functions available in the SQL Extensions toolkit, use the following SQL command:

SELECT regexp_Version();

Sample output follows:

REGEXP_VERSION------------------------------------------------IISI XML/Regular Expression Library Version 1.2 Build ()(1 row)

To display the version of the rest of the functions available in the SQL Extensions toolkit, use the following SQL command:

SELECT CRYPTO_VERSION();

Sample output follows:

CRYPTO_VERSION------------------------------------------------IISI CRYPTO Library Version 1.2 Build ()(1 row)

Upgrading the SQL Extensions ToolkitUpdate kits or upgrades of the SQL Extensions toolkit may be made available with fixes or enhancements to the functionality. When such kits become available, they will contain instructions for updating or upgrading to the latest software API.

Disabling the SQL Extensions Toolkit in a DatabaseYou can disable the SQL Extensions functions either temporarily (during testing or trouble-shooting) or permanently (such as prior to removing the package).

To disable support for SQL Extensions queries in a particular database, follow these steps:

1. Log in to the NPS system as the nz user account.

2. Change to the installation location of the XML functions in the toolkit, for example:

cd <install-dir>/libnetxml/version

3. Run the following command and specify the database name, NPS user name, and password for your system:

./install -R -d <dbname> -u <username> -W <password>

The command displays the message Successfully Uninstalled XML Library from <dbname> when it completes.

1-4 D20484 Rev.1

Page 17: Netezza SQL Toolkit

NPS Administration Information

4. Change to the installation location of the rest of the functions in the toolkit, for example:

cd <install-dir>/libnetcrypto/version

5. Run the following command and specify the database name, NPS user name, and password for your system:

./install -R -d <dbname> -u <username> -W <password>

The command displays the message Successfully Uninstalled Crypto Library from <dbname> when it completes.

6. Repeat Steps 2-5 for each database in which you want to disable the SQL Extensions query support.

This install command uses the DROP FUNCTION|AGGREGATE commands to drop the SQL Extensions functions that were added by the install script.

Removing the SQL Extensions Toolkit To remove or uninstall the SQL Extensions toolkit from an NPS system, first follow the pro-cedure in the previous section to disable SQL Extensions support in each database where it is currently enabled. After you disable support for the SQL Extensions functions, you can remove all of the files in the libnetcrypto/version directory and the libnetxml/version directory.

Using Different Versions of the SQL Extensions ToolkitSince you install the toolkit to a specific database on the NPS system, it is possible to unpack a new or different version of the kit, install it in a different database, and thus use different versions of the API simultaneously on the NPS system. However, this is not a rec-ommended practice for long-term use.

If you install a newer version of the toolkit to a different database, such as a test database for testing and comparison purposes, you should eventually update your production data-bases with the latest toolkit.

Best Practices for Upgrading NPS Systems with the SQL Extensions ToolkitAfter you install the Netezza SQL Extensions Toolkit, take special precautions before you patch or upgrade the NPS software on your system. While most patch and service pack updates should not affect the operation of the toolkit functions, it is possible that an upgrade could stop the functions from working. For example, an upgrade from one major release to another could require you to obtain a new toolkit installation package with new function object files.

Before you upgrade the NPS software on your system, make sure that you consult with Netezza Support to ensure that the planned upgrade will not affect your toolkit functions. The NPS Release Notes or the service pack readme file identifies any known situations where an update or upgrade can impact the functions.

D20484 Rev.1 1-5

Page 18: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

Best Practices for Backups and Restores of the NPS DataAs a best practice, keep a backup copy of the toolkit installation files in a safe location out-side of the NPS system. Make sure that you have recent backups of your NPS systems in the event that you need to recover from an accidental change to your data, or to restore NPS services as part of a disaster recovery situation.

There are no special requirements or procedures needed to back up the SQL Extensions functions. After you register the toolkit functions on an NPS system, they and their associ-ated object source files are backed up during the normal Netezza nzbackup operations. If you alter a function or an aggregate (perhaps as a result of a new object file with fixes), the next incremental backup also captures the new object files.

For a schema-only restore, you can use the nzrestore -allincs argument, which restores the object files from all available backup increments so that any referenced functions will be created and executable following the restore. If you attempt a -schema-only restore on an increment which does not have function object files (because they have not been altered during this time), the restore process creates zero-length placeholder object files for those functions and logs the signatures of the incomplete functions in the restoresvr log file. The resulting functions are defined in the database, but they cannot be executed because their object files have not been restored. You must use CREATE OR REPLACE commands to update the functions or aggregates with their necessary object files.

Known Issues

This release of the Netezza SQL Extensions Toolkit has the following known issues:

Table 1-1: Known Issues

Reference Issue Description

44849 XMLAgg() can only aggregate VARCHAR columns, not CHAR columns. For example, if emp.name is defined as CHAR(12), the following SELECT will return an error:

SELECT XMLElement ('emp', XMLAgg (XMLElement ('name', name))) from emp;

ERROR: 0 : XML: Corrupted XML Block

The workaround is to use rtrim() on the CHAR column, for example:

SELECT XMLElement ('emp', XMLAgg (XMLElement ('name', rtrim (name)))) from emp;

44894 Only arrays of type varchar support replacing elements by name. For example, given an array of integers, attempting to replace the array element named ‘one’ with the integer 22 returns an error:

SELECT replace_element(myarray,'one',22);

ERROR: 16 : Expected string argument

The workaround is to replace the element by index instead. For example:

SELECT replace_element(myarray,1,22);

44384 Arrays of type timetz are not supported.

1-6 D20484 Rev.1

Page 19: Netezza SQL Toolkit

C H A P T E R 2

XML Data

What’s in this chapterUser Type XML

Referencing Columns

Getting Started: Publishing SQL Data as XML

Using XPath Expressions

XML Function Reference

“One of the most intriguing and urgent requirements to arise from the appearance of XML is a well-defined relationship between XML and SQL. Vast quantities of business data are currently stored in SQL database systems and great demand exists for the ability to present that data in XML form to various client applications.” (Special Interest Group on Manage-ment of Data, ACM)

The XML functions provided by Netezza as extensions to the SQL language are modeled after the SQL/XML specification contained in SQL-2003. The SQL/XML specification defines ways of importing and storing XML data in a SQL database, manipulating it within the database, and publishing both XML and conventional SQL data in XML form.

Publishing conventional SQL data in XML form enables you to transform the flat (non-hier-archical) result sets of SQL queries into hierarchically structured XML data; one important use of this transformation is to make this data available via web services. The functions used to publish SQL data in XML format are XMLRoot, XMLElement, XMLConcat, XMLAgg, and XMLAttributes.

Data that is already stored in the database as XML can be queried, manipulated, and updated using functions such as XMLExistsNode, XMLExtract, XMLExtractValue, and XMLUpdate. Because XML data consists of a tree of nodes, these functions rely on W3C XPath expressions to locate individual XML nodes within the tree.

Note: Certain features of the SQL 2003 SQL/XML specification, including the ability to pass column names into functions and the ability to construct sets, are not supported by Netezza user-defined functions (UDFs).

For more information on industry standards for SQL extensions, refer to ISO/IEC 9075-14.

2-1

Page 20: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

User Type XML

The XML functions in the Netezza SQL Extensions Toolkit rely on the XML data type as defined in the SQL 2003 SQL/XML specification. Because the Netezza database currently does not support user-defined types, the XML type is stored in a varchar field. The maxi-mum size of a varchar field is 64000 bytes.

The XML type is a compiled representation of an XML file, usable wherever a SQL data type is allowed. The semantics of operations on values of XML type assumes a tree-based inter-nal representation. An XML value is either the null value, or a collection of nodes that consists of exactly one XML root node and every node that can be reached recursively by traversing the properties of the nodes.

Referencing Columns

The SQL/XML specification supports the ability to pass column names directly into func-tions. Netezza user-defined functions (UDFs) do not support this ability. Therefore, element names must be explicitly specified as additional parameters, as in the following example:

SELECT XMLElement('Employee', XMLAttributes('EID', a.id), a.name) from employees a;

Getting Started: Publishing SQL Data as XML

This section explains how to use the XMLElement, XMLConcat, XMLAgg, and XMLAt-tributes functions within a SQL expression to transform the results of a database query into XML. These are often referred to as “publishing” functions because the goal is to convert data stored in a relational database into XML that can be made available to other applica-tions, for example web services. The main function in this regard is XMLElement, which takes two arguments, the name of the XML element to create and the content of that ele-ment. The following select statement (which does not actually query a database) highlights the use of XMLElement:

select XMLElement('Parent', 'Parent Text');

This creates the following XML:

<Parent>Parent Text</Parent>

It is very important to note that the output from the XMLElement function is a value of type XML, which is the Netezza compiled representation of the XML element. So if you typed the preceding select statement, the return would be the type name XML:

XMLELEMENT

-----------

XML

(1 row)

In order to see the actual XML element created by the XMLElement call (<Parent>Parent Text</Parent>), you need to wrap the XMLElement call with XMLSerialize. For example:

select XMLSerialize(XMLElement('Parent', 'Parent Text'));

2-2 D20484 Rev.1

Page 21: Netezza SQL Toolkit

Getting Started: Publishing SQL Data as XML

The real power of XMLElement is that the function calls can be nested to produce the hier-archical structure required for XML data. For example:

select

XMLElement('Parent',

XMLElement('Child', 'Child text'));

This query produces the following XML:

<Parent>

<Child>Child text</Child>

</Parent>

The publishing functions can be nested as required, up to a limit of 10,000 nested calls. For example:

select

XMLElement('Parent',

XMLElement('Child',

XMLElement('GrandChild', 'Grandchild text')));

This query produces the following XML:

<Parent>

<Child>

<GrandChild>Grandchild text</GrandChild>

</Child>

</Parent>

As a more realistic example, suppose there is a DEPARTMENTS table that contains three columns: DEPTNO, DEPTNAME, and DEPTLOC:

DEPTNO DEPTNAME DEPTLOC

------ ---------- ---------

10 MARKETING BOSTON

20 HR BOSTON

30 SALES NEW YORK

40 ENGINEERING NEW YORK

A plain SQL query to list all departments would look like the following:

select * from departments;

But suppose you needed to return all four rows of department data as XML, with one <Dept> node for each department, and each <Dept> node containing three child nodes, <Number>, <Name>, and <Location>, as shown in the following XML document:

<Departments>

<Dept>

<Number>10</Number>

<Name>MARKETING</Name>

<Location>BOSTON</Location>

</Dept>

D20484 Rev.1 2-3

Page 22: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

<Dept>

<Number>20</Number>

<Name>HR</Name>

<Location>BOSTON</Location>

</Dept>

<Dept>

<Number>30</Number>

<Name>SALES</Name>

<Location>NEW YORK</Location>

</Dept>

<Dept>

<Number>40</Number>

<Name>ENGINEERING</Name>

<Location>NEW YORK</Location>

</Dept>

</Departments>

To create this XML document, you would use a SELECT statement modeled after the following:

SELECT

XMLElement('Departments', XMLAGG(

XMLElement('Dept', XMLConcat(

XMLElement('Number', d.deptno),

XMLElement('Name', d.deptname),

XMLElement('Location', d.deptloc)))))

from departments d;

In each of the first two XMLElement calls, the content of the element is created by a nested XML function call. To create a hierarchically structured XML document of parent and child nodes, you nest the XMLElement calls within a SQL statement.

So the first XMLElement function in the query creates the top-level <DEPARTMENTS> node:

XMLElement('Departments', XMLAgg (

The XMLAgg call is used for the second argument, indicating that the content for the top-level <DEPARTMENTS> node is a group of aggregated nodes, which means these nodes will be child nodes of a single parent node.

The second XMLElement call Establishes <DEPT> as the name of each child node of the <Departments> parent node, and then relies on the next three embedded XMLElement calls for the contents of each <DEPT> child node

XMLElement('Dept', XMLConcat(

XMLElement('Number', d.deptno),

XMLElement('Name', d.deptname),

XMLElement('Location', d.deptloc)))))

2-4 D20484 Rev.1

Page 23: Netezza SQL Toolkit

Getting Started: Publishing SQL Data as XML

These three embedded XMLElement calls create as many <DEPT> child nodes as neces-sary to wrap the rows of data returned from the Departments table. It is very important to understand the use of the XMLAGG function. This function aggregates child nodes under their parent node, which in the preceding example means that there is a single parent <DEPARTMENTS> node that contains all four <DEPT> nodes; without the XMLAGG call, the XML produced would contain four <DEPARTMENTS> nodes, each of which contained a single <DEPT> node, which would result in an invalid XML document, as shown here:

<Departments>

<Dept>

<Number>10</Number>

<Name>MARKETING</Name>

<Location>BOSTON</Location>

</Dept>

</Departments>

<Departments>

<Dept>

<Number>20</Number>

<Name>HR</Name>

<Location>BOSTON</Location>

</Dept>

</Departments>

<Departments>

<Dept>

<Number>30</Number>

<Name>SALES</Name>

<Location>NEW YORK</Location>

</Dept>

</Departments>

<Departments>

<Dept>

<Number>40</Number>

<Name>ENGINEERING</Name>

<Location>NEW YORK</Location>

</Dept>

</Departments>

This is not valid XML syntax because there are four instances of the <DEPARTMENTS> document element. This demonstrates how important it is to use the IsValidXML function to ensure that the XML you create with the function library can be parsed as XML. Further-more, if you are using schemas, then you are also responsible for returning well-formed XML (XML that conforms to the structure specified by the schema).

As another example, suppose you want to return a list of employees by department, tagged as follows:

D20484 Rev.1 2-5

Page 24: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

<EmployeesByDepartment>

<Dept DeptNo=“10“>

<Name>ACCOUNTING</Name>

<Location>NEW YORK</Location>

<Employees>

<Employee EmpNo=“7782“>

<Name>CLARK</Name>

<Job>MANAGER</Job>

<Manager>7839</Manager>

<Salary>2450</Salary>

</Employee>

<Employee EmpNo=“7839“>

<Name>KING</Name>

<Job>PRESIDENT</Job>

<Salary>5000</Salary>

</Employee>

...

</Employees>

</Dept>

...

<EmployeesByDepartment>

To return employees by department, two select statements are required: first create an employee grouping and then group the employees by department:

CREATE temp table emp_grouping AS

SELECT deptno, XMLElement ('Employees', XMLAGG (

XMLElement ('Employee', XMLAttributes ('EmpNo', empno),

XMLConcat (

xmlelement ('name', name)

xmlelement ('job', job)

xmlelement ('manager', mgr)

xmlelement ('salary', sal)

xmlelement ('comm', comm)))))

AS xml FROM emp INNER JOIN dept

ON emp.deptno = dept.deptno

GROUP BY deptno;

SELECT XMLElement('EmployeesByDepartment', XMLAGG(

XMLElement('Dept', XMLAttributes('DeptNo', deptno), XMLConcat(

XMLElement('Name', D.DNAME),

XMLElement('Location', D.LOC),

emp_grouping.xml))))

FROM dept INNER JOIN emp_grouping

ON dept.deptno = emp_grouping.deptno;

2-6 D20484 Rev.1

Page 25: Netezza SQL Toolkit

Using XPath Expressions

Using XPath Expressions

XML documents are organized as a tree, consisting of a root node and descendent child nodes. The function library relies on XPath arguments to navigate within this tree and locate individual XML nodes. The result of an XPath expression can be either a node or a set of element, text, or attribute nodes. For example, the XPath expression /ABC/DEF selects all DEF child nodes under the ABC root node of the XML document. The following table gives an overview of the most common features of XPath syntax.

XPath Syntax Usage

/ The initial forward slash in an XPath expression specifies the root of the tree. Specify an absolute path with an initial slash. For example, /ABC spec-ifies the root node’s child element named ABC. If the initial slash is omitted, the path is relative and the context of the relative path defaults to the root node. Subsequent forward slashes within an XPath expression are used as path separators to identify the child nodes of any given node. For example, /ABC/DEF specifies the DEF element, which is a child of the ABC element, which is a child of the root element.

// Two forward slashes specify all descendants of the current node. For exam-ple, ABC//DEF matches any DEF element under the ABC element.

* The asterisk is the wildcard character and specifies a match on any child node. For example, /ABC/*/DEF matches any DEF element that is a grand-child of the ABC element.

[] Specifies predicate expressions, such as the binary operators OR, AND, and NOT. For example, /RESIDENTS [AGE=65 and NAME="Jane Doe"]/ADDRESS selects out the address element of all residents whose age is 65 and whose name is Jane Doe. [ ] is also used to denote an index into a list. For example, /POSTOFFICE/BOX[10] identifies the second box number ele-ment under the POSTOFFICE root element.

nodename Selects all child nodes of the named node. For example

• bookstore selects all the child nodes of the bookstore element.

• /bookstore selects the root element bookstore. If the path starts with a slash ( / ) it always represents an absolute path to an element.

• bookstore/book selects all child book elements of bookstore.

• book selects all book elements in the document.

• bookstore//book selects all book elements that are descends of bookstore, no matter where they are under the bookstore element.

. Selects the current node.

.. Selects the parent of the current node.

@ Selects attributes. For example, //@lang selects all attributes that are named lang.

function-name

XPath supports a set of built-in functions such as substring(), round(), and not(). In addition, user-defined functions can be made available using namespaces.

D20484 Rev.1 2-7

Page 26: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

XML Function Reference

This section lists the available XML functions alphabetically.

IsValidXMLDetermines whether or not a character string can be parsed as XML.

DescriptionThe IsValidXML function has the following syntax:

boolean = IsValidXML(varchar input);

The input value specifies the character string to analyze.

ReturnsThe function returns true if the character string input can be parsed as XML; otherwise, the function returns false. For example:

select IsValidXML('<tag1>12</tag1>');

select ISValidXML('<tag1><tag2>');

This first example returns true; the second example returns false.

IsXMLDetermines whether the input argument is a compiled Netezza XML document; in other words, whether the input argument is of type XML.

DescriptionThe IsXML function has the following syntax:

bool = IsXML(XML input);

The input value specifies the XML object to analyze.

ReturnsThe function returns true if the input varchar is a compiled Netezza XML document. Other-wise it returns false. It is important to explicitly check whether the XML you produce by embedding SQLX functions within your SQL is valid XML, since the underlying SQLX engine does not perform any error checking or validation. Note that if you are using sche-mas, then you are also responsible for returning well-formed XML (meaning that it conforms to the structure specified by the schema).

For example:

select IsXML(XMLParse('<tag1>12345</tag1>'));

This example returns true.

2-8 D20484 Rev.1

Page 27: Netezza SQL Toolkit

XML Function Reference

XMLAGGThis publishing function aggregates the set of XML inputs into a single XML object.

DescriptionThe XMLAGG function has the following syntax:

XML = XMLAGG(Set(XML) inputs);

The inputs value specifies the set of XML inputs to aggregate into a single XML object.

ReturnsThe function returns a compiled representation (type XML) of a single XML object which has been aggregated from a set of XML inputs. For example:

SELECT

XMLElement('Departments', XMLAGG(

XMLElement('Dept', XMLConcat(

XMLElement('Number', d.deptno),

XMLElement('Name', d.deptname),

XMLElement('Location', d.deptloc)))))

from

departments d;

Assuming that the query returns three rows of data, a possible return value might look like this:

<Departments>

<Dept>

<Number>10</Number>

<Name>MARKETING</Name>

<Location>BOSTON</Location>

</Dept>

<Dept>

<Number>20</Number>

<Name>HR</Name>

<Location>BOSTON</Location>

</Dept>

<Dept>

<Number>30</Number>

<Name>SALES</Name>

<Location>NEW YORK</Location>

</Dept>

</Departments>

D20484 Rev.1 2-9

Page 28: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

XMLAttributesThis publishing function constructs an XML Attribute object. This object is not a valid XML object; rather, it must be assigned as an attribute value of an XMLElement.

DescriptionThe XMLAttributes function has the following syntax:

XML_Attrib = XMLAttributes(varchar name, varchar value);

The name value specifies the name of the XML attribute to construct.

The value value specifies the value of the XML attribute to construct.

ReturnsThe function returns an XML Attribute object. The following example produces an Emp ele-ment for each employee, with an ID and name attribute:

SELECT XMLELEMENT ( 'Emp',

XMLATTRIBUTES (e.id,e.fname ||' ' || e.fname AS "name")) AS "result"

FROM employees e

WHERE employee_id > 200;

This query produces an XML result fragment. For example:

<Emp ID="1001" name="John Smith"/>

<Emp ID="1206" name="Jane Doe"/>

XMLConcatThis publishing function concatenates two XML objects (either two elements or two attributes) to produce a single XML object.

DescriptionThe XMLExtract function has two forms, one for concatenating elements and another for concatenating attributes:

XML = XMLConcat(XML inputa, XML inputb);

XML_Atrrib = XMLConcat(XML_Attrib inputa, XML_Attrib inputb);

The inputa value specifies the first XML object to concatenate.

The inputb value specifies the second XML object to concatenate.

ReturnsThe function returns a compiled representation (type XML) of the concatenated XML input objects as a single XML object. If either of the input XML objects is null, the function returns null. For an example of the use of XMLConcat, see the example for XMLAgg.

2-10 D20484 Rev.1

Page 29: Netezza SQL Toolkit

XML Function Reference

XMLElementThis publishing function constructs an XML Element. The XMLElement function is typically nested to produce a hierarchically structured XML document.

DescriptionThe XMLElement function has the following syntax:

XML = XMLElement(varchar name, [XML_Attrib attrib,] varchar value);

The name value specifies the name of the enclosing tag for the XML element. If the identi-fier specified is NULL, then no element is returned. Note that the name cannot be a column name or column reference, a difference from the SQL/XML specification.

One or more optional attrib values specify one or more name-value pairs that create attributes for the XML element.

The input value specifies the content of the newly constructed XML element.This can be either a scalar value or a nested XMLElement call.

ReturnsThe function returns a compiled representation (type XML) of an XML element with the specified name, content, and optionally a collection of attributes. It does not create prolog information. For example:

select XMLElement('Parent', XMLElement('Child', 'Child text'));

This example returns:

<Parent><Child>Child text</Child></Parent>

XMLExistsNodeDetermines whether using an XPath to traverse the XML input document results in at least a single XML element or text node.

DescriptionThe XMLExistsNode function has the following syntax:

bool = XMLExistsNode(XML input, varchar XPath);

The input value specifies a compiled representation of an XML file. Values can be any built-in SQL type.

The XPath value specifies the XPath of the XML node to extract.

ReturnsReturns true if the XPath leads to an XML element or text node in the XML input object. Otherwise returns false. For example:

SELECT person

FROM MAILINGLIST

WHERE existsNode(person,'/MailingList[Occupation=“Doctor“]') = 1;

This example returns rows from MAILINGLIST only if nodes exist that satisfy the condition.

Note: When using the XMLExistsNode() function in a query, it must always be specified in the WHERE clause, not in the SELECT list.

D20484 Rev.1 2-11

Page 30: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

XMLExtractFinds the XML node(s) specified by the XPath expression. The extracted nodes can be ele-ments, attributes, or text nodes. XMLExtract can be used to extract:

Numerical values on which function-based indexes can be created to speed up processing.

Collection expressions for use in the FROM clause of SQL statements.

XML fragments to be combined into a single XML document.

DescriptionThe XMLExtract function has the following syntax:

XML = XMLExtract(XML input, varchar XPath);

The input value specifies the XML file from which to extract the node.

The XPath value specifies an XPath query which specifies an XML node within the XML file.

ReturnsIf more than one item is found by this function, only the first will be returned. If no item is found, null is returned. The following example uses XMLExtract to query the value of the Reference column for orders with SpecialInstructions set to Rush:

SELECT XMLExtract(object_value,'/PurchaseOrder/Reference') "REFERENCE"

FROM PURCHASEORDER

WHERE XMLExistsNode(object_value,'/PurchaseOrder[SpecialInstructions=“Rush“]') = 1;

An example of a possible return value is as follows:

<Reference>JSMITH-20021009123336271PDT</Reference>

<Reference>ABELL-20021009123336321PDT</Reference>

<Reference>JDOE-20021009123337303PDT</Reference>

<Reference>GWASHINGTON-20021009123337123PDT</Reference>

XMLExtractValueExtract the actual (scalar) value from the XML input object specified by the XPath parame-ter. The result of the XPath query must be a single node and either an element, a text node, or an attribute.

If a specific datatype is desired, XMLExtractValue can be wrapped with a conversion func-tion, for example a function that converts the varchar to a date.

DescriptionThe XMLExtractValue function has the following syntax:

varchar = XMLExtractValue(XML input, varchar XPath);

The input value specifies an XML file.

The XPath value specifies the XPath query.

2-12 D20484 Rev.1

Page 31: Netezza SQL Toolkit

XML Function Reference

ReturnsIf the result is an element then it must have a single text node as its child; the child node provides the text content for the scalar return value. If the node does not exist, this func-tion returns null. If more than one node is returned by the XPath expression or if the expression points to an element node with anything other than a single text child node, this function returns an error.

For example, the following query extracts the scalar value of the Reference column:

SELECT XMLExtractValue(object_value,'/PurchaseOrder/Reference') "REFERENCE"

FROM PURCHASEORDER

WHERE XMLExistsNode(object_value,'/PurchaseOrder[SpecialInstructions=“Rush“]') = 1;

An example of a possible return value is shown below. Note the difference from the return value for the similar example for XMLExtract. In that example, each line of data is wrapped with a <Reference> element. Here, just the scalar value is extracted and returned:

JSMITH-20021009123336271PDT

ABELL-20021009123336321PDT

JDOE-20021009123337303PDT

GWASHINGTON-20021009123337123PDT

XMLParseConverts a value of type varchar to a value of type XML, which is the Netezza compiled rep-resentation of an XML object (stripping white space by default). The inverse function is XMLSerialize.

Note: XMLParse is not intended for parsing and loading external data into XML columns. Though it is possible to call XMLParse as a part of an external table load, the resulting XML datatype is stored as a VARCHAR which has a maximum size of 64000 bytes.

DescriptionThe XMLParse function has the following syntax:

XML = XMLParse(varchar input)

The input value specifies a varchar representation of an XML input object.

ReturnsThe function returns the Netezza compiled representation of an XML object. If the input varchar resolves to null, the function returns null. For example:

select XMLParse('<Parent>Parent Text</Parent>');

This example returns a value of type XML which is the compiled representation of the XML object <Parent>Parent Text</Parent>.

D20484 Rev.1 2-13

Page 32: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

XMLRootThis publishing function creates a new XML value by providing the version and standalone properties in the XML root information (prolog) of the specified value of type XML. This cre-ates the root node if it does not already exist. Typically, this is done to ensure data-model compliance.

DescriptionThe XMLRoot function has the following syntax:

XML = XMLRoot(XML input, float version, bool standalone);

The input value specifies the XML object to update.

The version value specifies the version property of the input XML object.

The standalone value specifies the standalone property of the input XML object.

ReturnsThe function returns the updated object. If a prolog already exists, an error is returned. For example:

INSERT INTO employees ( id, xvalue)

VALUES (1001,

XMLROOT (XMLPARSE ('<Emp> John Smith </Emp>'), '1.0', true)

XMLSerializeConverts a value of type XML to a value of type varchar. The inverse function is XMLParse.

DescriptionThe XMLSerialize function has the following syntax:

varchar = XMLSerialize(XML input);

The input value specifies a value of type XML, which is the Netezza compiled representa-tion of an XML file. Values can be any built-in SQL type.

ReturnsThe function returns the varchar representation of the input XML object. For example:

select XMLSerialize(XMLElement('Parent', 'Parent Text'));

This example returns:

<Parent>Parent Text</Parent>

Without the XMLSerialize call, the XMLElement call returns the type name “XML”:

XMLELEMENT

-----------

XML

(1 row)

2-14 D20484 Rev.1

Page 33: Netezza SQL Toolkit

XML Function Reference

XMLUpdateUpdates the portion of an XML document (elements, attributes, or nodes) identified by XPath with a new value. The datatypes of the XPath target and the new value must match.

XMLUpdate cannot be directly used to insert a new node or delete an existing node, ele-ment, or attribute. Instead, you need to update the containing parent element with the new value.

DescriptionThe XMLUpdate function has two forms, one to update the XML document with a scalar (varchar) value and another to update the XML document with an XML document:

XML = XMLUpdate(XML input, varchar XPath, varchar value);

XML = XMLUpdate(XML input, varchar XPath, XML value);

The input value specifies an XML document that contains the fragment to be updated.

The Xpath value specifies the XPath expression used to locate the XML fragment to update. If Xpath is an XML element, then the corresponding value must be type XML. If Xpath is an attribute or text node, then the value can be any scalar datatype.

The value value specifies the new value to assign the XML fragment.

ReturnsThe function returns an XML document that contains an updated fragment. For example:

update sales_tab

set order = XMLUpdate(order,

'/order/company/name',

XMLParse('<Name>Netezza</Name>'))

where sales_person = “John Smith“

This example updates the company name in order XML documents to “Netezza”, where the salesperson is “John Smith”.

D20484 Rev.1 2-15

Page 34: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

2-16 D20484 Rev.1

Page 35: Netezza SQL Toolkit

C H A P T E R 3

Data Transformation

What’s in this chapterData Transformation Function Reference

The functions in this chapter transform data into a different representation, for the purposes of security, space savings, or transmission time savings. The functions in many cases rely on industry-standard algorithms, as noted in the function descriptions. For more information on these algorithms, refer to the publicly available documentation.

Note: Compressed and encrypted data exists in a binary format that is not readable. To dis-play this data, it must first be decompressed/decrypted to avoid output alignment problems. If table columns contain compressed or encrypted data, selects on that table need to use the decompress/decrypt functions to process the binary data in those columns properly.

Data Transformation Function Reference

Because compress/decompress, encrypt/decrypt, and uuencode/uudecode are inverse func-tions, they are listed together, rather than strictly alphabetically, for ease of comparison.

compressCompresses a varchar using the public source zlib software library. The zlib library uses the DEFLATE compression algorithm, a variation of LZ77 (Lempel-Ziv 1977).

Compression is the process of encoding data so that it uses fewer bits. For example, com-pression replaces instances of contiguous, repeated characters with a single character and a count. Compressed data must be decompressed before it can be used.

DescriptionThe compress function has the following syntax:

varchar = compress(varchar input[, int level]);

The input value specifies the varchar to be compressed.

The level value specifies the compression level used. It can be between 0 and 9 with 0 indicating the least compression and 9 indicating the most compression. The default is 6. Increasing the compression level increases the processing time.

3-1

Page 36: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

ReturnsThe function returns the compressed varchar. For example:

select decompress (compress('1234567890'));

This example returns:

1234567890

decompressDecompresses a previously compressed varchar.

DescriptionThe decompress function has the following syntax:

varchar = decompress(varchar input);

The input value specifies the compressed varchar to be decompressed.

ReturnsThe function returns the decompressed varchar. For example:

select decompress (compress('1234567890'));

This example returns:

1234567890

encrypt/decryptEncrypts or decrypts the input varchar using the supplied key.

Encryption is the process of transforming data in order to maintain its secrecy; the data can be read (unencrypted) only if the recipient has the required key. The Netezza implementa-tion uses symmetric encryption, also known as private or secret key encryption, because the same secret key is used to encrypt and to decrypt data. This means that this secret key must be made available on any server that is decrypting previously encrypted data. You can choose which symmetric encryption algorithm the function uses to encrypt/decrypt the data, either AES (Advanced Encryption Standard) or RC4.

Private key encryption is more secure than public key encryption because all public key encryption schemes are susceptible to brute force key search attacks. But private key encryption depends on maintaining the secrecy of the key, so you should periodically change the private key and take steps to ensure that it cannot be discovered in use, in stor-age, or in distribution (see the description of the key argument below for Netezza specific security recommendations).

Note: This is field level encryption, not database encryption.

DescriptionThe encrypt function has the following syntax:

varchar = encrypt(varchar text, varchar key [, int algorithm]);

The decrypt function has the following syntax:

varchar = decrypt(varchar text, varchar key [, int algorithm]);

The text value specifies the value to be encrypted/decrypted.

3-2 D20484 Rev.1

Page 37: Netezza SQL Toolkit

Data Transformation Function Reference

The key value specifies the key to use to encrypt/decrypt the value. Care must be taken to secure the key or else the security will be compromised. Keep in mind the architecture of the Netezza system when designing your security system including the following

SQL functions are logged in the pg.log file on the Netezza host so executing encrypt(secret_column, ‘my_secret_key’) will reveal your key to anyone who can read the pg.log file.

ODBC/JDBC conversations are easily captured with any number of diagnostic/hacking tools. If your key is transmitted as part of the SQL, it can be compromised during this process.

For these reasons it is recommended that the secret key be stored in a table and passed into the encrypt/decrypt functions through a table join. For example:

SELECT decrypt(a.value, b.key) FROM my_table a, my_keys b WHERE b.key_id = 1;

The algorithm value can be either RC4 or one of the versions of AES, as shown in the fol-lowing list.

RC4, although the most widely-used encryption algorithm (used for example by SSL and WEP), is not cryptographically secure and is vulnerable to attacks.

The Advanced Encryption Standard (AES) is the encryption standard adopted by the United States government and is required for all classified information. The three versions of AES differ only in the design and strength of the key lengths. While all three key lengths are suf-ficient to protect classified information up to the SECRET level, TOP SECRET information requires the use of key lengths 192 or 256.

0 – RC4 (default if no algorithm given)

1 – AES 128

2 – AES 192

3 – AES 256

ReturnsThe function returns an encrypted/decrypted varchar. For example:

Select decrypt (encrypt('123456',100,0),100,0);

This example returns:

123456

uuencodeEncodes a binary value as ASCII using the Unix UUencode format.

The encoding translates the binary value into ASCII character codes in the range 32 and above. Uuencoding has historically been used to encode files destined for e-mail transmis-sion. The uudecode function reverses the effect of uuencode, recreating the original binary file exactly.

The uuencode algorithm does the following:

1. Divides the binary value into groups of three bytes (24 bits), adding zeroes to the end of the binary value if necessary to create a final group of three bytes.

D20484 Rev.1 3-3

Page 38: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

2. Split the 24 bits into four groups of six bits each. This creates four decimal numbers which lie in the range 0 to 63.

3. Add decimal 32 to each number to create ASCII characters in the range 32 (space) to 95 (underscore).

Step 1 is illustrated by the following table.

Steps 2 and 3 are illustrated by the following table. Note the transformation of the three 8 bit ASCII Binary values in the preceding table to the four 6 bit Binary values in the first line of the table:

DescriptionThe uuencode function has the following syntax:

varchar = uuencode(varchar input);

The input value specifies the binary varchar to be uuencoded.

ReturnsThe function returns a UUencoded string. For example:

select uuencode ('hat');

The uuencoding for hat is:

:&%T

uudecodeDecodes an ASCII value that was previously encoded using the Unix UUencode format.

DescriptionThe uudecode function has the following syntax:

varchar = uudecode(varchar input);

The input value specifies the string to be uudecoded.

Table 3-1: Uuencoding, Part I

ASCII Input h a t

ASCII Decimal 104 97 116

ASCII Binary (8 bit) 01101000 01100001 01110100

Table 3-2: Uuencoding, Part II

6 Bit Binary 011010 000110 000101 110100

Decimal Equivalent

26 6 5 52

Decimal + 32 58 38 37 84

Uuencoding : & % T

3-4 D20484 Rev.1

Page 39: Netezza SQL Toolkit

Data Transformation Function Reference

ReturnsThe function returns a UUdecoded string. For example:

select uudecode (':&%T');

This example returns:

hat

D20484 Rev.1 3-5

Page 40: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

3-6 D20484 Rev.1

Page 41: Netezza SQL Toolkit

C H A P T E R 4

Hashing

What’s in this chapterHash Function Reference

Hashing functions are used to encode data, transforming the input into a “hash code” or “hash value.” The hash algorithm is designed to minimize the chance that two inputs will have the same hash value, termed a “collision.”

Hashing functions are used to speed up the retrieval of data records (simple one-way look-ups), for the validation of data (“checksums”), and for cryptography. For lookups, the hash code is used as an index into a hash table which contains a pointer to the data record. For checksums, the hash code is computed for the data before storage/transmission and then recomputed afterward to verify data integrity; if the hash codes do not match, the data is corrupt. Cryptographic hash functions are used for data security.

Some common use cases for hashing functions include:

Detect duplicated records. Because the hash keys of duplicates will hash to the same “bucket” in the hash table, the task reduces to scanning buckets that have more than two records, a much faster method than sorting and comparing each record in the file. (This same technique can be used to find similar records, because similar keys will hash to buckets that are contiguous, the search for similar records can therefore be limited to those buckets.)

Locate points that are near each other. Applying a hashing function to spatial data effectively partitions the space being modeled into a grid, and as in the previous exam-ple, the retrieval/comparison time is greatly reduced because only contiguous cells in the grid need to be searched. This same technique works for other types of spatial data, such as shapes and images.

Verify message integrity. The hash of message digests is made both before and after transmission and the two hash values compared to determine whether the message was corrupted.

Verify passwords. During authentication, the user’s login credentials are hashed and this value is compared with the hashed password stored for that user.

4-1

Page 42: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

Hash Function Reference

The SQL Extensions Toolkit has three hashing functions: hash(), hash4(), and hash8(). The hash() function is a cryptographic function that virtually never produce the same output for two different inputs. However, if speed in hash generation and compar-ison is required or if all you need is a simple one-way lookup function, use hash4 or hash8 instead.

hashReturns a 128 bit, 160 bit, or 256 bit hash of the input data, depending on the algorithm selected. This function provides between 2128 and 2256 distinct return values and is intended for cryptographic purposes.

hash() is generally much slower to calculate than hash4() or hash8().

The return type is a 16 to 32 byte binary varchar. This can make hash comparisons slower than a simple integer comparison

On the Netezza platform, a column of these hashes cannot make use of zone-maps and other performance enhancements.

DescriptionThe hash function has the following syntax:

varchar = hash(varchar data [, int algorithm]);

The data value specifies the varchar to hash.

The algorithm value is specified by an integer code (defaults to 0). The available algorithms and the size of the resulting hash value are shown in the following table:

Both the MD5 and SHA algorithms are message digest algorithms derived from MD4. The SHA (Secure Hash Algorithm) hash functions are the result of an effort by the National Security Agency (NSA) to provide strong cryptographic hashing capabilities. Security flaws have been identified in both SHA-1 and MD5. SHA-2 is still considered secure as of the publication date of this manual, but SHA-3 development is currently underway to prepare for any future security flaw discovered in SHA-2.

ReturnsThe function returns the hashed input data. For example:

select hash4('Netezza',0);

This example returns 186778338.

Table 4-1: Algorithms Supported for Cryptographic Hashing

Code Description Result

0 MD5 128 bit

1 SHA-1 160 bit

2 SHA-2 256 bit

4-2 D20484 Rev.1

Page 43: Netezza SQL Toolkit

Hash Function Reference

hash4Returns the 32 bit checksum hash of the input data. This function provides 232 (approxi-mately 4 billion) distinct return values and is intended for data retrieval (lookups).

DescriptionThe hash4 function has the following syntax:

int4 = hash4(varchar data [, int algorithm]);

The data value specifies the varchar to hash.

The algorithm can be one of the following (defaults to Adler):

0 – Adler

1 – CRC32

Adler is the fastest checksum hash that is provided. However, it has poor coverage when the messages are less than a few hundred bytes (poor coverage means that two different integers hash to the same value, referred to as a “collision”). In this case, use the CRC32 algorithm, or switch to hash8 instead.

ReturnsThe function returns the hashed input data. For example:

select hash4('Netezza',0);

This example returns 186778338.

hash8Returns the 64 bit hash of the input data. The function provides 264 distinct return values and is intended for data retrieval (lookups).

DescriptionThe hash8 function has the following syntax:

int8 = hash8(varchar data [, int algorithm]);

The data value specifies the varchar to hash.

Only one algorithm value is supported for this hashing function, 0, which indicates the Jen-kins algorithm.

ReturnsThe function returns the hashed input data. For example:

select hash8('Netezza');

This example returns 4894810902878370255.

D20484 Rev.1 4-3

Page 44: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

4-4 D20484 Rev.1

Page 45: Netezza SQL Toolkit

C H A P T E R 5

Date and Time Comparisons

What’s in this chapterDate and Time Function Reference

There are three types associated with the date and time functions date, time, and timestamp. The timestamp type is implicitly converted to date and time and can therefore be passed into any of the date/time functions. The date type is implicitly converted to type timestamp (but not time) and can therefore be supplied to any func-tion that takes either a date or a timestamp. Values of type time cannot be converted into anything and therefore can only be supplied to functions that take this type.

For example, although the signature for the next_month function indicates that the func-tion takes an input value of type date, it is permissible to pass an input value of type timestamp into the next_month function.

Date and Time Function Reference

The functions are organized alphabetically.

dayDetermine the weekday in the specified date.

Note: These can also be accomplished using the Netezza date_part() function.

DescriptionThe day function has the following syntax:

int1 = day(date input);

The input value specifies the date.

ReturnsReturns an integer representation of the day in the specified input. For example:

select day('1996-2-29');

This example returns 29.

5-1

Page 46: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

days_betweenDetermine the truncated number of full days between two timestamps.

DescriptionThe days_between function has the following syntax:

int = days_between(timestamp t1, timestamp t2);

The t1 value specifies the beginning timestamp.

The t2 value specifies the ending timestamp.

ReturnsReturns the truncated number of full days between t1 and t2. For example:

select days_between('1996-02-27 06:12:33' , '1996-03-01 07:12:33');

This example returns 3.

hourDetermine the hours value in the specified time.

Note: This can also be accomplished using the Netezza date_part function.

DescriptionThe hour function has the following syntax:

int1 = hour(time input);

The input value specifies the time.

ReturnsReturns an integer representation of the hour in the specified time. For example:

select hour ('01:12:55');

This example returns 1.

hours_betweenDetermine the truncated number of full hours between two timestamps.

DescriptionThe hours_between function has the following syntax:

int = hours_between(timestamp t1, timestamp t2);

The t1 value specifies the beginning timestamp.

The t2 value specifies the ending timestamp.

ReturnsReturns the truncated number of full hours between t1 and t2. For example:

select hours_between('1996-02-27 06:12:33' , '1996-03-01 07:12:33');

This example returns 73.

5-2 D20484 Rev.1

Page 47: Netezza SQL Toolkit

Date and Time Function Reference

minuteDetermine the minutes value in the specified time.

Note: This can also be accomplished using the Netezza date_part function.

DescriptionThe minute function has the following syntax:

int1 = minute(time input);

The input value specifies the time.

ReturnsReturns an integer representation of the minute in the specified time. For example:

select minute ('01:12:55');

This example returns 12.

minutes_betweenDetermine the truncated number of full minutes between two timestamps.

DescriptionThe minutes_between function has the following syntax:

int = minutes_between(timestamp t1, timestamp t2);

The t1 value specifies the beginning timestamp.

The t2 value specifies the ending timestamp.

ReturnsReturns the truncated number of full minutes between t1 and t2. For example:

select minutes_between('1996-02-27 06:12:33' , '1996-02-27 07:12:00');

This example returns 59.

monthDetermine the month in the specified date.

Note: This can also be accomplished using the Netezza date_part function.

DescriptionThe month function has the following syntax:

int1 = month(date input);

The input value specifies the date.

ReturnsReturns an integer representation of the month in the specified input. For example:

select month('1996-2-29');

This example returns 2.

D20484 Rev.1 5-3

Page 48: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

next_monthDetermine the first day of the next month after the specified date.

DescriptionThe next_month function has the following syntax:

date = next_month(date input);

The input value specifies a date.

ReturnsReturns a date value representing the first day of the next month after the month specified by the input. For example:

select next_month('1996-2-29');

This example returns 1996-03-01.

next_quarterDetermine the first day of the next quarter after the quarter specified by the input.

DescriptionThe next_quarter function has the following syntax:

date = next_quarter(date input);

The input value specifies a date.

ReturnsReturns a date value representing the first day of the next quarter after the quarter speci-fied by the input. For example:

select next_quarter('1996-2-29');

This example returns 1996-04-01.

next_yearDetermine the first day of the next year after the year specified by the input.

DescriptionThe next_year function has the following syntax:

date = next_year(date input);

The input value specifies a date.

ReturnsReturns a date value representing the first day of the next year after the year specified by the input. For example:

select next_year('1996-2-29');

This example returns 1997-01-01.

5-4 D20484 Rev.1

Page 49: Netezza SQL Toolkit

Date and Time Function Reference

secondDetermine the seconds value in the specified time.

Note: This can also be accomplished using the Netezza date_part function.

DescriptionThe second function has the following syntax:

int1 = second(time input);

The input value specifies the time.

ReturnsReturns an integer representation of the seconds value in the specified time. For example:

select second ('01:12:55');

This example returns 55.

seconds_betweenDetermine the truncated number of full seconds between two timestamps.

DescriptionThe seconds_between function has the following syntax:

int = seconds_between(timestamp t1, timestamp t2);

The t1 value specifies the beginning timestamp.

The t2 value specifies the ending timestamp.

ReturnsReturns the truncated number of full seconds between t1 and t2. For example:

select seconds_between('1996-02-27 06:12:33','1996-02-27 06:55:22');

This example returns 2569.

this_monthDetermine the first day of the month in the specified date.

Note: This functionality is also provided by the Netezza date_trunc() function.

DescriptionThe this_month function has the following syntax:

date = this_month(date input);

The input value specifies a date.

ReturnsReturns a date representing the first day of the month specified by input. For example:

select this_month('1996-2-29');

This example returns 1996-02-01.

D20484 Rev.1 5-5

Page 50: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

this_quarterDetermine the first day of the quarter in which the specified date occurs.

DescriptionThe this_quarter function has the following syntax:

date = this_quarter(date input);

The input value specifies a date.

ReturnsReturns a date value representing the first day of the specified quarter. For example:

select this_quarter('1996-2-29');

This example returns 1996-01-01.

this_weekDetermine the first day of the week in the specified date.

DescriptionThe this_week function has the following syntax:

date = this_week(date input);

The input value specifies a date.

ReturnsReturns a date value representing the first day of the week specified by input. For example:

select this_week('1996-2-29');

This example returns 1996-02-25.

this_yearDetermine the first day of the year in the specified date.

Note: This functionality is also provided by the Netezza date_trunc() function.

DescriptionThe this_year function has the following syntax:

date = this_year(date input);

The input value specifies a date.

ReturnsReturns a date value representing the first day of the year specified by input. For example:

select this_year('1996-2-29');

This example returns 1996-01-01.

5-6 D20484 Rev.1

Page 51: Netezza SQL Toolkit

Date and Time Function Reference

weeks_betweenDetermine the truncated number of full weeks between two timestamps.

DescriptionThe weeks_between function has the following syntax:

int = weeks_between(timestamp t1, timestamp t2);

The t1 value specifies the beginning timestamp.

The t2 value specifies the ending timestamp.

ReturnsReturns the truncated number of full weeks between t1 and t2. For example:

select weeks_between('1996-02-27 06:12:33' , '1996-03-05 07:12:33');

This example returns 1.

yearDetermine the year in the specified date.

Note: This can also be accomplished using the Netezza date_part function.

DescriptionThe year function has the following syntax:

int2 = year(date input);

The input value specifies the date.

ReturnsReturns an integer representation of the year in the specified date. For example:

select day('1996-2-29');

This example returns 1996.

D20484 Rev.1 5-7

Page 52: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

5-8 D20484 Rev.1

Page 53: Netezza SQL Toolkit

C H A P T E R 6

Text Analytics

What’s in this chapterWord Comparison Function Reference

Regular Expression Function Reference

The functions in this chapter fall into two distinct groupings. The word comparison func-tions are useful for “fuzzy” comparisons, finding records in a database that approximately match a search key, phonetically or lexically. The regular expression functions identify precise patterns of characters and are useful for data validation, for example type checks, range checks, and checks for illegal characters.

Word Comparison Function Reference

The functions are listed alphabetically.

For those functions that operate only on ASCII characters, you can transliterate the strings to convert any accented characters to their ASCII unaccented versions.

For those functions that consider case when evaluating strings, if you want to ignore case, you can use Netezza functions such as upper() and lower() to change the letter casing of strings prior to the comparison. For information on these functions, refer to the Netezza Performance Server Database User’s Guide.

word_diffFinds the number of modifications that are required to change the first string into the sec-ond string. Adding, deleting, substituting, or changing the case of a single character in the string each count as one modification. Transposing two adjacent characters counts as two modifications in all but the Damerau-Levenshtein algorithm, which counts transposition as a single modification.

Note: Using the word_diff function with the Soundex or Double-Metaphone algorithms achieves the same result as using the combination of the word_key function to convert the strings to their phonetic encodings and then using the word_keys_diff function to compare those encodings. The word_diff function both converts the strings to their phonetic encod-ings and compares those encodings.

DescriptionThe word_diff function has the following syntax:

int1 = word_diff(varchar word1, varchar word2 [, int algorithm]);

6-1

Page 54: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

The word1 value specifies the first word in the comparison.

The word2 value specifies the second word in the comparison.

Algorithm is one of the following:

0 – Soundex-Miracode

1 – Soundex-Simplified

2 – Soundex-SQLServer

3 – Double-Metaphone (default if no algorithm given)

10 – Levenshtein

11 – Damerau-Levenshtein

Note: The built-in Netezza le_dst() function is equivalent to using the word_diff() function with the Levenshtein algorithm. The built-in Netezza dle_dst() function is equivalent to using the word_diff() function, with the Damerau-Levenshtein algorithm.

ReturnsReturns an integer that indicates how similar or different the two strings are. A value of 0 indicates the strings are the same. The results vary depending on the algorithm chosen. For example:

select word_diff('anderson','andrsn',0);

This example returns 0, because the Soundex algorithms consider only the initial vowel, not subsequent vowels. Suppose the algorithm is changed to Damerau-Levenshtein, as in the following example:

select word_diff('anderson','andrsn',11);

This call returns 2, because Damerau-Levenshtein accounts for the missing vowels e and o in the second string.

word_findSearches the input varchar text for the first word that matches the input parameter word within the specified tolerance.

DescriptionThe word_find function has the following syntax:

int4 = word_find(varchar word, varchar text, int1 difference [, int algorithm1 [, int algorithm2 [, int algorithm3]]]);

The word value specifies the word you want to search for in text.

The text value specifies the varchar text to search.

The difference value specifies the tolerance used by each specified algorithm when search-ing for a match.

Each specified algorithm will be used to try and find a match within the tolerance defined by difference. If no algorithms are specified or if the only algorithm specified is a stemming algorithm then an exact (case-insensitive) match is required. algorithm is one of the following:

6-2 D20484 Rev.1

Page 55: Netezza SQL Toolkit

Word Comparison Function Reference

0 – Soundex-Miracode

1 – Soundex-Simplified

2 – Soundex-SQLServer

3 – Double-Metaphone

10 – Levenshtein

11 – Damerau-Levenshtein

100 – Porter

ReturnsReturns the position of the first character of the matching string. For example:

select word_find('swimming', 'she swims in the competition in red wsimwear', 0, 11, 100);

This example returns 5.

select word_find('swimming', 'she swims in the competition in red wsimwear', 1, 11);

This example returns 0.

select word_find('SwimweaR ', 'she swims in the competition in red wsimwear', 0, 11);

This example returns 37.

word_keyPhonetically encode a word, according to its pronunciation in English, using the Double Metaphone algorithm or one of the three supported varieties of the Soundex algorithm.

The phonetically encoded words can subsequently be compared with the word_keys_diff function for a “fuzzy” comparison. Words with the same pronunciation but different spell-ings are encoded the same; depending on the algorithm selected, similar sounding words might also be encoded the same.

The goal is to enable you to match names based on their pronunciation and reduce misses that might result from spelling variations. For example, this type of fuzzy comparison can be used to find duplicate records resulting from spelling errors; another use is to find ancestor names in a genealogical database when the spelling has changed slightly over time.

The phonetic matching functions are case-insensitive comparisons: the phonetic represen-tations are the same for two strings that have the same spelling but different letter casing. The functions ignore any characters outside the ASCII subset.

DescriptionThe word_key function has the following syntax:

int4 = word_key(varchar word [, int algorithm]);

The input value specifies the varchar word to be given a phonetic encoding.

D20484 Rev.1 6-3

Page 56: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

The algorithm value is specified by an integer code (defaults to 3). The available algorithms are listed in the following table:

Note: The Netezza built-in dbl_mp() function is equivalent to using the word_key() function with the Double Metaphone algorithm. The Netezza built-in nysiis() function is roughly equivalent to using the word_key() function with the Soundex-Simplified algorithm.

ReturnsThe function returns the word_key code of a word as an integer. These codes can be com-pared using the word_keys_diff() function. For example:

select word_key('persistent',1);

This example returns the encoding 67106.

word_key_tocharReturns the varchar representation of the phonetic encoding produced by the word_key function.

Table 6-1: Algorithms Supported for Phonetic Encoding

Code

Name Description

0 Soundex-Miracode

The original Soundex algorithm used to encode surnames in the United States census between 1880 and 1930. All surnames are encoded as a four-character string: the first character repre-sents the first letter of the person’s last name, and characters two, three, and four are integer encodings for the remaining consonants in the name, ignoring vowels, collapsing duplicate encodings to a single value, and right-padding with zeroes if necessary.

1 Soundex-Simplified

An updated form of the original Soundex algorithm, it is identi-cal to Miracode except that it does not encode H or W.

2 Soundex-SQLServer

The version of the Soundex algorithm implemented in Microsoft SQL Server. It does not encode H or W rule and similarity group-ing starts after the first character.

3 Double-Metaphone

Encodes most English words, not just names. The algorithm better quantifies the rules of English pronunciation and also recognizes a subset of non-Latin characters, making it a much better choice than Soundex (it is the algorithm used by most spell checkers). Whereas Soundex encodes all names with a key of the same length, Double-Metaphone outputs variable length encodings that more accurately represent the sounds of the word. The algorithm also handles the case in which a word has an alternate pronunciation by returning a primary and a second-ary encoding.

6-4 D20484 Rev.1

Page 57: Netezza SQL Toolkit

Word Comparison Function Reference

DescriptionThe word_key_tochar function has the following syntax:

varchar = word_key_tochar(int wordkey [, int algorithm]);

The wordkey value specifies the word_key encoding to be given a varchar representation.

Algorithm is one of the following:

0 – Soundex-Miracode

1 – Soundex-Simplified

2 – Soundex-SQLServer

3 – Double-Metaphone (default if no algorithm given)

ReturnsFor example word_keys_tochar(word_keys(‘Ashcroft’, 0), 0) will return ‘A261’. For example:

select word_key_tochar(word_key('PERsisteNT',2),2);

This example returns P622.

word_keys_diffComputes the lexical difference between phonetic encodings produced by the word_key function.

Note: Soundex word keys can be compared for an exact match by comparing the int4 keys directly without using this function.

DescriptionThe word_keys_diff function has the following syntax:

int1 = word_keys_diff(int4 wordkey1, int4 wordkey2 [, int algorithm]));

The wordkey1 value specifies the first word_key encoding in the comparison.

The wordkey2 value specifies the second word_key encoding in the comparison.

Algorithm is one of the following:

0 – Soundex-Miracode

1 – Soundex-Simplified

2 – Soundex-SQLServer

3 – Double-Metaphone (default if no algorithm given)

ReturnsSoundex will return a value between 0 and 4. 0 represents an exact match. 1-4 represent increasing degrees of inexactness. For example:

select word_keys_diff(word_key('Johnson',0),word_key('Jeppeson',0),0);

This example returns 1 because the two soundex encodings differ by 1 character; the soun-dex code for Johnson is J525 and the soundex code for Jeppeson is J125.

D20484 Rev.1 6-5

Page 58: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

word_stemReturns the root stem of the given varchar word. (e.g. “fishing”, “fished”, “fisher” all return “fish”).

DescriptionThe word_key function has the following syntax:

varchar = word_stem(varchar word [, int algorithm]);

The word value specifies the varchar word whose root stem you want.

The algorithm value has just one option, 100, which indicates the Porter algorithm. This is the default, so no algorithm need be specified.

ReturnsThe function returns the root stem of the given varchar word. For example:

select word_stem('fishing');

select word_stem('fisher');

Both of these examples return fish.

Regular Expression Function Reference

The supported regular expression functions are full Perl v5 compatible. A discussion of how regular expressions operate is beyond the scope of this document. For more information, refer to the many texts available that discuss how to construct Perl regular expressions.

The Flags ArgumentThe functions described in this section all take a flags argument. The flags argument can contain any of the following:

Table 6-2: Flags used in Regular Expressions Functions

Flag Short Description Full Description

m Multi-line Specifies that the input data may contain more than one line so that the ‘^’ and the ‘$’ matches should take that into account. Equivalent to the Perl /m option

i Case insensitive Matching should take place without considering case. Equivalent to the Perl /i option.

c Case sensitive The default and opposite of the ‘i’ parameter.

s Dot All Specifies that the ‘.’ character should match newlines. Equivalent to the Perl /s option.

n Equivalent to the ‘s’ parameter.

Included for compatibility with vendor’s that use n flag.

x Extended White space data characters are ignored unless escaped. Equivalent to the Perl /x option.

6-6 D20484 Rev.1

Page 59: Netezza SQL Toolkit

Regular Expression Function Reference

regexp_extractPulls out the matching text item.

Note: Analogous to the REGEXP_SUBSTR() function provided by some vendors.

DescriptionThe regexp_extract function has the following syntax:

varchar = regexp_extract(varchar input, varchar pattern [, int start_pos [, int reference]] [, varchar flags]);

The input value specifies the varchar on which the regular expression is processed

The pattern value specifies the regular expression.

The start_pos value specifies the character position at which to start the search (defaults to position 1).

The reference value specifies which instance of the pattern to extract (defaults to 1).

For a description of flags, see “The Flags Argument” on page 6-6.

ReturnsFor example:

select regexp_extract(‘hello to you, ‘.o’,1,1);

select regexp_extract(‘hello to you, ‘.o’,1,2);

select regexp_extract(‘hello to you, ‘.o’,1,3);

This first example returns lo, the second returns to, and the third returns yo.

regexp_extract_allPulls out all the matching text items and returns them in a varchar array.

DescriptionThe regexp_extract_all function has the following syntax:

array(varchar) = regexp_extract_all(varchar input, varchar pattern [, int start_pos] [, varchar flags]);

The input value specifies the varchar on which the regular expression is processed.

The pattern value specifies the regular expression.

The start_pos value specifies the character position at which to start the extract (defaults to position 1)

For a description of flags, see “The Flags Argument” on page 6-6.

ReturnsFor example:

select array_combine(regexp_extract_all('Steven .Stephen are best player','Ste(v|ph)en'),'|');

This example returns

Steven|Stephen

D20484 Rev.1 6-7

Page 60: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

regexp_extract_all_spProcesses the specified regular expression on the varchar input. All sub-patterns are returned in an array with the first element (element 0) corresponding to the full match.

DescriptionThe regexp_extract_all_sp function has the following syntax:

array(varchar) = regexp_extract_all_sp(varchar input, varchar pattern [, int start_pos][, varchar flags]);

The input value specifies the varchar on which the regular expression is processed.

The pattern value specifies the regular expression.

The start_pos value specifies the character position at which to start the extract (defaults to position 1).

For a description of flags, see “The Flags Argument” on page 6-6.

ReturnsFor example:

select array_combine(regexp_extract_all_sp('Robert Szissel, 128 Folson St, Boston', '([^,]*),[[:space:][:digit:]]*([^[:space:]]*).*,[[:space:]]*(.*)'),'|');

This example returns

Robert Szissel, 128 Folson St, Boston|Robert Szissel|Folson|Boston

regexp_extract_spProcesses the specified regular expression on the varchar input, returning the specified sub-pattern.

DescriptionThe regexp_extract_sp function has the following syntax:

varchar = regexp_extract_sp(varchar input, varchar pattern , int start_pos , int reference[, varchar flags]);

The input value specifies the varchar on which the regular expression is processed

The pattern value specifies the regular expression.

The start_pos value specifies the character position at which to start the extract (defaults to position 1).

The reference value specifies which instance of the pattern to extract.

For a description of flags, see “The Flags Argument” on page 6-6.

ReturnsFor example, consider the following database:

6-8 D20484 Rev.1

Page 61: Netezza SQL Toolkit

Regular Expression Function Reference

create table sample(col1 varchar(20));

CREATE TABLE

insert into sample values('bcaaabc');

INSERT 0 1

insert into sample values('abcbc');

INSERT 0 1

insert into sample values('bbb');

INSERT 0 1

insert into sample values('bcd');

INSERT 0 1

insert into sample values('bccdebc');

INSERT 0 1

insert into sample values('def');

INSERT 0 1

insert into sample values('efgbcbc');

INSERT 0 1

And consider the following query executed against this table:

select regexp_extract_sp ( col1, '[acf]' ,1,1)from sample order by rowid;

This example returns 7 rows:

c

a

c

c

f

f

regexp_instrPulls out the index of the matching text item.

DescriptionThe regexp_instr function has the following syntax:

int = regexp_instr(varchar input, varchar pattern [, int start_pos [, int reference]] [, varchar flags]);

The input value specifies the varchar on which the regular expression is processed

The pattern value specifies the regular expression.

The start_pos value specifies the character position at which to start the search for a match (defaults to position 1).

The reference value indicates a specific instance of the pattern.

For a description of flags, see “The Flags Argument” on page 6-6.

D20484 Rev.1 6-9

Page 62: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

ReturnsIf there is no match, or else if there are less than reference occurrences of the pattern, this will return 0. For example:

select regexp_extract(‘hello to you, ‘.o’,1,1);

select regexp_extract(‘hello to you, ‘.o’,1,2);

select regexp_extract(‘hello to you, ‘.o’,1,3);

This first example returns 4, the second returns 7, and the third returns 10.

regexp_likeReturns true if there is at least one matching occurrence in input.

DescriptionThe regexp_like function has the following syntax:

bool = regexp_like(varchar input, varchar pattern [, int start_pos] [, varchar flags]);

The input value specifies the varchar on which the regular expression is processed.

The pattern value specifies the regular expression.

The start_pos value specifies the character position at which to start the search for a match (defaults to position 1).

For a description of flags, see “The Flags Argument” on page 6-6.

ReturnsFor example:

select regexp_like('my password is 09124 or 069az6','[0-9][^0-9]+[0-9]$');

This example returns true.

regexp_match_countReturns the number of matching occurrences in input.

DescriptionThe regexp_match_count function has the following syntax:

int = regexp_match_count(varchar input, varchar pattern [, int start_pos] [, varchar flags]);

The input value specifies the varchar on which the regular expression is processed.

The pattern value specifies the regular expression.

The start_pos value specifies the character position at which to start the search for a match (defaults to position 1).

For a description of flags, see “The Flags Argument” on page 6-6.

6-10 D20484 Rev.1

Page 63: Netezza SQL Toolkit

Regular Expression Function Reference

ReturnsFor example:

select regexp_match_count('Steven Jones and Stephen Smith are the best players','Ste(v|ph)en');

This example returns 2.

regexp_replaceReplaces each instance of pattern in input with the value in the varchar replacement.

DescriptionThe regexp_replace function has the following syntax:

varchar = regexp_replace(varchar input, varchar pattern, varchar replacement [, int start_pos [, int reference]] [, varchar flags]);

The input value specifies the varchar on which the regular expression is processed

The pattern value specifies the regular expression.

The replacement value specifies the value to substitute for each instance of pattern.

The start_pos value specifies the character position at which to start the replace (defaults to position 1)

The reference value specifies which instance of the pattern to replace.

For a description of flags, see “The Flags Argument” on page 6-6.

ReturnsIf reference is set to 0 (or not specified) then all occurrences of the string will be replaced. For example:

select regexp_replace('Awake! Fear, Fire, Foes!','Foes','Flee');

This example returns:

Awake! Fear, Fire, Flee!

regexp_replace_spProcesses the specified regular expression on the varchar input and replaces each instance of a sub-pattern with the values in the array replacements.

DescriptionThe regexp_replace_sp function has the following syntax:

varchar = regexp_replace_sp(varchar input, varchar pattern, array replacements [, int start_pos] [, varchar flags]);

The input value specifies the varchar on which the regular expression is processed

The pattern value specifies the regular expression.

The replacement array specifies the values to substitute for each instance of the subpattern.

The start_pos value specifies the character position at which to start the replace (defaults to position 1)

D20484 Rev.1 6-11

Page 64: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

For a description of flags, see “The Flags Argument” on page 6-6.

ReturnsFor example:

select regexp_replace_sp('Robert Szissel, 128 Folson St, Boston', '([[:digit:]]+)[^.]*,.*(Boston)', array_split('37000,Cleveland', ','));

This example returns:

Robert Szissel, 37000 Folson St, Cleveland

6-12 D20484 Rev.1

Page 65: Netezza SQL Toolkit

C H A P T E R 7

Text Utility

What’s in this chapterText Utility Function Reference

The text utility functions in this chapter enable you to convert between ASCII hexadecimal and ASCII, substitute substrings, and extract substrings.

Text Utility Function Reference

Functions are listed alphabetically.

hextorawInterprets each pair of characters (left to right) in the input varchar as the hexadecimal code for an ASCII character and converts the hexadecimal sequence into a character string.

DescriptionThe hextoraw function has the following syntax:

varchar = hextoraw(varchar input);

The input value specifies the varchar to convert.

ReturnsFor example:

SELECT hextoraw(‘68656C6C6f’);

This example returns the varchar:

hello

rawtohexConverts a character string into the ASCII hexadecimal representation.

DescriptionThe rawtohex function has the following syntax:

varchar = rawtohex(varchar input);

The input value specifies the varchar to convert.

7-1

Page 66: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

ReturnsFor example:

SELECT rawtohex(‘hello’);

This example returns the varchar:

68656C6C6F

replaceReplaces each instance of pattern in input with the value in the varchar replacement.

DescriptionThe replace function has the following syntax:

varchar = replace(varchar input, varchar pattern, varchar replacement);

The input value specifies the varchar in which the characters are replaced.

The pattern value specifies the characters to replace.

The replacement value specifies the characters to substitute for each instance of pattern.

ReturnsFor example:

select replace('persisaent','a','t');

This example returns:

"persistent"

strleftReturns the left-most n characters from the varchar input.

DescriptionThe strleft function has the following syntax:

varchar = strleft(varchar input, int n);

The input value specifies the varchar from which the characters are returned.

The n value specifies the number of characters to return.

ReturnsFor example:

Select strleft ('1234567891',5)

This example returns:

"12345"

7-2 D20484 Rev.1

Page 67: Netezza SQL Toolkit

Text Utility Function Reference

strrightReturns the right-most n characters from the varchar input.

DescriptionThe strright function has the following syntax:

varchar = strright(varchar input, int n);

The input value specifies the varchar from which the characters are returned.

The int value specifies the number of characters to return.

ReturnsFor example:

Select strright ('1234567891',5)

This example returns:

"67891"

D20484 Rev.1 7-3

Page 68: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

7-4 D20484 Rev.1

Page 69: Netezza SQL Toolkit

C H A P T E R 8

Array

What’s in this chapterArray Function Reference

The array functions in the Netezza SQL Extensions Toolkit rely on the array data type. Because the Netezza database currently does not support user-defined types, the array type is stored in a varchar field. The maximum size of a varchar field is 64000 bytes.

The array type consists of a sequence of name-value pairs. Names can be a maximum of 40 characters in width. Values can be any built-in SQL type, but must be the same type for the entire array. Elements can be referenced by either name or by the 1-based index.

Array Function Reference

Functions are listed alphabetically.

add_elementAppends a new array element to the end of the input array and assign it the specified value. This is an overloaded function, with 7 forms corresponding to the 7 data types.

DescriptionThe syntax of the add_element function has eight forms, one for each data type:

array = add_element(array input, varchar value [, varchar name])

array = add_element(array input, nvarchar value [, varchar name])

array = add_element(array input, int8 value [, varchar name])

array = add_element(array input, double value [, varchar name])

array = add_element(array input, time value [, varchar name])

array = add_element(array input, date value [, varchar name])

array = add_element(array input, timestamp value [, varchar name]);

The input value specifies the array to which the element is appended.

The value value specifies the value to store in the new array element.

The optional name value specifies the name of the array element being appended.

8-1

Page 70: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

ReturnsFor example:

add_element(my_array, 45)

Assuming my_array has four elements, then this example appends a fifth element to the end of the array and stores the value 45 in that element

arrayCreates an array of the given type.

DescriptionThe array function has the following syntax:

array = array(int type);

The type value specifies the type of array to create. The type takes an integer code between 1 and 11 that indicates the type, as shown in the following table:

Table 8-1: Array Types

Code Type Size

1 Int1 8 bit

2 Int2 16 bit

3 Int4 32 bit

4 Int8 64 bit

5 Date Ranging from January 1, 0001, to December 31, 9999.Disk Usage: 4 bytes

6 Time Hours, minutes, and seconds to 6 decimal positions. Ranging from 00:00:00.000000 to 23:59:59.999999.Disk Usage: 8 bytes

7 Timestamp Has a date part and a time part, with seconds stored to 6 deci-mal positions. Ranging from January 1, 0001 00:00:00.000000 to December 31, 9999 23:59:59.999999.Disk Usage: 8 bytes

8 Varchar Variable length to a maximum length of n. No blank padding, stored as entered. The maximum character string size is 64,000. Uses N+2 or fewer bytes depending on the data.

9 NvarChar Variable-length Unicode data with a maximum length of 16000 characters. Using UTF-8 encoding, each Unicode code point requires 1-4 bytes of storage. So a 10-character string requires 10-bytes of storage if it is ASCII, up to 20 bytes if it is Latin, or as many as 40 bytes if it is pure Kanji (but typically 30 bytes).

10 Float Floating point number with precision 1 to 15. Precision less than 6 uses 4 bytes. Precision between 7 and 15 uses 8 bytes.

11 Double Equivalent to float with precision 15, using 8 bytes

8-2 D20484 Rev.1

Page 71: Netezza SQL Toolkit

Array Function Reference

ReturnsFor example:

create table array_t(col1 int,col2 varchar(100));

array_combineCombines the array elements in the array input into a single varchar delimited by delimiter.

DescriptionThe array_combine function has the following syntax:

varchar = array_combine(array input, char delimiter);

The input value specifies the array to decompose into a single varchar.

The delimiter value specifies the delimiter that distinguishes the array elements.

ReturnsFor example:

select array_combine(col2,'|')from array_t;

A possible return value might be:

12|23

array_concatConcatenates two arrays, creating a new array that contains all the elements in the first array followed by all the elements in the second array.

Note: The two arrays must be of the same type and element names cannot be the same.

DescriptionThe array_concat function has the following syntax:

array = array_concat(array array1, array array2);

The array1 value specifies the first of the two arrays to concatenate.

The array2 value specifies the second of the two arrays to concatenate.

ReturnsFor example:

select (array_concat (array(2),array(2)));

array_countReturns the number of elements in the array.

DescriptionThe array_count function has the following syntax:

int = array_count(array input);

The input value specifies the array in which to count elements.

D20484 Rev.1 8-3

Page 72: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

ReturnsFor example:

select array_count(col2)from array_t;

A possible return value might be:

2

array_splitParses the input for elements separated by a delimiter to create an array.

DescriptionThe array_concat function has the following syntax:

array = array_split(varchar input, varchar delimiter [, [int type]);

The input value specifies a character delimited list of elements.

The delimiter value specifies the delimiter used in the input.

The optional type value specifies the type of the array; the type defaults to varchar.

ReturnsFor example:

select array_combine(array_split('1,2,3,4,5,6,7,8',','),'|');

This example returns:

1|2|3|4|5|6|7|8

array_typeReturns the type of the array.

DescriptionThe array_type function has the following syntax:

int = array_type(array input);

The input value specifies the array for which to get the type.

ReturnsFor example:

select array_type(array(4));

This example returns 4:

This second example determines the array type of an array that is stored in a table:

select array_type(col2)from array_t;

8-4 D20484 Rev.1

Page 73: Netezza SQL Toolkit

Array Function Reference

delete_elementDeletes an element from the input array.

DescriptionThe syntax for the delete_element function supports deleting by name or by index:

array = delete_element(array input, int index);

array = delete_element(array input, varchar name);

The input value specifies the array which contains the element to delete.

The index value specifies the index of the element to delete from the input array.

The name value specifies the name of the element to delete from the input array.

ReturnsFor example:

select delete_element(col2,1)from array_t;

element_nameReturns the name of an element if it exists.

DescriptionThe element_name function has the following syntax:

varchar = element_name(array input, int index);

The input value specifies the array which contains the named element.

The index value specifies the element for which to retrieve the name.

ReturnsFor example:

select element_name(add_element(array(4),4,'Netezza'),1);

This example returns:

Netezza

get_value_typeRetrieves the value stored in the specified array element. The name of the function is of the form get_value_type, where type is the data type of the element to retrieve, for example get_value_varchar. There are seven data types, but there are two versions of the function for each data type, enabling you to retrieve array elements by index or by name.

DescriptionThe get_value_type function has the following syntax:

varchar = get_value_varchar(array input, int index);

varchar = get_value_varchar(array input, varchar name);

nvarchar = get_value_nvarchar(array input, int index);

nvarchar = get_value_nvarchar(array input, varchar name);

D20484 Rev.1 8-5

Page 74: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

int8 = get_value_int(array input, int index);

int8 = get_value_int(array input, varchar name);

double = get_value_double(array input, int index);

double = get_value_double(array input, varchar name);

time = get_value_time(array input, int index);

time = get_value_time(array input, varchar name);

date = get_value_date(array input, int index);

date = get_value_date(array input, varchar name);

time_tz = get_value_timestamp(array input, int index);

time_tz = get_value_timestamp(array input, varchar name);

The input value specifies the array which contains the element to retrieve.

The index value specifies the index of the element to retrieve from the input array.

The name value specifies the name of the element to retrieve from the input array.

ReturnsThis function attempts to perform type conversion if the specified element is of a different type than the function returns. If unsuccessful in conversion, or if the element does not exist, it will return an error. For example:

select get_value_int(col2,1)from array_t;

A possible return value might be 12:

replace_elementReplaces an array element in the input array. This is an overloaded function, with 14 forms corresponding to the 7 data types (by name or by array index).

DescriptionThe syntax of the add_element function has 16 variations, two for each of the 8 data types (one for referencing an element by name and one for referencing an element by index):

array = replace_element(array input, int index, varchar value)

array = replace_element(array input, varchar name, varchar value)

array = replace_element(array input, int index, nvarchar value)

array = replace_element(array input, varchar name, nvarchar value)

array = replace_element(array input, int index, int8 value)

array = replace_element(array input, varchar name, int8 value)

array = replace_element(array input, int index, double value)

array = replace_element(array input, varchar name, double value)

array = replace_element(array input, int index, time value)

array = replace_element(array input, varchar name, time value)

array = replace_element(array input, int index, date value)

array = replace_element(array input, varchar name, date value)

array = replace_element(array input, int index, timestamp value)

array = replace_element(array input, varchar name, timestamp value);

8-6 D20484 Rev.1

Page 75: Netezza SQL Toolkit

Array Function Reference

The input value specifies the array in which the element is replaced.

The index value specifies the position in the array at which the element is replaced.

The name value specifies the name of the array element to replace.

value specifies the new value for the specified array element.

ReturnsFor example:

select replace_element(col2,1,15)from array_t;

D20484 Rev.1 8-7

Page 76: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

8-8 D20484 Rev.1

Page 77: Netezza SQL Toolkit

C H A P T E R 9

Collection

What’s in this chapterUser Type Collection

Collection Function Reference

Collections are useful for grouping together heterogeneous information; in other words, information of different data types can be stored in each element in the collection, unlike arrays in which each element must be of the same data type.

User Type Collection

A new user type, collection is defined in this section. The collection type consists of a sequence of name-value pairs. Names can be a maximum of 40 characters in width. Values can be any built-in SQL type. Elements can be referenced by either name or by 1 based index.

Because the Netezza user defined functions (UDFs) currently do not support new user types, the collection type will be loaded into a varchar field.

Collection Function Reference

In addition to the two functions listed in this section, you can use any of the array functions listed in Chapter 8, “Array” to retrieve and manipulate collection elements.

collectionCreates an empty collection.

DescriptionThe collection function has the following syntax:

collection = collection();

ReturnsFor example:

create table collection_t(col1 int, col2 varchar(100));

9-1

Page 78: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

element_typeReturns the type of the collection element.

DescriptionThe element_type function has the following syntax:

int = element_type(collection input, int index);

int = element_type(collection input, varchar name);

The input value specifies the collection.

The index value specifies the index of the element to find the type of.

The name value specifies the name of the element to find the type of.

ReturnsFor example:

select element_type(col2,1)from collection_t;

Assuming an element of type INT4, the example returns 4.

9-2 D20484 Rev.1

Page 79: Netezza SQL Toolkit

C H A P T E R 10

Miscellaneous

What’s in this chapterMiscellaneous Function Reference

This chapter contains those functions that do not fit neatly into the functional groupings in the preceding chapters of this manual.

Miscellaneous Function Reference

Functions are listed alphabetically.

greatestReturns the largest of the input values, up to a maximum of four (variable length lists are not supported).

DescriptionThe syntax of the function has three forms, depending on the data type of the values being compared:

int4 = Greatest(int4 value1, int4 value2, ...);

int8 = Greatest(int8 value1, int8 value2, ...);

double = Greatest(double value1, double value2, ...);

The value1 value specifies the first input to compare.

The value2 value specifies the second input to compare.

The value3 value specifies the third input to compare.

The value4 value specifies the fourth input to compare

ReturnsFor example:

select greatest(12,45,85);

This example returns 85.

10-1

Page 80: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

leastReturns the smallest of the input parameters, up to a maximum of four (variable length lists are not supported).

DescriptionThe syntax of the function has three forms, depending on the data type of the values being compared:

int4 = Least(int4 value1, int4 value2, ...);

int8 = Least(int8 value1, int8 value2, ...);

double = Least(double value1, double value2, ...);

The value1 value specifies the first input to compare.

The value2 value specifies the second input to compare.

The value3 value specifies the third input to compare.

The value4 value specifies the fourth input to compare

ReturnsFor example:

select least(14,45,75);

This example returns 14.

mt_randomReturns a pseudo-random number between 0.0 and 1.0 using the Mersenne Twister pseudo-random number generator, an open source library that quickly generates high qual-ity pseudo-random numbers with a period of 219937 and very good distribution.

The pseudo-random numbers are excellent for simulations, such as Monte Carlo simula-tions, as well as for polling, for example providing a random sample of 1000 records from a table of one million records.

This algorithm by itself is not suitable for cryptography because as few as 624 iterations are required to predict all future iterations. Wrapping this function with a hash function is likely sufficient to provide cryptographically secure random numbers.

Note: NPS offers a built-in random() function which is based on the Linear Congruential Generator algorithm. The Mersenne Twister algorithm is often favored for certain random-ness applications.

DescriptionThe mt_random function has the following syntax:

mt_random = mt_random();

ReturnsThe function returns a pseudo random number between 0.0 and 1.0. The following exam-ple pulls a very well distributed random sample of 10 records from the Customer_Table:

SELECT * FROM Customer_Table ORDER BY mt_random() LIMIT 10;

10-2 D20484 Rev.1

Page 81: Netezza SQL Toolkit

Miscellaneous Function Reference

corrThis aggregate function returns the correlation coefficient of the set of inputa to inputb.

DescriptionThe corr function has the following syntax:

double = corr(Set(double) inputa, Set(double) inputb);

The inputa value specifies the first in the set.

The inputb value specifies the next in the set.

ReturnsFor example, assuming a table function_t with following values 1.2, 1.4, and 1.6 in col1 and the values1.4, 1.6, and 1.8 in col2:

select corr(col1,col2)from function_t;

This example returns 1:

covar_popThis aggregate function returns the population-based covariance of the set of number pairs inputa and inputb.

DescriptionThe covar_pop function has the following syntax:

double = covar_pop(Set(double) inputa, Set(double) inputb);

The inputa value specifies the first number of the set.

The inputb value specifies the next number of the set.

ReturnsFor example, assuming a table function_t with following values 1.2, 1.4, and 1.6 in col1 and the values1.4, 1.6, and 1.8 in col2:

select covar_pop(col1,col2)from function_t;

This example returns:

0.026666666666667

covar_sampThis aggregate function returns the sample-based covariance of the set of number pairs inputa and inputb.

DescriptionThe covar_samp function has the following syntax:

double = covar_samp(Set(double) inputa, Set(double) inputb);

The inputa value specifies the first number of the set.

The inputb value specifies the next number of the set.

D20484 Rev.1 10-3

Page 82: Netezza SQL Toolkit

NPS SQL Extensions Toolkit User’s Guide

ReturnsFor example, assuming a table function_t with following values 1.2, 1.4, and 1.6 in col1 and the values1.4, 1.6, and 1.8 in col2:

select covar_samp(col1,col2)from function_t;

This example returns:

0.040000000000001

10-4 D20484 Rev.1

Page 83: Netezza SQL Toolkit

Index

Index

Aaccented characters 6-1add_element 8-1Adler algorithm 4-3Advanced Encryption Standard 3-2AES 3-2Algorithms

AES 3-2algorithms

Adler 4-3CRC32 4-3Damerau-Levenshtein 6-1DEFLATE 3-1Double-Metaphone 6-1, 6-3, 6-4, 6-5Jenkins 4-3MD5 4-2Mersenne Twister 10-2SHA 4-2Soundex 6-1, 6-3, 6-4, 6-5

array data type 8-1, 8-2array function 8-2array_combine 8-3array_concat 8-3array_count 8-3array_split 8-4array_type 8-4ASCII 6-1, 6-3ASCII to hexadecimal conversions 7-1

Bbackups, for SQL Extensions toolkit 1-6

Ccharacters, accented 6-1checksum hash function 4-3checksums 4-1collection data type 9-1compress function 3-1conversion

ASCII to hexadecimal 7-1hexadecimal to ASCII 7-1

corr function 10-3correlation coefficient 10-3covar_pop 10-3covar_samp 10-3covariance

population based 10-3sample-based 10-3

CRC32 algorithm 4-3cryptographic hash function 4-2cryptography 4-1

DDamerau-Levenshtein algorithm 6-1data transformation functions 3-1data type

array 8-1, 8-2array elements 8-2collection 9-1converting in XMLExtractValue 2-12date 5-1implicit conversion of date and time 5-1SQL 2-2time 5-1timestamp 5-1type checking 6-1user defined types 2-2

database, registering SQL Extension functions in 1-2date data type 5-1day function 5-1days_between 5-2dbl_mp 6-4decompress function 3-2decrypt function 3-2DEFLATE compression algorithm 3-1delete_element 8-5detecting duplicated records 4-1dle_dst 6-2double function 10-2Double-Metaphone algorithm 6-1, 6-3, 6-4, 6-5duplicate records, detecting 4-1

Eelement_name 8-5element_type 9-2encrypt function 3-2encryption

private key 3-2secret key 3-2symmetric 3-2

examplesXMLAgg 2-4XMLConcat 2-4XMLElement 2-2, 2-3, 2-4XMLSerialize 2-2

expressions, XPath 2-7

Ffuzzy comparisons 6-1

Gget_value_date 8-5get_value_double 8-5get_value_int 8-5get_value_nvarchar 8-5get_value_time 8-5

Index-1

Page 84: Netezza SQL Toolkit

Index

get_value_timestamp 8-5get_value_timetz 8-5get_value_varchar 8-5greatest function 10-1

Hhash function 4-2

cryptographic 4-2hash functions

checksum 4-3lookup 4-3lookups 4-3

hash table 4-1hash4 4-3hash8 4-3hexadecimal to ASCII conversions 7-1hextoraw 7-1hour function 5-2hours_between 5-2

Iinstallation instructions 1-2ISO/IEC 9075-14 2-1IsValidXML 2-5, 2-8IsXML 2-8

JJDBC conversations 3-3Jenkins algorithm 4-3

Kkey search attacks 3-2

Lle_dst 6-2least function 10-2Levenshtein algorithm

algorithmsLevenshtein 6-2, 6-3

lexical comparisons 6-1libnetsqlextensions.tar.gz file, untarring 1-2license information 1-1locating spatial points 4-1lookup hash function 4-3lookups 4-1

MMD5 algorithm 4-2Mersenne Twister algorithm 10-2messages, verifying integrity 4-1minute function 5-3minutes_between 5-3month function 5-3

NNetezza SQL Extensions Toolkit

backups and restores 1-6disabling in a database 1-4displaying version 1-4installing 1-2obtaining 1-1registering functions in a database 1-2removing 1-5upgrading 1-4

next_month 5-4next_quarter 5-4next_year 5-4nysiis 6-4NzAdmin screenshot with functions 1-3

OODBC conversations 3-3

Ppasswords, verifying 4-1pattern matching 6-1Perl 5 regular expressions 6-6pg.log file 3-3phonetic comparisons 6-1, 6-3population-based covariance 10-3Porter algorithm

algorithmsPorter 6-3

private key encryption 3-2pseudo-random number 10-2publishing XML data 2-2

Rrandom function 10-2random number generator 10-2range checks 6-1rawtohex 7-1regexp_extract 6-7regexp_extract_all 6-7regexp_extract_all_sp 6-8regexp_extract_sp 6-8regexp_instr 6-9regexp_like 6-10regexp_match_count 6-10regexp_replace 6-11regexp_replace_sp 6-11regular expressions 6-1

flags argument 6-6overview 6-6

removal instructions 1-5replace function 7-2replace_element 8-6restores, for SQL Extensions toolkit 1-6

Index-2

Page 85: Netezza SQL Toolkit

Index

Ssample-based covariance 10-3second function 5-5seconds_between 5-5secret key encryption 3-2SHA algorithm 4-2Soundex algorithm 6-1, 6-3, 6-4, 6-5spatial points, locating 4-1SQL 2003 2-1, 2-2SQL Extension functions, registering 1-2SQL Functions toolkit

disabling 1-4strleft 7-2strright 7-3symmetric encryption 3-2system prerequisites 1-1

Ttext

fuzzy comparisons 6-1lexical comparisons 6-1phonetic comparisons 6-1, 6-3regular expressions 6-1

this_month 5-5this_quarter 5-6this_week 5-6this_year 5-6time data type 5-1timestamp data type 5-1transliterating accented characters 6-1type checks 6-1

UUDFs 2-1, 2-2uninstall instructions 1-5user accounts, permissions 1-4user defined types 2-2uudecode 3-4uuencode 3-3

Vverifying message integrity 4-1verifying passwords 4-1version, displaying for SQL Extensions toolkit 1-4

Wweeks_between 5-7word_diff 6-1word_find 6-2word_key 6-3word_key_tochar 6-4word_keys_diff 6-5word_stem 6-6

XXML data type 2-2, 2-13, 2-14XML data, publishing 2-2XML examples

XMLAgg 2-4XMLConcat 2-4XMLElement 2-2, 2-3, 2-4XMLSerialize 2-2

XML functions, nesting 2-3XML standalone property 2-14XML version property 2-14XMLAGG 2-9XMLAgg 2-1, 2-2XMLAttributes 2-1, 2-2, 2-10XMLConcat 2-1, 2-2, 2-10XMLElement 2-1, 2-2, 2-11XMLExistsNode 2-1, 2-11XMLExtract 2-1, 2-12XMLExtractValue 2-1, 2-12XMLParse 2-13XMLRoot 2-1, 2-14XMLSerialize 2-14XMLUpdate 2-1, 2-15XPath expressions 2-7

Yyear function 5-7

Zzlib library 3-1zone maps 4-2

Index-3

Page 86: Netezza SQL Toolkit

Index

Index-4