text analytics with sap hana platform week 1

32
openSAP TEXT ANALYTICS WITH SAP HANA PLATFORM – WEEK 1 Version: January 20, 2016 Exercises / Solutions Anthony Waite / SAP Labs, LLC. Bill Miller / SAP Labs, LLC.

Upload: others

Post on 18-Dec-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

openSAP

TEXT ANALYTICS WITH SAP HANA

PLATFORM – WEEK 1

Version: January 20, 2016

Exercises / Solutions Anthony Waite / SAP Labs, LLC. Bill Miller / SAP Labs, LLC.

Page 2: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

2

Contents Desktop in SAP Cloud Appliance Library ..................................................................................................... 4 Preconfigured front-end client ........................................................................................................................... 4 Front-End Client Configuration ...................................................................................................................... 6 Getting Help ....................................................................................................................................................... 6 Adding a HANA Studio Perspective .................................................................................................................. 6 Create a connection to the HANA server .......................................................................................................... 8 Create a Repository Workspace ...................................................................................................................... 11 Create a HANA Project .................................................................................................................................... 13 Important HANA Studio customizations .......................................................................................................... 15 (Mandatory) Set permissions for course exercises ......................................................................................... 17 Exercise 1 – Solution ..................................................................................................................................... 21 In SAP HANA Studio ....................................................................................................................................... 21 Exercise 2 – Solution ..................................................................................................................................... 25 In SAP HANA Studio ....................................................................................................................................... 25 Exercise 3 – Solution ..................................................................................................................................... 28 In SAP HANA Studio ....................................................................................................................................... 28 Exercise 4 – Solution ..................................................................................................................................... 30 In SAP HANA Studio ....................................................................................................................................... 30

Page 3: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

3

BEFORE YOU START

System Host: HANA IP address

System Instance Number: 00

System User ID: SYSTEM

Password: Master Password you entered for the solution when creating

the instance in the SAP Cloud Appliance Library

Page 4: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

4

DESKTOP IN SAP CLOUD APPLIANCE LIBRARY

Preconfigured front-end client

Steps Screenshot

1) Click on the “SAP Dev Tools for Eclipse” icon on your desktop in the SAP Cloud Appliance Library (CAL) to open the front-end client.

Note: If you have SAP HANA Studio installed on your computer, you can use it instead of the preconfigured front-end client in CAL. If so, bypass this section and jump to the following Adding a HANA Studio Perspective.

2) Click on the “Systems” view. Right-click on “HDB (SYSTEM)”.

3) Select “Log On” from the pop-up menu.

Page 5: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

5

4) Log on as: User Name: SYSTEM Password: Master Password you specified when creating the instance in SAP CAL

Click the “OK” button to close the dialog box.

5) You are now logged on to your SAP HANA database.

Page 6: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

6

FRONT-END CLIENT CONFIGURATION

Getting Help If you need additional help resources beyond this document: http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf Adding a HANA Studio Perspective

Steps Screenshot

1) To support the developer-centric workflow, there is an additional Eclipse perspective which has been added to SAP HANA Studio. This may not displayed by default. In the upper right corner of your front-end client, click the “Open Perspective” button.

2) Add the “SAP HANA Development” perspective. This is the perspective you should be using for the entire workshop.

3) After adding the “SAP HANA Development” perspective, you may also still see other perspectives such as “SAP HANA Administration Console”. If that is the case, you may want to right-click and choose “Close” as we will not be using this or any

Page 7: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

7

other perspectives.

Page 8: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

8

Create a connection to the HANA server

Steps Screenshot

1) Make sure you are in the “SAP HANA Development” perspective by clicking on the button.

Note: This section only applies to you if you are using your own SAP HANA Studio installation on your computer. For those of you using the SAP Dev Tools for Eclipse in CAL, bypass this section and jump to the following Create a Repository Workspace.

2) Click on the “Systems” view. Right-click on the white space below this tab and choose “Add System…”.

Page 9: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

9

3) Enter the server Host Name: HANA IP address Enter the Instance Number: 00 Optionally, you can enter a meaningful description of your choice. Click the “Next” button.

4) Specify the user ID and password. User Name: SYSTEM Password: Master Password specified when creating the instance in SAP CAL

Select the “Store user name and password in secure storage” option. Click the “Finish” button.

Page 10: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

10

5) You should now have a new connection with your specific user ID for the HANA system. Please make sure to use this connection for the rest of the course exercises. Note: The System ID and users shown in these screenshots might be different than the ones you are working with and may be blurred to avoid any confusion.

Page 11: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

11

Create a Repository Workspace

Steps Screenshot

1) Switch to the “Repositories” view. Find your system entry and the default workspace. Right-click and choose “Create Repository Workspace”.

Note: In this and the following screenshot, your system naming conventions and workspace paths may be different.

2) Confirm the file system location on your local machine which will hold the local copy of this workspace. Click the “Finish” button.

Page 12: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

12

3) You should now see the local workspace mapped to the remote workspace in the “Repositories” view.

Note: Your System ID, User ID, Host Name, and System Numbers will be different than those displayed in subsequent screenshots. For this reason we may have blurred this information in screenshots to avoid confusion.

Page 13: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

13

Create a HANA Project

Steps Screenshot

1) Under the “Repositories” tab, navigate to “(Default) / student00”. Right-click “student00” and choose “Check Out and Import Projects …” from the pop-up menu.

2) Click the “Next” button. Note: It may take several minutes to download the package contents to your local machine.

Page 14: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

14

3) Click the “Finish” button to complete the process.

4) Switch to the “Project Explorer” view. You should see the “student00” project that you created in the previous step.

Page 15: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

15

Important HANA Studio customizations

Steps Screenshot

1) Open “Preferences” from the “Window” menu.

2) In the “Preferences” dialog box, under “SAP HANA / Runtime / Result”, select the “Enable zoom of LOB columns” option.

Click the “Apply” button.

Page 16: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

16

3) Continuing in the “Preferences” dialog box, navigate to “General / Appearance / Content Types”.

Under the “Content Types” tree, navigate to “Text / XML”. Click the “Add…” button and enter: *.hdbtextconfig *.hdbtextdict You will see these two new content types under “File associations”.

4) Continuing in the “Preferences” dialog box, navigate to “General / Editors / File Associations”.

Under “File types”, highlight “*.sql”. Under “Associated editors”, highlight “SAP HANA SQL Console”, then click the “Default” button. Note: This particular setting may apply to you if you are using the SAP Dev Tools for Eclipse in CAL. Click the “OK” button to close the “Preferences” dialog box.

Page 17: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

17

(Mandatory) Set permissions for course exercises

Steps Screenshot

1) Under the “Repositories” view, navigate to “(Default) / student00 / data”. Double-click on “prepare.sql”.

2) If there is “No connection to database” displayed in the SQL console, click on the “Choose Connection” icon, which is found right of the green circle with an arrow (Execute) icon.

Page 18: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

18

3) In the “Choose Connection” dialog, select the appropriate database.

Click the “OK” button.

4) In the SQL console, you will notice the following SQL syntax from the “prepare.sql” file:

call "_SYS_REPO"."GRANT_ACTIVATED_ROLE"('student00.security::TA_DEVELOPER','SYSTEM'); Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

5) Select the “Don’t show this message again during this editor session” option and click the “Yes” button.

Note: Disregard this warning message throughout the course exercises.

Page 19: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

19

6) Afterwards you should see a similar confirmation in the SQL console:

Statement 'call "_SYS_REPO"."GRANT_ACTIVATED_ROLE"('student00.security::TA_DEVELOPER','SYSTEM')' successfully executed in 17 ms 256 µs (server processing time: 15 ms 968 µs) - Rows Affected: 0

Please proceed to the exercises.

Page 20: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

20

EXERCISE 1 – CREATE FULLTEXT INDEX WITHOUT SPECIFYING

LANGUAGES

Objective In this exercise, you will execute a SQL statement to create a fulltext index for your copy of the workshop data. Intentionally, we will omit the LANGUAGE DETECTION parameter. Exercise Description

Create fulltext index without specifying possible languages

Search for restaurant terms in the conte

Observe that English is assumed for all inputs

Page 21: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

21

EXERCISE 1 – SOLUTION

In SAP HANA Studio

Steps Screenshot

1) Under the “Repositories” tab, navigate to “(Default) / student00 / solutions / week-1”. Double-click on “exercises.sql”.

2) If there is “No connection to database” displayed in the SQL console, click on the “Choose Connection” icon, which is found right of the green circle with an arrow (Execute) icon.

Page 22: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

22

3) In the “Choose Connection” dialog, select the appropriate database.

Click the “OK” button.

4) In the SQL console, highlight the following SQL syntax:

SET SCHEMA OPENSAP_TA_WORKSHOP; Click on the “Execute” (green circle with an arrow) icon or hit the F8 key. Note: If you close this session at any point while working on Week 1 exercises, you will need to re-execute this command at the start.

5) In the SQL console, highlight the following SQL syntax:

CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON "student00.data::PRODUCT_REVIEWS"(CONTENT) FAST PREPROCESS OFF;

Page 23: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

23

Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

6) In the SQL console, highlight the following SQL syntax:

SELECT FILE_NAME, LANGUAGE(CONTENT), SNIPPETS(CONTENT) FROM "student00.data::PRODUCT_REVIEWS" WHERE CONTAINS(CONTENT, 'meal OR menu OR entree OR appetizer OR food', LINGUISTIC); Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

7) As you can see, these are the results of the search for some restaurant terms in the fulltext content. Notice in the query results that English is assumed for all inputs.

en = English

Page 24: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

24

EXERCISE 2 – RECREATE FULLTEXT INDEX SPECIFYING LANGUAGES

Objective In this exercise, you will execute a SQL statement to create a fulltext index for your copy of the workshop data. We will specify the LANGUAGE DETECTION parameter. Exercise Description

Drop fulltext index from previous exercise

Create fulltext index specifying possible languages

Search for restaurant terms in the content

Language will be automatically determined for all inputs

Page 25: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

25

EXERCISE 2 – SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console, highlight the following SQL syntax:

DROP FULLTEXT INDEX PRODUCT_REVIEWS_IDX; Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

2) In the SQL console, highlight the following SQL syntax:

CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON "student00.data::PRODUCT_REVIEWS"(CONTENT) FAST PREPROCESS OFF LANGUAGE DETECTION ('EN', 'DE', 'FR'); Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

3) In the SQL console, highlight the following SQL syntax:

SELECT FILE_NAME, LANGUAGE(CONTENT), SNIPPETS(CONTENT) FROM "student00.data::PRODUCT_REVIEWS" WHERE CONTAINS(CONTENT, 'meal OR menu OR entree OR appetizer OR food', LINGUISTIC); Repeating the same search in EXERCISE 1, click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

4) As you can see, these are the results of the search for some restaurant terms in the fulltext content. Notice

Page 26: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

26

in the query results that the correct language is detected for all inputs.

en = English de = German fr = French

Page 27: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

27

EXERCISE 3 – RECREATE FULLTEXT INDEX SPECIFYING LANGUAGES

WITH TEXT ANALYSIS ON

Objective In this exercise, you will execute a SQL statement to create a fulltext index for your copy of the workshop data. We will specify the LANGUAGE DETECTION and TEXT ANALYSIS ON parameters. Exercise Description

Drop fulltext index from previous exercise

Create fulltext index specifying languages and text analysis output

Notice new table with text analysis annotations for each input

Page 28: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

28

EXERCISE 3 – SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console, highlight the following SQL syntax:

DROP FULLTEXT INDEX PRODUCT_REVIEWS_IDX; Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

2) In the SQL console, highlight the following SQL syntax:

CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON "student00.data::PRODUCT_REVIEWS"(CONTENT) FAST PREPROCESS OFF LANGUAGE DETECTION ('EN', 'DE', 'FR') TEXT ANALYSIS ON; Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

3) In the SQL console, highlight the following SQL syntax:

SELECT * FROM "$TA_PRODUCT_REVIEWS_IDX" WHERE TA_LANGUAGE = 'en' ORDER BY ID, TA_COUNTER; Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

4) In the query results, notice the text analysis annotations for each input, in order of their appearance in the original text.

Page 29: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

29

EXERCISE 4 – MONITOR TEXT ANALYSIS PROCESSING

Objective In this exercise, you will learn how to monitor the progress and status of your text data processing. Exercise Description

Run a command to monitor the progress for text processing

Run a command to report any errors for text processing

Page 30: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

30

EXERCISE 4 – SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console, highlight the following SQL syntax:

SELECT * FROM SYS.M_FULLTEXT_QUEUES WHERE SCHEMA_NAME = 'OPENSAP_TA_WORKSHOP' AND TABLE_NAME = 'student00.data::PRODUCT_REVIEWS'; Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

2) Notice the queue status is shown at the table level.

3) In the SQL console, highlight the following SQL syntax:

SELECT ID, INDEXING_STATUS(CONTENT), INDEXING_ERROR_CODE(CONTENT), INDEXING_ERROR_MESSAGE(CONTENT) FROM "student00.data::PRODUCT_REVIEWS" ORDER BY ID; Click on the “Execute” (green circle with an arrow) icon or hit the F8 key.

4) Notice this shows the errors for each “job” (document a.k.a. row of the input table).

Note: This command will be useful later when we start modifying the configuration settings for text analysis.

Page 31: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

31

Page 32: TEXT ANALYTICS WITH SAP HANA PLATFORM WEEK 1

www.sap.com

© 2015 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.