filetable and semantic search in sql server 2012
DESCRIPTION
SQL Saturday 109 Presentation on FileTable and Semantic Search in SQL Server 2012TRANSCRIPT
![Page 1: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/1.jpg)
© 2012 Microsoft
FILETABLE AND SEMANTIC SEARCH IN SQL SERVER 2012
Michael RysPrincipal Program ManagerMicrosoft Corp@SQLServerMike
![Page 2: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/2.jpg)
MY FAVORITE BEYOND RELATIONAL APPLICATION
Structured and unstructured Search
Related/”Semantic” Search
![Page 3: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/3.jpg)
BEYOND RELATIONAL DATA
Building and Maintaining Applications with relational and non-relational data is hard
Complex integrationDuplicated functionalityCompensation for unavailable services
Pain Points
Goals
Reduce the cost of managing all dataSimplify the development of applications over all dataProvide management and programming services for all data
![Page 4: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/4.jpg)
RICH UNSTRUCTURED DATA IN SQL SERVER 2012
• 80% of all data is not stored in databases! Most of it is “unstructured”
• Make SQL Server the preferred choice for managing Unstructured Data and allow building Rich Application Experience on top
• Address important customer requests for Capabilities and rich services for Rich Unstructured Data (RUDS)
o Scale Up for storage and search to 100mio to 500mio documentso Easy use/access to Unstructured data from all applicationso Rich insight into unstructured data to make better decisions
![Page 5: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/5.jpg)
DEMO
Teaser: MySemanticSearchhttp://mysemanticsearch.codeplex.com
![Page 6: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/6.jpg)
RICH UNSTRUCTURED DATA & SERVICES ECOSYSTEM
Fulltext Search
Semantic Similarity Search
Rich
S
erv
ices
Database
Disk1
Disk2
Disk3
Multiple Containers
Sca
le-u
p
Solu
tions
Database Applications
Transactional Access
Blobs
DB FileStre
DB FileStreams
Integrated Backup/Replication/AlwaysOn
Integrated AdministrationIntegrated Administration?
Windows Apps
SMB Share Files/Folders
FileStream API
Streaming Win32 AccessStreaming Win32 Access??
Customer Application
Azure lib Centera lib
SQL FILESTREAM lib
SQL RBS API
Azure Centera SQL DB
Remote BLOB Storage
FileStreamsFileTable
SQL Apps
![Page 7: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/7.jpg)
DEMO
Integrated Management of documents in SQL Server 2012
![Page 8: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/8.jpg)
FILETABLE OVERVIEW
FileTable: A Table of Files/Directories
User created Table with a fixed schema
contains FILESTREAM and File Attributes
Each row represents a File or a Directory
System defined constraints maintain the tree integrity
File/Directory hierarchy view through a Windows Share
Supports Win32 APIs for File/Directory Management
DB Storage is Transparent to Win32 applications
SMB level of application compatibility
Virtual network name (VNN) path support for transparent Win32 application failover
Private Docs(Database1)
Office Docs(Database2)
LogFiles (FileTable)
Documents(FileTable)
Media(FileTable)
MSSQLSERVER
\\my_machine\MSSQLSERVER\Office Docs\Documents
FILESTREAM Share
Database Directories
FileTable Directories
FileTable Folder Hierarchy
User-Defined Directory Structure
![Page 9: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/9.jpg)
CREATING A FILETABLE
Pre-requisitesEnable FILESTREAM
Create FILESTREAM Share and Filegroup
Enable non-transactional access at the DB levelALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL, Directory_name = N’Contoso’)
Create FileTableCREATE TABLE Contoso..Documents AS FILETABLE
WITH (filetable_directory = N'Document Library')
Access at \\<machine name>\<FILESTREAM share>\Contoso\Document Library\
![Page 10: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/10.jpg)
MODIFYING A FILETABLE
FileTable has a fixed schemaColumns, system defined constraints cannot be altered/dropped
Allows user defined indexes/constraints/triggers
Disabling/Enabling FileTable NamespaceALTER TABLE Documents DISABLE FILETABLE_NAMESPACE
Disables all system-defined constraints and Win32 access to FileTable
Useful for bulk-loading/re-organization of data
FileTable can be dropped similar to any other tableCatalog views can be used for obtaining metadata
![Page 11: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/11.jpg)
DATA ACCESS – FILE SYSTEM ACCESS
FileTable hierarchy is visible through Filestream share
\\machine\<FILESTREAMshare>\<Database_directory>\<FileTable_Directory>\...
Provides transparent Win32 API & File/Directory Management capabilities
e.g. MS word can create/open/save files; xcopy for copying directory trees into database..
Win32 API operations are non-transactionalOperations cannot be part of any user transactions
Win32 operations are intercepted by SQL Server at the File system level
e.g. File/Directory creation/deletion => insert/delete into FileTable
Full locking/concurrency semantics with other accesses
Allows in-place update of file stream data/File attributes
Transactional FILESTREAM APIs can also be used.
![Page 12: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/12.jpg)
DATA ACCESS – T-SQL ACCESS
Normal Insert/Update/Delete allowed for the FileTable manipulationFileTable Namespace integrity constraints enforced
Set based operations on the File-attributes – value add
Built-in functionsGetFileNamespacePath() – UNC path for a file/directory
FileTableRootPath() – UNC path to the FileTable root
GetPathlocator() – path_locator value for a file/directory
DDL/DML Triggers are supportedDML triggers on a FileTable cannot update any FileTables
![Page 13: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/13.jpg)
MANAGING FILETABLE
DB Backup/Restore operations include FileTable data
Point in time Restore’ may contain more recent FILESTREAM data due to non-transactional updates during backup
FileTables are secured similar to any other user tables
Same security is enforced for Win32 access also
Data LoadingWindows tools like xcopy/robocopy OR drag-drop operations through Windows Explorer can be used
BCP operations are supported for direct T-SQL data inserts
SSMS supports FileTable creation/exploration
![Page 14: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/14.jpg)
MANAGING FILETABLE – HIGH AVAILABILITY
SQL Server 2012 AlwaysOn is fully supported
Transparent data failoverFileTables can be configured with multiple secondary nodes
Both sync and async data replication is supported
File and metadata is available in the secondary in case of failover
Transparent application failoverVirtual network name (VNN) path support for transparent Win32 application failover
Applications use \\VNN\Share\db\... Path
Applications are automatically redirected to the secondary in case of failover
RestrictionsFileTables cannot participate in “Read-only” replicas.
![Page 15: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/15.jpg)
FILETABLE RESTRICTIONS
FileTables cannot be partitionedMerge/Transactional replications are not supportedRCSI/SnapShot isolation mode
Applications cannot modify file stream data in FileTables
Win32 Application compatibilityMemory mapped files, Directory notifications, links are not supported
![Page 16: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/16.jpg)
UNSTRUCTURED DATA SCALE-UPMULTIPLE CONTAINERS FOR FILESTREAM DATA
SQL 2008 R2Only one storage container/FILESTREAM filegroup
Limits storage capacity scaling and I/O scaling
SQL Server 2012Support for multiple storage containers/filegroup.
DDL Changes to Create/Alter Database statements
Ability to set max_size for the containers
DBCC Shrinkfile Emptyfile support
Scaling FlexibilityStorage scaling by adding additional storage drives
I/O scaling with multiple spindles
![Page 17: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/17.jpg)
UNSTRUCTURED DATA : MULTIPLE CONTAINERS
Use of multiple spindles for achieving better I/O Scalability
![Page 18: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/18.jpg)
RUDS SCALE-UP: FILESTREAM PERF/SCALEImproved performance of T-SQL and File I/O access
Various enhancements to improve read/write throughput 5 fold increase in Read throughput
Linear scaling with large number of concurrent threads
2012 2012
![Page 19: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/19.jpg)
SUMMARY: FILETABLE
Application Compatibility for Windows Applications
Windows applications run on top of files stored in FileTables with no modifications
Relational Value PropositionProvide Integrated Administration and Services
Backup, Log Shipping, HA-DR, Full text and Semantic search, …
T-SQL orthogonalityFile/Folder attributes surfaced through relational columns
Power of set based operations, Policy Management, Reporting etc
FileNamespace Hierarchy management
![Page 20: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/20.jpg)
FULL TEXT SEARCH IMPROVEMENTS IN SQL SERVER 2012Improved Performance and Scale:
Scale-up to 350M documents
iFTS query perf 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times < 3 sec for corpus
At par or better than main database search competitors
New Functionality:Property Search
customizable NEAR
New Wordbrakers: update existing WB, add Czech and Greek
Innovation in Search: Semantic Similarity Search
![Page 21: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/21.jpg)
FULLTEXT SEARCH PERFORMANCE & SCALE IMPROVEMENTS
Architectural ImprovementsImproved internal implementation
Queries no longer block Index updates
Improved Query Plans: Better Plans for common queries
Fulltext predicate folding
Parallel Plan execution
Index and Query tested on scale up to 350Million documents with <~2 Sec Response
~3X better w/o DML and ~9X better with DML throughput
Scale easily with increasing number of connections
![Page 22: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/22.jpg)
SCALE-UP: FULL-TEXT SEARCH
Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
2012
2005/8
2005/8 vs 2012
![Page 23: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/23.jpg)
SCALE-UP: FULL-TEXT SEARCH
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark
2012
2005/8
2005/8 vs 2012
![Page 24: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/24.jpg)
FULLTEXT PROPERTY SCOPED SEARCH
• Setup once per database instance to load the office filtersexec sp_fulltext_service 'load_os_resources',1goexec sp_fulltext_service 'restart_all_fdhosts'go
• Create a property listCREATE SEARCH PROPERTY LIST p1;
• Add properties to be extractedALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH
(PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9', PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');
• Create/Alter Fulltext index to specify property list to be extractedALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];
• Query for propertiesSELECT * FROM fttable WHERE CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');
New Search Filter for Document PropertiesCONTAINS (PROPERTY ( { column_name }, 'property_name' ),
‘contains_search_condition’ )
![Page 25: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/25.jpg)
FULL-TEXT CUSTOMIZABLE NEAROLD NEAR SYNTAXselect * from fttable where contains(*, 'test near Space')
NEW NEAR USAGES
• SPECIFY DISTANCEselect * from fttable where contains(*, 'near((test, Space), 5,false)')
• REDUCE DISTANCEselect * from fttable where contains(*, 'near((test, Space), 2,false)')
• ORDER OF WORDS IS SPECIFIED AS IMPORTANTselect * from fttable where contains(*, 'near((test, Space), 5,true)')
![Page 26: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/26.jpg)
STATISTICAL SEMANTIC SEARCHSemantic Insight into textual content
Uses language models to find most important keywords in documentNo need to build brittle ontologies!
Statistically Prominent KeywordsAutogenerated tag clouds
Potentially Related Content based on extracted Keywords, such asSimilar Products (based on description)
Similar Jobs or Applicants
Similar Support Incidents (based on call logs)
Potential Solutions (based on similar incidents)
First class usage experienceEfficent linear algorithms
Integrated with FTS and SQLNew Rowset functions for all results using SQL query
![Page 27: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/27.jpg)
DEMO
Semantic Extraction and RelationshipsFullText Search in SQL Server 2012
![Page 28: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/28.jpg)
SEMANTIC SIMILARITY• Input: Text such as varchar, Office, PDF, HTML, email…
Output: Rowset functions with standard SQL queries
Illustrating example:
Key Title Document
D1 Annual Budget …
D2 Corporate Earnings …
D3 Marketing Reports …
… … …
------------------------------------------------------------
----------------------------------------------------------------------
----------
------------------------------------------------------------
----------
Source Table
ID Keyword Colid … compDocid CompOc CompPid
K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3
K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4
… … … … … …
Keyword Index (Full-Text)
Keyphrases KeyphraseDocumentsID DocID
T1 (revenue) D1 (Annual Budget)
T2 (growth) D2 (Corporate Earnings)
T3 (Windows) D3 (Marketing Reports)
… …
T1 (revenue) D7 (Finance Report)
… …
T3 (Windows) D11 (Azure Strategy)
T4 (Azure) D11 (Azure Strategy)
ID Keyword
T1 revenue
T2 growth
T3 Windows
T4 Azure
… …
DocumentSimilarityDocID MatchedDocID
D1 (Annual Budget) D2 (Corporate Earnings)
D1 (Annual Budget) D7 (Finance Report)
D3 (Marketing Reports) D11 (Azure Strategy)
… …
Full-Text and Semantic Processing
quarter, record, revenue…
2b
3
2 a1
+ Language Models 3
![Page 29: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/29.jpg)
SEMANTIC EXTRACTION: END-2-END EXPERIENCE• Downloadable Language Statistical Database with registration
stored procedure• Setup along with Full-Text• Metadata / Catalog views• System level DMVs for progress state and usage• Manageability through SSMS and SMO
![Page 30: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/30.jpg)
KEY TAKEAWAYS
SQL Server’s unstructured data support is key strategy to enable you to build complex data applications that go beyond relational data!
Content and Collaboration, eDiscovery, Healthcare, Document management etc.
![Page 31: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/31.jpg)
RELATED CONTENT
SQL Server 2012 Whitepapers and information:http://www.sqlserverlaunch.com
Channel 9 DataBound Episode 2: http://channel9.msdn.com
MySemanticsSearch Demo: http://mysemanticsearch.codeplex.com
More demo data sets and demo scripts: http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-statistical-semantic-search-in-sql-server-codename-denali-release.aspx
Microsoft Virtual Academy Recording: Coming Soon!Find Me Later At…
• On Twitter: @SQLServerMike• Blog: http://sqlblog.com/blogs/michael_rys• Email: [email protected]
![Page 32: FileTable and Semantic Search in SQL Server 2012](https://reader033.vdocuments.mx/reader033/viewer/2022061221/54be76de4a79590d0a8b45b2/html5/thumbnails/32.jpg)