full-text support in a database semantic file system kristen lefevre & kevin roundy computer...

21
Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Upload: gary-day

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Full-Text Support in a Database Semantic File System

Kristen LeFevre & Kevin Roundy

Computer Sciences 736

Page 2: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Leveraging DBs in File Systems

What do databases have to offer?

• Transactions

• Concurrency control

• Crash recovery

• Query power (metadata)

• Extensibility – add new objects/modules• Efficient Search!

Page 3: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Re-thinking Directories

• Current state of directories:• User remembers what, not whereOur System:• Search tools for grouping related files• Semantically meaningful directories

[Semantic FS]• Files are stored in tables• Directories are just for looks

LAME!

Page 4: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Related Work

• Semantic Filesystems• Use a DB [Inversion Filesystem]• NFS Meets Databases [Halverson]

• NFS for portability, transparency, existing code support, familiar semantics

• Server-side caching for performance

Bringing ideas together:• Use [Halverson]’s infrastructure to

implement semantic filesystem ideas

Page 5: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Roadmap

• Overview of System Design and Implementation

• Virtual Directories and Full-Text Queries

• Live Demonstration

• Conclusions & Future Work

Page 6: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

System Architecture

Standard NFS Clients:

Client Client

NFS Server:NFS Front End

Custom Backend

...

Object-Relational Database: Storage

M TS2

Storage

TS2M M M

Page 7: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Postgres Capabilities

An object-relational DB such as Postgres lets you define and add modules.

Case in point: Tsearch2

New type: tsvector

Related function: to_tsvector to_tsvector(‘a b a c'); ‘a':1,3 ‘b':2 ‘c':4

Related index: idxFTI

Set triggers to do updates

Page 8: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Mapping FS data to DB Schema

Filesystem Data Database Tables

Metadata fileatt

Directory Structure naming

Non-indexed File Content

allfiles

Indexed File Content

allfiles_txt

Page 9: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

[Halverson] Schema

inode uid gid mode nlinks size ctime mtime atime

fileatt

inode name parent

naming inode chunk_id data

allfiles

1 1

NN

1

N

Page 10: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Database Schema

inode uid gid mode nlinks size ctime mtime atime istext

fileatt

inode name parent

naming inode chunk_id data

allfiles

1 1

NN

1

N

strstr(a,”.txt”)

Page 11: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Database Schema

inode uid gid mode nlinks size ctime mtime atime istext

fileatt

inode name parent

naming inode chunk_id data

allfiles

1 1

NN

inode fulltext tsvector

allfiles_txt

1

1

1

N

tsearch2 index

strstr(a,”.txt”)

Page 12: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Roadmap

• Overview of System Design and Implementation

• Virtual Directories and Full-Text Queries

• Live Demonstration

• Conclusions & Future Work

Page 13: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Virtual Directories and Text Search

• Want to handle 2 types of text queries• Boolean keyword queries

• e.g. (‘Kristen’ | ‘Kevin’ | ‘Remzi’) & ‘file’ & ‘system’

• IR rank queries• e.g. Rank files with respect to (‘computer’ & ‘architecture’)

• More powerful than grep!

• Virtual directories proposed for Semantic File systems• Incorporate full-text queries without “breaking” NFS

interface for existing applications

Page 14: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

DBMS Full-Text Support

• Keyword Search• Text indices support search over keywords• Words extracted from document, stemmed,

“stopwords” removed

• Rank• Used existing rank() function as a black-box• rank() counts number of times each word appears in

document, and whether search terms are near one another

• Optionally, normalize by document length• Other notions of IR rank could easily be substituted

Page 15: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Semantics of Virtual Directories

• Encountered some tradeoffs• What we did:

• Static virtual directories (search once on mkdir)• Directory contents as a snapshot at one point in time• Hard links

/CS736/CS736

projectproject paperspapers reading questions

reading questions

%nfs%%nfs%

writeup

writeup

NFSNFStalk outline

talk outline

NFS vs AFS

NFS vs AFS

Thread ideas

Thread ideas

Page 16: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Semantics of Virtual Directories

• Encountered some tradeoffs• Alternatives (all also valid):

• Static virtual directory creation with symbolic links• leads to dangling (broken) links

• Process query lazily on readdir command• Semantics used in Semantic File System paper

• Dynamically update contents of virtual directories on file creation, deletion, or write

• Can be implemented using database triggers• More expensive, heavier back-end load

Page 17: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Roadmap

• Overview of System Design and Implementation

• Virtual Directories and Full-Text Queries

• Live Demonstration

• Conclusions & Future Work

Page 18: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Roadmap

• Overview of System Design and Implementation

• Virtual Directories and Full-Text Queries

• Live Demonstration

• Conclusions & Future Work

Page 19: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Conclusions

• Benefits of our proxy architecture:• Standard NFS clients• Postgres as black box• Simple to expose functionality of DB• Use & add DB objects at will

Page 20: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Future Work• Performance evaluation to understand the

overhead of new functionality• Dynamic index maintenance (file creation &

modification)• Virtual directory creation and text querying

• Block-level text writes and caching• Query support for other file types

• Mechanisms for extracting and indexing meta-data from additional file types (e.g., image files)

• Performance Monitoring, Adaptive Indexing and storage format within the NFS Proxy

Page 21: Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736

Thanks!Questions?

Special Thanks:Remzi Arpaci-Dusseau

Alan HalversonDavid DeWitt