object oriented database design - a case study cliff frazier cs457/657 december 6, 2002
TRANSCRIPT
Object Oriented Database Design - A Case Study
Cliff FrazierCS457/657
December 6, 2002
Motivations
• Permanent access to internet-published Linux kernel programming information
• Explore object oriented database design
• Learn Enhanced Entity Relationship (EER) modeling
Internet Sources of Programming Information
DB Design Approach
• Data requirements• Functional requirements• Develop data model that
represents our “miniworld” - EER modeling
• Convert data model to physical model
Data Requirements
• Organize information on Linux kernel development
• Classify data by subject• Query / View / Report capability• Display mailing list and newsgroup
data as threads• Provide annotation capability
LKPDB Functional Diagram
Mailing List Newsgroup FAQs TutorialHow To
Data Parse/Import
LKPDB
Classify
View
Report
Annotate
Query
ManualFunction
AutomatedFunction
...
Data Parse/Import
• Use source specific data parsing rules where possible– Mailing lists & newsgroups– Specific rule set for each mailing list & each
newsgroup– Automated data import
• Use generic data parsing rules otherwise– One rule set for each data source type– Manual assistance required for import
Received: from vmg.prodigy.net by vmg with SMTP; Thu, 5 Dec 2002 12:17:39 -0500X-Originating-IP: [209.116.70.75]. . . Date: Thu, 5 Dec 2002 09:03:03 -0800 (PST)From: Linus Torvalds <[email protected]>To: george anzinger <[email protected]>cc: Jim Houston <[email protected]>, Stephen Rothwell <[email protected]>, LKML <[email protected]>, <[email protected]>, "David S. Miller" <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]>Subject: Re: [PATCH] compatibility syscall layer (lets try again)In-Reply-To: <[email protected]>Message-ID: <[email protected]>MIME-Version: 1.0Content-Type: TEXT/PLAIN; charset=US-ASCIISender: [email protected]: bulkX-Mailing-List: [email protected]
Mailing List Parsing - Header
On Thu, 5 Dec 2002, george anzinger wrote:>> I think this covers all the bases. It builds boots and> runs. I haven't tested nano_sleep to see if it does the> right thing yet...
Well, it definitely doesn't, since at least this test is the wrong wayaround (as well as being against the coding style whitespace rules ;-p):
+ if ( ! current_thread_info()->restart_block.fun){+ return current_thread_info()->restart_block.fun(&parm);
Also, I would suggest against having a NULL pointer, and instead justinitializing it with a function that sets it to an error return (don't useENOSYS, since the system call _does_ exist, and ENOSYS is what old kernelswould return if you do it by hand by mistake. I'd suggest -EINTR, sincethat will "DoTheRightThing(tm)" if we somehow get confused).
Linus
Mailing List Parsing - Body
Mailing List Parsing - Postscript
-To unsubscribe from this list: send the line "unsubscribe linux-kernel" inthe body of a message to [email protected] majordomo info at http://vger.kernel.org/majordomo-info.htmlPlease read the FAQ at http://www.tux.org/lkml/
What kinds of things should be threaded or multitasked?
If you are a programmer and would like to take advantage of multithreading, the natural question is what parts of the program should/ should not be threaded. Here are a few rules of thumb (if you say "yes" to these, have fun!):
Are there groups of lengthy operations that don't necessarily depend on other processing (like painting a window, printing a document, responding to a mouse-click, calculating a spreadsheet column, signal handling, etc.)? Will there be few locks on data (the amount of shared data is identifiable and "small")? Are you prepared to worry about locking (mutually excluding data regions from other threads), deadlocks (a condition where two COEs have locked data that other is trying to get) and race conditions (a nasty, intractable problem where data is not locked properly and gets corrupted through threaded reads & writes)? Could the task be broken into various "responsibilities"? E.g. Could one thread handle the signals, another handle GUI stuff, etc.?
Parsing for Linux Threads FAQ
Classification
• Both automatic & manual modes• Each entry classified based on
keywords• Multiple categories allowed• Categories:
– Scheduler– Virtual memory management– File system
Classification Categories (cont)
– Interprocess communication– Modules– Networking– Architecture related– Symmetric multiprocessing– Device drivers– Compiling– Debugging
Query Operations
• SQL based• Queries used for Views, Reports,
Annotations, and Classification• Primary use to perform SELECTs to
search for and view or print certain data
• Also include keyword search capability
Annotation Example from the Kernel HowTo
. . .7. Now, give the make command -
The gcc compiler distributed with RedHat 7.0 will not compile the kernel correctly. They do supply a kernel compatible compiler as well, which is invoked with kgcc. On RH 7.0 distributions of Linux, change all occurrences of gcc to kgcc in the root level Makefile before giving the make command.___________________________________________________________ bash# cd /usr/src/linux bash# man nohup bash# nohup make bzImage & bash# man tail bash# tail -f nohup.out (.... to monitor the progress) This will put the kernel in /usr/src/linux/arch/i386/boot/bzImage ___________________________________________________________
. . .
Data Model
• Enhanced Entity Relationship (EER) modeling
• Enhanced = object oriented concepts
• Initial design: list entity types and their attributes
• Refinement: some attributes converted to relationships
Mailing List Entity Attributes
• MAILING_LIST_POST– Name e.g. Linux-Kernel M.L. *– Serial number– Author– Subject– Date/time stamp– Header– Body of post *
• * Converted to relationships
EER Diagrams
• Rectangle - entity• Oval - attribute• Diamond - relationship• Structural constraints
– Participation– Cardinality ratio
• Added types besides mailing lists
INFO_SOURCE
Name URL
d
LIST_ENTRY
DOCUMENT
Type Date Pub.
Post_SN
Thread_SN
Parent
Child
DOC_TEXT
FAQ *
d
BOOK *UNSTR. *
Chap.
Quest.
Subs.
Ans.
Date_Stamp
Author
Subject LIST
INCLUDES
N
1
CONTAINS
1
1
* Same relationships to DOC_TEXT and A_TEXT as LIST_ENTRY
KeywordCategory
TEXT
A_TEXT
Size
Annotated
Offset
1
N
Conclusion
• An OODB for Linux kernel programming information was designed using EER
• Attributes vs. relationship roles change during design process
• The design methodology influences the content of the DB
• Next project - Implement DB