discovering web access patterns and trends by applying olap and data mining technology on web logs...
TRANSCRIPT
![Page 1: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/1.jpg)
Discovering Web Access Patterns and Trends by Applying OLAP
and Data Mining Technology on Web logs
Data Engineering Lab
성 유 진
![Page 2: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/2.jpg)
Abstract
Web server log files analysis • server performance improvement• system performance improvement• customer targeting in electronic commerce
problem and difficulty• large raw log data processing is not easy• data reduce
• size and time
![Page 3: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/3.jpg)
• current weglogminer • slow, inflexible, difficult to maintain
• only frequency count not enough WebLogMiner
• Virtual University/data mining WeblogMiner• OLAP and data mining technique• multi-dimensional data cube• scalability, interactivity, variety, flexibility
![Page 4: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/4.jpg)
Design of a Web log Miner
Web log server log file information• domain name of the request / user name / date and time of
the request / the method of the request(GET, POST) / the name of the file requested / the result of the request(success, failure, error, etc) / size of the data sent back / the URL of the referring page / identification of the client agent
• Example210.114.3.64 - - [01/Jul/1998:17:34:05 0900]"GET/~yjsung/sign.htmlHTTP/1.1" 200 740
210.114.3.64 -- [01/Jul/1998:17:38:44-0900]"POST/cgi-bin/yjsung/signHTTP/1.1" 200 352
POST : 브라우저가 채워진 양식을 서버에 전달 할 때 GET : 서버로부터의 데이터 요청 시
![Page 5: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/5.jpg)
• Cache information • frequent backtracking and reload : deficient design
– client site log
• Access count• not always the measure of interestingness
– 특정 document 를 access 하기 위해 반드시 거쳐야하는 사이트
• Time and Date • evaluate user interest by time spent
• Domain name • Sequence of requests can predict next request
improve traffic
![Page 6: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/6.jpg)
.Filtering the data, creating relational DB
2. Data cube construction
3. OLAP is used
4. Data mining technique are used
WebLogMiner 4 Stages
![Page 7: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/7.jpg)
1.DATABASE CONSTRUCTION FROM SERVER LOG FILES
Data Cleansing and Transformation• filter out page graphics(sound and video) but 보존• two types
• without knowledge about site– (time day, month, year 등으로의 transformation 은 서버 정보
없이 가능 )
• with knowledge about site : – associating server request to intended action needs site structure
• relation database• cleaned data and new implicit data is added
![Page 8: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/8.jpg)
2.MULTI-DIMENSIONAL WEB LOG DATA
CUBE CONSTRUCTION AND MANIPULATION Data Cube
• group by operator in SQL is used to compute aggregates on a set of attributes
sum of sales by P, C: for each product, give a breakdown on how much of it was sold to each customer
• CUBE is the n-dimensional generalization of group-by• gives remarkable flexibility to manipulate and view the
data• allow OLAP operation such as drill-down, roll-up,
slice and dice
![Page 9: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/9.jpg)
•Attributes - URL - domain name
- size of resource,
- time
. . .
•Attributes - URL - domain name
- size of resource,
- time
. . .
![Page 10: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/10.jpg)
3.DATA MINING ON WEB LOG DATA CUBE
AND WEB LOG DATABASE Data Characterization
• find rule that summarize user defined data set☞ the traffic on a web server for a given type of media
in a particular time of day Class comparison
• discover discriminant rules ☞ compare requests from two different web browsers
Association • discover the patterns that access to different
resources consistently occurring together Prediction
☞ access to a new resource on a given day can be prediected based on accesses to similar old resources on similar days
![Page 11: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/11.jpg)
Classification • can be used to develop a better understanding of
each class in the web log database, and perhaps restructure a web sit or customize answers to requests based on classes of requests
Time-series analysis - • to analyze data along time sequences to discover
time-related interesting patterns …☞ disclose the patterns and trends of the
improvement of services of the web server
Focus will be on time-series analysis because web log records are highly time-related
![Page 12: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/12.jpg)
Experiments with the web log miner Virtual-U:six different major component: Goal - understand the usage and user
behavior patterns
Data Cleaning and transformations• all entries were mapped one on one into
relational database• field site, user action are added.• Problem
– extraneous information => define those entries and eliminate them
– multiple server requests by same user action– same server request by multiple user actions– local activities are not recorded
![Page 13: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/13.jpg)
![Page 14: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/14.jpg)
Multi-dimensional data cube construction manipulation• summarization(group-bys on different
dimensions)• request/domain /event/session/bandwidth/error/referring organization /browser summary
ExamplesFigure2) OLAP analysis of Web log
![Page 15: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/15.jpg)
Fig3) Typical event sequence and user behavior pattern analysis
Fig4) Web traffic analysis of Web log
![Page 16: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/16.jpg)
![Page 17: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/17.jpg)
•Fig6) Event trees of month one to four
![Page 18: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진](https://reader035.vdocuments.mx/reader035/viewer/2022070406/56649ddf5503460f94ad8bb8/html5/thumbnails/18.jpg)
Discussion and Conclusion
WebLogMiner• OLAP and data mining technique• multi-dimensional data cube• major strength
• scalability, interactivity, variety, flexibility
Current log file 의 문제점• web server should collect more information• new structure is needed ==> would
simplify pre-processing