voip data iiit allahabad

27
VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA [email protected] Support provided by Fulbright Grant and IIIT Allahabad IIIT Allahabad 1

Upload: dobry

Post on 25-Feb-2016

64 views

Category:

Documents


4 download

DESCRIPTION

VoIP Data IIIT Allahabad . Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA [email protected] Support provided by Fulbright Grant and IIIT Allahabad. VoIP Data Outline. VoIP overview CDR CDR Example using EMM. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: VoIP Data IIIT Allahabad

VoIP DataIIIT Allahabad Margaret H. DunhamDepartment of Computer Science and EngineeringSouthern Methodist UniversityDallas, Texas 75275, [email protected]

Support provided by Fulbright Grant and IIIT Allahabad

IIIT

Alla

haba

d

1

Page 2: VoIP Data IIIT Allahabad

VoIP Data Outline

• VoIP overview• CDR• CDR Example using EMM

IIIT

Alla

haba

d

2

Page 3: VoIP Data IIIT Allahabad

VoIP Overview

http://www.voipmechanic.com/what-is-voip.htm

IIIT

Alla

haba

d

3

Page 4: VoIP Data IIIT Allahabad

VoIP Advantages• Travel• Cost reduction• Additional Features: Voice messages, call forwarding, logs,

caller ID, …• Integration of business tools• Common network infrastructure

IIIT

Alla

haba

d

4

Page 5: VoIP Data IIIT Allahabad

VoIP Disadvantages• Need reliable broadband internet connection• Voice quality

IIIT

Alla

haba

d

5

Page 6: VoIP Data IIIT Allahabad

Telephone-VoIP Steps• Analog Telephone Adapter (ATA) converts analog phone call to

digital signal.• Sent over internet as data packets.• Converted back to digital analog. III

T A

llaha

bad

6

Page 7: VoIP Data IIIT Allahabad

VoIP Codec• Software on server or ATA that converts voice signal into

digital data.• COmpressor – DECompressor• COder – DECoder• Sample (8000, 24000, 32000 times per second)• Sort • Compress• Packetize

IIIT

Alla

haba

d

7

Page 8: VoIP Data IIIT Allahabad

Protocols• SIP (Session Initiation Protocol)• Signaling to set up and tear down sessions.

• SDP (Session Description Protocol) • Describe call

• RTP (Realtime Transport Protocol) • Exchange data/voice packets• Media Transport to transmit packets

IIIT

Alla

haba

d

8

Page 9: VoIP Data IIIT Allahabad

SIP• Setup• Connect• Disconnect• Syntax similar to HTTP• Bind to IP address using SIP registration• URLs for address format: [email protected]• Independent of application or data types• Uses RTP and SDP

IIIT

Alla

haba

d

9

Page 10: VoIP Data IIIT Allahabad

SIP Overview

http://www.voipmechanic.com/sip-basics.htm

IIIT

Alla

haba

d

10

Page 11: VoIP Data IIIT Allahabad

VoIP Data Packet [4]

IIIT

Alla

haba

d

11

Page 12: VoIP Data IIIT Allahabad

VoIP Data• Any of this digital data could be saved and analyzed.• Typically only statistical/summary information about the calls

is saved• These Call Detail Records (CDR) are use for billing and analysis III

T A

llaha

bad

12

Page 13: VoIP Data IIIT Allahabad

Call Detail Record• Log of VoIP usage• May be by account• Typical attributes:• Source• Destination• Duration of call• Amount billed• Total usage time in billing period• Remaining time in billing period• Total charge in billing period

• The format of the CDR varies among VoIP providers or programs. Some programs allow CDRs to be configured by the user.

IIIT

Alla

haba

d

13

Page 14: VoIP Data IIIT Allahabad

CDR Generation [3]• Usually created through special Authentication, Authorization,

and Accounting (AAA) server. • May also be created by logging capabilities at gateway or

router using a syslog server software.• Normally simply csv format.• Normally uses UDP, so underlying data packets are not

sequenced and may be lost (Redundancy of servers can help.)• Timestamps between routers can be synchronized using a

Network Time Protocol (NTP). • CDR generated for both forward and return leg of call.• http://

www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml

IIIT

Alla

haba

d

14

Page 15: VoIP Data IIIT Allahabad

Example: CISCO CDR Data• VoIP traffic in their Richardson, Texas facility from Mon Sep 22

12:17:32 2003 to Mon Nov 17 11:29:11 2003. • Over 1.5 million call trials were logged• 272,646 connected calls• 66 attributes including source, destination, starting time,

duration, routing/switching, device, etc• Application: Anomaly Detection (Classification)• Goal: Find unusual call patterns based on type and time of

call• Technique: New data structure, New classification algorithm,

New visualization technique• Sample of raw csv data:http://lyle.smu.edu/~mhd/iiit/start.csv

IIIT

Alla

haba

d

15

Page 16: VoIP Data IIIT Allahabad

CISCO Preprocessing• Remove the attributes other than source, destination, starting

time, duration from the logs. • Count the connected calls and discard unconnected calls. • The total number of connected calls was 272,646.5 phone

classes: internal, local, national, international, unknown.• 25 link classes (source class + destination class)• Data is aggregated into 15 minute time intervals. • The total number of time points is 5422 and the total number

of attributes is 26.• Add two attributes, namely, type of day (workday or weekend)

and time of the day, to the processed data. This step gives a spatio-temporal cube in the model space.

• http://www.engr.smu.edu/~mhd/7331f08/CISCOEMM.xls

IIIT

Alla

haba

d

16

Page 17: VoIP Data IIIT Allahabad

CISCO Data Visualization

IIIT

Alla

haba

d

http://www.lyle.smu.edu/~mhd/7331f11/CiscoEMM.png

17

Page 18: VoIP Data IIIT Allahabad

IIIT

Alla

haba

d

Spatiotemporal Stream Data

Records may arrive at a rapid rateHigh volume (possibly infinite) of continuous dataConcept drifts: Data distribution changes on the flyData does not necessarily fit any distribution patternMultidimensionalTemporalSpatialData are collected in discrete time intervals,Data are in structured format, <a1, a2, …>Data hold an approximation of the Markov property.

18

Page 19: VoIP Data IIIT Allahabad

IIIT

Alla

haba

d

Spatiotemporal Environment• Events arriving in a stream• At any time, t, we can view the state of

the problem as represented by a vector of n numeric values:

Vt = <S1t, S2t, ..., Snt>

V1 V2 … VqS1 S11 S12 … S1qS2 S21 S22 … S2q… … … … …Sn Sn1 Sn2 … Snq

Time 19

Page 20: VoIP Data IIIT Allahabad

IIIT

Alla

haba

d

Data Stream Modeling• Single pass: Each record is examined at most once• Bounded storage: Limited Memory for storing synopsis• Real-time: Per record processing time must be low• Summarization (Synopsis )of data• Use data NOT SAMPLE• Temporal and Spatial• Dynamic• Continuous (infinite stream)• Learn• Forget• Sublinear growth rate - Clustering

20

20

Page 21: VoIP Data IIIT Allahabad

IIIT

Alla

haba

d

MMA first order Markov Chain is a finite or countably infinite

sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state

A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that:• S ={N1,N2, …, Nm}, and• A = {Lij | i 1, 2, …, m, j 1, 2, …, m} and Each arc,

Lij = <Ni,Nj> is labeled with a transition probability Pij = P(Nj | Ni).

21

Page 22: VoIP Data IIIT Allahabad

IIIT

Alla

haba

d

Extensible Markov Model (EMM)• Time Varying Discrete First Order Markov Model• Nodes are clusters of real world states.• Learning continues during application phase.• Learning:• Transition probabilities between nodes• Node labels (centroid/medoid of cluster)• Nodes are added and removed as data arrives

22

Page 23: VoIP Data IIIT Allahabad

IIIT

Alla

haba

d

EMM Creation

<18,10,3,3,1,0,0>

<17,10,2,3,1,0,0>

<16,9,2,3,1,0,0>

<14,8,2,3,1,0,0>

<14,8,2,3,0,0,0>

<18,10,3,3,1,1,0.>

1/3

N1

N2

2/3

N3

1/11/3

N1

N2

2/3

1/1

N3

1/1

1/2

1/3

N1

N2

2/3 1/2

1/2

N3

1/1

2/3

1/3

N1

N2

N1

2/21/1

N1

1

23

Page 24: VoIP Data IIIT Allahabad

IIIT

Alla

haba

d

EMMRare• EMMRare algorithm indicates if the current input

event is rare. Using a threshold occurrence percentage, the input event is determined to be rare if either of the following occurs:• The frequency of the node at time t+1 is below

this threshold • The updated transition probability of the MC

transition from node at time t to the node at t+1 is below the threshold

24

Page 25: VoIP Data IIIT Allahabad

Sublinear Growth Rate

IIIT

Alla

haba

d

25

Page 26: VoIP Data IIIT Allahabad

Rare Event in Cisco Data

IIIT

Alla

haba

d

26

Page 27: VoIP Data IIIT Allahabad

References1. VoIP Mechanic, “What is VoIP?, a tutorial.” http://www.voipmechanic.com/what-is-voip.htm .2. Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal

Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634.3. Cisco, “CDR Logging Configuration with Syslog Servers and Cisco IOS Gateways,” Document ID: 14068,

February 24, 2006, http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml .

4. Cisco, “Voice Over IP – Per Call Bandwidth Consumption,” Document ID: 7934, February 2, 2008, http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml .

5. “VoIPThink”, http://www.en.voipforo.com , Accessed February 1, 2012.6. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM

Conference, November 2004, pp 371-374.7. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,”

Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)

8. Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50.

9. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) (Extended version submitted to Journal of Computers.)

10. Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634.

IIIT

Alla

haba

d

27