ao assignment sanmeet dhokay

24
1. Makeup.csv: data related to cosmetics purchase of 9 different women Code used – ## set the working directory first ## by clicking on the console and choosing from dropdown menu setwd("E:/PGPMX/AO") ## since the files given to u are csv files, those can be read as makeup_data = read.csv("Makeup.csv",header=TRUE) ## this command simply prints out the name of columns: colnames(makeup_data) ## this command makes the data accessible to the R console for manipulations: attach(makeup_data) ## Extracting the complete tables for two levels of a categorical variable, 'Product': makeup_mascara= makeup_data[Product=="mascara",] makeup_foundation = makeup_data[Product=="foundation",] # a simple plot of the Dollar variable across the entire dataset: plot(Dollars) # sorted plot of the Dollar variable across the entire dataset: plot(sort(Dollars)) # same command with some additional features like labels, limits, etc... plot(sort(Dollars),type='b',xlab='Product',ylab='Total Orders', ylim = c(0,500)) # 6 number summary of price data (lowest value, 1QR, median, mean, 3QR, maximum value) summary(Dollars, digits=2)

Upload: sanmeet-dhokay

Post on 27-Jan-2017

65 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Ao assignment   sanmeet dhokay

1. Makeup.csv: data related to cosmetics purchase of 9 different women

Code used –

## set the working directory first ## by clicking on the console and choosing from dropdown menusetwd("E:/PGPMX/AO")

## since the files given to u are csv files, those can be read as makeup_data = read.csv("Makeup.csv",header=TRUE)

## this command simply prints out the name of columns:colnames(makeup_data)

## this command makes the data accessible to the R console for manipulations:attach(makeup_data)

## Extracting the complete tables for two levels of a categorical variable, 'Product': makeup_mascara= makeup_data[Product=="mascara",]makeup_foundation = makeup_data[Product=="foundation",]

# a simple plot of the Dollar variable across the entire dataset:plot(Dollars)

# sorted plot of the Dollar variable across the entire dataset:plot(sort(Dollars))

# same command with some additional features like labels, limits, etc...plot(sort(Dollars),type='b',xlab='Product',ylab='Total Orders', ylim = c(0,500))

# 6 number summary of price data (lowest value, 1QR, median, mean, 3QR, maximum value)summary(Dollars, digits=2)

## drawing the boxplot of the Price variable across entire dataset (range=0 command creates a boxplot with inner fences as extreme values):boxplot(Dollars, horizontal=TRUE,range=0,xlab = 'Dollars')## drawing the boxplot of the Price variable across entire dataset (creates a boxplot with inner fences as 1QR-1.5*IQR and 3QR+1.5*IQR):boxplot(Dollars, horizontal=TRUE,xlab = 'Dollars Spent')

# if we write range=0 it takes minimum and maximum values as inner fences

Page 2: Ao assignment   sanmeet dhokay

# on removing range=0, it draws the inner fences at 1QR-1.5IQR & 3QR+1.5IQR

## command for drawing histogram; breaks=4 creates 4 class breaks:hist(Dollars, xlab = 'Price paid', main = 'Make Up Data')hist(Dollars,breaks=4, xlab = 'Price paid', main = 'Make Up data')

## Steps to draw bar charts:User_counts = table(Name)User_relfreq = User_counts/sum(User_counts)

Product_counts = table(Product)Product_relfreq = Product_counts/sum(Product_counts)

barplot(User_counts, col=rainbow(6), main = 'User Bar Plot',ylim=c(0,300))pie(Product_counts, col=rainbow(6), main = 'Product Pie Chart')

## comparing two products:boxplot(makeup_mascara[,5],makeup_foundation[,5],range=0,border=rainbow(2),names=c('mascara','foundation'),main="Mascara Vs. Foundation Quantity Ordered: Boxplot")

> colnames(makeup_data)[1] "Trans.Number" "Name" "Date" "Product" "Units" "Dollars" "Location"

Data is read from the Makeup.csv file and column names are extracted

The Below Bar Graph shows us the comparison between different product types with respect to the orders

Page 3: Ao assignment   sanmeet dhokay

Below graph shows the linear relationship below the dollars spent and the quantity ordered

The below boxplot shows that the median dollar value spent per order was around $125 .

Page 4: Ao assignment   sanmeet dhokay

The below User Bar plot shows us the comparison between the users with regards to the number of Transactions performed by the individual users. We can come to the conclusion that almost all the users are equal spenders with Cici being marginally ahead and the topmost transaction maker.

The below Pie Chart shows that the Product Lipstick is having the least share in the spends.

Page 5: Ao assignment   sanmeet dhokay

The below BoxPlot shows that the product ‘Foundation’ is slightly higher than the product ‘Mascara’ when we compare the total orders.

2. Traveldata.csv : amount spent on traveling along with gender and age data

Code Used

Page 6: Ao assignment   sanmeet dhokay

## set the working directory first ## by clicking on the console and choosing from dropdown menusetwd("E:/PGPMX/AO")

## this is the command to read the data file as a text file##ipl_data = read.table("IPL2014_DATA.txt",header=TRUE)

## since the files given to u are csv files, those can be read as travel_data = read.csv("TravelData.csv",header=TRUE)

## this command simply prints out the name of columns:colnames(travel_data)

## this commands changes the names of the columns to our desired names:colnames(travel_data) = c("Amount","Age","Gender")

## this command makes the data accesible to the R console for manipulations:attach(travel_data)

## Extracting the complete tables for two levels of a categorical variable, 'Product': travel_male= travel_data[Gender=="M",]travel_female = travel_data[Gender=="F",]

# a simple plot of the Price variable across the entire dataset:

Page 7: Ao assignment   sanmeet dhokay

plot(Amount)

# sorted plot of the price variable across the entire dataset:plot(sort(Amount))

# same command with some additional features like labels, limits, etc...plot(sort(Amount),type='b',xlab='Product',ylab='Total Orders', ylim = c(300,1500))

# 6 number summary of price data (lowest value, 1QR, median, mean, 3QR, maximum value)summary(Amount, digits=2)

## drawing the boxplot of the Price variable across entire dataset (range=0 command creates a boxplot with inner fences as extreme values):boxplot(Amount, horizontal=TRUE,range=0,xlab = 'Amounts')## drawing the boxplot of the Price variable across entire dataset (creates a boxplot with inner fences as 1QR-1.5*IQR and 3QR+1.5*IQR):boxplot(Amount, horizontal=TRUE,xlab = 'Amounts')

# if we write range=0 it takes minimum and maximum values as inner fences# on removing range=0, it draws the inner fences at 1QR-1.5IQR & 3QR+1.5IQR

## comand for drawing histogram; breaks=4 creates 4 class breaks:hist(Amount, xlab = 'Price paid', main = 'Travel data')hist(Amount,breaks=4, xlab = 'Price paid', main = 'Travel data')

## Steps to draw bar charts:User_counts = table(Gender)User_relfreq = User_counts/sum(User_counts)

barplot(User_counts, col=rainbow(6), main = 'User Bar Plot',ylim=c(0,600))pie(User_counts, col=rainbow(6), main = 'User Pie Chart')

## comparing two products:boxplot(travel_male[,2],travel_female[,2],range=0,border=rainbow(2),names=c('Male','Female'),main="Male Vs. Female Age Comparison: Boxplot")# ipl_india[,5] is the command to access the price data (which is the 5th column) for indian players

## comparing boxplots of all countries together:boxplot(Amount~Gender, range=0,border=rainbow(6), main = 'Gender-wise; boxplot of Amounts')

## comparing boxplots of all teams together:boxplot(Amount~Age, range=0, border=rainbow(6), main = 'Age-wise: Amount box plot')

Page 8: Ao assignment   sanmeet dhokay

The below is the scatter plot of the Amounts spent by both Males and Females

The below plot shows the gradual increase in the amount spent

If we observe the box plot below ,the median value is 920 which means that the median amount spent was 920 by both males and females.The Min amount being 380 and the maximum being 1400. Maximum amounts could be found in between 710 and 1100.

Page 9: Ao assignment   sanmeet dhokay

> summary(Amount, digits=2) Min. 1st Qu. Median Mean 3rd Qu. Max. 380 710 920 910 1100 1400

The below bar graph gives us the frequency of the different ranges of amount. We can say that amount spent between 900 and 1000 is having the highest frequency.

Page 10: Ao assignment   sanmeet dhokay

The below Pie chart tells us that the number of Males and Females are almost the same

The same above data is displayed in the form of Bar Graph

The below box plot shows us the Age Comparison between Males and Females.Median Age of Females is slightly higher than Males. It shows that overall Females were older than Males.

Page 11: Ao assignment   sanmeet dhokay

The below box plot shows the comparison between Males and Females for the Amounts Spent.Median value is almost the same. 3rd Quartile of Male is slightly higher than Females which means that the large amount spenders were slightly higher in Male category.

The below boxplots comparsions for different age groups shows us that the Low Age group members (Around the age 25 and 27) population were highest spenders.

Page 12: Ao assignment   sanmeet dhokay

3. CCBMDO.csv : Placement data of a course (mba program for military officers) batch at iim indore.

Code Used

## set the working directory first ## by clicking on the console and choosing from dropdown menusetwd("E:/PGPMX/AO")

## this is the command to read the data file as a text file##ipl_data = read.table("IPL2014_DATA.txt",header=TRUE)

## since the files given to u are csv files, those can be read as army_data = read.csv("CCMBDO.csv",header=TRUE)

## this command simply prints out the name of columns:colnames(army_data)

Page 13: Ao assignment   sanmeet dhokay

## this commands changes the names of the columns to our desired names:colnames(army_data) = c("SNO","Rank","Wing","Name","Gender","Age","Salary")

## this command makes the data accesible to the R console for manipulations:attach(army_data)

## Extracting the complete tables for two levels of a categorical variable, 'Product': army_Capt= army_data[Rank=="Capt",]army_Maj = army_data[Rank=="Maj",]

# a simple plot of the Price variable across the entire dataset:plot(Salary)

# sorted plot of the price variable across the entire dataset:plot(sort(Salary))

# same command with some additional features like labels, limits, etc...plot(sort(Salary),type='b',xlab='Salary',ylab='Salary', ylim = c(0,25))

# 6 number summary of price data (lowest value, 1QR, median, mean, 3QR, maximum value)summary(Salary, digits=2)

## drawing the boxplot of the Price variable across entire dataset (range=0 command creates a boxplot with inner fences as extreme values):boxplot(Rank, horizontal=TRUE,range=0,xlab = 'Rank')## drawing the boxplot of the Price variable across entire dataset (creates a boxplot with inner fences as 1QR-1.5*IQR and 3QR+1.5*IQR):boxplot(Gender, horizontal=TRUE,xlab = 'Gender')

# if we write range=0 it takes minimum and maximum values as inner fences# on removing range=0, it draws the inner fences at 1QR-1.5IQR & 3QR+1.5IQR

## comand for drawing histogram; breaks=4 creates 4 class breaks:hist(Salary, xlab = 'Salary', main = 'Army Data')hist(Salary,breaks=4, xlab = 'Salary', main = 'Army data')

## Steps to draw bar charts:Rank_counts = table(Rank)Rank_relfreq = Rank_counts/sum(Rank_counts)

barplot(Rank_counts, col=rainbow(6), main = 'User Bar Plot',ylim=c(0,25))pie(Rank_counts, col=rainbow(6), main = 'User Pie Chart')

## comparing two products:boxplot(army_Capt[,6],army_Maj[,6],range=0,border=rainbow(2),names=c('Capt','Maj'),main="Capt Vs. Maj Age Comparison: Boxplot")

Page 14: Ao assignment   sanmeet dhokay

## comparing boxplots of all countries together:boxplot(Age~Rank, range=0,border=rainbow(6), main = 'Agewise; boxplot of Rank')

## comparing boxplots of all teams together:boxplot(Salary~Rank, range=0, border=rainbow(6), main = 'Salary-wise: Rank box plot')

## comparing boxplots of all player types together:boxplot(Salary~Gender, range=0, border=rainbow(6), main = 'Salary-wise: Gender box plot')

Scatterplot for the salary data is mentioned below. The salary are evenly spread out between 10 and 20

The below plot shows how the salary increases in steps gradually

Page 15: Ao assignment   sanmeet dhokay

The below plot also shows the almost linear growth in salary as we go above in the ranks

The below Bar graph shows the comparison between number of officers in different ranks.Maj Rank tops the list followed by Capt Rank

Page 16: Ao assignment   sanmeet dhokay

The below Bar Graph shows the frequency of Salary in different Ranges.Range 10-12 is the most commonly found salary range

The below BoxPlot tells us that the Median Age of Capt is higher than than the Median Age of Maj. Age Range is much more in the Rank of Maj than Capt.Lowest Age is found in the Rank of Maj

Page 17: Ao assignment   sanmeet dhokay

Below is the BoxPlot comparison of different Ranks with respect to their Age. Col rank is having the highest Median Age.

Boxplot of Salary for the two genders shows us that Overall Males earn much higher than the Females .But high number of Females’ salary lie above the median whereas for Males ,high number of Males’ salary fall below the median.

Page 18: Ao assignment   sanmeet dhokay

Below Box plot of Salaries for different Ranks shows us that Lt Col has the highest median and highest salary which is followed by Maj Rank and then by Capt Rank and lastly by Col rank. Topmost salary for 3 Ranks viz Maj,Capt and Lt Col is the same and that figure is 20.

Page 19: Ao assignment   sanmeet dhokay
Page 20: Ao assignment   sanmeet dhokay
Page 21: Ao assignment   sanmeet dhokay