sas assignment for practice
DESCRIPTION
Assignment for beginners in the world of SAS programmingTRANSCRIPT
ASSIGNEMENT 1
1. Import the Sales file and map brands using the brand mapping also given in mapping tab of attached excel:a. using SAS IMPORT WIZARDb. using PROC IMPORTc. using CUSTOM CODE with below format:Data TypeLengthFormat
BrandCharacter50
ItemNumeric50
StoreNumeric10
MonthCharacter5
Monthly_QTYNumeric8Comma Format
Monthly_SALESNumeric8Dollar Format
Raw File:
2. Prepare the summary table as per below mentioned format:a.
Brand DescItem# of StoresAverage QTYSales VolumeCLM QTYAverage SALESSales ValueCLM SALES
XXXXXXXXXXXXXXX (Sum of Monthly Qty)XXX:XXXXXXXXX (Sum of Monthly Sales)XXX:XXX
b.Brand DescMonthAverage QTYMax:Min QtyAverage SALESMax:MIN Sales
XXXXXXXXXXXX:XXXXXXXXX:XXX
c. Based on Monthly QTY, prepare a summary table for the top 10 items for each Brand,Provided that all the top 10 items are selling in equal number of stores.
3. How will you split a dataset in four equal parts having:a. Observations picked up on the basis of descending salary? For e.g. People with Top 25% salary should be outputted in first dataset, Top 26-50 go into second dataset and so on.b. Records are randomly picked up from input dataset? E.g. Example:Create a table with some sample data with below format to perform the above exercise.EmpNoNameAddressMobileSalary
ID1XXXX1Add 1+91-7892123456 56,065.00
ID2XXXX2Add 2+91-9728737466 34,013.00
ID3XXXX3Add 3+91-8285405233 40,138.00
4. You have the data set shown in below example as Data Set A. Prepare new data sets Data Set B and Data Set C also shown below:
Data Set A: Raw DatasetIDMonthUnitIDMonthUnitIDMonthUnit
101Jan87.89102Jan23.70513103Jan98.96
101Feb20.95102Feb27.56596103Feb53.11
101Mar24.14102Mar45.22387103Mar17.17
101Apr29.13103Apr21.52
103May63.26
103Jun24.32
103Jul42.93
Data Set B: Missing Months are filled with last value of Available months.IDMonthUnitIDMonthUnitIDMonthUnit
101Jan87.89102Jan23.70103Jan98.96
101Feb20.95102Feb27.56103Feb53.11
101Mar24.14102Mar45.22103Mar17.17
101Apr29.13102Apr45.22103Apr21.52
101May29.13102May45.22103May63.26
101Jun29.13102Jun45.22103Jun24.32
101Jul29.13102Jul45.22103Jul42.93
101Aug29.13102Aug45.22103Aug42.93
101Sep29.13102Sep45.22103Sep42.93
101Oct29.13102Oct45.22103Oct42.93
101Nov29.13102Nov45.22103Nov42.93
101Dec29.13102Dec45.22103Dec42.93
Data Set C: Missing Months are filled with Average of Available months.IDMonthUnitIDMonthUnitIDMonthUnit
101Jan87.89102Jan23.70103Jan98.96
101Feb20.95102Feb27.56103Feb53.11
101Mar24.14102Mar45.22103Mar17.17
101Apr29.13102Apr32.16103Apr21.52
101May40.53102May32.16103May63.26
101Jun40.53102Jun32.16103Jun24.32
101Jul40.53102Jul32.16103Jul42.93
101Aug40.53102Aug32.16103Aug45.90
101Sep40.53102Sep32.16103Sep45.90
101Oct40.53102Oct32.16103Oct45.90
101Nov40.53102Nov32.16103Nov45.90
101Dec40.53102Dec32.16103Dec45.90
5. Dummy_2 Data set is Item x Date level data but all the items have not same start and end date. The time-series is also not continuous for the items i.e. in between Start Date and End Date some dates WKLY QTY and WKLY SALES are missing.
Prepare the data set as a continuous time-series for each Item. The missing WKLY QTY and WKLY SALES will be replaced by:a. Zerosb. Average of the available time-series for each itemc. Average of top 5 WKLY QTY and WKLY SALES for each itemd. Save all the above 3 data set in different sheet of an Excel File using SAS.
6. Attached Dummy_3 Data set is Item x Store level Monthly SALES data.a. Prepare a summary table to know distribution of sales among the stores.b. Prepare a data set with duplicate observation as per following criteria:i. If Total Sales of an Item is LESS than 15% of Total Sales for a particular Store then the Item x store combination will be repeated for 3 times.ii. If Total Sales of an Item is MORE than 15% but LESS than 50 % of Total Sales for a particular Store then the Item x store combination will be repeated for 5 times.iii. If Total Sales of an Item is MORE than 50% of Total Sales for a particular Store then the Item x store combination will be repeated for 10 times.
c. Prepare a summary table for the above data set as per below mentioned format
ItemStoreTotal SALES (Item)Total SALES (Store)%Sales Item by Store# of Repetition
XXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXX
7. Create Cumulative Series for each of the variable in attached raw file:a. using RETAIN functionb. without using RETAIN function
8. Use the Dummy 4.xlsx and create below APLs:(In _AAPPPL format of APL AA stands for Ad-stock, PPP stands for Power and L stands for Lag value)
TV_REG_GRP_100401
TV_REG_DOL_000000
DSP_NAT_IMP_200600
DSP_NAT_CLK_400402
Variable Name APL
WK_END_DT
Region
TV_REG_GRP__Variable_100401
Variable_TV_REG_DOL_000000
DSP_NAT_IMPRESSIONS_200600
DSP_NATIONAL_CICKS_400402
9. Input data set has following values:
Write a dataset so that output dataset will have following values:Variable Name APLVariable Name
WK_END_DTWK_END_DT
RegionRegion
TV_REG_GRP__Variable_100401TV_REG_GRP__Variable
Variable_TV_REG_DOL_000000Variable_TV_REG_DOL
DSP_NAT_IMPRESSIONS_200600DSP_NAT_IMPRESSIONS
DSP_NATIONAL_CICKS_400402DSP_NATIONAL_CICKS_400402
10. TV GRP data is provided at region, week level. Create national level TV data. Population data is provided for all regions.
TV GRP data
RegionWeekTV_REG_GRP
10101-Jul-201310
10108-Jul-201315
10201-Jul-201330
10208-Jul-201325
10301-Jul-20138
10308-Jul-201310
Population
RegionPopulation
10110500
10222300
10312800
11. Import following data of daily sales data of two products (Shoes and Bags) in different stores. Sales data is provided as State code Store number level (NY 12 means 12th store of New York). Sales data
Store_IDDateSales_ShoesSales_Bags
NY 1001-Jul-20131020
NY 1003-Jul-201310
NY 1008-Jul-201358
NY 20302-Jul-201389
NY 20303-Jul-201310
NY 20310-Jul-20131520
NJ 2001-Jul-201337
NJ 2003-Jul-201341
NJ 2008-Jul-201358
NJ 12302-Jul-201365
NJ 12303-Jul-201347
NJ 12310-Jul-20133
a. Create a dataset having distinct list of all the stores (resulting dataset will only have one variable). b. Create a dataset having total sales of both products at state, week level (week ending Saturday) - like Total Sales in NY during the week 30th June 6th July. New dataset will have 3 columns State, Week_end_date, and Total_Sales.c. Merge with a predefined list of stores and report those store ids for which no sale information is provided.d. Store_ID
NY 10
NJ 10
NJ 123
NY 203
NJ 20
12. A dataset contains both numeric and character variables. Write a code:a. Which would attach the text End to all the character variables?b. Which would add 10 to all the numeric values?c. Which would Convert Negative Values to Positive Values but positive, 0 or missing will remain same.
13. Please provide the output when a dataset is sorted using the Nodupkey and Noduprecs. Explain the difference between two outputs.
SOLUTION Q12:
Variable1 = Variable1|| End;New_Nunber = abs(negative-value) ORNew_Number = negative-value* -1
--------------------- All the best---------------------
Dummy3.xlsxAPLsWK_END_DTRegionTV_REG_GRPTV_REG_DOLDSP_NAT_IMPDSP_NAT_CLK3/10/13Newyork60.6353.4555.0059.453/17/13Newyork67.2619.9167.9167.823/24/13Newyork66.6263.0344.7513.373/31/13Newyork76.7126.8984.2846.484/7/13Newyork32.3588.3072.5253.414/14/13Newyork8.3797.2154.2516.064/21/13Newyork17.7279.275.9131.944/28/13Newyork67.2962.3677.6617.735/5/13Newyork57.5518.7051.7230.495/12/13Newyork45.4128.9317.5089.175/19/13Newyork35.6831.2894.2179.185/26/13Newyork3.3064.0121.5071.136/2/13Newyork70.1053.1638.3676.656/9/13Newyork1.9927.2082.5872.976/16/13Newyork5.9733.6263.4425.386/23/13Newyork47.2115.6530.7791.036/30/13Newyork34.4159.6357.6233.167/7/13Newyork14.7817.7048.9530.057/14/13Newyork27.316.5425.9828.947/21/13Newyork44.3912.6322.5874.307/28/13Newyork68.4591.1519.287.828/4/13Newyork62.0552.763.8461.75
Sales.xlsxDummy2.xlsx