sas assignment for practice

Upload: amit-anand

Post on 09-Mar-2016

223 views

Category:

Documents


0 download

DESCRIPTION

Assignment for beginners in the world of SAS programming

TRANSCRIPT

ASSIGNEMENT 1

1. Import the Sales file and map brands using the brand mapping also given in mapping tab of attached excel:a. using SAS IMPORT WIZARDb. using PROC IMPORTc. using CUSTOM CODE with below format:Data TypeLengthFormat

BrandCharacter50

ItemNumeric50

StoreNumeric10

MonthCharacter5

Monthly_QTYNumeric8Comma Format

Monthly_SALESNumeric8Dollar Format

Raw File:

2. Prepare the summary table as per below mentioned format:a.

Brand DescItem# of StoresAverage QTYSales VolumeCLM QTYAverage SALESSales ValueCLM SALES

XXXXXXXXXXXXXXX (Sum of Monthly Qty)XXX:XXXXXXXXX (Sum of Monthly Sales)XXX:XXX

b.Brand DescMonthAverage QTYMax:Min QtyAverage SALESMax:MIN Sales

XXXXXXXXXXXX:XXXXXXXXX:XXX

c. Based on Monthly QTY, prepare a summary table for the top 10 items for each Brand,Provided that all the top 10 items are selling in equal number of stores.

3. How will you split a dataset in four equal parts having:a. Observations picked up on the basis of descending salary? For e.g. People with Top 25% salary should be outputted in first dataset, Top 26-50 go into second dataset and so on.b. Records are randomly picked up from input dataset? E.g. Example:Create a table with some sample data with below format to perform the above exercise.EmpNoNameAddressMobileSalary

ID1XXXX1Add 1+91-7892123456 56,065.00

ID2XXXX2Add 2+91-9728737466 34,013.00

ID3XXXX3Add 3+91-8285405233 40,138.00

4. You have the data set shown in below example as Data Set A. Prepare new data sets Data Set B and Data Set C also shown below:

Data Set A: Raw DatasetIDMonthUnitIDMonthUnitIDMonthUnit

101Jan87.89102Jan23.70513103Jan98.96

101Feb20.95102Feb27.56596103Feb53.11

101Mar24.14102Mar45.22387103Mar17.17

101Apr29.13103Apr21.52

103May63.26

103Jun24.32

103Jul42.93

Data Set B: Missing Months are filled with last value of Available months.IDMonthUnitIDMonthUnitIDMonthUnit

101Jan87.89102Jan23.70103Jan98.96

101Feb20.95102Feb27.56103Feb53.11

101Mar24.14102Mar45.22103Mar17.17

101Apr29.13102Apr45.22103Apr21.52

101May29.13102May45.22103May63.26

101Jun29.13102Jun45.22103Jun24.32

101Jul29.13102Jul45.22103Jul42.93

101Aug29.13102Aug45.22103Aug42.93

101Sep29.13102Sep45.22103Sep42.93

101Oct29.13102Oct45.22103Oct42.93

101Nov29.13102Nov45.22103Nov42.93

101Dec29.13102Dec45.22103Dec42.93

Data Set C: Missing Months are filled with Average of Available months.IDMonthUnitIDMonthUnitIDMonthUnit

101Jan87.89102Jan23.70103Jan98.96

101Feb20.95102Feb27.56103Feb53.11

101Mar24.14102Mar45.22103Mar17.17

101Apr29.13102Apr32.16103Apr21.52

101May40.53102May32.16103May63.26

101Jun40.53102Jun32.16103Jun24.32

101Jul40.53102Jul32.16103Jul42.93

101Aug40.53102Aug32.16103Aug45.90

101Sep40.53102Sep32.16103Sep45.90

101Oct40.53102Oct32.16103Oct45.90

101Nov40.53102Nov32.16103Nov45.90

101Dec40.53102Dec32.16103Dec45.90

5. Dummy_2 Data set is Item x Date level data but all the items have not same start and end date. The time-series is also not continuous for the items i.e. in between Start Date and End Date some dates WKLY QTY and WKLY SALES are missing.

Prepare the data set as a continuous time-series for each Item. The missing WKLY QTY and WKLY SALES will be replaced by:a. Zerosb. Average of the available time-series for each itemc. Average of top 5 WKLY QTY and WKLY SALES for each itemd. Save all the above 3 data set in different sheet of an Excel File using SAS.

6. Attached Dummy_3 Data set is Item x Store level Monthly SALES data.a. Prepare a summary table to know distribution of sales among the stores.b. Prepare a data set with duplicate observation as per following criteria:i. If Total Sales of an Item is LESS than 15% of Total Sales for a particular Store then the Item x store combination will be repeated for 3 times.ii. If Total Sales of an Item is MORE than 15% but LESS than 50 % of Total Sales for a particular Store then the Item x store combination will be repeated for 5 times.iii. If Total Sales of an Item is MORE than 50% of Total Sales for a particular Store then the Item x store combination will be repeated for 10 times.

c. Prepare a summary table for the above data set as per below mentioned format

ItemStoreTotal SALES (Item)Total SALES (Store)%Sales Item by Store# of Repetition

XXXXXXXXXXXXXXXXXX

XXXXXXXXXXXXXXXXXX

7. Create Cumulative Series for each of the variable in attached raw file:a. using RETAIN functionb. without using RETAIN function

8. Use the Dummy 4.xlsx and create below APLs:(In _AAPPPL format of APL AA stands for Ad-stock, PPP stands for Power and L stands for Lag value)

TV_REG_GRP_100401

TV_REG_DOL_000000

DSP_NAT_IMP_200600

DSP_NAT_CLK_400402

Variable Name APL

WK_END_DT

Region

TV_REG_GRP__Variable_100401

Variable_TV_REG_DOL_000000

DSP_NAT_IMPRESSIONS_200600

DSP_NATIONAL_CICKS_400402

9. Input data set has following values:

Write a dataset so that output dataset will have following values:Variable Name APLVariable Name

WK_END_DTWK_END_DT

RegionRegion

TV_REG_GRP__Variable_100401TV_REG_GRP__Variable

Variable_TV_REG_DOL_000000Variable_TV_REG_DOL

DSP_NAT_IMPRESSIONS_200600DSP_NAT_IMPRESSIONS

DSP_NATIONAL_CICKS_400402DSP_NATIONAL_CICKS_400402

10. TV GRP data is provided at region, week level. Create national level TV data. Population data is provided for all regions.

TV GRP data

RegionWeekTV_REG_GRP

10101-Jul-201310

10108-Jul-201315

10201-Jul-201330

10208-Jul-201325

10301-Jul-20138

10308-Jul-201310

Population

RegionPopulation

10110500

10222300

10312800

11. Import following data of daily sales data of two products (Shoes and Bags) in different stores. Sales data is provided as State code Store number level (NY 12 means 12th store of New York). Sales data

Store_IDDateSales_ShoesSales_Bags

NY 1001-Jul-20131020

NY 1003-Jul-201310

NY 1008-Jul-201358

NY 20302-Jul-201389

NY 20303-Jul-201310

NY 20310-Jul-20131520

NJ 2001-Jul-201337

NJ 2003-Jul-201341

NJ 2008-Jul-201358

NJ 12302-Jul-201365

NJ 12303-Jul-201347

NJ 12310-Jul-20133

a. Create a dataset having distinct list of all the stores (resulting dataset will only have one variable). b. Create a dataset having total sales of both products at state, week level (week ending Saturday) - like Total Sales in NY during the week 30th June 6th July. New dataset will have 3 columns State, Week_end_date, and Total_Sales.c. Merge with a predefined list of stores and report those store ids for which no sale information is provided.d. Store_ID

NY 10

NJ 10

NJ 123

NY 203

NJ 20

12. A dataset contains both numeric and character variables. Write a code:a. Which would attach the text End to all the character variables?b. Which would add 10 to all the numeric values?c. Which would Convert Negative Values to Positive Values but positive, 0 or missing will remain same.

13. Please provide the output when a dataset is sorted using the Nodupkey and Noduprecs. Explain the difference between two outputs.

SOLUTION Q12:

Variable1 = Variable1|| End;New_Nunber = abs(negative-value) ORNew_Number = negative-value* -1

--------------------- All the best---------------------

Dummy3.xlsxAPLsWK_END_DTRegionTV_REG_GRPTV_REG_DOLDSP_NAT_IMPDSP_NAT_CLK3/10/13Newyork60.6353.4555.0059.453/17/13Newyork67.2619.9167.9167.823/24/13Newyork66.6263.0344.7513.373/31/13Newyork76.7126.8984.2846.484/7/13Newyork32.3588.3072.5253.414/14/13Newyork8.3797.2154.2516.064/21/13Newyork17.7279.275.9131.944/28/13Newyork67.2962.3677.6617.735/5/13Newyork57.5518.7051.7230.495/12/13Newyork45.4128.9317.5089.175/19/13Newyork35.6831.2894.2179.185/26/13Newyork3.3064.0121.5071.136/2/13Newyork70.1053.1638.3676.656/9/13Newyork1.9927.2082.5872.976/16/13Newyork5.9733.6263.4425.386/23/13Newyork47.2115.6530.7791.036/30/13Newyork34.4159.6357.6233.167/7/13Newyork14.7817.7048.9530.057/14/13Newyork27.316.5425.9828.947/21/13Newyork44.3912.6322.5874.307/28/13Newyork68.4591.1519.287.828/4/13Newyork62.0552.763.8461.75

Sales.xlsxDummy2.xlsx