datastage interview tips

Download Datastage Interview Tips

Post on 06-Apr-2015

592 views

Category:

Documents

13 download

Embed Size (px)

TRANSCRIPT

INTERVIEW QUESTIONS What is E-R modeling and why is it used for OLTP design? E-R model is Entity Relation model used in two dimensional Databases. For Example, SQL Server, or Oracle. A table is based on two dimensional Rows and Columns. Generally, OLTP systems are based on two dimensions. But, if you see in Dimensional modeling, we have more than two dimensions. A cube represents a three dimensional model in a data warehouse, the data are stored in the form of summary of information. Also, these data can be easily retrieved from a DB compared to a normal OLTP Database. Let us assume, PROD, GEOG, TIME and MEAS are the four dimensions we have. A DW System have stored information with these four dimensions. If you want to know the sales of Lux (Prod), in?North India (Geog), during (Oct 2006) for a measure value of Lux 75 grams (MEAS). ie., FACT_TBL(PROD LUX, GEOG NORTH_INDIA, TIME OCT06, MEAS Units) would give rise to some quantity say, 75809 Units. This means, in north india this many units have been sold during the given period. This you can very well access with a normal OLTP system. But the problem is when the size of the data grows, your system will not tolerate the load. Your query performance will die down. Not just this alone, for many other advantages, we need DWH instead of a normal OLTP system. What is the architecture of any Data warehousing project? What is the flow? 1) The basic step of data warehousing starts with datamodelling. i.e. creation of dimensions and facts. 2) data warehouse starts with collection of data from source systems such as OLTP,CRM,ERPs etc 3) Cleansing and transformation process is done with ETL(Extraction Transformation Loading)?tool. 4) by the end of ETL process target databases(dimensions,facts) are ready with data which accomplishes the business rules. 5) Now finally with the use of Reporting tools(OLAP) we can get the information which is used for decision support. Discuss the advantages & Disadvantages of star & snowflake schema? In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies. Compare Data Warehousing Top-Down approach with Bottom-up approach? In top down approach: first we have to build data warehouse then we will build data marts. Which will need more cross functional skills and time taking process also costly.

1

in bottom up approach: first we will build data marts then data warehouse. The data mart that is first build will remain as a proof of concept for the others. Less time as compared to above and less cost. Definition of data marts? Data Mart is the subset of data warehouse. You can also consider data mart holds the data of one subject area. For an example, you consider an organization that has HR, Finance, Communications and Corporate Service divisions. For each division you can create a data mart. The historical data will be stored into data marts first and then exported to data warehouse finally. What is the difference between E-R modeling and Dimensional modeling? E-R modeling is the relation between entities in the form of normalization. Dimensional modeling is the relation between dimensions in the form of de normalization. Are OLAP databases also called decision support system??? True/false? True What is the difference between OLAP and datawatehouse? Data warehouse is the place where the data is stored for analyzing Where as OLAP is the process of analyzing the data, managing aggregations, Partitioning information into cubes for in depth visualization. What is the difference between Data warehousing and Business Intelligence? Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term? Business intelligence? is used to encompass OLAP, data visualization, data mining and query/reporting tools. Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense . Why Denormalization is promoted in Universe Designing? In a relational data model, for normalization purposes, some lookup tables are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called DIMENSION table for performance and slicing data. Due to this merging of tables into one large Dimension table, it comes out of complex intermediate joins. Dimension tables are directly joined to Fact tables. Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is

2

15%onlywhen compared to FACT table. So only Denormalization is promoted in Universe Designing. What is fact less fact table? Where you have used it in your project? Fact less Fact Table contains nothing but dimensional keys. It is used to support negative analysis report. For example a Store that did not sell a product for a given period. What is snapshot? Snapshot is static data source; it is permanent local copy or picture of a report, it is suitable for disconnected networks. we cant add any columns to sanpshot. we can sort, grouping and aggregations and it is mainly used for analyzing the historical data. what are non-additive facts in detail? A fact may be measure, metric or a dollar value. Measure and metric are non additive facts. Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount. A non additive fact, for eg measure height(s) for citizens by geographical location , when we rollup city data to state level data we should not add heights of the citizens rather we may want to use it to derive count Data warehouse interview questions only What is source qualifier? Difference between DSS & OLTP? What is cube and why we are crating a cube what is diff between ETL and OLAP cubes? Any schema or Table or Report which gives you meaningful information Of One attribute wrt more than one attribute is called a cube. For Ex: In a product table with Product ID and Sales colomns, we can analyze Sales wrt to Prodcut Name, but if you analyze Sales wrt Product as well as Region( region being attribute in Location Table) the report or Resultant table or schema would be Cube. ETL Cubes: Built in the staging area to load frequently accessed reports to the target. Reporting Cubes: Built after the actual load of all the tables to the target depending on the customer requirement for his business analysis. What is surrogate key? Surrogate key is a substitution for the natural primary key. What are Aggregate tables? Aggregate table contains the Summary of existing warehouse data which is grouped to certain levels of dimensions. Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance To avoid this we can aggregate the table to certain required level and can use it. This table reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.

3

How data in data warehouse stored after data has been extracted and transformed from heterogeneous sources and where does the data go from data warehouse? Data in Data warehouse stored in the form of relational tables, most of the data ware houses approach is snowflake schema. What is the difference between hierarchies and levels? Levels: Columns available in dimension table is levels Hierarchies - Process of representing levels in Top to Bottom OR Bottom to Top Approach. Ex: Regional, Country, State, City Year, Month, Day, Hours Multi level hierachies can be natural like Year, Month, and Day. But a hierarchy doesnt have to be natural. You can create a hierarchy just For navigational or reporting purposes. Ex: Days to manufacture and Safety Stock level. Theres no relationship between the two attributes in this navigational hierarchy. In natural hierarchy is one in which you should define attribute relationship between levels. Levels are constructed from attributes. What is the difference between data warehouse and BI? DATAWAREHOUSE: Datawarehouse is integrated, time-variant, subject oriented and non-volatile collection data in support of management decision making process. BUSINESS INTELLIGENCE: Business Intelligence is the process of extracting the data, converting it into information and then into knowledge base is known as Business Intelligence. What are non-additive facts? # Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. # Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. # Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. What are the