20 November, 2015

SSAS Interview Questions and Answers

SSAS Interview Questions and Answers


1. Star Vs Snowflake schema and Dimensional modeling
Star Schema: One Fact Table surrounded by number of Dimension Tables. It is a De-Normalized form.
Dimension table will not have any parent table.
Hierarchies in the Dimension are stored in the Dimension itself.

Snowflake: Normalized form of star schema is a snow flake schema. Dimension tables can be further broken down into sub dimensions.
Dimension table will have one or more parent tables.
Hierarchies are broken into separate tables in snow schema. These hierarchies helps to drilldown the data from Top hierarchy to lowermost hierarchy.
Increases the number of joins and poor performance in retrieval of data.

2. Data storage modes - MOLAP, ROLAP, HOLAP

In ROLAP, the structure of aggregation along with the values are stored in the 2 dimensional relational formats at disc level.
   ROLAP offers low latency, but it requires large storage space as well as slower processing and query response times.


In MOLAP, the structure of aggregation along with the data values are stored in multi dimensional format, takes more space with less time for data analysis compared to ROLAP.
    MOLAP offers faster query response and processing times, but offers a high latency and requires average amount of storage space. This storage mode leads to duplication of data as the detail data is present in both the relational as well as the multidimensional storage.

In HOLAP, structure is stored in Relational model and data is stored in multi-dimensional model which provides optimal usage and space.
This storage mode offers optimal storage space, query response time, latency and fast processing times.

Default storage setting is MOLAP.

3. Types of Dimensions


Dimension type

Description

Regular

A dimension whose type has not been set to a special dimension type.

Time

A dimension whose attributes represent time periods, such as years, semesters, quarters, months, and days.

Organization

A dimension whose attributes represent organizational information, such as employees or subsidiaries.

Geography

A dimension whose attributes represent geographic information, such as cities or postal codes.

BillOfMaterials

A dimension whose attributes represent inventory or manufacturing information, such as parts lists for products.

Accounts

A dimension whose attributes represent a chart of accounts for financial reporting purposes.

Customers

A dimension whose attributes represent customer or contact information.

Products

A dimension whose attributes represent product information.

Scenario

A dimension whose attributes represent planning or strategic analysis information.

Quantitative

A dimension whose attributes represent quantitative information.

Utility

A dimension whose attributes represent miscellaneous information.

Currency

This type of dimension contains currency data and metadata.

Rates

A dimension whose attributes represent currency rate information.

Channel

A dimension whose attributes represent channel information.

Promotion

A dimension whose attributes represent marketing promotion information.
5.    Types of Measures
Fully Additive Facts: These are facts which can be added across all the associated dimensions.
For example, "Sales Amount" is a fact which can be summed across different dimensions like Customer, Geography, Date, Product.

Semi-Additive Facts: These are facts which can be added across only few dimensions rather than all dimensions.
For example, Bank balance is a fact which can be summed across the customer dimension (i.e. the Total balance of all the customers in a bank at the end of a particular quarter).  However, the same fact cannot be added across the date dimension (i.e. the total balance at the end of Quarter 1 is $X million and $Y million at the end of Quarter 2, so at the end of quarter 2, the total balance is only $Y million and not $X+$Y).

Non-Additive Facts: These are facts which cannot be added across any of the dimensions in the cube. For example, profit margin is a fact which cannot be added across any of the dimensions.
For example, if product P1 has a 10% profit and product P2 has a 10% profit then your Net Profit is still 10% and not 20%.  We cannot add profit margins across product dimensions. Similarly, if your profit margin is 10% on Day1 and 10% on Day2, then your net Profit Margin at the end of Day2 is still 10% and not 20%.

Derived Facts: Derived facts are the facts which are calculated from one or more base facts, often by applying additional criteria. Often these are not stored in the cube and are calculated on the fly at the time of accessing them. For example, profit margin.

Factless Facts: A factless fact table is one which only has references (Foreign Keys) to the dimensions and it does not contain any measures. These types of fact tables are often used to capture events (valid transactions without a net change in a measure value).
For example, a balance enquiry at an ATM. Though there is no change in the account balance, this transaction is still important for analysis purposes.

Textual Facts: Textual facts refer to the textual data present in the fact table, which is not measurable (non-additive), but is important for analysis purposes. For example, codes (i.e. product codes), flags (i.e. status flag), etc.

5. Types of relationships between dimensions and measure groups. 
No relationship: The dimension and measure group are not related.
Regular: The dimension table is joined directly to the fact table.
Referenced: The dimension table is joined to an intermediate table, which in turn, is joined to the fact table.
Many to many: The dimension table is joined to an intermediate fact table, the intermediate fact table is joined, in turn, to an intermediate dimension table to which the fact table is joined.
Data mining: The target dimension is based on a mining model built from the source dimension. The source dimension must also be included in the cube.
Fact table: The dimension table is the fact table.

6. Proactive caching
Proactive caching can be configured to refresh the cache (MOLAP cache) either on a pre-defined schedule or in response to an event (change in the data) from the underlying relational database. Proactive caching settings also determine whether the data is queried from the underlying relational database (ROLAP) or is read from the outdated MOLAP cache, while the MOLAP cache is rebuilt.
Proactive caching helps in minimizing latency and achieve high performance.
It enables a cube to reflect the most recent data present in the underlying database by automatically refreshing the cube based on the predefined settings.
Lazy aggregations:
When we reprocess SSAS cube then it actually bring new/changed relational data into SSAS cube by reprocessing dimensions and measures. Partition indexes and aggregations might be dropped due to changes in related dimensions data so aggregations and partition indexes need to be reprocessed. It might take more time to build aggregation and partition indexes.
If you want to bring cube online sooner without waiting rebuilding of partition indexes and aggregations then lazy processing option can be chosen. Lazy processing option bring SSAS cube online as soon as dimensions and measures get processed. Partition indexes and aggregations are triggered later as a background job.


Lazy processing option can be changed by server level property "OLAP\LazyProcessing\Enabled"
Advantage: Lazy processing saves processing time as it brings as soon as measures and dimension data is ready.
Disadvantage: User will see performance hit when aggregation are getting build in background.
7. Partition processing options
Process Default: SSAS dynamically chooses from one of the following process options.
Process Full: Drop all object stores, and rebuild the objects. This option is used when a structural change has been made to an object, for example, when an attribute hierarchy is added, deleted, or renamed.
Process Update: Forces a re-read of data and an update of dimension attributes. Flexible aggregations and indexes on related partitions will be dropped.
Process Add: For dimensions, adds new members and updates dimension attribute captions and descriptions.
Process Data: Processes data only without building aggregations or indexes. If there is data is in the partitions, it will be dropped before re-populating the partition with source data.
Process Index:  Creates or rebuilds indexes and aggregations for all processed partitions. For unprocessed objects, this option generates an error.
Unprocess: Delete data from the object.
Process Structure: Drop the data and perform process default on all dimensions.
Process Clear: Drops the data in the object specified and any lower-level constituent objects. After the data is dropped, it is not reloaded.
Process Clear Structure: Removes all training data from a mining structure.

8. Difference between attribute hierarchy and user hierarchy
An Attribute Hierarchy is created by SSAS for every Attribute in a Dimension by default. An Attribute by default contains only 2 levels - An "All" level and a
"Detail" level which is nothing but the Dimension Members.

A User Defined Hierarchy is defined explicitly by the user/developer and often contains multiple levels. For example, a Calendar Hierarchy contains Year, Quarter, Month, and Date as its levels.

Some of the highlights/differences of Attribute and User Defined Hierarchies:
1. Attribute Hierarchies are always two-Level (Unless All Level is suppressed) whereas User Defined Hierarchies are often Multi-Level.
2. By default, Every Attribute in a Dimension has an Attribute Hierarchy whereas User Defined Hierarchies have to be explicitly defined by the user/developer.
3. Every Dimension has at least one Attribute Hierarchy by default whereas every Dimension does not necessarily contain a User Defined Hierarchy. In essence, a Dimension can contain zero, one, or more User Defined Hierarchies.
4. Attribute Hierarchies can be enabled or disabled. Disable the Attribute Hierarchy for those attributes which are commonly not used to slice and dice the data during analysis, like Address, Phone Number, and Unit Price etc. Doing this will improve the cube processing performance and also reduces the size of the cube as those attributes are not considered for performing aggregations.
5. Attribute Hierarchies can be made visible or hidden. When an Attribute Hierarchy is hidden, it will not be visible to the client application while browsing the Dimension/Cube. Attribute Hierarchies for those attributes which are part of the User Defined Hierarchies, like Day, Month, Quarter, and Year, which are part of the Calendar Hierarchy, can be hidden, since the attribute is available to the end users through the User Defined Hierarchy and helps eliminate the confusion/redundancy for end users.

9. Dimension, Hierarchy, Level, and Members
Dimensions in Analysis Services contain attributes that correspond to columns in dimension tables. These attributes appear as attribute hierarchies and can be organized into user-defined hierarchies, or can be defined as parent-child hierarchies based on columns in the underlying dimension table. Hierarchies are used to organize measures that are contained in a cube. 
Hierarchy: is the relation between attributes in a dimension.
Level: refers to individual attribute within the Hierarchy.

10. Difference between database dimension and cube dimension?

In SQL Server Analysis Services, there are Two types of Dimensions
1.Database Dimension
2.Cube Dimension


All the dimensions that are created using the Dimension Wizard of the Solution Explorer are treated as database dimensions. In other words, the dimensions which are at Database level are called Database Dimensions.
Database dimensions is independent of cube and can be processed on their own. 

When you build a cube, and you add dimensions to that cube, you create cube dimensions: cube dimensions are instances of a database dimension inside a cube.

Difference Between Database Dimension and Cube Dimension in SSAS:
1.A database dimension can be used in multiple cubes, and multiple cube dimensions can be based on a single database dimension
2.The Database dimension has only Name and ID properties, whereas a Cube dimension has several more properties.
3.Database dimension is created once where as the Cube dimension is referenced from database dimension.
4.Database dimension exists only once, whereas Cube dimensions can be created more than one using ROLE PLAYING Dimensions concept.

11. Importance of CALCULATE keyword in MDX script, data pass and limiting cube space

12. Effect of materialize

When setting up a dimension with a Reference relationship type, we have the option of "materializing" the dimension.

   Select to store the attribute member in the intermediate dimension that links the attribute in the reference dimension to the fact table in the MOLAP
structure.  This imporvies the qery performance, but increases the processing time and storage space.
  If the option is not selected, only the relationship between the fact records and the intermediate dimension is stored in the cube. This means that Anaylysis services has to derive the aggregated values for the members of the referenced dimension when a query is executed, resulting in slower query performance.
13. Partition processing and Aggregation Usage Wizard

14. Perspectives, Translations, Linked Object Wizard


15. Handling late arriving dimensions / early arriving facts


16. Role playing Dimensions, Junk Dimensions, Conformed Dimensions, SCD and other types of dimensions

Role playing Dimension:
A Role-Playing Dimension is a Dimension which is connected to the same Fact Table multiple times using different Foreign Keys.
eg: Consider a Time Dimension which is joined to the same Fact Table (Say FactSales) multiple times, each time using a different Foreign Key in the Fact Table like Order Date, Due Date, Ship Date, Delivery Date.
Steps:
In Cube Designer, click the Dimension Usage tab.
Either click the 'Add Cube Dimension' button, or right-click anywhere on the work surface and then click Add Cube Dimension.
In the Add Cube Dimension dialog box, select the dimension that you want to add, and then click OK.
Conformed Dimension is a Dimension which connects to multiple Fact Tables across one or more Data Marts (cubes). Conformed Dimensions are exactly the same structure, attributes, values (dimension members), meaning and definition.
Example: A Date Dimension has exactly the same set of attributes, same members and same meaning irrespective of which Fact Table it is connected to.
linked dimension is based on a dimension that is stored in a separate Analysis Services Database which may or may not be on the same server. You can create and maintain a dimension in just one database and then reuse that dimension by creating linked dimensions for use in multiple databases.

Linked Dimensions can be used when the exact same dimension can be used across multiple Cubes within an Organization like a Time Dimension, geography Dimension etc.

Here are some of the highlights of a Linked Dimension:
-More than one Linked Dimension can be created from a Single Database Dimension.
-These can be used to implement the concept of Conformed Dimensions.
-For an end user, a Linked Dimension appears like any other Dimension.

Degenerate Dimension is a Dimension which is derived out of a Fact Table and it appears to the end user as a separate/distinct Dimension, its data is actually stored in the Fact Table. It's a Dimension table which does not have an underlying physical table of its own.

Degenerate Dimensions are commonly used when the Fact Table contains/represents Transactional data like Order Details, etc. and each Order has an Order Number associated with it, which forms the unique value in the Degenerate Dimension.

Junk Dimension is often a collection of Unrelated Attributes like indicators, flags, codes, etc. It is also called as a Garbage Dimension.
Junk Dimensions are usually small in size.

One of the common scenarios is when a Fact Table contains a lot of Attributes which are like indicators, flags, etc. Using Junk Dimensions, such Attributes can be removed/cleaned up from a Fact Table.

SCD: The Slowly Changing Dimension (SCD) concept is basically about how the data modifications are absorbed and maintained in a Dimension Table.

 The new (modified) record and the old record(s) are identified using some kind of a flag like say IsActive, IsDeleted etc. or using Start and End Date fields to indicate the validity of the record.
17. Parent Child Hierarchy, NamingTemplate property, MemberWithLeafLevelData property

18. How will you keep measure in cube without showing it to user?


19. How to pass parameter in MDX

Here is an example, suppose this is your query:
select {[Measures].[Internet Order Quantity]} on 0,
[Product].[Category].[Category] on 1
from [Adventure Works]
where [Date].[Calendar Year].&[2002]

You can modify it like this:
select {[Measures].[Internet Order Quantity]} on 0,
[Product].[Category].[Category] on 1
from [Adventure Works]
where strtomember(@P1)

Now, if you pass the value [Date].[Calendar Year].&[2002] to the P1, then it will run just like:
where [Date].[Calendar Year].&[2002]

STRTOSET returns a set.
STRTOMEMBER returns a member.

20. Why do we need to create partitions only on fact table. Why not on the dimension tables.?
Mainly partitioning technique is used for performance, usually Fact Tables have large volume of data, but Dimension Table is not like that, so that there won’t be any performance issues regarding the Dimension data. 
For example if i have multiple partition in fact table, we can process the partitions parallel, it will reduce the processing time.


21. Dimension security vs Cell security

22. SCOPE statement, THIS keyword, SUBCUBE


23. CASE (CASE, WHEN, THEN, ELSE, END) statement, IF THEN END IF, IS keyword, HAVING clause


24. CELL CALCULATION and CONDITION clause


25. RECURSION and FREEZE statement


26. Common types of errors encountered while processing a dimension / measure groups / cube
Common Errors in the OLAP storage engine: 
1. The process operation ended because the number of errors encountered during processing reached the defined limit of allowable errors for the operation.
2. An error occurred while processing the partition 
3. The attribute key cannot be found: Table: TableName
4. The record was skipped because the attribute key was not found.


27. Logging and monitoring MDX scripts and cube performance

28. What do you understand by attribute relationship? what are the main advantages in using attribute relationship? 
An Attribute Relationship is a relationship between various attributes within a Dimension. By default, every Attribute in a Dimension is related to the Key
Attribute.

There are basically 2 types of Attribute Relationships: 

1. Rigid
2. Flexible
29. What is natural hierarchy and how will you create it?
Natural hierarchies, where each attribute is related either directly or indirectly to all other attributes in the same hierarchy, as in product category - product 
subcategory - product name

30. What do you understand by rigid and flexible relationship? Which one is better from performance perspective?
Rigid
Attribute Relationship should be set to Rigid when the relationship between those attributes is not going to change over time. For example, 
relationship between a Month and a Date is Rigid since a particular Date always belongs to a particular Month like 1st Feb 2012 always belongs to Feb 
Month of 2012. Try to set the relationship to Rigid wherever possible.

Flexible
Attribute Relationship should be set to Flexible when the relationship between those attributes is going to change over time. For example, relationship between an Employee and a Manager is Flexible since a particular Employee might work under one manager during this year (time period) and under a different manager during next year (another time period).
31. In which scenario, you would like to go for materializing dimension? 
Reference dimensions let you create a relationship between a measure group and a dimension using an intermediate dimension to act as a bridge between 
them.

 32. In dimension usage tab, how many types of joins are possible to form relationship between measure group and dimension?
33. What is deploy, process and build?
Bulid: Verifies the project files and create several local files.
Deploy: Deploy the structure of the cube(Skeleton) to the server.
Process: Read the data from the source and build the dimesions and cube structures
34. Can you create server time dimension in analysis services(Server time dimension)? 

35. How many types of dimension are possible in SSAS?
Account
Bill of Materials
Currency
Channel
Customer
Geography
Organizations
Products
promotion
Regular
Scenario
Time
Unary
36. What is time intelligence? How will you implement in SSAS?
37. What do you understand by linked cube or linked object feature in SSAS?
38. How will you write back to dimension using excel or any other client tool?
39. What do you understand by dynamic named set (SSAS 2008)? How is i different from static named set?
40. In Process Update, which relationship will be better(Rigid and Flexible relationship)?  
41. What is the difference between "ProcessingGroup" ByAttribute and ByTable? 

42. What do you understand by following properties of dimension attribute?
AttributeHierarchyEnabled:
The value of the "AttributeHierarchyEnabled" property determines whether an attribute hierarchy is created. 
If this property is set to False, the attribute hierarchy is not created and the attribute cannot be used as a level in a user hierarchy; the attribute hierarchy exists as a member property only. However, a disabled attribute hierarchy can still be used to order the members of another attribute. 
If the value of the AttributeHierarchyEnabled property is set to True, the value of the AttributeHierarchyVisible property determines whether the attribute hierarchy is visible independent of its use in a user-defined hierarchy.

When an attribute hierarchy is enabled, you may want to specify values for the following 3 additional properties:
1. IsAggregatable:  To disable the (All) level for an enabled attribute hierarchy, set the value for this property to False.
2. AttributeHierarchyOrdered
3. AttributeHierarchyOptimizedState

Default Member
DiscretizationMethod
OrderBy
OrderByAttribute


43.  What are different storage mode options in SSAS and which scenario, they will be useful?
There are 3 standard storage modes (Multi dimensional OLAP, Relational OLAP, Hybrid OLAP) in OLAP applications which affect the performance of OLAP queries and cube processing, storage requirements and also determine storage locations.

44.  How will you implement data security for given scenario in analysis service data?
"I have 4 cubes and 20 dimension. I need to give access to CEO, Operation managers and Sales managers and employee. 
1) CEO can see all the data of all 4 cubes.
2) Operation Managers can see only data related to their cube. There are four operation managers.
3) Employees can see only certain dimension and measure groups data. (200 Employees) "

45. What are the options to deploy SSAS cube in production?
Right click on Project in Solution Explorer -> Properties
Build -> Select ' Output Path'
Deployment ->
 Processing Option: Default, Full, Do Not Process
 Transactional Deployment: False, True
 Deployment Mode: Deploy All, Deploy Changes only

1.Firstly, when we build the cube, the build process will create 4 XML files inside the bin folder. The asdatabase file is the main object definition file and the 3 other files contain configuration and deployment metadata.

.asdatabase -  is the main object definition file
.configsettings
.deploymentoptions
.deploymenttargets


2. Secondly, the deployment process will attach the above files to running Server Analysis Service. This deployment can be done by many ways, but more appropriate option for production deployment is deploy by Deployment Wizard Tool, because this tool will allow doing many configuration settings.
Run the wizard from Start -> All Programs ->Microsoft Sql Server -> Analysis Services -> Deployment wizard.

1. Browse to the .asdatabase file created by the build
2. connect to the target server
3. Configure how partitions and roles should be deployed
4. specify how configuration settings are deployed
5. Specify Processing options:
Default processing allows SSAS to decide what needs to be done; Full processing can be used to process all objects.  You can also choose not to process at all.
6. choose whether to deploy instantly or to create an XMLA command script for later deployment. The script will be created in the same location as the 
.asdatabase file.

Finally, the Processing is the step, or series of steps, in which Analysis Service loads data from a relational data source into a multidimensional model.
For objects that use MOLAP storage, data is saved on disk in the database file folder. 
For ROLAP storage, processing occur on demand, in response to MDX query on an object. For object that uses ROLAP storage, processing refers to updating the cache before returning query results.

46. What are the options available to incrementally load relational data into SSAS cube?
Use Slowly Changing Dimension

47. Why will you use aggregation at remote server?    
48.  What are different ways to create aggregations in SSAS? 
49. What do you understand by Usage based optimization?
50.  Can we use different aggregation scheme for different partitions?
51. Why will you use perspective in SSAS?
52. What are KPIs? How will you create KPIs in SSAS?
53. What are the main feature differences in SSAS 2005 and SSAS 2008 from developer point of view?
54.What are the aggregate functions available for measure in SSAS?
Sum, Min, Max, Count, and Distinct Count
55. What are the processing modes available for measure group? What do you understand by lazy aggregation? 

56. How can you improve dimension design?
1: Limit the Number of Dimensions Per Measure Group and Number of Attributes Per Dimension.
AttributeHierarchyOptimizedState: Determines the level of optimization applied to the attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means that Analysis Services builds indexes for the attribute hierarchy to improve query performance. The other option, NotOptimized, means that no indexes are built for the attribute hierarchy.


2: Use Dimension Properties Effectively
For large dimensions that expose millions of rows and have a large number of attributes, pay particular attention to the ProcessingGroup property. By default, this property is assigned a value of ByAttribute.


3: Use Regular Dimension Relationship Whenever Possible
4: Use Integer Data Type for Attribute Keys If at All Possible
5: Use Natural Hierarchies When Possible
57. What are the performance issues with parent child hierarchy? 
In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute, i.e., the All attribute unless it is disabled.
58. What do you understand by formula engine and storage engine?
Formula Engine is single-threaded, Storage Engine (SE) is multi-threaded.
The Query Processor Cache/Formula Engine Cache caches the calculation results whereas the Storage Engine Cache caches aggregated/fact data being 
queried.

59. How can you improve overall cube performance?
1. Partitioning the cube can help to reduce the processing time. The benefit of partitioning is that it allows  to process multiple partitions in parallel on a
server that has multiple processors.
2. Keep Cube space as small as possible by only including measures groups that are needed.
3. Place the measures that are queried together in same measure group. A query that retrieves measures from multiple measure groups requires multiple storage space.
4. Define the aggregations to reduce the number of records that the storage engine needs to scan from disk to satisfy a query.
5. Avoid designing excessive aggregations. Excessive aggregations may reduce processing performance and query performance.

Regarding the best possible processing strategy, the following steps:
1. Process Update all the dimensions that could have had data changed.  Depending on the nature of the changes in the dimension table, Process Update can affect dependent partitions.  If only new members were added, then the partitions are not affected.  But if members were deleted or if member relationships changed, then some of the aggregation data and bitmap indexes on the partitions are dropped.
2. Process Data the partitions that have changed data (which are usually the most recent partitions).  Of course, the smaller the partition, the better, so try to use daily partitions instead of monthly or use monthly partitions instead of yearly.
3. Process Index for the rest of the partitions 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.