Tuesday, January 22, 2013

DataWareHousing - ETL Project Life Cycle ( Simple to understand )

-> Datawarehousing projects are categorized into 4 types.
1) Development Projects.
2) Enhancement Projects
3) Migration Projects
4) Production support Projects.

-> The following are the different phases involved in a ETL project development life cycle.
1) Business Requirement Collection ( BRD )
2) System Requirement Collection ( SRD )
3) Design Phase
a) High Level Design Document ( HRD )
b) Low level Design Document ( LLD )
c) Mapping Design
4) Code Review
5) Peer Review
6) Testing
a) Unit Testing
b) System Integration Testing.
c) USer Acceptance Testing ( UAT )
7) Pre - Production
8) Production ( Go-Live )

Business Requirement Collection :-
---------------------------------------------
-> The business requirement gathering start by business Analyst, onsite technical lead and client business users.
-> In this phase,a Business Analyst prepares Business Requirement Document ( BRD ) (or) Business Requirement Specifications ( BRS )
-> BR collection takes place at client location.
-> The o/p from BR Analysis are
-> BRS :- Business Analyst will gather the Business Requirement and document in BRS
-> SRS :- Senior technical people (or) ETL architect will prepare the SRS which contains s/w and h/w requirements.
The SRS will includes
a) O/S to be used ( windows or unix )
b) RDBMS required to build database ( oracle, Teradata etc )
c) ETL tools required ( Informatica,Datastage )
d) OLAP tools required ( Cognos ,BO )
The SRS is also called as Technical Requirement Specifications ( TRS )

Designing and Planning the solutions :-
------------------------------------------------
-> The o/p from design and planning phase is
a) HLD ( High Level Design ) Document
b)LLD ( Low Level Design ) Document
HLD ( High Level Design ) Document : -
An ETL Architect and DWH Architect participate in designing a solution to build a DWH.
An HLD document is prepared based on Business Requirement.
LLD ( Low Level Design ) Document : -

Based on HLD,a senior ETL developer prepare Low Level Design Document
The LLD contains more technical details of an ETL System.
An LLD contains data flow diagram ( DFD ), details of source and targets of each mapping.
An LLD also contains information about full and incremental load.
After LLD then Development Phase will start

Development Phase ( Coding ) :-
--------------------------------------------------
-> Based an LLD, the ETL team will create mapping ( ETL Code )
-> After designing the mappings, the code ( Mappings ) will be reviewed by developers.
Code Review :-
-> Code Review will be done by developer.
-> In code review,the developer will review the code and the logic but not the data.
-> The following activities takes place in code review
-> You have to check the naming standards of transformation,mappings of data etc.
-> Source and target mapping ( Placed the correct logic or not in mapping )
Peer Review :-
-> The code will reviewed by your team member ( third party developer )
Testing:-
--------------------------------

The following various types testing carried out in testing environment.
1) Unit Testing
2) Development Integration Testing
3) System Integration Testing
4) User Acceptance Testing

Unit Testing :-
-> A unit test for the DWH is a white Box testing,It should check the ETL procedure and Mappings.
-> The following are the test cases can be executed by an ETL developer.
1) Verify data loss
2) No.of records in the source and target
3) Dataload/Insert
4) Dataload/Update
5) Incremental load
6) Data accuracy
7) verify Naming standards.
8) Verify column Mapping
-> The Unit Test will be carried by ETL developer in development phase.
-> ETL developer has to do the data validations also in this phase.
Development Integration Testing -
-> Run all the mappings in the sequence order.
-> First Run the source to stage mappings.
-> Then run the mappings related to dimensions and facts.
System Integration Testing :-
-> After development phase,we have to move our code to QA environment.
-> In this environment,we are giving read-only permission to testing people.
-> They will test all the workflows.
-> And they will test our code according to their standards.
User Acceptance Testing ( UAT ) :-
-> This test is carried out in the presence of client side technical users to verify the data migration from source to destination.

Production Environment :-
---------------------------------
-> Migrate the code into the Go-Live environment from test environment ( QA Environment ).

Monday, January 14, 2013

Functions in Informatica

1 Character Functions:
1.1 LENGTH:
The LENGTH function returns the number of characters in a string, including trailing blanks. It is available in the Designer and the Workflow Manager.
LENGTH (string)
Example: The following expression returns the length of each customer name:
LENGTH (CUSTOMER_NAME)
CUSTOMER_NAME
Leonardo
NULL
Edwin Britto
1.2 LPAD:
RETURN VALUE
8
NULL
12
The LPAD function adds a set of blanks or characters to the beginning of a string, to set a string to a specified length. It is available in the Designer and the Workflow Manager.
LPAD (first_string, length [, second_string])
Example: The following expression standardizes numbers to five digits by padding them with leading zeros.
LPAD (NUM, 5, '0')
NUM
1
250
1.3 LTRIM:
RETURN VALUE
00001
00250
The LTRIM function removes blanks or characters from the beginning of a string. It is available in the Designer and the Workflow Manager.
LTRIM (string [, trim_set])
LTRIM (string) removes the leading spaces or blanks from the string. When LTRIM function is used with a trim set, which is optional, it removes the characters in the trim set from the string.
Example : The following expression removes the leading zeroes in the port
ITEM_CODE.
LTRIM (ITEM_CODE,'0')
ITEM_CODE
006
0803
RETURN VALUE
6
803
* The LTRIM function can be nested when needed to remove multiple characters.
1.4 RPAD:
The RPAD function converts a string to a specified length by adding blanks or characters to the end of the string. It is available in the Designer and the Workflow Manager.
RPAD( first_string, length [, second_string ] )
Example: The following expression returns the string with a length of 5 characters, appending the string ':' to the end of each word:
RPAD (WORD, 5, ':’)
WORD
Date
Time
1.5 RTRIM:
RETURN VALUE
Date:
Time:
The RTRIM function removes blanks or characters from the end of a string. It is available in the Designer and the Workflow Manager.
RTRIM (string [, trim_set])
The RTRIM function can be combined with the LENGTH function if the trailing blanks are to be ignored. It can also be nested when needed to remove multiple characters.
RTRIM (string) removes the trailing spaces or blanks from the string. When RTRIM function is used with a trimset, which is optional, it removes the characters in the trimset from the string.
For example,
RTRIM (ITEM_CODE,'10')
The above expression removes the characters 10 in the port ITEM_CODE.
ITEM_CODE
0610
380
RETURN VALUE
06
38
In the second example the function removes the trailing zero since the RTRIM compares the first character in the trimset with the last character of the string, since it does not match it takes the second character in the trimset and compares with last character of the string. Since it matches it removes it.
1.6 SUBSTR:
The SUBSTR function returns a portion of a string. It is available in the Designer and the Workflow Manager.
SUBSTR( string, start [, length ] )
The SUBSTR may not give the desired result if the string on which it is used is not trimmed. Though it is always a good practice to trim the strings before using them in any expression, it becomes extremely important to trim them if they are used in a SUBSTR function.
For example, if there is a function
SUBSTR (NAME, 2,2)
It will not return the 2,3 characters of the NAME if the port has leading spaces. In this case LTRIM becomes essential.
SUBSTR(LTRIM(NAME),2,2)
The SUBSTR function can also be used to get the last few characters as described below.
SUBSTR(NAME,-3,3)
This function will return the last three characters of the string. But it may not return the required last three characters if the port has trailing blanks, hence RTRIM is essential.
SUBSTR(RTRIM(NAME),-3,3)
Hence it is always better to trim the strings before using them in a SUBSTR function.
SUBSTR(LTRIM(RTRIM(NAME)),3,2)
The above expression will get the 3,4 character of the port NAME irrespective of whether the port has leading or trailing blanks or not.
2 Conversion Functions:
2.1 TO_CHAR:
The TO_CHAR function converts numeric values and dates to text strings. It is available in the Designer and the Workflow Manager.
TO_CHAR( numeric_value )
TO_CHAR (date [, format ] )
Example : The following expression converts the values in the SALES port to text:
TO_CHAR (SALES )
SALES
1800.03
-22.57891
RETURN VALUE
'1800.03'
'-22.57891'
The following expression converts the dates in the DATE_PROMISED port to text in the format MON DD YYYY:
TO_CHAR (DATE_PROMISED, 'MON DD YYYY' )
DATE_PROMISED
Apr 1 1998 12:00:10AM
RETURN VALUE
'Apr 01 1998'
If we omit the format_string argument, TO_CHAR returns a string in the default date format ‘MM/DD/YYYY’.
We can use Conversion functions with DATE functions in order to do some calculations.
The following composite expression converts the string DATE_PROMISED to date, adds 1 to it and then converts the same to text string with the format YYYYMMDD.
TO_CHAR(ADD_TO_DATE(TO_DATE(DATE_PROMISED),'DD',1),'YYYYMMDD')
Test functions can also be used with Conversion functions.
The following expression uses IS_DATE along with TO_CHAR.
IS_DATE(TO_CHAR(DATE_PROMISED,'YYYYMMDD'))
* TO_CHAR returns NULL if invalid Date is passed to the function.
2.2 TO_DATE:
The TO_DATE function converts a character string to a date datatype in the same format as the character string. It is available in the Designer and the Workflow Manager.
TO_DATE( string [, format ] )
Example : The following expression returns date values for the strings in the DATE_PROMISED port. TO_DATE always returns a date and time. If we pass a string that does not have a time value, the date returned always includes the time 00:00:00. If we execute a session in the twentieth century, the century will be 19.
The current year on the machine running the Informatica Server is 1998:
TO_DATE( DATE_PROMISED, 'MM/DD/YY' )
DATE_PROMISED
'12/28/81'
NULL
RETURN VALUE
Dec 28 1981 00:00:00
NULL
The format of the string must exactly be the format given in the TO_DATE function.
* TO_DATE function fails if invalid date entries are given. To avoid this we must use IS_DATE function to check if the string has a valid date to be converted.
2.3 TO_DECIMAL:
The TO_DECIMAL function converts any value (except binary) to a decimal. It is available in the Designer.
TO_DECIMAL( value [, scale ] )
Example : This expression uses values from the port IN_TAX. The datatype is decimal with precision of 10 and scale of 3:
TO_DECIMAL( IN_TAX, 3 )
IN_TAX
'15.6789'
NULL
'A12.3Grove'
RETURN VALUE
15.678
NULL
0
We can also use two conversion functions together in a single expression.
The following expression uses the functions TO_DECIMAL and TO_CHAR.
TO_DECIMAL(TO_CHAR(DATE_PROMISED,'YYYYMMDD'))
2.4 TO_FLOAT:
The TO_FLOAT function converts any value (except binary) to a double-precision floating point number (the Double datatype). It is available in the Designer and the Workflow Manager.
TO_FLOAT( value )
Example : This expression uses values from the port IN_TAX:
TO_FLOAT( IN_TAX )
IN_TAX
'15.6789'
NULL
2.5 TO_INTEGER:
RETURN VALUE
15.6789
NULL
The TO_INTEGER function converts any value (except binary) to an integer by rounding the decimal portion of a value. It is available in the Designer and the Workflow Manager.
TO_INTEGER( value )
Example : This expression uses values from the port IN_TAX:
TO_INTEGER( IN_TAX )
IN_TAX
'15.6789'
'60.2'
RETURN VALUE
16
60
3 Date Functions:
Date Format Strings in the Transformation Reference
D, DD, DDD, DAY, DY, J
Days (01-31). We can use any of these format strings to specify the entire day portion of a date. For example, if we pass 12-APR-1997 to a date function, we can use any of these format strings specify 12.
HH, HH12, HH24
Hour of day (0 to 23), where zero is 12 AM (midnight). We can use any of these formats to specify the entire hour portion of a date. For example, if we pass the date 12-APR-1997 2:01:32 PM, we can use HH, HH12, or HH24 to specify the hour portion of the date.
MI
Minutes.
MM, MON, MONTH
Month portion of date (0 to 59). We can use any of these format strings to specify the entire month portion of a date. For example, if we pass 12-APR-1997 to a date function, we can use MM, MON, or MONTH to specify APR.
SS , SSSS
Second portion of date (0 to 59).
Y, YY, YYY, YYYY , RR
Year portion of date (1753 to 9999). We can use any of these format strings to specify the entire year portion of a date. For example, if we pass 12-APR-1997 to a date function, we can use Y, YY, YYY, or YYYY to specify 1997.
3.1 ADD_TO_DATE
The ADD_TO_DATE function adds a specified amount to one part of a date/time value, and returns a date in the same format as the specified date.
Note: If we do not specify the year as YYYY, the Informatica Server assumes the date is in the current century. It is available in the Designer and the Workflow Manager.
ADD_TO_DATE( date, format, amount )
Example : The following expression adds one month to each date in the
DATE_SHIPPED port. If we pass a value that creates a day that does not exist in a particular month, the Informatica Server returns the last day of the month. For example, if we add one month to Jan 31 1998, the Informatica Server returns Feb 28 1998.
Also note, ADD_TO_DATE recognizes leap years and adds one month to Jan 29 2000:
ADD_TO_DATE( DATE_SHIPPED, 'MM', 1 )
DATE_SHIPPED
Jan 12 1998 12:00:30AM
RETURN VALUE
Feb 12 1998 12:00:30AM
The following expression subtracts 10 days from each date in the DATE_SHIPPED port:
ADD_TO_DATE( DATE_SHIPPED, 'D', -10 )
DATE_SHIPPED
Jan 1 1997 12:00:30AM
RETURN VALUE
Dec 22 1996 12:00AM
The following expression subtracts 15 hours from each date in the DATE_SHIPPED port:
ADD_TO_DATE( DATE_SHIPPED, 'HH', -15 )
DATE_SHIPPED
Jan 1 1997 12:00:30AM
RETURN VALUE
Dec 31 1996 9:00:30AM
In ADD_TO_DATE function, if the argument passed evaluates to a date that does not exist in a particular month, the Informatica Server returns the last day of the month.
The following expression reveals this.
ADD_TO_DATE( DATE_SHIPPED, 'MON', 3 )
DATE_SHIPPED
Jan 31 1998 6:24:45PM
3.2 DATE_COMPARE
RETURN VALUE
Apr 30 1998 6:24:45PM
The DATE_COMPARE function returns a value indicating the earlier of two dates. It is available in the Designer and the Workflow Manager.
DATE_COMPARE( date1, date2 )
Example : The following expression compares each date in the DATE_PROMISED and DATE_SHIPPED ports, and returns an integer indicating which date is earlier:
DA DATE_COMPARE ( DATE_PROMISED, DATE_SHIPPED )
DATE_PROMISED
Jan 1 1997
Feb 1 1997
Dec 22 1997
3.3 DATE_DIFF
DATE_SHIPPED
Jan 13 1997
Feb 1 1997
Dec 15 1997
RETURN VALUE
-1
0
1
The DATE_DIFF function returns the length of time between two dates, measured in the specified increment (years, months, days, hours, minutes, or seconds). It is available in the Designer and the Workflow Manager.
DATE_DIFF( date1, date2, format )
Example: The following expressions return the number of days between the DATE_PROMISED and the DATE_SHIPPED ports:
DATE_DIFF DATE_DIFF ( DATE_PROMISED, DATE_SHIPPED, 'D' )
DATE_DIFF DATE_DIFF ( DATE_PROMISED, DATE_SHIPPED, 'DD' )
DATE_PROMISED
Jan 1 1997 12:00:00AM
Mar 29 1997 12:00:00PM
DATE_SHIPPED
Mar 29 1997 12:00:00PM
Jan 1 1997 12:00:00AM
RETURN VALUE
-87.5
87.5
We can combine DATE functions and TEST functions so as to validate the dates.
For example, while using the DATE functions like DATE_COMPARE and DATE_DIFF, the dates given as inputs can be validated using the TEST function IS_DATE and then passed to them if valid.
3.4 GET_DATE_PART
The GET_DATE_PART function returns the specified part of a date as an integer value, based on the default date format of MM/DD/YYYY HH24:MI:SS. It is available in the Designer and the Workflow Manager.
GET_DATE_PART( date, format )
Example: The following expressions return the day for each date in the
DATE_SHIPPED port:
GE GET_DATE_PART ( DATE_SHIPPED, 'D' )
GEGET_DATE_PART ( DATE_SHIPPED, 'DD' )
DATE_SHIPPED
Mar 13 1997 12:00:00AM
June 3 1997 11:30:44PM
NULL
3.5 LAST_DAY
RETURN VALUE
13
3
NULL
The LAST_DAY function returns the date of the last day of the month for each date in a port. It is available in the Designer and the Workflow Manager.
LAST_DAY( date )
Example : The following expression returns the last day of the month for each date in
the ORDER_DATE port:
LAST_DAY( ORDER_DATE )
ORDER_DATE
RETURN VALUE
Apr 1 1998 12:00:00AM
Jan 6 1998 12:00:00AM
Apr 30 1998 12:00:00AM
Jan 31 1998 12:00:00AM
DATE functions combine with Conversion functions also.
The following expression has LAST_DAY and TO_DATE functions nested or combined together.
LAST_DAY( TO_DATE( GIVEN_DATE, 'DD-MON-YY' ))
3.6 MAX
The MAX function returns the latest date found in a group. It is available in the Designer.
MAX( date, filter_condition )
We can return the maximum date for a port or group.
Example: The following expression returns the maximum order date for flashlights:
MAX( ORDERDATE, ITEM_NAME='Flashlight' )
ITEM_NAME
Flashlight
Regulator System
Flashlight
Diving Hood
Halogen Flashlight
Flashlight
RETURN VALUE: Oct 10 1998
3.7 MIN
ORDER_DATE
Apr 20 1998
May 15 1998
Sep 21 1998
Aug 18 1998
Feb 1 1998
Oct 10 1998
The MIN function returns the earliest date found in a group. It is available in the Designer.
MIN( date, filter_condition )
Example: The following expression returns the oldest order date for flashlights:
MIN( ORDER_DATE, ITEM_NAME='Flashlight' )
ITEM_NAME
Flashlight
Regulator System
Flashlight
Diving Hood
ORDER_DATE
Apr 20 1998
May 15 1998
Sep 21 1998
Aug 18 1998clip_image004[8]
Halogen Flashlight
Flashlight
RETURN VALUE: Feb 1 1998
3.8 ROUND
Feb 1 1998
Oct 10 1998
The ROUND function rounds one part of a date. It is available in the Designer and the Workflow Manager.
ROUND( date [, format ] )
Example: The following expressions round the month portion of each date in the DATE_SHIPPED port.
ROUND( DATE_SHIPPED, 'MM' )
ROUND( DATE_SHIPPED, 'MON' )
DATE_SHIPPED
Jan 15 1998 2:10:30AM
RETURN VALUE
Jan 1 1998 12:00:00AM
Similarly the ROUND function can be used to round off Year, Day or Time portions.
3.9 SET_DATE_PART
The SET_DATE_PART function sets one part of a date/time value to a specified value. It is available in the Designer and the Workflow Manager.
SET_DATE_PART( date, format, value )
Example: The following expressions change the month to June for the dates in the DATE_PROMISED port. The Informatica Server displays an error when we try to create a date that does not exist, such as changing March 31 to June 31:
SET_DATE_PART( DATE_PROMISED, 'MM', 6 )
SET_DATE_PART( DATE_PROMISED, 'MON', 6 )
DATE_PROMISED
Jan 1 1997 12:15:56AM
NULL
RETURN VALUE
Jun 1 1997 12:15:56AM
NULL
Similarly the SET_DATE_PART function can be used to round off Year, Day or Time portions.
3.10 TRUNC
The TRUNC function truncates dates to a specific year, month, day, hour, or minute. It is available in the Designer and the Workflow Manager.
TRUNC( date [, format ] )
Example: The following expressions truncate the year portion of dates in the DATE_SHIPPED port:
TRUNC( DATE_SHIPPED, 'Y' )
TRUNC( DATE_SHIPPED, 'YY' )
DATE_SHIPPED
Jan 15 1998 2:10:30AM
RETURN VALUE
Jan 1 1998 12:00:00AM
Similarly the TRUNC function can be used to truncate Month , Day or Time portions.
The functions TRUNC & ROUND can be nested in order to manipulate dates.
4 Special Functions:
4.1 DECODE
The DECODE function searches a port for the specified value. It is available in the Designer and the Workflow Manager.
DECODE( value, first_search, first_result [, second_search, second_result ]…[, default ] )
Example: We might use DECODE in an expression that searches for a particular ITEM_ID and returns the ITEM_NAME:
DECODE( ITEM_ID, 10, 'Flashlight',
14, 'Regulator',
20, 'Knife',
40, 'Tank',
'NONE' )
ITEM_ID
10
14
17
4.2 IIF
RETURN VALUE
Flashlight
Regulator
NONE
The IIF function returns one of two values we specify, based on the results of a condition. It is available in the Designer and the Workflow Manager.
IIF( condition, value2 [, value2 ] )
Example : IIF( SALES < 100, 0, SALARY )
SALES
150
50
SALARY
50,000.00
20,000.00
RETURN VALUE
50,000
0
NULL
50,000.41
50,000
IIF functions can be nested if there is more than one condition to be tested. But it is always a better option to go for DECODE function when the number of conditions is large since DECODE function is less costlier compared to IIF function.
For example consider the following expression
IIF(MARKS>=90,'A',
(IIF(MARKS>= 75,'B',
(IIF(MARKS>=65,'C',
(IIF(MARKS>=55,'D',
IIF(MARKS>=45,'E',
'F'))))))))
The same result can be obtained with
DECODE(TRUE,
MARKS>=90,'A',
MARKS>=75,'B',
MARKS>=65,'C',
MARKS>=55,'D',
MARKS>=45,'E',
'F')
When the number of conditions increase we will be able to appreciate the simplicity of the DECODE function and the complexity of the IIF function.
In both the cases , If MARKS>90 it will return 'A' though it satisfies all the conditions given. It is because it returns when the first condition is satisfied.
Therefore even if a port satisfies two or more the conditions it will take only the first one. Therefore Ordering is important in IIF and DECODE functions.
4.3 ERROR:
The ERROR function causes the Informatica Server to skip a record and throws an error message defined by the user. It is available in the Designer.
ERROR( string )
Example : The following example shows how you can reference a mapping that calculates the average salary for employees in all departments of your company, but skips negative values. The following expression nests the ERROR function in an IIF expression so that if the Informatica Server finds a negative salary in the Salary port, it skips the row and displays an error:
IIF( SALARY < 0, ERROR ('Error. Negative salary found. Row skipped.', EMP_SALARY )
SALARY RETURN VALUEclip_image004[11]
10000 10000
-15000 'Error. Negative salary found. Row skipped.'
The below example combines two special functions, a test Function and a conversion function.
IIF(IS_DATE(DATE_PROMISED,'MM/DD/YY'),TO_DATE(DATE_PROMISED),ERROR('Invalid
Date'))
4.4 LOOKUP:
The LOOKUP function searches for a particular value in a lookup source column. It is available in the Designer.
LOOKUP( result, search1, value1 [, search2, value2]… )
Example : The following expression searches the lookup source :TD.SALES for a specific item ID and price, and returns the item name if both searches find a match:
LOOKUP( :TD.SALES.ITEM_NAME, :TD.SALES.ITEM_ID, 10, :TD.SALES.PRICE, 15.99 )
ITEM_NAME
Regulator
Flashlight
5 Test Functions:
5.1 ISNULL
ITEM_ID
5
10
PRICE
100.00
15.99
The ISNULL function returns whether a value is NULL. It is available in the Designer and the Workflow Manager.
ISNULL( value )
Example : The following example checks for null values in the items table:
ISNULL ISNULL ( ITEM_NAME )
ITEM_NAME
Flashlight
NULL
''
5.2 IS_DATE
RETURN VALUE
0 (FALSE)
1 (TRUE)
0 (FALSE) Empty string is not NULL
The IS_DATE function returns whether a value is a valid date. It is available in the Designer and the Workflow Manager.
IS_DATE( value )
Example : The following expression checks the INVOICE_DATE port for valid dates:
IS_DATE( INVOICE_DATE )
This expression returns data similar to the following:
INVOICE_DATE
NULL
180
'04/01/98'
'04/01/1998 00:12:15'
'02/31/1998 12:13:55'
'John Smith'
RETURN VALUE
NULL
0 (FALSE)
0 (FALSE)
1 (TRUE)
0 (FALSE) (February does not have 31 days)
0 (FALSE)
This function can also be used to validate a date for a specified format for which the
syntax is
IS_DATE( value, format )
If the format is not specified, ‘MM/DD/YYYY’ is taken as the default format.
5.3 IS_NUMBER
The IS_NUMBER returns whether a string is a valid number. It is available in the Designer and the Workflow Manager.
IS_NUMBER( value )
Example : The following expression checks the ITEM_PRICE port for valid numbers:
IS_NUMBER( ITEM_PRICE )
ITEM_PRICE
123.00
-3.45e+3
''
+123abc
ABC
-ABC
NULL
RETURN VALUE
1 (True)
1 (True)
0 (False) Empty string
0 (False)
0 (False)
0 (False)
NULL
5.4 IS_SPACES
The IS_SPACES function returns whether a value consists entirely of spaces. It is available in the Designer and the Workflow Manager.
IS_SPACES( value )
Example : The following expression checks the ITEM_NAME port for rows that consist entirely of spaces:
IS_SPACES IS_SPACES ( ITEM_NAME )
ITEM_NAME
Flashlight
Regulator
system
RETURN VALUE
0 (False)
1 (True)
0 (False)

Saturday, August 18, 2012

Configuration of informatica - Informatica Training With Live Project

Configuration of informatica - Informatica Training With Live Project

Unix interview questions


How do send the session report (.txt) to manager after session is completed?
Email variable - %a (attach the file) %g – attach session log file

How do identify the empty line in a flat file in Unix? How to remove it?
grep –v “^$” filename

List the files in ascending order in Unix?
ls  -lt (sort by last date modified)  ls –ltr (reverse)
ls –lS (sort by size of the file)

How to open a Database using Unix Command
mysql -u uname -h hostname –p pwd

What is command to check space in Unix?
Ans: df -k 

What is command to kill last background Job? 
Ans: kill $!

How you will list all Hidden files? 
Ans: ls -la|grep ^[.] 

How to kill a process forcibly?
Ans: kill -9 PID (Process Identification Number) 

How to print/display the first line of a file?
There are many ways to do this. However the easiest way to display the first line of a file is using the [head] command.
$> head -1 file.txt
No prize in guessing that if you specify [head -2] then it would print first 2 records of the file.
Another way can be by using [sed] command. [Sed] is a very powerful text editor which can be used for various text manipulation purposes like this.
$> sed '2,$ d' file.txt
How does the above command work? The 'd' parameter basically tells [sed] to delete all the records from display from line 2 to last line of the file (last line is represented by $ symbol). Of course it does not actually delete those lines from the file, it just does not display those lines in standard output screen. So you only see the remaining line which is the 1st line.

How to print/display the last line of a file?
The easiest way is to use the [tail] command.
$> tail -1 file.txt
If you want to do it using [sed] command, here is what you should write:
$> sed -n '$ p' test
From our previous answer, we already know that '$' stands for the last line of the file. So '$ p' basically prints (p for print) the last line in standard output screen. '-n' switch takes [sed] to silent mode so that [sed] does not print anything else in the output.

How to display n-th line of a file?
The easiest way to do it will be by using [sed] I guess. Based on what we already know about [sed] from our previous examples, we can quickly deduce this command:
$> sed –n '<n> p' file.txt
You need to replace <n> with the actual line number. So if you want to print the 4th line, the command will be
$> sed –n '4 p' test
Of course you can do it by using [head] and [tail] command as well like below:
$> head -<n> file.txt | tail -1
You need to replace <n> with the actual line number. So if you want to print the 4th line, the command will be
$> head -4 file.txt | tail -1

How to remove the first line / header from a file?
We already know how [sed] can be used to delete a certain line from the output – by using the'd' switch. So if we want to delete the first line the command should be:
$> sed '1 d' file.txt
But the issue with the above command is, it just prints out all the lines except the first line of the file on the standard output. It does not really change the file in-place. So if you want to delete the first line from the file itself, you have two options.
Either you can redirect the output of the file to some other file and then rename it back to original file like below:
$> sed '1 d' file.txt > new_file.txt
$> mv new_file.txt file.txt
Or, you can use an inbuilt [sed] switch '–i' which changes the file in-place. See below:
$> sed –i '1 d' file.txt

How to remove the last line/ trailer from a file in Unix script?
Always remember that [sed] switch '$' refers to the last line. So using this knowledge we can deduce the below command:
$> sed –i '$ d' file.txt

How to remove certain lines from a file in Unix?
If you want to remove line <m> to line <n> from a given file, you can accomplish the task in the similar method shown above. Here is an example:
$> sed –i '5,7 d' file.txt
The above command will delete line 5 to line 7 from the file file.txt

How to remove the last n-th line from a file?
This is bit tricky. Suppose your file contains 100 lines and you want to remove the last 5 lines. Now if you know how many lines are there in the file, then you can simply use the above shown method and can remove all the lines from 96 to 100 like below:
$> sed –i '96,100 d' file.txt   # alternative to command [head -95 file.txt]
But not always you will know the number of lines present in the file (the file may be generated dynamically, etc.) In that case there are many different ways to solve the problem. There are some ways which are quite complex and fancy. But let's first do it in a way that we can understand easily and remember easily. Here is how it goes:
$> tt=`wc -l file.txt | cut -f1 -d' '`;sed –i "`expr $tt - 4`,$tt d" test
As you can see there are two commands. The first one (before the semi-colon) calculates the total number of lines present in the file and stores it in a variable called “tt”. The second command (after the semi-colon), uses the variable and works in the exact way as shows in the previous example.

How to check the length of any line in a file?
We already know how to print one line from a file which is this:
$> sed –n '<n> p' file.txt
Where <n> is to be replaced by the actual line number that you want to print. Now once you know it, it is easy to print out the length of this line by using [wc] command with '-c' switch.
$> sed –n '35 p' file.txt | wc –c
The above command will print the length of 35th line in the file.txt.

How to get the nth word of a line in Unix?
Assuming the words in the line are separated by space, we can use the [cut] command. [cut] is a very powerful and useful command and it's real easy. All you have to do to get the n-th word from the line is issue the following command:
cut –f<n> -d' '
'-d' switch tells [cut] about what is the delimiter (or separator) in the file, which is space ' ' in this case. If the separator was comma, we could have written -d',' then. So, suppose I want find the 4th word from the below string: “A quick brown fox jumped over the lazy cat”, we will do something like this:
$> echo “A quick brown fox jumped over the lazy cat” | cut –f4 –d' '
And it will print “fox”

How to reverse a string in unix?
Pretty easy. Use the [rev] command.
$> echo "unix" | rev
xinu

How to get the last word from a line in Unix file?
We will make use of two commands that we learnt above to solve this. The commands are [rev] and [cut]. Here we go.
Let's imagine the line is: “C for Cat”. We need “Cat”. First we reverse the line. We get “taC rof C”. Then we cut the first word, we get 'taC'. And then we reverse it again.
$>echo "C for Cat" | rev | cut -f1 -d' ' | rev
Cat

How to get the n-th field from a Unix command output?
We know we can do it by [cut]. Like below command extracts the first field from the output of [wc –c] command
$>wc -c file.txt | cut -d' ' -f1
109
But I want to introduce one more command to do this here. That is by using [awk] command. [awk] is a very powerful command for text pattern scanning and processing. Here we will see how may we use of [awk] to extract the first field (or first column) from the output of another command. Like above suppose I want to print the first column of the [wc –c] output. Here is how it goes like this:
$>wc -c file.txt | awk ' ''{print $1}'
109
The basic syntax of [awk] is like this:
awk 'pattern space''{action space}'
The pattern space can be left blank or omitted, like below:
$>wc -c file.txt | awk '{print $1}'
109
In the action space, we have asked [awk] to take the action of printing the first column ($1). More on [awk] later.

How to replace the n-th line in a file with a new line in Unix?
This can be done in two steps. The first step is to remove the n-th line. And the second step is to insert a new line in n-th line position. Here we go.
Step 1: remove the n-th line
$>sed -i'' '10 d' file.txt       # d stands for delete
Step 2: insert a new line at n-th line position
$>sed -i'' '10 i This is the new line' file.txt     # i stands for insert

How to show the non-printable characters in a file?
Open the file in VI editor. Go to VI command mode by pressing [Escape] and then [:]. Then type [set list]. This will show you all the non-printable characters, e.g. Ctrl-M characters (^M) etc., in the file.

How to zip a file in Linux?
Use inbuilt [zip] command in Linux

How to unzip a file in Linux?
Use inbuilt [unzip] command in Linux.
$> unzip –j file.zip

How to test if a zip file is corrupted in Linux?
Use “-t” switch with the inbuilt [unzip] command
$> unzip –t file.zip

How to check if a file is zipped in Unix?
In order to know the file type of a particular file use the [file] command like below:
$> file file.txt
file.txt: ASCII text
If you want to know the technical MIME type of the file, use “-i” switch.
$>file -i file.txt
file.txt: text/plain; charset=us-ascii
If the file is zipped, following will be the result
$> file –i file.zip
file.zip: application/x-zip

How to connect to Oracle database from within shell script?
You will be using the same [sqlplus] command to connect to database that you use normally even outside the shell script. To understand this, let's take an example. In this example, we will connect to database, fire a query and get the output printed from the unix shell. Ok? Here we go –
$>res=`sqlplus -s username/password@database_name <<EOF
SET HEAD OFF;
select count(*) from dual;
EXIT;
EOF`
$> echo $res
1
If you connect to database in this method, the advantage is – you will be able to pass Unix side shell variables value to the database. See below:
$>res=`sqlplus -s username/password@database_name <<EOF
SET HEAD OFF;
select count(*) from customer where last_name='$1';
EXIT;
EOF`
$> echo $res
12

How to execute a database stored procedure from Shell script?
$> SqlReturnMsg=`sqlplus -s username/password@database <<EOF
BEGIN
Proc_Your_Procedure(… your-input-parameters …);
END;
/
EXIT;
EOF`
$> echo $SqlReturnMsg

How to check the command line arguments in a UNIX command in Shell Script?
In a bash shell, you can access the command line arguments using $0, $1, $2, … variables, where $0 prints the command name, $1 prints the first input parameter of the command, $2 the second input parameter of the command and so on.

How to fail a shell script programmatically?
Just put an [exit] command in the shell script with return value other than 0. this is because the exit codes of successful Unix programs is zero. So, suppose if you write
exit -1
inside your program, then your program will thrown an error and exit immediately.

How to list down file/folder lists alphabetically?
Normally [ls –lt] command lists down file/folder list sorted by modified time. If you want to list then alphabetically, then you should simply specify: [ls –l]

How to check if the last command was successful in Unix?
To check the status of last executed command in UNIX, you can check the value of an inbuilt bash variable [$?]. See the below example:
$> echo $?

How to check if a file is present in a particular directory in Unix?
Using command, we can do it in many ways. Based on what we have learnt so far, we can make use of [ls] and [$?] command to do this. See below:
$> ls –l file.txt; echo $?
If the file exists, the [ls] command will be successful. Hence [echo $?] will print 0. If the file does not exist, then [ls] command will fail and hence [echo $?] will print 1.

How to check all the running processes in Unix?
The standard command to see this is [ps]. But [ps] only shows you the snapshot of the processes at that instance. If you need to monitor the processes for a certain period of time and need to refresh the results in each interval, consider using the [top] command.
$> ps –ef
If you wish to see the % of memory usage and CPU usage, then consider the below switches
$> ps aux
If you wish to use this command inside some shell script, or if you want to customize the output of [ps] command, you may use “-o” switch like below. By using “-o” switch, you can specify the columns that you want [ps] to print out.
$>ps -e -o stime,user,pid,args,%mem,%cpu

How to tell if my process is running in Unix?
You can list down all the running processes using [ps] command. Then you can “grep” your user name or process name to see if the process is running. See below:
$>ps -e -o stime,user,pid,args,%mem,%cpu | grep "opera"
14:53 opera 29904 sleep 60                     0.0  0.0
14:54 opera 31536 ps -e -o stime,user,pid,arg  0.0  0.0
14:54 opera 31538 grep opera                0.0  0.0

How to get the CPU and Memory details in Linux server?
In Linux based systems, you can easily access the CPU and memory details from the /proc/cpuinfo and /proc/meminfo, like this:
$>cat /proc/meminfo
$>cat /proc/cpuinfo
Just try the above commands in your system to see how it works.



How to display top 10 users Who | head -10 | wc –w?
who | head -10
ls -lrt |tail -10
 
What is ls -ltd?
ls - ltd  
current directory information.
Option l: Long listing
       t: Time Stamp
       d: Information about current directory 
          (Used with -l Option)

Saturday, August 11, 2012

Informatica Paramater File Concept

Slowly Changing Dimension Type 2 Illustration Using Informatica

DWH,Informatica,Oracle(FAQ)


Data warehousing concepts (FAQ’s)
1)      What is Data warehouse?

Data warehouse is relational database used for query analysis and reporting. By definition data warehouse is Subject-oriented, Integrated, Non-volatile, Time variant.                                                                                           
Subject oriented : Data warehouse is maintained particular subject.
Integrated : Data collected from multiple sources integrated into a  user readable unique format.
Non volatile  : Maintain Historical date.
Time variant : data display the weekly, monthly, yearly.

2)      What is Data mart?

A subset of data warehouse is called Data mart.

3)      Difference between Data warehouse and Data mart?

Data warehouse is maintaining the total organization of data. Multiple data        marts used in data warehouse. where as data mart is maintained only particular subject.
4)      Difference between OLTP and OLAP?
OLTP is Online Transaction Processing. This is maintained current transactional data. That means insert, update and delete must be fast.
5)      Explain ODS?
Operational data store is a part of data warehouse. This is maintained only current transactional data. ODS is subject oriented, integrated, volatile, current data.
6)      Difference between Power Center and Power Mart?
Power center receive all product functionality including ability to multiple register servers and metadata across the repository and partition data.
One repository multiple informatica servers. Power mart received all features except multiple register servers and partition data.
7)      What is a staging area?
Staging area is a temporary storage area used for transaction, integrated and rather than transaction processing.
      When ever your data put in data warehouse you need to clean and process your data.
8)      Explain Additive, Semi-additive, Non-additive facts?
  Additive fact: Additive Fact can be aggregated by simple arithmetical additions.
  Semi-Additive fact: semi additive fact can be aggregated simple arithmetical
    additions along with some other dimensions.
       Non-additive fact: Non-additive fact can’t be added at all.
9)      What is a Fact less Fact and example?
  Fact table which has no measures.
10)  Explain Surrogate Key?
 Surrogate Key is a series of sequential numbers assigned to be a primary key for the table.
11)  How many types of approaches in DHW?
 Two approaches: Top-down(Inmol approach), Bottom-up(Ralph Kimball)
12)  Explain Star Schema?
Star Schema consists of one or more fact table and one or more dimension tables that are   toforeignkeys. Dimension tables are De-normalized, Fact table-normalized
     Advantages: Less database space &  Simplify queries.
13)  Explain Snowflake schema?
Snow flake schema is a normalize dimensions to eliminate the redundancy.The dimension data has been grouped into one large table. Both dimension and fact tables normalized.
14)  What is confirm dimension?
If both data marts use same type of dimension that is called confirm dimension.If you have same type of dimension can be used in multiple fact that is called confirm dimension.
15)  Explain the DWH architecture?
16)  What is a slowly growing dimension?
Slowly growing dimensions are dimensional data,there dimensions increasing dimension data with out update existing dimensions.That means appending new data to existing dimensions.
17)  What is a slowly changing dimension?
Slowly changing dimension are dimension data,these dimensions increasing dimensions data with update existing dimensions.
Type1: Rows containing changes to existing dimensional are update in the target by overwriting the existing dimension.In the Type1 Dimension mapping, all rows contain current dimension data.
  Use the type1 dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.
Type2: The Type2 Dimension data mapping inserts both new and changed dimensions into the target.Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.
  Use the Type2 Dimension/version data mapping to update a slowly changing dimension when you want to keep a full history of dimension data in the table.version numbers and versioned primary keys track the order of changes to each dimension.
Type3: The type 3 dimension mapping filters source rows based on user-defined comparisions and inserts only those found to be new dimensions to the target.Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension the informatica server saves existing data in different columns of the same row and replaces the existing data with the updates.
18)  When you use for dynamic cache.
Your target table is also look up table then you go for dynamic cache .In dynamic cache multiple matches return an error.use only = operator.
19)  what is lookup override?
Override the default SQL statement.You can join multiple sources use lookup override.By default informatica server add the order by clause.
20)  we can pass the null value in lookup transformation?
Lookup transformation returns the null value or equal to null value.
21)  what is the target load order?
You specify the target load order based on source qualifiers in a mapping.if u have the multiple source qualifiers connected to the multiple targets you can designate the order in which informatica server loads data into the targets.
22)  what is default join that source qualifier provides?
Inner equi join.
23)  what are the difference between joiner transformation and source qualifier transformation?
You can join heterogeneous data sources in joiner transformation, which we cannot achive in source qualifier transformation.
You need matching keys to join two relational sources in source qualifier transformation.where you doesn’t need matching keys to join two sources.
Two relational sources should come from same data source in source qualifier.You can join relational sources, which are coming from different sources in source qualifier.You can join relational sources which are coming from different sources also.
24)  what is update strategy transformation?       
Whenever you create the target table whether you are store the historical data or current transaction data in to target table.
25)  Describe two levels in which update strategy transformation sets?
26)  what is default source option for update strategy transformation?
Data driven.
27)  What is data driven?
 The information server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert,update,delete or reject if u do not choose data driven option setting , the informatica server ignores all update strategy transformations in the mapping.
28)  what are the options in the trarget session of update strategy transformation?
  Insert
  Delete
  Update
  Update as update
 Update as insert
Update else insert
 Truncate table.
29)  Difference between the source filter and filter?
 Source filter is filtering the data only relational sources. Where as filter transformation filter the data any type of source.

30)  what is a tracing level?
Amount of information sent to log file.
-- What are the types of tracing levels?
Normal,Terse,verbose data,verbose intitialization.
--Expalin sequence generator transformation?
-- can you connect multiple ports from one group to multiple transformations?
   Yes
31)  can you connect more than one group to the same target or transformation?
 NO
32)  what is a reusable transformation?
  Reusable transformation can be a single transformation.This transformation can be used in multiple mappings.when you need to incorporate this transformation into mapping you add an instance of it to mapping.Later if you change the definition of the transformation, all instances of it inherit the changes.Since the instance of reusable transformation is a pointer to that transformation.U can change the transformation in the transformation developer, its instance automatically reflect these changes. This feature can save U great deal of work.
-- what are the methods for creating reusable transformation?
      Two methods
1)      Design it in the transformation developer.
2)      Promote a standard transformation from the mapping designer.After you add a transformation to the mapping, you can promote it to status of reusable transformation.
Once you promote a standard transformation to reusable status, you can demote it to a standard transformation at any time.
If u change the properties of a reusable transformation in mapping , you can revert it to the original reusable transformation properties by clicking the revert.
33)  what are mapping parameters and mapping variables?
Mapping parameter represents a constant value that you can define before running a session.A mapping parameter retains the same value throughout the entire session.
When you use the mapping parameter , you declare and use the parameter in a mapping or mapplet.Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change through out the session. The informatica server save the value of mapping variable to the repository at the end of session run and uses that value next time you run the session.
34)  can you use the mapping parameters or variables created in one mapping into another mapping?
NO, we can use mapping parameters or variables in any transformation of the same mapping or mapplet in which have crated mapping parameters or variables.
35)  Can you are the mapping parameters or variables created in one mapping into any other result transformation.
Yes because the reusable transformation is not contained with any mapplet or mapping.
36)  How the informatica server sorts the string values in rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data using binary sort order.If you configures the session to use a binary sort order, the informatica server calculates the binary value of each string and returns the specified number of rows with the highest binary values for the string.

37)  What is the rank index in rank transformation?
The designer automatically creates a RANKINDEX port for each Rank transformation. The informatica server uses the Rank Index port to store the ranking position for each record in a group.For example, if you create a Rank transformation that ranks the top 5 sales persons for each quarter, the rank index number the salespeople from 1 to 5.
38)  what is the mapplet?
Mapplet is a set of transformation that you build in the mapplet designer and you can use in multiple mappings.
39)  Difference between mapplet and reusable transformation?
Reusable transformation can be a single transformation.Where as mapplet use multiple transformations.
40)  what is a parameter a file?
Paramater file defines the values for parameter and variables.

WORKFLOW MANAGER

41)  what is a server?
The power center server moves data from source to targets based on a workflow and mapping metadata stored in a repository.
42)  what is a work flow?
A workflow is a set of instructions that describe how and when to run tasks related to extracting,transformation and loading data.
       -- what is session?
A session is a set of instructions that describes how to move data from source to target using a mapping.
-- what is workflow monitor?
Use the work flow monitor work flows and stop the power center server.
43)  explain a work flow process?
 The power center server uses both process memory and system shared memory to perform these tasks.
Load manager process:  stores and locks the workflow tasks and start the DTM run the sessions.
Data Transformation Process DTM: Perform session validations,create threads to initialize the session,read,write and transform data, and handle pre and post session operations.
     The default memory allocation is 12,000,000 bytes.
44)  What are types of threads in DTM?
The main dtm thread is called the master thread.
Mapping thread.
Transformation thread.
Reader thread.
Writer thread.
Pre-and-post session thread.
45)  Explain work flow manager tools?
1)      Task developer.
2)      Work flow designer.
3)      Worklet designer.
46)  Explain work flow schedule.
You can sehedule a work flow to run continuously, repeat at given time or interval or you manually start a work flow.By default the workflow runs on demand.
47)  Explain stopping or aborting a session task?
If the power center is executing a session task when you issue the stop the command the power center stop reading data. If continuous processing and writing data and committing data to targets.
If the power center can’t finish processing and committing data you issue the abort command.
You can also abort a session by using the Abort() function in the mapping logic.
48)  What is a worklet?
A worklet is an object that represents a set of taske.It can contain any task available in the work flow manager. You can run worklets inside a workflow. You can also nest a worklet in another worklet.The worklet manager does not provide a parameter file for worklets.
The power center server writes information about worklet execution in the workflow log.

49)  what is a commit interval and explain the types?
A commit interval is the interval at which power center server commits data to targets during a session. The commit interval the number of rows you want to use as a basis for the commit point.
Target Based commit: The power center server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.
Source-based commit:---------------------------------------------
User-defined commit:----------------------------------------------
50)  Explain bulk loading?
You can use bulk loading to improve performance of a session that inserts a large amount of data to a db2,sysbase,oracle or MS SQL server database.
When bulk loading the power center server by passes the database log,which speeds performance.
With out writing to the database log, however the target database can’t perform rollback.As a result you may not be perform recovery.
51)  What is a constraint based loading?
When you select this option the power center server orders the target load on a row-by-row basis only.
Edit tasks->properties->select treat source rows as insert.
Edit tasks->config object tab->select constraint based
If session is configured constraint absed loading when target table receive rows from different sources.The power center server revert the normal loading for those tables but loads all other targets in the session using constraint based loading when possible loading the primary key table first then the foreign key table.
Use the constraint based loading only when the session option treat rows as set to insert.
Constraint based load ordering functionality which allows developers to read the source once and populate parent and child tables in a single process.
52)  Explain incremental aggregation?
When using incremental aggregation you apply captured changes in the source to aggregate calculations in a session.If the source changes only incrementally and you can capture changes you can configure the session to process only those changes. This allows the power center server to update your target incrementally rather than forcing it to process the entire source and recalculate the same data each time you run the session.
You can capture new source data.use incremental aggregation when you can capture new source data much time you run the session.Use a stored procedure on filter transformation only new data.
Incremental changes do not significantly change the target.Use incremental aggregation when the changes do not significantly change the target.If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case drop the table and recreate the target with complete source data.
53)  Processing of incremental aggregation
The first time u run an incremental aggregation session the power center server process the entire source.At the end of the session the power center server stores aggregate data from the session runs in two files, the index file and the data file .The power center server creates the files in a local directory.

     Transformations.

--- what is transformation?
Transformation is repository object that generates modifies or passes data.
54)  what are the type of transformations?
2 types: 
1) active
2) passive.
   -- explain active and passive transformation?
      Active transformation can change the number of rows that pass through it.No of output rows less than or equal to no of input rows.
       Passive transformation does not change the number of rows.Always no of output rows equal to no of input rows.
55)  Difference filter and router transformation.
 Filter transformation to filter the data only one condition and drop the rows don’t meet the condition.
Drop rows does not store any ware like session log file..
Router transformation to filter the data based on multiple conditions and give yiou the option to route rows that don’t match to a default group.
56)  what r the types of groups in router transformation?
 Router transformation 2 groups 1. Input group 2. output groups.
Output groups in 2 types. 1. user defined group 2. default group.
57)  difference between expression and aggregator transformation?
 Expression transformation calculate the single row values before writes the target.Expression transformation executed by row-by-row basis only.
Aggregator transformation allows you to perform aggregate calculations like max, min,avg…
Aggregate transformation perform calculation on groups.
58)  How can u improve the session performance in aggregate transformation?
Use stored input.
59)  what is aggregate cache in aggregate transformation?
 The aggregate stores data in the aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregate transformation , the informatica server creates index and data caches in memory is process the transformation. If the informatica server requires more space it seores overview values in cache files.
60)  explain joiner transformation?
Joiner transformation joins two related heterogeneous sources residing in different locations or files.
--What are the types of joins in joiner in the joiner transformation?
Normal
Master outer
Detail outer
Full outer
61)  Difference between connected and unconnected transformations.
Connected transformation is connected to another transformation with in a mapping.
Unconnected transformation is not connected to any transformation with in a mapping.

62)  In which conditions we cannot use joiner transformation(limitations of joiner transformation)?
Both pipelines begin with the same original data source.
Both input pipelines originate from the same source qualifier transformation.
Both input pipelines originate from the same normalizer transformation
Both input pipelines originate from the same joiner transformation.
Either input pipelines contains an update strategy transformation
Either input pipelines contains sequence generator transformation.
63)  what are the settings that u use to configure the joiner transformation?
   Master and detail source.
   Type of join
  Condition of the join
64)  what is look up transformation
look up transformation can be used in a table view based on condition by default lookup is left outer join
65)  why use the lookup transformation?
To perform the following tasks.
Get a related value.For example if your table includes employee ID,but you want to include such as gross sales per invoice or sales tax but not the calculated value(such as net sales)
Update slowly changing dimension tables. You can use a lookup transformation to determine whether records already exist in the target.
66)  what are the types of lookup?
Connected and unconnected
67)  difference between connected and unconnected lookup?
 Connected lookup
    Unconnected lookup
Receives input values directly from the pipe line.
Receives input values from the result of a clkp expression in a another transformation.
U can use a dynamic or static
Cache
U can use a static cache
Cache includes all lokkup columns used in the mapping(that is lookup table columns included in the lookup condition and lookup table columns linked as output ports to other transformations)
Cache includes all lookup/output ports in the lookup condition and the lookup/return port.
Can return multiple columns from the same row or insert into the dynamic lookup cache.
Designate one return port(R).Returns one column from each row.
If there is no match for the lookup condition, the informatica server returns the default value for all output ports.If u configure dynamic caching the informatica server inserts rows into the cache.
If there is no matching for the lookup  condition the informatica server returns NULL
Pass multiple output values to another transformatnion.Link lookup/output ports to another transformation
Pass one output value to another transformation.The lookup/output/return port passes the same value to the ---------------------------------------------------------
Supports user-defined default values.
Does not support user-defined default values.
68)  explain index cache and data cache?
 The informatica server stores conditions values in the index cache and output values in the data cache.
69)  What are the types of lookup cache?
Persistent cache: U can save the look up cache files and reuse them the next time the informatica server processes a lookup transformation to use the cache.
Static cache: U can configure a static or read-only lookup table.By default informatica server creates a static cache.It caches the lookup table and lookup values in the cache for each row that comes into the transformation.When the lookup condition is true the inforamtica server does not update the cache while it processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the target you can create a look up transformation to use dynamic cache.The informatica server dynamically inserts data into the target table.          
Shared cache: You can share the lookup cache between multiple transformations.You can share unnamed cache between transformation in the same mapping.

70)  Difference between static cache and dynamic cache?
   Static cache
   Dynamic cache
You cannot insert or update the cache
You can insert rows into the cache as you pass rows to the target
The informatica server returns a value from the lookup table or cache when the condition is true,.When the condition is true the informatica server returns the default value for connected transformation
The informatica server inserts rows into the cache when the condition is false.This indicates that the row in the cache or target table.You can pass these rows to the target table.




ORACLE:

71)  Difference between primary key and unique key?
 Primary key is Not null unique
Unique accept the null values.
72)  Difference between inserting and sub string?
73)  What is referential integrity?
74)  Difference between view and materialized view?
75)  What is Redolog file?
  The set of redo log files for a database is collectively know as the databases redo log.
76)  What is RollBack statement?
A database contains one or more rollback segments to temporarily store undo information.Roll back segment are used to generate read consistant data base information during database recovery to rooback uncommitted transactions for users.
-- what is table space?
A data base is divided into logical storage unit called table space.A table space is used to grouped related logical structures together.
-- How to delete the duplicate records.
-- What are the difference types of joins in Oracle?
    Self-join,equi-join,outer join.
77)  What is outer join?
One of which rows that don’t match those in the commen column of another table.
78)  write query Max 5 salaries?
Select * from emp e where 5>(select count(*) from emp where sal>e.sal)
79)  what is synonym?
82)  What is bit map index and example?
83)  What is stored procedure and advantages?
84)  Explain cursor and how many types of triggers in oracle?
Trigger is stored procedure.Trigger is automatically executed.
85)  Difference between function and stored procedure?
  Function returns a value.Procedure does not return a value(but returns a value tru IN OUT parameters!!!!!!)
86)  Difference between replace and translate?
87)  Write the query nth max sal
 Select distinct (a.sal) from emp a where &n=select count(distinct(b.sal) from emp b where a.sal<=b.sal

88)  Write the query odd and even numbers?
Select * from emp where (rowed,1) in (select rowed,mod(rownum,2) from emp)

DataWareHousing - ETL Project Life Cycle ( Simple to understand ) Warehousing -> Datawarehousing projects are categoriz...