Saturday, March 5, 2011

How to remove duplicate records with out sort

How to remove duplicate records with out sort

Suppose if the records like below:

345, bcd, 789
123, abc, 456
234, abc, 456
712, bcd, 789

Output should be
345, bcd, 789
123, abc, 456

Key for the records is 2nd and 3rd fields. fields are separated by colon(,).

Ans : nawk -F',' '!x[$2$3]++' input_file

Ans : awk '!($2$3 in a){a[$2$3]++;print}' FS=',' file

How about this code?

for i in `cat filename | cut -d "," -f2 | uniq`
do
sed -n '/'${i}'/{p;q;}' filename
done

Or:

Ans : awk '!x[$2,$3]++' FS="," file

Use nawk or /usr/xpg4/bin/awk on Solaris.

you can use 'uniq' command also to remove the duplicate records which is very easy to use. Check out the man pages of this command

You need a sorted input for uniq ....

yes I agree with you that uniq like a sorted input

Oracle NVL function

The NVL function is used to replace NULL values by another value.

The syntax for the NVL function is:
NVL( value_in, replace_with )

value_in if the function to test on null values. The value_in field can have a datatype char, varchar2, date or number datatype.
replace_with is the value that is returned if value_in has a null value. The NVL statement works like this pl/sql code:

if (value_in is NULL) then
return replace_with;
else
return value_in;
end if;

Sample code:

select nvl(salary, 0) from employees;

select nvl(ref_code,'Unknown') from users;

what is sql override in lookup transformation

ANS : It overrides the default sql statement to query the lookup table. specifies the sql statement you want the informatica server to use for quering lookup values. use only with the lookup cache enabled. enter only the select,from and where clauses when you enter the sql override
donot enter an orderby clause ,the informatica server always generates an order by clause, even if you enter one in the override

Stored Procedure Transformation :
It is a passive and both connected and unconnected transformation. It is useful to automate time consuming tasks and also used to create and drop the indexes and also used in error handling.It must exist in the database before creating the stored procedure transformation.

Sequence Generator Transformation :
It is a passive and connected transformation. It is used to create unique primary key values

Lookup Transformation :
It is a passive and both connected and unconnected transformation. It is used to lookup data from in a relational table view or synonym. Lookup definition can be imported either from source or target tables

What is the difference between SQL Overriding in Source qualifier and Lookup transformation?

ANS :
1. In LOOKUP SQL override if you add or subtract ports from the SELECT statement the session fails.

2. In LOOKUP if you override the ORDER BY statement the session fails if the ORDER BY statement does not contain the condition ports in the same order they appear in the Lookup condition

You can use SQL override in lookup if you have
1. More than one look up table
2. If you use where condition to reduce the records in the cache.

If you write a query in source qualifier(to override using sql editor) and press validate you can recognise whether the querry written is right or wrong.
But in lookup override if the querry is wrong and if you press validate button.You cannot recognise but when you run a session you will get error message and session fails


EXECUTE THE WORKFLOWS FROM COMMAND PROMPT IN WINDOWS ENVIRONMENT

c:\Informatica>Powercenter8.6.0>Server>bin>pmcmd

pmcmd>connect -sv ravana -d Ravan -u Adminstrator -p Adminstrator

connected to integration service[ravan]

here -sv is server name
-d is domain name, -u is user name, -p is password

pmcmd>start workflow -f practice sanju_agg

here -f practice is practice folder name, sanju_agg is workflow name

Dimensional Table:- It contains detailed information of one
Particular department

Eg:- Customer dimension it contains all the information
about the customer like name, place, DOB, city n so on along
with one primary key.

What is operational data source (ODS)?

ODS captures Day to day transactions .and u can generate
report on ODS

Difference between stop and abort

Stopping a session task means the server stops reading data.

Abort has timeout of 60 sec , If the server is not finished
processing and committing data by the timeout ,the threads
and process assosiated with the sessions are killed.

abort stops the session forcefully so in less time it stops
our task as compared to stop.

What is Dimensional Modelling

In Dimensional Modeling Data is stored in two kinds of tables: Fact Tables and Dimension tables.
Fact Table contains fact data e.g. sales revenue profit etc.....
Dimension table contains dimensional data such as Product Id product name product description etc.....