Surrogate key is a unique identification key, it is like an artificial or alternative key to production key, bz the production key may be alphanumeric or composite key but the surrogate key is always single numeric key.
============================
Surrogate key is the primary key for the Dimensional table.
===============================
surrogate key is a substitution for the natural primary key.
It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.
Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.
It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.
Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.
2. Adapted from response by Vincent on Thursday, March 13, 2003
Another benefit you can get from surrogate keys (SID) is :
Tracking the SCD - Slowly Changing Dimension.
Let me give you a simple, classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.'
If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.
This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'
You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.
========================
When creating a dimension table in a data warehouse, we generally create the tables witha system generated key to unqiuely identify a row in the dimension. This key is also known as a surrogate key. The surrogate key is used as the primary key in the dimension table. The surrogate key will also be placed in the fact table and a foreign key will be defined between the two tables. When you ultimately join the data it will join just as any other join within the database. should you need any further assistance pls revert to this mail id venkatdba2000@yahoo.com or venkata.veluri@gmail.comRegardsVen (Venkat)
===========================
Surrogate Key a simple concept.
Correct n exact answer for SURROGATE KEY IS BELOW:
Definition of Surrogate Key:
Alternate of Primary Key that allows duplication of datas/records.
Need, Where & Why we use Surrogate Key:
OLTP is of "Normalised Form" whereas OLAP (i.e.) Datawarehouse is of "De-normalised form".
Actually the DWH concept is to maintain the historic datas for analysing. So its should denormalized form.
To be denormalise form duplication should be allowed in DWH. When datas entering the DWH Surrogate key a new column named serial number is introduced to allow duplication in OLAP Systems to maintain historic datas.
You all know one thing a single mobile is used by other person if it is not in use for more than one year. how is it posssible just because of this Surrogate Key.
===========================
I think there already enough statements of what a surrogate key is. From experience, the huge advantage:-
Our warehouse captured data from an existing system which identified SALES by a unique number. We implemented this as the "Natural" or "Business" Key but also generated a unique "Surrogate" key. Seems pointless as the incoming business key is numeric already. But we did it.
Six months into the project, the users announced the feeder system would be replaced, but the two SALES systems would run in parallel for a year. Our existing system had a numeric natural key, the new system had a unique ID of a 20 character field.
We simply added the 20 character field to the existing dimension, and the ETL looked up using the appropriate key (existing system key or new replacement ID). Either way it matched and was converted a source system independent surrogate key.
No fuss, no re-design. It just worked. Surrogate key importance proven.
So there's little room for confusion:-
Dimension: SALESMAN
SALES_ID Number not null /* Primary Key */
LEGACY_SALE_IDENTIFIER Number /* Legacy system business key */
NEW_SALE_IDENTIFIER varchar(20) /* New system business key */
...Other attributes...
The SALES_ID the the "Surrogate key" which is the Foreign Key link to the Fact Table(s). This key is simply a generated sequence whenever a new SALESMAN entry is created.
The design (as already described) can be extended to support type 2 Slowly changing dimensions.
====================================
No comments:
Post a Comment