Stage: Analyze

High Level Design Document (HLD)

IPC & Security Reference Manager (SRM) – Data Migration

Document Version: 1.0

17/02/2012

Project Name	Client	Process Name
Index Platform Consolidation	Russell Investments

Revision History

Release Date	Version No.	Summary of Changes	Prepared /Modified By	Reviewed By
14-Feb-2012	0.1	Initial version	Vaithi
16-Feb-2012	0.2	Updated Detailed Architecture for IPC and SRM in Section 3.3, 3.5 and 3.7	Shyam Sundar
21-Feb-2012	0.3	Updated ETL Frame Work and Execution Components in section 4	Shyam Sundar

Contents

1.4

1.5 System Description

1.5.1 Source System Description

1.6

1.6.1 Target System Description

1.7 Methodology and Tools

1.8 Definitions, Acronyms and Abbreviation

1.9

1.10 Assumptions

2 Standards, Conventions and Procedures

2.1 Programming Standards

3 design Illustration

3.1 Introduction

3.2 High level Architecture for IPC and SRM

3.3 Detailed Architecture for IPC and SRM

3.4 High Level Architecture for Static Data

3.5 Detailed Architecture for Static Data

3.6 High Level Architecture for Data Synchronization

3.7 Detailed Level for Data Synchronization

3.8 Sequence of Execution Flow

4 ETL FRamework

4.1 ETL Definition

4.2 ETL Framework Architecture

4.3 ETL Master Package

4.4 ETL Execution Components

4.4.1 Extraction Stage (Landing)

4.4.2 Staging 1 (Data Cleansing & Validation)

4.4.3 Staging 2 (Data Transformation & Mapping)

4.4.4 Target (Data Transformation & Mapping)

4.5 ETL Framework DB’s (Metadata Information)

4.5.1 Configuration

4.5.2 Log

4.5.3 Audit

4.6 ETL Framework Capabilities

4.6.1 Logging

4.6.2 Exceptional Handling

4.6.3 Reconciliation

4.6.4 Email Notifications

4.7 Configuration and Initialization

5 Database Management

5.1 SRM Data Logical Process Flow

5.2 Static Data Logical Process Flow

5.3 Landing database

5.3.1 Landing data validation

5.4 Staging 1 Database

5.4.1 Staging 1 data validation

5.5 Staging 2 Database

5.5.1 Staging 2 data validation

6 deployment strategy

8 Pre-implementation dry/runs

9 implementation

10 Technical Constraints

10.1 Performance and Volume

10.2 ETL Job failure

10.3 ETL Job Scheduler

11 Appendix

11.1 Reference (Links)

11.2 Conceptual Data Model

11.3 ERD Data Model

1 Executive summary

The Russell Indexes business unit currently operates two disparate platforms supporting the US Index Operations (in Seattle) and Global index Operations (in Edinburgh). Neither group has access to or training in the use of the other’s platform. If either location experiences a regional disaster affecting both staff and infrastructure, the other group is not currently in a position to step in to run the first group’s index operations.

Russell Indexes launched the Platform Consolidation Program with the primary goal—as identified by Russell’s Board of Directors and the Indexes business unit—to reduce operational risk in the indexes business unit—risk of loss resulting from inadequate or failed internal processes, people, systems, or external events. The Neptune Project, within the Platform Consolidation Program, tackles consolidation of the platform supporting Index Operations’ business processes.

The objective of this project is to consolidate the Global and US Index operations onto a single platform. This consolidation will result in a single, standardized index-production unit using uniform global-production procedures, executed in multiple locations. To achieve this sub-project “Data Migration and Integration” within the program Index Platform consolidations is to migrate data from the existing Source Systems to IPC Applications in phased manner.

· Release 0 is planned to migrate static data and securities for Security Reference Manager (SRM). This will Integrate Securities data from multiple source systems and then to provide the unique security definition and unique internal security identifier for the use of all the existing and future systems within Russell Indexes division.

· Release 1 is planned to migrate Static data required to create Securities and Indexes and all Russell universe securities for IPC Applications.

· Release 2 is planned to migrate complete data from Global Systems (UK).

· Release 3 is planned to migrate complete data from Eagle Pace Systems (US)

1.1 Introduction

This document provides the High level design and overall design considerations for Data Migration and Integration phase for the Release 0 (SRM) and Release 1 (IPC).

1.2 Purpose

This document is to explain the Data Migration and Integration process for IPC and SRM from the following comprehensive list of application which will be the source of data for migration and integration into the target database:

· US Market – Production System - Eagle Pace database

· Global Market – Production System - Sybase “Global Research” database

· R&R - Global Research system – SQL Server “Global Index” database

The following information related to the design and operations of the ETL processes required to load current registration cycle-point information is as follows:

· Overview of the High-level Extract, Transformation, and Load (ETL) architecture to support loading Source data into the target system.

· Detailed ETL architecture that identifies data sources, transformations and procedures.

· Operational Process Flow.

1.3 Scope

The scope of this document is to define the strategy for the various phases of data migration. The phases in this data migration project are as follows.

o Design

o Construction

o Testing

o Pre-Implementation/Dry Runs

o Implementation/Production data migration

1.4 System Description

The scope of the data migration is to migrate the data from the existing multiple source system to SQL Server Database. The System architecture related to these systems is:

1.4.1 Source System Description

System	Operating System	Software Platform	Database
1	Windows 2008 Server	Eagle Pace - US	SQL Server 2005
2	Unix	Portico Application – UK	Sybase 11.x
3	Windows 2008 Server	Research - RnR	SQL Server 2005

1.5

1.5.1 Target System Description

System	Operating System	Software Platform	Database
1	Windows 2008 Server	Security Reference Manager (SRM)	SQL Server 2008
2	Windows 2008 Server	Index Process Consolidation (IPC) Application	SQL Server 2008

1.6 Methodology and Tools

Environment	Platform / Tool
Database Server	§ Microsoft Windows® 2008 Server Enterprise Edition § SQL Server 2008 R2
Development	§ SQL Server 2008 Business Intelligence Studio § ODBC 32 bit version to support Sybase connectivity

1.7 Definitions, Acronyms and Abbreviation

ETL	Extraction – Transformation – Loading
OLTP	Online Transactional Processing
SSIS	SQL Server Integration Services
HEX or Hexaware	Hexaware Technologies Ltd.
IPC	Index Platform Consolidation
ASIS / Source	Source systems currently being used
To Be / Target	Target or proposed system
AMD	Apply Market Data
Target System	IPC - Eagle Pace
DM	Data Migration
SRM	Security Reference Manager
FS	Functional Specification

1.8 1.9 Assumptions

· Target data model will be developed iteration wise and so may undergo several changes. So source data analysis has to be done based on evolving target data model. Once the target data model is baseline unmapped fields in source will be further analyzed to confirm whether it can be actually ignored.

· Russell will define the strategy, analysis, design and construct scripts for Data Cleansing. Hexaware will support and complement this.

· The production cut-over window for implementation is expected to be 48 hours over a weekend. This could change based on the volume of the record, relationship between tables which defines the order of migration

· The current strategy is to extract the data from source systems using SSIS tools and to extract, transform and load SQL Server target database.

· Existing master data will be extracted in Excel sheets and new target data is mapped one to one manually.

· Data to be migrated is frozen before the start of the migration.

· There will not be any explicit lock on the data to be migrated by any of the application accessing the data during the outage window.

· The current existing model is base lined and assumed to be 100% complete.

· The scope of data migration project is to migrate only the data that will be accessed by the target application system.

2 Standards, Conventions and Procedures

2.1 Programming Standards

Following is the SharePoint link for SSIS Development Guide

http://project- sharepoint/PWA/Program%2000001%20Enterprise%20Data%20Management/201089%20EDM%20Program%20and%20Data%20Factory/Project%20Documents/Forms/AllItems.aspx[11]

Following is the SharePoint link for Data Modeling and Architecture Standards:

http://project-sharepoint/PWA/Program%2000001%20Enterprise%20Data%20Management/201089%20EDM%20Program%20and%20Data%20Factory/Project%20Documents/Forms/AllItems.aspx[12]

3 design Illustration

3.1 Introduction

A High Level Design (HLD) Document provides an overview of a solution, platform, system, or process. This will usually include a high-level architecture diagram depicting the components, interfaces and networks that need to be further specified or developed in the Low Level Design. The document may also depict or otherwise refer to work flows and/or data flows between component systems.

The objective of this document is to elaborate and provide insight on all stages of the Data Migration and integration life cycle and its stages.

The High Level Design Document content includes;

· Overall High Level Design Architecture

· Key Technical Requirements

· Key Technical Design Decisions and Criteria

· Solution Overview

· List of dependencies and prerequisites

· Project Risks and any assumptions made.

3.2 High level Architecture for IPC and SRM

Following diagram depicts the overall High level architecture diagram for IPC and SRM Data Migration and Integration layers.

Source Layer:

o All the source systems Eagle pace –US, Global – UK (Sybase DB) and Research (RnR) are available in Source area.

Landing Area:

o All the source tables identified from US-Eagle, UK-Global and RnR for migration into SRM will be imported as landing tables. The table structure wills same as source system and complete data will be extracted.

Staging Area 1:

o Source data includes Company/Security/Cross Reference information with all the date columns, status flags and legacy info source.

o Security history data is not extracted.

o Securities data with latest effective/updated date are only extracted.

o Based on the data analysis, the fields that need to cleanse is identified. Data cleansing is required to ensure that only accurate, consistent and complete data is loaded into target database. Data cleansing will be required for

§ Junk Characters/Characters

§ Invalid Values

§ Values not within Range of the field

§ Format consolidation (e.g., dates , amount fields)

§ Referential integrity (e.g., affiliate RUT in any transaction table should also be present in affiliate master)

Staging Area 2:

o Structure of table will be as same as target system.

o Data from staging 1 are mapped one to one relationship and converts data from source data format into destination data.

o Data transformation is applied at this stage includes code conversion, reformatting, consolidation, summation, etc.,

Target System

o Data will be moved finally from staging area 2 with proper data element mappings to Target System.

3.3 Detailed Architecture for IPC and SRM

Following diagram depicts the overall SRM Detailed ETL Architecture:

All the security related data will be loaded through ETL Process to Staging Areas. Data will be loaded to the target following staging areas.

All the source tables identified from US-Eagle, UK-Global and RnR for migration into SRM will be imported as landing tables by using ETL jobs. The table structure is same as source system and complete data will be extracted.
Once the data loading into landing zone it will moves to staging area1 in between landing and staging area1 we are applying data cleansing and validation rules like as

§ Data must and should within 75 years

§ Check with null data

§ Format consolidation (e.g., dates , amount fields)

Once the data loading into staging area1 it will moves to staging area2 in between staging area1 and staging area2 we are applying Transformation rules and check the mappings to target mappings hear table structure will be same as target data base table structure

Transformation rules like as

Check with mappings
Check with data types

Once the data loading into staging area2 by using wrapper procedure the staging area data will be loaded our target data base

All the security related data will be loaded through ETL Process to Staging Areas. Data will be loaded to the target following staging areas.

· All the source tables identified from US-Eagle, UK-Global and RnR for migration into SRM will be imported as landing tables by using ETL jobs. The table structure is same as source system and complete data will be extracted.

· Once the data loading into landing zone it will moves to staging area1 in between landing and staging area1 we are applying data assessments and validation rules like as

§ Data must and should within 75 years

§ Check with null data

§ Format consolidation (e.g., dates , amount fields)

· Once the data loading into staging area1 it will moves to staging area2 in between staging area1 and staging area2 we are applying Transformation rules and check the mappings to target mappings hear table structure will be same as target data base table structure

Transformation rules like as

§ Check with mappings

§ Check with data types

· Once the data loading into staging area2 by using wrapper procedure the staging area data will be loaded our target data base

3.4 High Level Architecture for Static Data

Following diagram depicts the overall SRM Static Data ETL Architecture:

These data and data types tend to be created once maintained over a long time frame and are shared and used by a number of business activities. This information is the system needs for proper operation. It is the single access point for the creation, update and deletion of content relevant for SRM in performing its functions, such as companies, securities, and others. Required data is validated from the three different source systems US-Eagle Pace, UK-Global and Research and then mastering of data will be done.

Following static data content are identified as related to securities information.

· Country

· Region

· Exchanges

· Security Classification

· GICS, ICB, RGS Classification

· Industry Group, Industry, Sub-Industry, Sector

· Currencies

· Caps Style

· Size

· Asset Classification

· Asset Sub-Classification

· Static data content like Currencies, Countries, Exchanges, etc., will be extracted from each source systems US-Eagle, UK-Global and RnR.

· Data extracted from each source will be checked with mastering data.

· Using Excel sheets data are arranged to the structure to load in target system.

· From Excel sources data are loaded into Staging Area and then executed to target system.

3.5 Detailed Architecture for Static Data

Following diagram depicts the overall SRM Detailed Static Data ETL Architecture:

The static data will be loaded to the target following staging areas. Static SRM DB mainly using for reference of securities.

All the source tables identified from US-Eagle, UK-Global and RnR for static data migration into SRM data base will be imported as landing area(excel source) by using manual process these excel sheets are maintained unique records for each data base.
Once the data loading into landing zone it will moves to staging area in between landing and staging area we are applying mapping and transformation rules

§ Check with mappings

Once the data loading into staging area by using wrapper procedure the staging area data will be loaded to our target data base.

3.6 High Level Architecture for Data Synchronization

Following diagram depicts the SRM Data Synchronization:

The purpose of data synchronization is to maintain the consistency of data contained in multiple applications, databases or systems. The need for data synchronization can either be permanent (data synchronization between operational systems), or temporary (for example data synchronization for certain periods). Data synchronization can be either mono-directional or bi-directional.

In the database delta approach, the system will look at each database table that can be synchronized and evaluate if a record has been changed, inserted or deleted. Changed records are easiest to identify and update. New records need to take into account identity fields, particularly if the ID field is referenced in other places. It then becomes the responsibility of the data synchronization controller to ensure that the new ID, assigned by the server, cascades to the referencing tables before the data is synchronized.

From each source systems US-Eagle and UK-Global securities related data element record has been changed or inserted can be detected using Binary Checksum algorithm which is provided in SQL Server. Data synchronization will be in mono-directional i.e from source system.

· Source data related to securities will be extracted and then either updated or inserted into landing area.

· In the Landing area tables with Index Id for each row Binary Checksum is maintained. This is compared with the previous stored values and new values against the record changes.

· Changes in values and new records are extracted to Staging 1 and then transformed to Staging 2

· Source record changes are updated in the SRM data base and then it will be moved to RnR database through flat file.

3.7 Detailed Level for Data Synchronization

· From each source systems US-Eagle and UK-Global securities related data element record has been changed or inserted can be detected using Binary Checksum algorithm which is provided in SQL Server. Data synchronization will be in mono-directional i.e from source system.

· Source data related to securities will be extracted and then either updated or inserted into landing area.

· After completion of data loading into landing zone this landing zone data will be checked with Reconciliation data base

· We can maintain the Meta data at this stage with respect to these details like logging, configuration, auditing.

· Landing data will be moved to stagingarea1 by using ETL process at this stage we are applying data cleansing, data validation and binary check sum algorithm .Some of cleansing and validation rules like as

§ data must and should within 75 years

§ Check with null data

§ Format consolidation (e.g., dates , amount fields)

· We can maintain the rejected records at this stage in rejected data base

· After completion of data loading into stagingarea1 this staging area1 data will be checked with Reconciliation data base

· We can maintain the Meta data at this stage with respect to these details like logging, configuration, auditing.

· Staging area1 data will be moved to stagingarea2 by using ETL process at this stage we are applying data transformation rules. Some of transformation rules like as

§ Check with mappings

§ Check with data types

· We can maintain the rejected records at this stage in rejected data base

· After completion of data loading into stagingarea2 this staging area2 data will be checked with Reconciliation data base

· We can maintain the Meta data at this stage with respect to these details like logging, configuration, auditing.

· Staging area2 data will be moved to SRM data base by using ETL process at this stage we are using wrapper procedure

· We can maintain the rejected records at this stage in rejected data base

· After completion of data loading into SRM data base this SRM data base data will be checked with Reconciliation data base and then SRM db data will be moved to RnR data base through flat file.

3.8 Sequence of Execution Flow

· The Static data migration is performed by taking the data from the three source systems US,Global and Research and loaded to an excel sheet using the ETL jobs.

· This excel now holds the consolidated information of the three source systems. Using ETL package transformations the data is loaded to the Staging database.

· Finally the staging data is passed on through ETL packages and loaded to the target database. This helps to load the Static data from the source systems to the SRM target database.

· In SRM Data migration approach, data from three systems – US, Global and Research is loaded to the SRM landing database with the help of ETL jobs.

· This data from the SRM landing database is cleansed and loaded to the Staging 1 database using the ETL packages. The data from Staging 1 is now passed on to the Staging 2 database after the corresponding transformation rules are applied.

· Finally the transformed data from three sources is moved to the SRM target database using the ETL packages.

4 ETL FRamework

The concept of ETL Framework is to provide a set of terminology and a focus by which the relative interface requirements; common processes can be identified and reusable design and components can be provided to streamline, through design, build and deployment and ongoing production running.

In Data Migration and Integration project that uses SSIS as the ETL tool, the first step is to build an ETL framework. This framework handles error checking, alert notification, task failures, logging, execution history and batch control.

4.1 ETL Definition

'Extract, Transform, Load'(ETL) is the technology specialized in:

· Extraction of data from one or more sources;

· Transformations of the data (e.g. cleansing, reformatting, standardization, aggregation, or the application of any number of business rules); and

· Loading of the resulting data set into specified target systems or file formats.

4.2 ETL Framework Architecture

Execution Components

Master Package – ETL Batch processing along with the configuration details.
Execution Package – Provides workflow for Activities
Container – Provides the ETL Framework for Activities
Data Flow – Responsible for all ETL processing
Metadata Information – Contains all the metadata information like Log details, Audits and Configuration details.

ETL Framework Capabilities

This functionality is supported by the ETL Framework. This is implemented by a combination of the Master Package, Execution Packages and Containers. These ETL framework capabilities would provide support for logging, handling exceptions, sending email notifications and reconciliation reports.

Metadata Information

The Metadata information like configuration details, logs and audit data are stored in the database and will be used for the ETL package execution.

Configuration – Contains both runtime and system configuration information
Log – Populated with all Master Package, Activity and Error information from ETL Batch runs
Audit – Populated with pre-update and deleted records from ETL processing

Utilities

Excel Reporting – Provides detailed information on ETL Batch execution

4.3 ETL Master Package

· Below diagram provides an overview of capabilities seen to most ETL Frameworks. This explains about the Configurations and logging features. In addition, most frameworks also include a collection of SSIS templates that interface with the ETL Framework and demonstrate core ETL patterns. The Execution package would group all the activities and helps to extract data, validate, apply cleansing rules and perform data mapping operation.

The SRM Master package creates the Package template and schedules as per the vendor SLA dynamically.

o Logging – Contains all custom logging activity from master and child package execution Details.

o Configurations – All Required Database connections strings information consists of technical metadata tables used to drive ETL package execution including configuration tables and execution package sequencing.

o Reports/ Audit Trail – The logging activity is presented to users through a predefined Email notification. The logging activity can also be customized at any point of time.

o The Execution package helps to extract data, validate, cleanse, apply transformation rules and store it to the target table.

Following diagram depicts the overall SRM DM ETL Components:

Below diagram depicts the steps of the ETL execution package.

o ETL Execution package – The SRM execution package does file validation, creates staging area, validates the data logs, and provides workflow for one or more data flows. In addition, this ETL Framework’s execution package template supports efficient set based updates that execute after a dataflow completes.

o The SRM execution package has three main phases involved:

o Landing Area

o Staging1 Area

o Staging 2 Area

o The Landing zone gets the input data from three sources – Global Sybase, US Eagle Pace and Research databases. The extraction of data from the source systems take place and data gets loaded to the Landing environment.

o The data from the Landing database is cleansed, validated, transformed and finally loaded to the Staging1 database.

o The Staging 1 environment now holds the transformed data. The transformation rules are now applied to this data and finally loaded to the Staging 2.

o The data from Staging 2 with the help of Wrapper Procedure is now moved to the Target SRM database.

4.4 ETL Execution Components

4.4.1 Extraction Stage (Landing)

The data is extracted from three source systems – US Eagle Pace, Global UK and Research databases.

This extracted data is now mapped to the Landing database tables using one to one mapping. The Landing database now holds the raw data from all three sources.

4.4.1.1 Entry Criteria

The Entry criteria for Landing area would be the Source data from all three systems – US Eagle Pace, Global – UK and the Research databases.

4.4.1.2 Approach

The ETL – SSIS approach is used here to load data from the source systems to the landing area. The Data flow tasks will redirect the flow to the OLEDB source task which will get the input from US, Global and Research databases. This source data with the help of one to one mapping is now loaded to the Landing database.

4.4.1.3 Extraction Process

The data is extracted from the source system using the ETL SSIS tasks and containers. Once the data is extracted, it is loaded to the landing database.

4.4.1.4 Reconciliation

Reconciliation report is generated at the end of this process which would provide us the record count / number of records between the Source and Landing tables. This report helps us to validate the data.

4.4.1.5 Error Handling

The data is loaded from the Source systems to the Landing tables. In case if the process fails in between the ETL Error handling operation is performed and cause of the error is tracked.

4.4.1.6 Exit Criteria

The raw data from all three systems is now available in the Landing tables.

4.4.2 Staging 1 (Data Cleansing & Validation)

4.4.2.1 Entry Criteria

The Landing database is the source system now which holds the consolidated data from all three sources – US, Global and Research.

4.4.2.2 Approach

The data from the landing database with the help of SSIS transformation tasks are validated initially. Once the validation is completed, the cleansing rules are applied and data is loaded to the first staging area.

4.4.2.3 Extraction Process

4.4.2.4 Reconciliation

Reconciliation report is generated at the end of this process which would provide us the record count / number of records between the Landing and Staging tables. This report helps us to validate the data between the source and staging areas.

4.4.2.5 Error Handling

Error handling takes place when the data is migrated from the landing to staging tables. ETL handles the errors by default and logs the failure details to the relevant tables.

4.4.2.6 Exit Criteria

The Staging table now contains the validated and cleansed data. This is now passed on to the Staging area 2 for processing.

4.4.3 Staging 2 (Data Transformation & Mapping)

4.4.3.1 Entry Criteria

The Staging1 database is the source system which holds the validated and cleansed data. This data will now be mapped to the corresponding data elements and loaded to the Staging 2 environment.

4.4.3.2 Approach

The data from the Staging1 database are mapped to the corresponding data elements using the ETL transformations. The transformation rules are now applied to this data and loaded to the Staging2 database.

4.4.3.3 Extraction Process

4.4.3.4 Reconciliation

Reconciliation report is generated at the end of this process which would provide us the record count / number of records between the Staging1 and Staging2 tables. This report helps us to validate the data between the staging areas.

4.4.3.5 Error Handling

Error handling takes place when the data is migrated between the two staging areas. ETL handles the errors by default and logs the failure details to the relevant tables.

4.4.3.6 Exit Criteria

The Staging 2 database now holds the transformed and validated records. This would be manipulated and passed on to the target SRM database.

4.4.4 Target (Data Transformation & Mapping)

4.4.4.1 Entry Criteria

The Staging2 database is the source system which holds the transformed data. This data is now passed on through the wrapper procedure to load the target SRM database,

4.4.4.2 Approach

The wrapper procedure is used to load data from the Staging environment to the Target SRM database.

4.4.4.3 Extraction Process

4.4.4.4 Reconciliation

Reconciliation report is generated at the end of this process which would provide us the record count / number of records between staging and the target systems. This report helps us to validate the data between the staging and SRM databases.

4.4.4.5 Error Handling

Error handling takes place when the data is migrated from staging to target databases. ETL handles the errors by default and logs the failure details to the relevant tables.

4.4.4.6 Exit Criteria

The SRM / Target database now holds the validated data from three systems – US, Global and Research.

4.5 ETL Framework DB’s (Metadata Information)

4.5.1 Configuration

Package configurations are used to update the values of properties at the run time. Configuration is a property/value pair that you add to a completed package. Typically, you create a package set properties on the package objects during package development, and then add the configuration to the package. When the package runs, it gets the new values of the property from the configuration.

Package configurations provide the following benefits:

· Configurations make it easier to move packages from a development environment to a production environment.

· Configurations are useful when you deploy packages to many different servers.

· Configurations make packages more flexible.

4.5.2 Log

The Execution Log table provides a concise summary of package execution results (e.g., start time, end time, overall status) in a single row, along with other useful information.

4.5.3 Audit

A key component of the extract, transformation, and load (ETL) process is the cleanup, or transformation, of data while in transition from one place to another. In most cases there are restrictions - regulatory or otherwise - that require any data that is modified in the ETL pipeline to be audited as such. it is a wise practice to build an audit mechanism for any data you update or exclude as part of the ETL process. Years down the road if you or your successor is quizzed as to the reasoning certain transformations, an invaluable tool in answering those questions will be an auditing file or table. In addition, some of the users will occasionally request access to the unmodified data as it was originally extracted from the source system, and an audit table can be useful in providing that information as well.

4.6 ETL Framework Capabilities

4.6.1 Logging

Logging SSIS packages run information into five tables: ETL Job, ETL Package, ETL_Rowcount, ETL Task, ETL_Error, see Figure 4 – TGIEH_SSIS_ETL_Framework Database Diagram. Therefore, you no longer need to use SSIS Logging and dbo.sysssislog table.

4.6.2 Exceptional Handling

The ETL handles the errors using the logging mechanism available in the SSIS package. This logging mechanism helps to capture all the success and failure notifications of the SSIS package. The package failure will be mainly captured for the following events:

· On Error

· On Post Execute

· On Pre Execute

The package failures will be tracked by creating separate tables and each failure / success event is tracked in it. When there is a failure in the package, a new row is inserted in the log table.

This table also maintains the Start time and End time of the package along with the description of the errors hit and package details.

4.6.3 Reconciliation

Data reconciliation is a necessary process of comparing the data in source and target system. The main purpose of the data reconciliation is to ensure that the reference and transaction data has been converted and migrated correctly. At the completion of a data migration effort where we want to understand if the validity of the entire data set is still intact. To verify the correctness & completeness during each stage, reconciliation will be done. Number of records transformed in the Stage 1 table will be compared against the Landing Table count. Similarly Amount Sum or Record count of the transformed records in Stage 2 will be compared against the Stage 1 tables’ column sum. Reports will be generated for both the Record Count and Column Sum comparison.

4.6.4 Email Notifications

Sending an email is a frequent requirement to notify a user on the occurrence of certain events, especially if an unexpected event happens (for example sending notification on failure that could be either logical or physical). SSIS provides a built-in "Send Mail Task" to send email in these circumstances.

4.7 Configuration and Initialization

The ETL Framework stores its configuration information in a database and queries this database at run time to configure the solution. The objective for every ETL solution targeted for a production environment. Achieving this requires you to: first use variables and expressions to dynamically configure all connections and workflow variables, second use SSIS and custom configuration to populate these variables at runtime. To handle any unexpected events happen in package level to give the e-mail connection.

5 Database Management

5.1 SRM Data Logical Process Flow

All the security related data will be loaded through ETL Process to Staging Areas. Data will be loaded to the target following staging areas.

· All the source tables identified for migration into SRM will be imported to landing zone by using ETL process. The table structure is same as source system and complete data will be extracted.

And after completion of data loading into landing zone this landing zone data will be checked with Reconciliation data base by using of reconciliation process.

· Landing zone data will be moved to staging area1by using ETL process. At this stage we are applying data cleansing and validation rules like as

§ data must and should within 75 years

§ Check with null data

§ Format consolidation (e.g., dates , amount fields)

We can find any errors or any improper data we are maintain error logs. After completion loading data in staging area1 the data will be checked with reconciliation data base by using reconciliation process.

· Staging area1 data will be moved to staging area2 by using ETL process. At this stage we are applying data transformation rules and data mappings.

§ Check with mappings

§ Check with data types

At this time find any errors we are maintain error logs. After completion loading data in staging area2 the data will be checked with reconciliation data base by using reconciliation process.

· Once the data loading into staging area2 by using wrapper procedure the staging area data will be loaded our target data base at this time find any data base errors we are maintain error logs. After completion of data loading in target data base the data checked with reconciliation data base.

5.2 Static Data Logical Process Flow

All the security related static data will be loaded through ETL Process to Staging Areas. Data will be loaded to the target following staging areas.

· All the static related data tables identified for static migration into SRM will be imported to landing zone by using manual process. Landing zone data will be maintained in excel sources, after completion of data loading into landing zone this landing zone data will be checked with Reconciliation data base by using of reconciliation process.

· Landing zone data will be moved to staging area. At this stage we are applying data transformation rules and data mappings.

§ Check with mappings

§ Check with data types

At this time find any errors we are maintain error logs. After completion loading data in staging area2 the data will be checked with reconciliation data base by using reconciliation process.

5.3 Landing database

5.3.1 Landing data validation

Landing zone data will be moved to staging area1 by using ETL process. At this stage we are applying data cleansing and validation rules like as

§ data must and should within 75 years

§ Check with null data

§ Format consolidation (e.g., dates , amount fields)

5.4 Staging 1 Database

5.4.1 Staging 1 data validation

Staging area1 data will be moved to staging area2 by using ETL process. At this stage we are applying data transformation rules and data mappings.

§ Check with mappings

§ Check with data types

At this time find any errors we are maintain error logs. After completion loading data in staging area2 the data will be checked with reconciliation data base by using reconciliation process.

5.5 Staging 2 Database

5.5.1 Staging 2 data validation

Once the data loading into staging area2 by using wrapper procedure the staging area data will be loaded to our target data base at this time find any data base errors we are maintain error logs. After completion of data loading in target data base the data checked with reconciliation data base.

6 deployment strategy

7 Testing strategy

This phase comprises of testing the data migration suite for each iteration. Testing will check all the transformations / mappings / workflows / cleansing / audit and validations. Individual test cases need to be prepared for testing out various functionalities.

7.1 Strategy

The following matrix illustrates the broad areas that the test cases will pertain to

Attributes	Measurement Plan	Remarks
Business important fields for checksum	· Identify all business important fields that can be used for summation checks for data extracts and in target tables. · Perform summations on the identified fields in incoming data files and match the sum.	Business important fields that can be used for checksums need to be requested to Russell users and it should be included in the extracts.
Integrity checks	· Identify all integrity constraints · All data must pass through associated integrity constraints(E.g.- There can be no detail records in the absence of a master)	Integrity constraints are verified and validated
Outlier conditions	· Identify the Minimum, Maximum and default values for data attributes. · All data attributes should contain a valid value. · Raise alert when invalid values are detected.	Min, Max and Default values should be provided in and verified and validated
Alert mechanism	· Identify all steps which need to generate alert (eg- Invalid incoming data, failed integrity checks, outliers, load. · Raise alerts.	Any specific alert requirements should be specified in the ETL strategy to incorporate the same in development
Correctness of calculations	· Identify fields involving complex calculations · Recalculate once loading is complete · Match with previously calculated values	Russell users to specify critical fields involving complex calculations and the same should be incorporated
Audit trail	· Identify data to be captured in audit trail ( E.g.- Table name, number of records on file, Records inserted from table). · Capture audit attributes during load process and store in Audit table	Any specific audit requirements should be specified in the ETL specs and will be incorporated
Incoming data summary	· Identify summary information for input data to be sent in additional file (File name, number of records, date and · Perform checks on incoming data ( Match record count in control file and actual number of files received, match · Raise alert in case of mismatch	Incoming control summary file specification to be provided and same should be incorporated in the extract
Business test cases	· Will write test cases to check the business scenario for audit. Business test cases could be writing SQL queries to get the data from target and verify it using existing mainframe data. The choice of the business criteria can be identified from the legacy reports or may be provided by Russell	Information on critical reports to be provided by Russell

7.2 Validation

The following are the validations that will be performed to ensure the correctness of the data migrated.

S.No	Category	Source	Destination	Criteria
1	Number of physical record	All Entities	All Entities	Exact match or deviation justified
2	Sum	Field-1; Table-1 Field-2; Table-2 Field-1; Table-1	Field-1; Table-1 Field-2; Table-2 Field-1; Table-3	Exact match or deviation justified
3	Sum against a branch	Field-1; Table-1	Field-1; Table-1	Exact match or deviation justified
4	Total number of active affiliates	Field-1; Table-1	Field-1; Table-1	Exact match or deviation justified
5	Total number of deceased affiliates	Field-1; Table-1	Field-1; Table-1	Exact match or deviation justified
6	Totals	Field-1; Table-1	Field-1; Table-1	Exact match or deviation justified
7	Status Fields	Group By count	Group By count	Exact match or deviation justified
8	Null fields	Count Field-1 Count Field-2	Count Field-1 Count Field-2	Exact match
9	Blank Fields	Count Field-1 Count Field-2	Count Field-1 Count Field-2	Exact match
10	Not Null Fields	Count Field-1	Count Field-1	Exact match
11	Duplicate Rows	Table-1 Table-2	Table-1 Table-2	Exact match
12	Deleted Rows			Justify
13	Key fields (RUT, Folio Number)	Group by range	Group by range	Exact match
14	Name Fields	Compare by key	Compare by key	Exact match
16	Round off	Verify correct decimal places	Verify correct decimal places	Exact match
17	Truncation Error on Identified Field	Correct truncation	Correct truncation	Exact match
18	Exceptions	Defined	Defined	Validate
19	Bad Records	Identify	Defined	Validate

7.3 Audit

Audit rules are expected to be defined by the Russell team in the following format. The auditing should be done based on the reliable reports from business. Business reports will be provided by Russell to be used for auditing.

No	Category	Source	Destination	Criteria

Assumptions

· Data migration suite available (Extraction, transformation and load routines)

· Audit and validation routines available

· Source Data for migration is available

8 Pre-implementation dry/runs

Pre-Implementation or Dry run is the simulation of production implementation in test environment. The objective is to understand the complexities during implementation, in terms of the window for data migration, infrastructure requirements and to fine tune the programs and implementation procedures if required. Data migration implementation is planned in two phases.

Assumptions

o Tested Data migration suite available for the current implementation phase

o Test environment that is simulated based on production is available

o Source Data for pre-implementation dry run is available

9 implementation

Implementation phase comprises of activities for implementing the actual production data migration.

The implementation of data migration depends mainly on the implementation window, volume of data to be migrated and the type of data. On further analysis on data and discussions the implementation strategy will be finalized.

Points to be considered to adopt this approach:

1. All the backup data be extracted in 48 hours

2. All the reference data be extracted in 48 hours

3. All the transaction, master and catch up reference data be extracted, cleaned, transformed and loaded in 48 hours

4. Additional effort is involved in doing catch up for reference data

5. Testing of the data will be in the parallel run time

6. Incremental migration may be required for reference data

10 Technical Constraints

10.1 Performance and Volume

As part of Initial load, 30million rows will be extracted from Security history tables. SSIS throughput count should be between 800 to 1000 rows per second. Objective is to maintain the high ETL performance by sizing the Database Server capacity in terms of number of Processors and Memory.

10.2 ETL Job failure

In case of any job failure during data loads, ETL load will start from the beginning with “Update else Insert” Logic. This logic will be implemented within the ETL tasks.

Objective of this methodology is to avoid insertion of duplicate data in case of re-run of failed jobs.

Below are the some of the examples to failure due to unexpected errors during the ETL process

Primary Key constraint

Bad data in the source

Database disconnections during the process

Data type conversion

10.3 ETL Job Scheduler

11 Appendix

11.1 Reference (Links)

Reference	Description
Functional Spec DM – IPC & SRM	Functional Specifications document for IPC & SRM Data Migration

11.2 Conceptual Data Model

11.3 ERD Data Model

ERD diagram for tables in scope for three systems

[11]TBD

[12]TBD

Monday, 7 October 2013

1 Executive summary

1.5

1.8

3.2 High level Architecture for IPC and SRM

4.3 ETL Master Package

4.4.1.1 Entry Criteria

4.4.1.2 Approach

4.4.1.3 Extraction Process

4.4.1.4 Reconciliation

4.4.1.5 Error Handling

4.4.1.6 Exit Criteria

4.4.2.1 Entry Criteria

4.4.2.2 Approach

4.4.2.3 Extraction Process

4.4.2.4 Reconciliation

4.4.2.5 Error Handling

4.4.2.6 Exit Criteria

4.4.3.1 Entry Criteria

4.4.3.2 Approach

4.4.3.3 Extraction Process

4.4.3.4 Reconciliation

4.4.3.5 Error Handling

4.4.3.6 Exit Criteria

4.4.4.1 Entry Criteria

4.4.4.2 Approach

4.4.4.3 Extraction Process

4.4.4.4 Reconciliation

4.4.4.5 Error Handling

4.4.4.6 Exit Criteria

10.3 ETL Job Scheduler

11.1 Reference (Links)