Conceptualization of a Data Migration I
Goals, Objectives & Requirements
One of the nowadays’ challenges is finding the right mix of technologies that allows building a solution for a business need. There are so many choices and the responsible person is easily tempted to use one of the trending technologies just because he wants to learn something new or the technologies seem to fit into the bigger picture, which probably in many cases it would be acceptable. Unfortunately, there’s also the tendency of picking a technology without looking at what functionality it provides, respectively whether the functionality meets intended solutions’ requirements. Moreover, the requirements are sometimes barely defined at the appropriate level of detail, fact that makes from the implementation project a candidate for failure. Sometimes even the goals and objectives aren’t clearly stated, fact that can make a project’s success easily questionable from the beginning.
A goal is a general statement that reflects the desired result toward which an organization’s effort needs to be directed . For example, a Data Migration (DM)’s primary goal can be formulated as ‘ to make available all the master and transactional data needed by the business from the legacy systems to the target system(s) within expected timeline and quality with a minimal disruption for the business ‘.
An objective is a break down of the goal into several components that should foster a clear understanding on how the goal will be achieved. Ideally the objectives should be SMART (specific, measurable, attainable, relevant, time-bound), even if measurable objectives are sometimes hard to define properly. One can consider them as the tactics used in achieving the goal. For example, the above formulated goal can be broken down into the following objectives:
- Build a DM concept/strategy
- Build a flexible and performant infrastructure for DM that can be adapted to further requirements
- Provide a basis for further DMs
- Align DM and main project’s requirements and activities
- Provide an interface and support for the Data Management areas
- Foster trust, transparency and awareness
- Address internal/external compliance requirements
- Document and communicate accountability for the various activities
- Cleanse and enrich the data needed by the target system
- Archive the DM and project data
One can attempt defining the objectives directly from the goal(s), though unless one is aware of all the implication a DM has, more likely one will be forced to define and evaluate the individual functional and nonfunctional requirements for the DM first, and attempt consolidating the requirements into a set of objectives. In the end it can be a combination of both, in which some objectives are first formulated, the requirements are defined and evaluated, respectively the objectives are refined to accommodate the requirements.
ISO 9126, an international standard for the evaluation of software quality, defines about 45–50 attributes that can be used for addressing the requirements of software solutions, attributes that reflect functionality, reliability, usability, efficiency, and maintainability characteristics. One can start with such a list and identify how important are the respective attributes for the solution. The next step would be to document the requirements into a consolidated list by providing a short argumentation for their use, respectively how they will be addressed as part of the solution. The process can prove to be time-consuming, however it is a useful exercise that usually needs to be done only once and be reviewed occasionally.
The list can be created independently of any other documentation or be included directly into a concept or strategy. The latter will assure in theory that the document provides a unitary view of the migration, considering that each new or obsolete requirement can impact the concept.
Plan vs. Concept vs. Strategy
A concept is a document that describes at high level the set of necessary steps and their implications to achieve a desired result, typically making the object of a project. A concept is usually needed to provide more technical and nontechnical information about the desired solution, the context in which a set of steps are conducted, respectively the changes considered, how the changes will be implemented and the further aspects that need to be considered. It can include a high-level plan and sometimes also information that typically belong in a Business Case — goals,objectives, required resources, estimated effort and costs, risks and opportunities.
A concept is used primarily as basis for sign-off as well for establishing common ground and understanding. When approved, it’s used for the actual implementation and solution’s validation. The concept should be updated as the project progresses, respectively as new information are discovered.
Creating a concept for a DM can be considered as best practice because it allows documenting the context, the technical and organizational requirements and dependencies existing between the DM and other projects, how they will be addressed. The concept can include also a high-level plan of the main activities (following to be detailed in a separate document).
Especially when the concept has an exploratory nature (due to incomplete knowledge or other considerations), it can be validated with the help of a proof-of-concept (PoC), the realization of a high-level-design prototype that focuses on the main characteristics of the solution and allows thus identifying the challenges. Once the PoC implemented, the feedback can be used to round out the concept.
Building a PoC for a DM should be considered as objective even when the project doesn’t seem to meet any major challenges. The PoC should resume in addressing the most important DM requirements, ideally by implementing the whole or most important aspects of functionality (e.g. data extraction, data transformations, integrity validation, respectively the import into the target system) for one or two data entities. Once the PoC built, the team can use it as basis for the evolutive development of the solution during the iterations considered.
A strategy is a set of coordinated and sustainable actions following a set of well-defined goals, actions devised into a plan and designed to create value and overcome further challenges. A strategy has the character of a concept though it has a broader scope being usually considered when multiple projects or initiatives compete for the same resources to provide a broader context and handle the challenges, risks and opportunities. Moreover, the strategy takes an inventory of the current issues and architecture — the ‘AS-IS’ perspective and sketches the to ‘TO-BE’ perspective by devising a roadmap that bridges the gap between the two.
In the case of a DM a strategy might be required when multiple DM projects need to be performed in parallel or sequentially, as it can help the organization to better manage the migrations.
A plan is a high-level document that describes the tasks, schedule and resources required to carry on an activity. Even if it typically refers to the work or product breakdown structure, it can cover other information usually available in a Business Case. A project plan is used to guide both project execution and project control, while in the context of Strategic Management the (strategic) plan provides a high-level roadmap on how the defined goals and objectives will be achieved during the period covered by the strategy.
For small DM projects a plan can be in theory enough. As both a strategy and a concept can include a high-level plan, the names are in praxis interchangeable.
Probably one of the most difficult things to learn as a technical person is using the right technology for a given purpose, this mainly because one’s inclined using the tools one knows best. Moreover, technologies’ overlapping makes the task more and more challenging, the difference between competing technologies often residing in the details. Thus, identifying the gaps resumes in understanding the details of the problem(s) or need(s), respectively the advantages or disadvantages of a technology over the other. This is true especially about competing technologies, including the ones that replace other technologies.
There are simple heuristics, that can allow approaching such challenges. For example, heavy data processing belongs usually in databases, while import/export functionality belongs in an ETL tool. Therefore, one can start looking at the problems from these two perspectives. Would the solution benefit from these two approaches or are there more appropriate technologies (e.g. data streaming, ELT, non-relational databases)? How much effort would involve building the solution?
Commercial Off-The-Shelf (COTS) tools provided by third-party vendors usually offer specialized functionality in each area. Gartner and Forrester provide regular analyses of the main players in the important areas, analyses which can be used in theory as basis for further research. Even if COTS tend to be more expensive and can have some important functionality gaps, as long they are extensible, they can prove a good starting point for developing a solution.
Sometimes it helps researching on the web what other people or organizations did, how they approached the same aspects, what technologies, techniques and best practices they used to overcome the challenges. One doesn’t need to reinvent the wheel even if it’s sometimes fun to do so. Moreover, a few hours of research can give one a basis of useful information and a better understanding over the work ahead.
On the other side sometimes it’s advisable to use the tools one knows best, however this can lead also to unusable and less performant solutions. For example, MS Excel and Access have been for years the tools of choice for building personal solutions that later grew into maintenance nightmares for the IT team. Ideally, they can still be used for data entry or data cleaning, though building solutions exclusively based on (one of) them can prove to be far than optimal.
When one doesn’t know whether a technology or mix of technologies can be used to provide a solution, it’s recommended to start a proof-of-concept (PoC) that would allow addressing most important aspects of the needed solution. One can start small by focusing on the minimal functionality needed to check the main aspects and evolve the PoC during several iterations as needed.
For example, in the case of a Data Migration (DM) this would involve building the data extraction layer for an entity, implement several data transformations based on the defined mappings, consider building a few integrity rules for validation, respectively attempt importing the data into the target system. Once this accomplished, one can start increasing the volume of data to check how the solution behaves under stress. The volume of data can be increased incrementally or by considering all the data available.
As soon the skeleton was built one can consider all the mappings, respectively add several entities to build the dependencies existing between them and other functionality. The prototype might not address all the requirements from the beginning, therefore consider the problems as they arise. For example, if the volume of data seems to cause problems then attempt splitting the data during processing in batches or considering specific optimization techniques like indexing or scaling techniques like increasing computing resources.
Originally published at http://sql-troubles.blogspot.com.