Analyze - Identify Data Structure Issues
Armed with the results of profiling, the project moves into the analysis phase. Applaud’s advanced analysis tools include four styles of report generators and other advanced analysis features. Applaud quickly identifies data structure problems such as redundant data, orphans, failures in data relationship rules, invalid integrity constraints, parent/child issues, etc.
The analysis tools also make is easy to "drill-down" into the data content problems found in profiling to further understand the issues. For example, suppose that profiling identifies unexpected values in a code field. Applaud's analysis tools can quickly identify the exact conditions where the unexpected values exist, and can easily provide detailed reports broken-down by each related business unit for subsequent analysis and resolution.
In many cases, the easiest solution is to provide Applaud's data validation reports to the client with instructions for the data correction. After the data is corrected, the data analysis components that identified the data problems may be easily run again to ensure that all problems are fully resolved.
Sample
Applaud includes advanced sampling tools to assist in selecting samples of large volume data for testing and analysis. Applaud supports random sampling, attribute sampling, interval sampling, and monetary unit sampling. All of Applaud's sampling capabilities can be coupled to Applaud's analytical tools to enable sampling within any component process.
Parse
Multiple data values are often contained in larger unstructured data elements and must be separated (parsed) and analyzed. Applaud includes a family of tools that makes it easy to parse data components. For example, Applaud makes it easy to parse address data into separate fields: "123 South Main Street Suite 456" is parsed into separate components for street number (123), directional (South), street name (Main), street type (Street), unit type (Suite), and unit identifier (456). In another example, a system may have multiple fields for company name, address and person name, but there may not be a consistent way of using the fields. Again, Applaud makes it easy to determine the content of each separate field so it is applied consistently. Applaud's family of parsing tools works with names, street addresses, city/state/zip/county names, as well as other data.
Identify Redundant Data
Applaud speeds the process of identifying redundant/duplicate data from disparate systems. Applaud provides the ability to match any type of data based on any number of flexible criteria. Applaud's rapid application development approach allows the matching tools to be applied flexibly in any combination to solve any cleansing requirement. Multiple criteria may be established. Weighting may be applied to results to obtain "most likely" match combinations. Negative criteria may be applied to reduce the possibility of "false positives".
Name/Address Matching
Applaud also includes comprehensive name/address cleansing tools for matching and consolidating name/address data. Among many other capabilities, Applaud's name/address matching capabilities provide the ability to equate names and words (Bob = Robert = Rob = Robt, etc.); ignore "noise" words (Company, Co, Corp, Corporation, Inc, Ltd, Limited, LLC, etc.), and phonetically encode names/words to find duplicates that may have misspellings (Parsippany = Parsipany = Parsipanie, etc.) Applaud uses these tools to automate the process of identifying the "hard to find" duplicates.
Match Any Data
Applaud's matching features are not limited to only name and address data. Applaud's tools for word replacement, "noise" word elimination, phonetic encoding and duplicate identification are equally valuable in identifying duplicate records in any type of data including products, inventory, services, contracts, assets, etc. For example, the same tools used to match Bob = Robert = Rob = Robt may also be used to equate units of measure, product abbreviations, asset names, services names, etc. The same phonetic encoding tools can be used to match misspellings in any type of data, not just name and address data. Literally the full suite of matching tools may be used for any type of data. Complete flexibility is provided to support the most challenging data cleansing requirement.
Consolidate
Applaud's Rapid Application Development tools make it easy to combine selected data from matched records into a single record and eliminate duplicates. Applaud's Data Repository plays an invaluable role in this process. Data is extracted into the repository for analysis and consolidation. The repository acts as a flexible staging area where various elements of data may be quickly examined and consolidated.
In addition, redundant records must be eliminated. Applaud speeds this process as well, providing a thorough analysis of the results for the client to examine.
Correct/Enhance
Next, invalid data values must be corrected and missing data must be calculated or otherwise provided. This often involves complex transformation rules and additional data sources containing crosswalk/lookup tables to compute the values for selected data elements. 3rd party data may easily be appended as needed. Additional data can be easily inserted and applied at any step of the process. Applaud makes this process easy with its ability to quickly insert complex transformation logic and the ability to easily add any number of crosswalk/lookup tables to any component. Applaud's load tools are then used to load the cleansed data in the correct tables.
Name/Address Cleansing
In addition, Applaud's comprehensive name/address cleansing offers the ability to determine if addresses actually exist, correct addresses to postal standards, and obtain CASS certification. Applaud includes the ability to compute Zip+4 codes, postal codes, county names, geocode addresses (compute longitude and latitude) and provide other address-related information.
Data Entry
Further, Applaud includes the ability to quickly construct data entry screens (with comprehensive data validation) to capture changes for selected data. These screens can be used as needed to enter data changes into the source tables (in the existing system), target tables (in a new system being implemented), or intermediate tables (in the Applaud data repository). Data changes stored in the Applaud data repository can be used to "override" appropriate data values at the "go live" date.
Manual Cleansing Support
Applaud can also interface with Excel for manual cleansing projects. Applaud can build “batches” of data in Excel spreadsheets for manual cleansing and can provide comprehensive validation upon re-importing the data after the cleansing. Cleansed data can be stored in the Applaud data repository until the “go live” date. Controls can be provided that report the status of each batch of data to ensure that every batch is correctly processed. Ongoing evaluation features may be activated that monitor subsequent changes during cleansing and report any issues that may cause problems.
Confirmation
Applaud enables users to dynamically confirm (or reject) match candidates identified during the processing. For example, Applaud can easily produce a spreadsheet of "probable" match candidates thus allowing the client to specify which records should be consolidated (or which should not be consolidated). The spreadsheet data is extracted into Applaud and processed per the specifications provided.
Standardize
In the end, all data must be formatted consistently according to system rules. Applaud makes it easy to standardize the data to ensure a consistent format throughout all records. For example, this might include the normalization of phone numbers, tax ID numbers, part numbers, product information, etc. Every project has its unique challenges. Applaud makes it easy to solve the most complex cleansing requirement.
Analyze Results
Applaud's four styles of report generators and advanced analytical tools are designed to be able to analyze the results of cleansing as well as the original data. This makes it easy to ensure that data issues are fully corrected and resolved. Applaud analysis components may be run at every test date to confirm that the cleansing is successful, and identify any new issues.
Parallel Test Analysis
Applaud's analysis tools can also assist in analyzing the results of parallel tests (i.e., comparing legacy system results to new system results). It is easy to extract data from both the legacy and target systems into Applaud's Data Repository after parallel processes have been run. The data repository makes it easy to normalize/reconcile the two data formats and compare the results. Differences are quickly identified on Applaud's analytical reports, which can include any data from both systems. The result is a very fast way of ensuring the new system operates exactly as expected.
Post-Implementation Audit
Applaud's analytical tools can also be used to perform post-implementation audit and analysis. Applaud can quickly (and if desired continually) test new system data to ensure that processes are operating properly. All of Applaud's analytical tools are independent of the data source. All tools can analyze data in source system tables, Applaud's data repository tables, and target system tables. This makes it easy to ensure that the new system is operating as planned.
Data Correction and Documentation
Applaud can also make corrections to the live production data and can provide simultaneous analysis of the changes made. For example, suppose that it is learned that a specific process has been operating incorrectly, resulting in several RDBMS tables containing incorrect data. Applaud can easily make the correction to the affected data. But more important, Applaud's analytical tools can easily provide a complete audit trail, showing the "before" and "after" values for every change as an integrated part of the correction. With Applaud, all changes are fully documented.
On-Going Data Cleansing
Applaud is also the ideal solution for ongoing data quality initiatives. Applaud data cleansing processes can identify missing/invalid/inconsistent data, correct problems, cleanse addresses, identify duplicates, produce data quality metrics, and build many other data quality-related processes. Everything developed in the initial migration project is reusable in ongoing data quality solutions, providing complete solutions quickly and at very low cost.
Applaud cleansing processes may be run as needed using Applaud's robust batch scheduler. (Hourly, daily, weekly, and monthly processes are supported.) Alternatively, any other batch scheduler may run Applaud processes. |