Data Discovery Platform for one of the Largest Casino & Hotel in US
Overview.
Initial objective was to develop a solution which can assist in identifying object dependencies while migrating 6500+ tables of a legacy DWH from on-premise Teradata system to Modernized Data Lake + DWH on Azure Cloud (as part of a large scale Data Modernization Program).
Coforge developed a data discovery solution in the first phase that would crawl different objects (Tables, Views, Users, BTEQ, T-Pump Scripts etc.) in Teradata and downstream Tableau reports, and persist the linkages in a polyglot storage.
Solution was developed using technologies like – Neo4J, ElasticSearch, Python, Spline, SQL Database, NodeJS and Angular.
Solution is getting used as a Data discovery Platform and answers questions like:
What all will get impacted directly/indirectly if I drop/modify this dataset?
Does schema of Table A in QA environment reconciles with schema in Prod environment?
From Where Did this dataset ended up in my data Lake?
Which datasets have Customer Phone Numbers Stored?
What access does Jerry have across all data assets within the enterprise?
The Impact.
Highly interactive way of dependency checking.
Significantly reduced failures because of migration and cut-over.
Improved Regulatory Compliance using PII Data Discovery