Senior Data Analyst
Tech Mahindra, Banglore
February 2020 - May 2022
PySpark, Pandas, Numpy, Matplotlib, Seaborn, Bigdata, Hadoop, HDFS, Tableau, SQL, GIT, BitBucket, Impala, HIVE, PCR, JIRA, Confluence
. Providing Data and Business analysis solutions to the client.
. Used python packages (Pyspark, Pandas, Numpy, MatPlotLib, re, Seaborn, SKlearn) to pull the data from different databases, files and transformed the data into required format for the business purpose.
. Responsible for Data Analysis, Data munging by using PySpark through connecting to large datasets having more than 75 million records and different resources like CSV, Hive tables, Oracle tables and Teradata.
. Developing Spark codes in Python using Spark SQL & Data Frames for aggregation transformation and actions for analysis and used filters based on the business requirement.
. Experienced in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
. Experienced in creating Hive (Hadoop) tables, loading the data and analyzing data using hive queries.
. Experienced in loading data from LINUX file system to HDFS by using UNIX commands.
. Hands on experience in writing and reviewing requirements, architecture documents, test plan, test design and maintaining documents, quality analysis and helping with software release process.
. Responsible for generating Data analysis and visualization graphs using Python MatPlotLib, Seaborn package.
. Hands on experience with Redwood tool and UNIX commands to jobs automation for tables refresh with given intervals.
. Active interaction with team members and Clients in review meetings to evaluate the progress and performance of applications.
. Familiar Version control management tools GIT, Github, BitBucket and with object oriented programming, database design.
. Good exposure with Agile software development process by actively involving daily scrum meetings, sprint reviews sessions.