Surendra Kumar Arivappagari

Bengaluru, KA, INDIA, PIN:560066
surendra.arivappagari@gmail.com

Senior Data Analyst having 8 years of Hands-on experiece in PySpark, BigData, Google Cloud Platform, Hadoop, Tableau, SQL. Python for Data Science certified professional having hands on experience in Pandas, Numpy, MatPlotLib, Pyspark, Seaborn, SkLearn python packages.

Experience

Software Engineer-3 (Data Analyst)

Walmart, Banglore

May 2022 - Present

SQL, Pyspark, Pandas, Numpy, Matplotlib, Seaborn, Bigdata, GCP, Google Bigquery, Google Cloud Storage, Tableau, Git, Github, CI/CD, Looper, Concord, UDP, JIRA, Confluence

. Providing Data and Business solutions to the client by using PySpark, GCP, CI/CD.
. Used python packages (Pyspark, Pandas, Numpy, MatPlotLib, re, Seaborn, Logging, JSON, Exception) to connect to different source systems(SAP, CILL, Sharepoint) and fetch the data store it in Google Could Platform.
. Maintained Landing, Staging, Standard layers in Google Cloud Storage to clean, Harmonize, Transform the data into Google BigQuery.
. Experienced in Google Cloud Services like Google Biq Query, Google Cloud Storage to analyse and store the data.
. Responsible for writing the python code and maintained it into version control GitHub.
. Part of Agile software development process by actively involving daily scrum meetings, sprint reviews sessions. Active interaction with team members and Clients in review meetings to evaluate the progress and performance of applications.
. Responsible for generating Data analysis and visualization graphs using Python MatPlotLib, Seaborn package.
. Using CI/CD tools Looper(Jenkins), Concord, UDP platforms to automate the process for generating reports to the clients to provide live data at given scheduled intervals.
. Used Logging functionalities to maitain the logs for reports debugging purpose and sending mails for alert/status of reports for given schedules.
. Experienced in writting Unit testcases, integration testcases and using DQAF(Data Quality Assessment) for tracking the data validation for given reports.
. Well experienced in developing Spark codes in Python using Spark SQL & Data Frames for aggregation transformation and actions for analysis and used filters based on the business requirement.

Senior Data Analyst

Tech Mahindra, Banglore

February 2020 - May 2022

PySpark, Pandas, Numpy, Matplotlib, Seaborn, Bigdata, Hadoop, HDFS, Tableau, SQL, GIT, BitBucket, Impala, HIVE, PCR, JIRA, Confluence

. Providing Data and Business analysis solutions to the client.
. Used python packages (Pyspark, Pandas, Numpy, MatPlotLib, re, Seaborn, SKlearn) to pull the data from different databases, files and transformed the data into required format for the business purpose.
. Responsible for Data Analysis, Data munging by using PySpark through connecting to large datasets having more than 75 million records and different resources like CSV, Hive tables, Oracle tables and Teradata.
. Developing Spark codes in Python using Spark SQL & Data Frames for aggregation transformation and actions for analysis and used filters based on the business requirement.
. Experienced in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
. Experienced in creating Hive (Hadoop) tables, loading the data and analyzing data using hive queries.
. Experienced in loading data from LINUX file system to HDFS by using UNIX commands.
. Hands on experience in writing and reviewing requirements, architecture documents, test plan, test design and maintaining documents, quality analysis and helping with software release process.
. Responsible for generating Data analysis and visualization graphs using Python MatPlotLib, Seaborn package.
. Hands on experience with Redwood tool and UNIX commands to jobs automation for tables refresh with given intervals.
. Active interaction with team members and Clients in review meetings to evaluate the progress and performance of applications.
. Familiar Version control management tools GIT, Github, BitBucket and with object oriented programming, database design.
. Good exposure with Agile software development process by actively involving daily scrum meetings, sprint reviews sessions.

Data Analyst

TCS (Tata Consultancy Sevices), Chennai

October 2016 - November 2019

PySpark, Pandas, Numpy, Bigdata, Hadoop, SQL, SAS, GIT, GITHUB, Impala, HIVE

. Migration of SAS datasets and their logic to Pyspark dataframes and stored the data in HIVE tables for faster response to the visualization and Data Science teams.
. Collecting the data using Sqoop from Relational Database System to HDFS and vice-versa.
. Using Pandas dataframes and OOP's concepts converted Sales business requirements and providing Oracle tables to Tableau teams for visualization.
. Maintained developed appliations into version controls with Git, GitHub.
. Using JIL Automation tool to automate the job executions with given intervals to refresh the tables with updated data.
. Experienced with Agile software development process by actively involving daily scrum meetings, sprint reviews sessions and meeting the deadlines of User stories.
. Hands on experience in writing and reviewing requirements, architecture documents, test plan, test design and maintaining documents, quality analysis and helping with software release process.

Data Engineer - Internship

Aktha Software Solutions, Hyderabad

May 2015 - July 2015

SQL, SQL Server, BigData

. Responsible for Collecting data from Excel files and converting into SQL tables.
. Applying aggregation functions and window functions on top of very huge data to get insights about the trends of data.
. Removing duplicates and replacing missing data with default values.
. Providing clean data to the customers to take business decisions on top of data.

Education

IIIT, RGUKT, RK-Valley

Bachelor of technology (B-Tech)

Computer Science and Engineering

June 2012 - May 2016

IIIT, RGUKT, RK-Valley

Pre University Course (PUC)

Maths, Physics, Chemistry, Biology, IT

June 2010 - May 2012

APR School, Kodigenahalli

High School

Maths, Physics, Chemistry, Biology

June 2004 - May 2010

Awards & Certifications

Platform Certificate Name & Link

Google Google Data Analytics

Udemy Python for Data Science and Machine Learning Bootcamp

Udemy Python NLP - Natural Language Processing

Udemy Python OpenCV and Deep Learning

Udemy Tensorflow Deep Learning and Artificial Intelligence

Udemy AWS Essentials

Udemy Agile Project Management & Delivery

Udemy Python for Data Structures and Algorithms

Udemy SQL for Data Science

Udemy GitHub Ultimate: Master Git and GitHub

Udemy Probability and Statistics for Business and Data Science

Udemy Hadoop Hands-On Tame your Big Data

Udemy Spark and Python for Big Data with Pyspark

Udemy Tableau Hands-on Advanced Training: Master Tableau in Data Science

Udemy Python3 Complete Bootcamp

Microsoft Python for Data Science - Intermediate

	Platform		Certificate Name & Link
	Google		Google Data Analytics
	Udemy		Python for Data Science and Machine Learning Bootcamp
	Udemy		Python NLP - Natural Language Processing
	Udemy		Python OpenCV and Deep Learning
	Udemy		Tensorflow Deep Learning and Artificial Intelligence
	Udemy		AWS Essentials
	Udemy		Agile Project Management & Delivery
	Udemy		Python for Data Structures and Algorithms
	Udemy		SQL for Data Science
	Udemy		GitHub Ultimate: Master Git and GitHub
	Udemy		Probability and Statistics for Business and Data Science
	Udemy		Hadoop Hands-On Tame your Big Data
	Udemy		Spark and Python for Big Data with Pyspark
	Udemy		Tableau Hands-on Advanced Training: Master Tableau in Data Science
	Udemy		Python3 Complete Bootcamp
	Microsoft		Python for Data Science - Intermediate