Surendra Kumar Arivappagari

   Bengaluru, KA, INDIA, PIN:560066
   surendra.arivappagari@gmail.com

Senior Data Analyst having 8 years of Hands-on experiece in PySpark, BigData, Google Cloud Platform, Hadoop, Tableau, SQL. Python for Data Science certified professional having hands on experience in Pandas, Numpy, MatPlotLib, Pyspark, Seaborn, SkLearn python packages.


  Experience

  Software Engineer-3 (Data Analyst)

  Walmart, Banglore
  May 2022 - Present
 SQL, Pyspark, Pandas, Numpy, Matplotlib, Seaborn, Bigdata, GCP, Google Bigquery, Google Cloud Storage, Tableau, Git, Github, CI/CD, Looper, Concord, UDP, JIRA, Confluence
.  Providing Data and Business solutions to the client by using PySpark, GCP, CI/CD.
.  Used python packages (Pyspark, Pandas, Numpy, MatPlotLib, re, Seaborn, Logging, JSON, Exception) to connect to different source systems(SAP, CILL, Sharepoint) and fetch the data store it in Google Could Platform.
.  Maintained Landing, Staging, Standard layers in Google Cloud Storage to clean, Harmonize, Transform the data into Google BigQuery.
.  Experienced in Google Cloud Services like Google Biq Query, Google Cloud Storage to analyse and store the data.
.  Responsible for writing the python code and maintained it into version control GitHub.
.  Part of Agile software development process by actively involving daily scrum meetings, sprint reviews sessions. Active interaction with team members and Clients in review meetings to evaluate the progress and performance of applications.
.  Responsible for generating Data analysis and visualization graphs using Python MatPlotLib, Seaborn package.
.  Using CI/CD tools Looper(Jenkins), Concord, UDP platforms to automate the process for generating reports to the clients to provide live data at given scheduled intervals.
.  Used Logging functionalities to maitain the logs for reports debugging purpose and sending mails for alert/status of reports for given schedules.
.  Experienced in writting Unit testcases, integration testcases and using DQAF(Data Quality Assessment) for tracking the data validation for given reports.
.  Well experienced in developing Spark codes in Python using Spark SQL & Data Frames for aggregation transformation and actions for analysis and used filters based on the business requirement.

  Senior Data Analyst

  Tech Mahindra, Banglore
  February 2020 - May 2022
 PySpark, Pandas, Numpy, Matplotlib, Seaborn, Bigdata, Hadoop, HDFS, Tableau, SQL, GIT, BitBucket, Impala, HIVE, PCR, JIRA, Confluence
.  Providing Data and Business analysis solutions to the client.
.  Used python packages (Pyspark, Pandas, Numpy, MatPlotLib, re, Seaborn, SKlearn) to pull the data from different databases, files and transformed the data into required format for the business purpose.
.  Responsible for Data Analysis, Data munging by using PySpark through connecting to large datasets having more than 75 million records and different resources like CSV, Hive tables, Oracle tables and Teradata.
.  Developing Spark codes in Python using Spark SQL & Data Frames for aggregation transformation and actions for analysis and used filters based on the business requirement.
.  Experienced in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
.  Experienced in creating Hive (Hadoop) tables, loading the data and analyzing data using hive queries.
.  Experienced in loading data from LINUX file system to HDFS by using UNIX commands.
.  Hands on experience in writing and reviewing requirements, architecture documents, test plan, test design and maintaining documents, quality analysis and helping with software release process.
.  Responsible for generating Data analysis and visualization graphs using Python MatPlotLib, Seaborn package.
.  Hands on experience with Redwood tool and UNIX commands to jobs automation for tables refresh with given intervals.
.  Active interaction with team members and Clients in review meetings to evaluate the progress and performance of applications.
.  Familiar Version control management tools GIT, Github, BitBucket and with object oriented programming, database design.
.  Good exposure with Agile software development process by actively involving daily scrum meetings, sprint reviews sessions.

  Data Analyst

  TCS (Tata Consultancy Sevices), Chennai
  October 2016 - November 2019
 PySpark, Pandas, Numpy, Bigdata, Hadoop, SQL, SAS, GIT, GITHUB, Impala, HIVE
.  Migration of SAS datasets and their logic to Pyspark dataframes and stored the data in HIVE tables for faster response to the visualization and Data Science teams.
. Collecting the data using Sqoop from Relational Database System to HDFS and vice-versa.
. Using Pandas dataframes and OOP's concepts converted Sales business requirements and providing Oracle tables to Tableau teams for visualization.
. Maintained developed appliations into version controls with Git, GitHub.
. Using JIL Automation tool to automate the job executions with given intervals to refresh the tables with updated data.
. Experienced with Agile software development process by actively involving daily scrum meetings, sprint reviews sessions and meeting the deadlines of User stories.
. Hands on experience in writing and reviewing requirements, architecture documents, test plan, test design and maintaining documents, quality analysis and helping with software release process.

  Data Engineer - Internship

  Aktha Software Solutions, Hyderabad
  May 2015 - July 2015
 SQL, SQL Server, BigData
.  Responsible for Collecting data from Excel files and converting into SQL tables.
.  Applying aggregation functions and window functions on top of very huge data to get insights about the trends of data.
.  Removing duplicates and replacing missing data with default values.
.  Providing clean data to the customers to take business decisions on top of data.

  Education

  IIIT, RGUKT, RK-Valley

     Bachelor of technology (B-Tech)
           Computer Science and Engineering
June 2012 - May 2016

  IIIT, RGUKT, RK-Valley

     Pre University Course (PUC)
           Maths, Physics, Chemistry, Biology, IT
June 2010 - May 2012

  APR School, Kodigenahalli

     High School
           Maths, Physics, Chemistry, Biology
June 2004 - May 2010


  Skills  &  Technologies

Programming Languages & Tools
    Python Hadoop Spark Numpy Pandas MatPlotLib Seaborn Plotly
    SKLearn Tensorflow Impala Tableau Bitbucket Git AWS Linux Agile

  Blogs  &  Publications

Below are detailed blogs which are helpful for learners in the field of Data Science domain.

Blog 1). SQL + Python + Spark Tutorials

Blog 1). SQL + Python + Spark Tutorials

Blog 2). Pandas Tutorials

Blog 2). Pandas Tutorials

Blog 3). Numpy Tutorials

Blog 3). Numpy Tutorials

Blog 4). MatPlotLib Tutorials

Blog 4). MatPlotLib Tutorials

Blog 5). Seaborn Tutorials

Blog 5). Seaborn Tutorials

Blog 6). Data Science Preprocessing Project Tutorials

Blog 6). Data Science Preprocessing Project Tutorials


  Interests

Apart from being a Data Analyst, I love to do: