Vivek - Welcome to CazVid!

Charlotte,North Carolina,United Statesen

Contact via CazVid

Published on CazVid

Skills

PythonIntermediate

Demonstrated experience with Python.

ScalaIntermediate

Demonstrated experience with Scala.

JavaIntermediate

Demonstrated experience with Java.

HadoopIntermediate

Demonstrated experience with Hadoop.

SparkIntermediate

Demonstrated experience with Spark.

Languages

en(Native)

Experience

Data Engineer II

State Farm

December 2023 - Present

Orchestrated the deployment, automation, and maintenance of AWS cloud-based production systems using Apache Airflow, AWS Step Functions, and CloudFormation, ensuring availability, performance, and scalability., Established Infrastructure as Code (IaaS) using CloudFormation to provision AWS services, applying monthly critical patches via AWS Patch Manager. Implemented IAM roles and policies to ensure secure access pipeline., Led the migration of sensitive financial data from on-premises SQL Server to Amazon Redshift data warehouse via S3 and Lake Formation, enabling centralized, secure, and governed access to analytics-ready datasets., Designed and implemented scalable ETL pipelines using PySpark in AWS Glue to perform data cleansing, transformation, and feature engineering in collaboration with data science teams. Utilized AWS Glue Crawlers and the Glue Data Catalog for schema inference and metadata management., Built real-time ingestion pipelines from Kinesis/MSK to S3, triggering Glue jobs using Lambda and Step Functions, automating data processing and improving latency by 20%., Processed and transformed large Parquet datasets using PySpark and Spark SQL, creating Hive-compatible tables and leveraging partitioning, broadcast joins, and in-memory optimization techniques., Involved in LLM fine-tuning, prompt evaluation workflows, and prototyping retrieval workflows to enable basic GenAI applications., Improved Redshift performance by 3x through schema optimization using a star schema. Integrated Glue Data Quality for validation rules and AWS CloudWatch for monitoring and alerting on data pipeline failures and delays.

Data Engineer

Evoke Technologies

December 2021 - July 2023

Actively engaged with key components within the Hadoop Ecosystem including Spark, HDFS, HIVE, HBase, Zookeeper, Sqoop, and Oozie., Developed Sqoop jobs to seamlessly ingest data from diverse systems of records into the Enterprise Data Lake., Created Spark jobs in PySpark and SparkSQL to operate on Hive tables, generating transformed datasets for downstream utilization., Installed and configured Hadoop MapReduce, HDFS, and created multiple MapReduce jobs in Java and Scala for data cleaning and preprocessing., Designed ETL jobs using Spark-Scala to migrate data from Oracle to new Cassandra tables., Leveraged Spark-Scala (RDDs, DataFrames, Spark SQL) and Spark-Cassandra for tasks such as data migration and business report generation., Employed Data Build Tool for transformations in the ETL process, alongside AWS Lambda and SQS. Extensively worked with AWS services such as EC2, S3, VPC, Appflow, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS., Integrated CI/CD pipelines using Jenkins for automated deployment of Spark and ETL jobs.