Prometheus Relabel Example Jobs in Usa
51 positions found — Page 3
LTIMindtree is an equal opportunity employer that is committed to diversity in the workplace. Our employment decisions are made without regard to race, color, creed, religion, sex (including pregnancy, childbirth or related medical conditions), gender identity or expression, national origin, ancestry, age, family-care status, veteran status, marital status, civil union status, domestic partnership status, military service, handicap or disability or history of handicap or disability, genetic information, atypical hereditary cellular or blood trait, union affiliation, affectional or sexual orientation or preference, or any other characteristic protected by applicable federal, state, or local law, except where such considerations are bona fide occupational qualifications permitted by law.
A little about us...
Role: AWS DevOps Engineer
Location: Charlotte, NC
Salary: Market Rate
Job Description:
We are seeking a highly skilled Senior DevOps Engineer with strong expertise in AWS cloud infrastructure automation databases and modern containerized environments The ideal candidate will have experience designing implementing and maintaining scalable secure and reliable systems while enabling fast and efficient development workflows You will work closely with development architecture and operations teams to build robust CICD pipelines automate infrastructure provisioning and ensure high availability of business critical applications
Key Responsibilities:
- Design implement and manage AWS cloud infrastructure EC2 S3 Lambda ECSEKS etc with scalability and security in mind
- Develop and maintain Infrastructure as Code IaC using Terraform
- Build manage and optimize Docker base images and containerized application stacks
- Orchestrate and maintain Kubernetes EKS clusters for production and staging environments
- Set up manage and optimize CICD pipelines in GitLab to support fast reliable deployments
- Manage MCP servers and ensure reliable operations for critical services
- Automate operational tasks and workflows using Python and JavaScript
- Support fullstack teams React Nodejs by providing containerized environments and deployment strategies
- Manage and optimize databases SQL PostgreSQL for performance security and scalability
- Integrate and manage AWS streaming services Kinesis MSK Kafka or similar for realtime data pipelines
- Implement container image security scanning governance and lifecycle management
- Monitor system performance availability and cost implementing proactive improvements
- Ensure compliance with security and governance standards across cloud infrastructure and database layers
- Collaborate with developers and architects to improve application delivery scalability and resilience
Required Skills Qualifications:
- 8 years of experience in DevOps Cloud Infrastructure
- Strong Handson experience with AWS services EC2 S3 ECSEKS Lambda VPC IAM CloudWatch Kinesis MSK
- Proficiency in Terraform for infrastructure automation
- Expertise with Docker including base image creation and Kubernetes orchestration
- Strong scripting programming skills in Python and JavaScript
- Experience with GitLab CICD for pipelines automation and environment management
- Strong database experience with SQL and PostgreSQL setup scaling replication performance tuning
- Exposure to streaming architectures AWS Kinesis Kafka MSK or similar
- Experience supporting React based applications from a DevOps perspective
- Familiarity with MCP servers and containerized service deployments
- Knowledge of cloud cost optimization and security best practices
- Strong problem-solving troubleshooting and communication skills
- Preferred Qualifications
- AWS certifications eg AWS Certified Solutions Architect DevOps Engineer Professional
- Experience with monitoring observability tools Prometheus Grafana ELK Datadog
- Knowledge of networking load balancing and distributed system design
- Familiarity with Agile Scrum methodologies
Skills
- Mandatory Skills : AWS Lambda, Docker, Python
- Good to Have Skills : Ansible, Git, Kubernetes
LTIMindtree is an equal opportunity employer that is committed to diversity in the workplace. Our employment decisions are made without regard to race, color, creed, religion, sex (including pregnancy, childbirth or related medical conditions), gender identity or expression, national origin, ancestry, age, family-care status, veteran status, marital status, civil union status, domestic partnership status, military service, handicap or disability or history of handicap or disability, genetic information, atypical hereditary cellular or blood trait, union affiliation, affectional or sexual orientation or preference, or any other characteristic protected by applicable federal, state, or local law, except where such considerations are bona fide occupational qualifications permitted by law.
LTIMindtree is an equal opportunity employer that is committed to diversity in the workplace. Our employment decisions are made without regard to race, color, creed, religion, sex (including pregnancy, childbirth or related medical conditions), gender identity or expression, national origin, ancestry, age, family-care status, veteran status, marital status, civil union status, domestic partnership status, military service, handicap or disability or history of handicap or disability, genetic information, atypical hereditary cellular or blood trait, union affiliation, affectional or sexual orientation or preference, or any other characteristic protected by applicable federal, state, or local law, except where such considerations are bona fide occupational qualifications permitted by law.
A little about us...
Role: Azure DevOps Engineer
Location: Berkeley Heights, NJ
Job Description:
1. Extensive hands-on experience on GitHub Actions writing workflows in YAML using re-usable templates
2. Extensive hands-on experience with application CI/CD pipelines both for Azure and on-prem for different frameworks
3. Hands on experience with Azure DevOps and migration programs of CI/CD pipelines preferably from Azure DevOps to GitHub Actions
4. Proficiency in integrating and consuming REST APIs to achieve automation through scripting
5. Hands on experience with atleast 1 scripting language and has done out of box automations for platforms like People Soft, SharePoint, MDM etc
6. Hands on experience with CI/CD of databases
7. Good to have experience with infrastructure-as-code including ARM templates Terraform Azure CLI Azure PowerShell modules
8. Exposure to monitoring tools like ELK Prometheus Grafana
LTIMindtree is an equal opportunity employer that is committed to diversity in the workplace. Our employment decisions are made without regard to race, color, creed, religion, sex (including pregnancy, childbirth or related medical conditions), gender identity or expression, national origin, ancestry, age, family-care status, veteran status, marital status, civil union status, domestic partnership status, military service, handicap or disability or history of handicap or disability, genetic information, atypical hereditary cellular or blood trait, union affiliation, affectional or sexual orientation or preference, or any other characteristic protected by applicable federal, state, or local law, except where such considerations are bona fide occupational qualifications permitted by law.
Hi
I hope you’re doing well.
My name is Sai, and I’m an Account Manager with Astir IT Solutions. We are currently working with our client on a senior-level opportunity for Agentic AI QA Engineer at Dallas, TX (Need Locals)!
Based on your background, I believe this role could be a strong fit.
Job Title: Agentic AI QA Engineer
Location: Dallas, TX (Need Locals)
Experience: 7+ years
Position type: Contract W2/C2C
Required Qualifications
• 7+ years in Software QA/Testing, with 2+ years in AI/ML or LLM-based systems; hands-on experience testing agentic/multi-agent architectures.
• Strong programming skills in Python or TypeScript/JavaScript; experience building test harnesses, simulators, and fixtures.
• Experience with LLM evaluation (exact/soft match, BLEU/ROUGE, BERTScore, semantic similarity via embeddings), guardrails, and prompt testing.
• Expertise in distributed systems testing latency profiling, resiliency patterns (circuit breakers, retries), chaos engineering, and message queues.
• Familiarity with orchestration frameworks (LangChain, LangGraph, LlamaIndex, DSPy, OpenAI Assistants/Actions, Azure OpenAI orchestration, or similar).
• Proficiency with CI/CD (GitHub Actions/Azure DevOps), observability (OpenTelemetry, Prometheus/Grafana, Datadog), and feature flags/canaries.
• Solid understanding of privacy/security/compliance in AI systems (PII handling, content policies, model safety).
• Excellent communication and leadership skills; proven ability to work cross-functionally with Ops, Data, and Engineering.
Preferred Qualifications
• Experience with multi-agent simulators, agent graph testing, and tooling latency emulation.
• Knowledge of MLOps (model versioning, datasets, evaluation pipelines) and A/B experimentation for LLMs.
• Background in cloud (AWS), serverless, containerization, and event-driven architectures.
- Prior ownership of cost/latency/SLAs for AI workloads in production
If you are currently open to new opportunities, I would appreciate the chance to connect and discuss this role in more detail. Please let me know a convenient time for a quick call, or feel free to share your updated resume.
Looking forward to hearing from you.
Thanks & Regards.
Sai
Sr. Account Manager
Astir IT Solutions, Inc.
ID: , Contact: 732-694-6000 * 795
We are seeking an experienced Cloud Platform Engineer with deep expertise in Red Hat OpenShift and strong Linux systems engineering background. This role will be responsible for designing, building, and operating large-scale OpenShift platforms within on-premises datacenter environments.
The ideal candidate will work closely with SRE teams and Program Management to drive the successful implementation, scaling, and operationalization of enterprise-grade OpenShift infrastructure.
Key Responsibilities
1. Platform Engineering
- Design, deploy, and manage enterprise-scale Red Hat OpenShift clusters in on-prem datacenter environments.
- Architect highly available, scalable, and secure OpenShift platforms.
- Implement cluster lifecycle management (installation, upgrades, patching, scaling).
- Configure networking, storage, ingress, and security components for OpenShift.
2. Infrastructure Build & Automation
- Build and automate infrastructure in datacenter environments (compute, storage, networking).
- Integrate OpenShift with virtualization platforms (VMware/other hypervisors as applicable).
- Develop Infrastructure-as-Code (IaC) solutions using tools such as Terraform, Ansible, or similar.
- Implement CI/CD pipelines for platform deployments and updates.
3. Linux Systems Engineering
- Provide deep Linux system administration and troubleshooting support.
- Optimize OS-level configurations for performance, reliability, and security.
- Automate system configuration and compliance management.
- Diagnose and resolve complex kernel, networking, and storage issues.
4. Reliability & Operations
- Partner closely with the SRE team to establish SLOs, SLIs, monitoring, and alerting.
- Drive observability implementation (logging, metrics, tracing).
- Participate in incident management, root cause analysis (RCA), and remediation.
- Ensure platform resiliency, performance tuning, and capacity planning.
5. Program & Cross-Functional Collaboration
- Work with Program Management to drive large-scale OpenShift implementation milestones.
- Provide technical input into roadmap planning, timelines, and risk mitigation.
- Collaborate with security, networking, storage, and application teams.
- Document architecture, standards, and operational procedures.
6. Security & Compliance
- Implement RBAC, security policies, and compliance controls within OpenShift.
- Harden clusters according to enterprise security standards.
- Support vulnerability management and patch governance processes.
Required Qualifications
- 5+ years of experience in Linux systems engineering (RHEL preferred).
- 3+ years of hands-on experience with Red Hat OpenShift (OCP 4.x preferred).
- Proven experience building infrastructure in on-prem datacenter environments.
- Strong understanding of:
- Kubernetes architecture
- Networking (DNS, load balancing, firewalls, SDN)
- Storage (SAN, NAS, CSI drivers)
- Virtualization platforms (VMware, etc.)
- Experience with automation tools (Terraform, Ansible, GitOps).
- Strong troubleshooting and problem-solving skills.
Preferred Qualifications
- Red Hat certifications (RHCE, OpenShift Certification).
- Experience implementing OpenShift at enterprise scale (multi-cluster environments).
- Experience working in SRE-driven environments.
- Knowledge of DevOps/GitOps practices.
- Experience with monitoring tools (Prometheus, Grafana, ELK, etc.).
Job Title: Windows SRE – Vulnerability Management & PowerShell
Location: Onsite
Experience: 8+ Years
Job Summary:
Looking for a Windows SRE with strong experience in managing enterprise Windows environments, vulnerability remediation, and automation using PowerShell. The role focuses on improving system reliability, security, and operational efficiency.
Main Skills Required:
- Windows Server Administration (2016/2019/2022)
- Vulnerability Management (Qualys / Tenable / Nessus / Rapid7)
- PowerShell Scripting & Automation
- Patch Management (SCCM / WSUS / Intune)
- Active Directory & Group Policy
- SRE / Production Support Experience
- Monitoring Tools (Splunk / Datadog / Prometheus)
- Incident Management & Root Cause Analysis
- Security Hardening & Compliance (CIS / NIST)
- Cloud Exposure (Azure / AWS)
- Infrastructure Automation (Ansible / Terraform)
Job Title: Rotating Equipment Planner
Location: Baytown TX
Duration: indefinite
Rate: $50-$60 per hour DOE
Description:
Position Summary
The Rotating Equipment Planner specializes in planning, scheduling, and coordinating maintenance activities for critical rotating equipment (pumps, compressors, turbines, motors, gearboxes, cooling towers, etc.). This role prepares detailed plans for non-emergency maintenance work selected through the Risk Based Work Selection (RBWS) process, ensuring optimal equipment reliability and performance while minimizing production downtime.
Key Responsibilities
• Planning: Develop detailed work plans for rotating equipment maintenance, including precision alignments, vibration analysis, and bearing replacements with appropriate man-hour and cost estimates
• Technical Expertise: Apply specialized knowledge of rotating equipment mechanics, tolerances, and failure modes to develop effective maintenance strategies and troubleshooting procedures
• Materials Management: Ensure critical rotating equipment spare parts (bearings, seals, couplings) are properly inventoried and available; create and maintain Bills of Material
• Work Coordination: Coordinate with Contractor Management Coordinator for resource requirements; prioritize maintenance activities between crews and production teams to minimize process disruption
• Documentation & Systems: Create and maintain task lists for repetitive jobs; outline detailed work instructions with safety advice, resources, and tools; close out jobs by entering notification history
• Reliability Improvement: Collaborate with production and technical teams to establish preventive/predictive maintenance plans, including vibration monitoring programs and lubrication schedules
• Backlog Management: Review and purge backlog weekly, distributing 'ready-to-schedule' work; identify and communicate repetitive equipment problems to Asset Engineer
Required Qualifications
• High school diploma or equivalent
• 12 years of heavy industrial maintenance experience OR 7 years with an associate's degree OR 4 years with a bachelor's degree
• Certification from Vocational or Technical school in millwright or verifiable millwright experience
• Demonstrated experience in equipment planning for rotating equipment and cooling towers
• Minimum 2 years planning/scheduling experience
• In-depth knowledge with SAP-PM Maintenance Transactions and Prometheus
• Experience using Microsoft Office Products (Word, Excel, Outlook etc.)
• The eligibility to apply for and obtain a Transportation Worker Identification Credential (TWIC) within a reasonable timeframe
Physical Requirements
• Ability to climb stairs and work at heights up to 100+ feet
• Ability to climb vertical ladders
• Sufficient physical strength to perform requirements safely
• Ability to work at computer workstation for extended periods
Success Metrics Performance measured by quality of planning and meeting established KPIs
Job Title: Redis Admin
Location: NYC, NY (3 days onsite minimum)
Duration: 6 months
The ideal candidate will be responsible for designing, deploying, maintaining, and scaling Kafka clusters in mission-critical environments, while also supporting the Linux-based infrastructure that forms the foundation of our real-time data platform.
Responsibilities
- Manage and maintain Redis instances, ensuring high availability and optimal performance.
- Should possess well-versed experience in Redis administration and management for ex: strong understanding of data structures, caching mechanisms, and performance tuning in Redis.
- Monitor system health, troubleshoot issues, and implement backup and recovery strategies for Redis clusters.
- Configure Redis caching, session management, and data storage.
- Develop and maintain Python scripts for data manipulation, integration, and automation related to Redis.
- Create efficient data processing pipelines to ingest and process data from various sources.
- Python scripting for database interactions and automation tasks. Optimize Python scripts for performance, scalability, and maintainability.
- Work closely with development teams to design and implement Redis-based solutions that meet business requirements.
- Provide technical support and training to team members on Redis functionalities and Python scripting best practices.
- Document Redis configurations, Python scripts, and integration workflows for knowledge sharing and compliance.
- Generate performance reports and dashboards to monitor Redis usage and efficiency
Qualifications
- BE/B Tech/MCA
- Excellent written and verbal communication skills
Preferred Qualifications/ Skills
- Experience with Redis clustering, caching strategies, and distributed systems
- Familiarity with monitoring tools like Prometheus and ELK Stack and cloud solutions like AWS ElastiCache
- Preferred experience running Redis on Kubernetes and familiarity with Redis modules like RedisJSON
- Working experience with OpenShift Kubernetes Cloud services to deploy Redis cluster using vendor provided docker/helm charts
- Redis cluster monitoring & alerting
- Optimizing Redis cluster performance using Jvm tuning & profiling.
Build and scale enterprise Kafka infrastructure using Confluent Cloud and Platform across hybrid environments. Design event-driven architectures, automate deployments with Terraform/CI/CD, optimize performance, ensure security compliance, and troubleshoot distributed streaming systems at scale.
Must Have:
- 5+ years Kafka (2+ years Confluent Cloud/Platform, Kafka Connect, Schema Registry, ksqlDB)
- Expertise in hybrid cloud Kafka deployments (AWS/Azure/GCP + on-prem)
- Strong automation (Terraform, Ansible, Jenkins) and programming (Java, Python, Scala)
- Experience with monitoring/troubleshooting distributed systems (Splunk, Datadog, Prometheus)
- Security expertise (Kerberos, SSL, RBAC) and compliance knowledge (GDPR, SOC, PCI)
You'll Build: Scalable Kafka clusters • Event-driven architectures • Automated CI/CD pipelines • Observability frameworks • Secure, compliant streaming platforms.
Please find the JD
Role: Site Reliability Engineer
Location: Columbus, OH (Onsite)
- 8+ years of Software Engineering experience
- 4+ years of experience in Site Reliability Engineering teams with continued focus on improving Platform health
- Familiar with Agile or other rapid application development practices
- Hands-on expertise in building dashboards using APM tools.
- Experience with distributed (multi-tiered) systems, algorithms, relational databases, and NoSQL databases.
- Knowledge & Exposure caching tools (Redis, memcache) or messaging tools such as MQ, Kafka.
- Must have working knowledge of APM tools such as splunk, GCL, ELK, Grafana, Prometheus etc.
Gopi Pabbu
Resource Specialist
Yochana IT Solutions Inc
Mail Id :
Seeking Senior Cloud Infrastructure Engineers - Hybrid Roles in Palo Alto, CA (Remote Option Not Available)
Duration: 5+ months with possibility of longer term extensions
Pay rate: $55/hr on W2
If interested, please email me your resume at
Please Note: Client is not open to C2C, H1B, TN Visa, 1099, F1 – CPT & OPT at this time.
*Must be located/authorized to work in the US without visa sponsorship or transfer now or in the future. No C2C inquiries, please
Role and Responsibilities:
Manage AWS environment using Control Tower, EKS, EC2, S3, IAM, and related services.
Administer and troubleshoot Kubernetes (EKS) and EC2 instances, including patching, lifecycle management, and performance optimization.
Provision and manage infrastructure as code using Terraform (CloudFormation a plus).
Triage and resolve ServiceNow tickets for OS and cloud issues, including vulnerability remediation, ensuring compliance with SLAs.
Automate operational tasks using Python scripting.
Drive cloud resource lifecycle activities, including commissioning/decommissioning, backups, patching, DR activities, and cost optimization.
Implement and maintain monitoring and observability (CloudWatch, Prometheus, Grafana, etc.); build/run operational runbooks and playbooks.
Design, deploy, and securely manage networks (LAN/WAN/VPN/firewalls/routers/switches) across AWS and on-premise hybrid environments.
Integrate, secure, and troubleshoot Windows and Linux systems, with expertise in Active Directory, clustering, and Hyper-V.
Collaborate with security, operations, and application teams; participate in on-call and incident response rotations.
Work with security tools (AWS WAF, Inspector, Macie); experience with vulnerability remediation workflows.
Deep knowledge of AWS cost management and optimization practices.
Proven skills in documentation, process improvement, and compliance in regulated environments.
Experience with container orchestration (Docker/Kubernetes), CI/CD tooling, and automation pipelines.
Support and manage Zscaler Cloud proxy, policies, certificate management, and API integrations.
Excellent communication, collaboration, and problem-solving abilities.
Required Qualifications:
Bachelor's degree in Computer Science/IT preferred.
Minimum 7 years' experience in cloud and network engineering (focused on AWS, Windows, and Linux).
Expert knowledge of AWS core services (EC2, S3, EKS, RDS, IAM), networking protocols (TCP/IP, BGP, OSPF).
Experience in Terraform and Python.
Strong background in cloud and network security, compliance, and automation.
Track record of supporting production environments in high-growth, fast-paced organizations.