Alibaba Cloud Linux Ubuntu Jobs in Usa

2,267 positions found — Page 2

Cloud Infrastructure Support Specialist (FLORENCE)
Salary not disclosed
FLORENCE, Kentucky 3 days ago
Jabil is a product solutions company providing comprehensive design, manufacturing, supply chain and product management services. Operating from over 100 facilities in 29 countries, Jabil delivers innovative, integrated, and tailored solutions to customers across a broad range of industries and end-markets, such as automotive, consumer lifestyle and wearable tech, defense and aerospace, connected home and building, industrial and energy, enterprise and infrastructure, healthcare, mobility, packaging and printing.

How will you make an impact?

- As a Site Reliability Technician within Jabil’s Cloud Test Software Development team, you will directly contribute to the daily operations and development of our Cloud Test Platform Infrastructure deployed at multiple production facilities worldwide.

Working Hours:

- 12 HOUR SHIFT

- 6 AM TO 6 PM

- Sunday, Monday, Tuesday, Alternating Wednesdays

What will you do?

- As the Site Reliability Technician, you will provide the first line response to production issues including but not limited to outages, end user performance, change management, monitoring, improving the efficiency and usability of production test infrastructure and applications, and ensuring all site test infrastructure software and hardware is maintained with the latest updates to ensure high levels of performance and reliability.

How will you get here?

- Sustaining support and maintenance for the manufacturing server (L10) and rack (L11-L12) level test software and infrastructure deployed at our production facilities.

- Support the site’s manufacturing server (L10) and rack (L11-L12) current test infrastructure as well as future expansion planning, deployments, and assembly.

- Maintain manufacturing server (L10) and rack (L11-L12) test infrastructure documentation of installations, upgrades, and management.

- Communicate manufacturing test infrastructure enhancements while providing insights based on site operations and uptime challenges.

- Support manufacturing test incident response, analysis, and corrective actions for the site operations.

- Participate in closed loop analysis/responses to factory test failures.

- Perform scheduled preventive maintenance on the test infrastructure, including MDF, IDF, and SUT TORs

Experience:

- Experience in the following programming/scripting languages:

- Python,

- Java,

- BASH,

- C, C++, experience a plus

- Understanding of Linux fundamentals:

- CentOS

- Ubuntu

- Familiarity with hardware and API solutions for controlling, managing and stressing L10 devices (servers, network and storage SSDs, NVMe):

- IPMI,

- Redfish,

- mprime,

- FIO,

- Linpack,

- ptugen,

- memtester

- Familiarity in the creation and configuration (DHCP, PXE boot, nginx) of Virtual Machines (VMs) using VMWare is a plus.

- Experience with leading edge networking systems, hardware, software, and protocols including but not limited to enterprise ethernet datacenter switching/routing L1, L2, and L3 (BGP, DHCP Relay, ECMP). Arista CloudVision is a plus.

- Experience with networking systems, hardware, software, and protocols including but not limited to enterprise ethernet datacenter switching/routing (L1 – L3). · Demonstrated systematic problem-solving capability, coupled with effective communication skills and a sense of ownership and drive.

What Can Jabil Offer You?

Along with growth, stability, and the opportunity to be challenged, Jabil offers a competitive benefits package that includes:

- Medical, Dental, Prescription Drug, and Vision Insurance with HRA and HSA options

- 401K Match

- Employee Stock Purchase Plan

- Paid Time Off

- Tuition Reimbursement

- Life, AD&D, and Disability Insurance

- Commuter Benefits

- Employee Assistance Program

- Pet Insurance

- Adoption Assistance

- Annual Merit Increases

- Community Volunteer Opportunities
temporary
Senior Cloud Test Automation Engineer (FLORENCE)
✦ New
🏢 JABIL CIRCUIT, INC
Salary not disclosed
FLORENCE, Kentucky 1 day ago
Operating from over 100 facilities in 29 countries, Jabil delivers innovative, integrated, and tailored solutions to customers across a broad range of industries and end-markets, such as automotive, consumer lifestyle and wearable tech, defense and aerospace, connected home and building, industrial and energy, enterprise and infrastructure, healthcare, mobility, packaging and printing.

How will you make an impact?

- Jabil is seeking a Sr. Manufacturing Cloud Test Development Engineer who will directly contribute to the transformative growth within our Enterprise and Infrastructure division by applying unique and innovative approaches to solving problems within a large-scale software production environment.

- The Software Test Development Engineer plays a vital role in ensuring the quality and reliability of hardware products, contributing to the overall success of the manufacturing process and customer satisfaction.

- You will be responsible for contributing to the end-to-end architecture, definition, development and production deployment of production software applications and infrastructure spanning multiple customers and manufacturing regions.

- As the Sr. Manufacturing Cloud Test Development Engineer, you will also be responsible for interfacing internal engineering, manufacturing and quality teams and our end customers to ensure your software deliverables meet the rigorous standards of Jabil’s world-class manufacturing environments.

What will you do?

Test System Development:

- Design and develop test systems and procedures for manufacturing processes. This includes creating test plans, test cases, and test scripts to assess the functionality and performance of hardware components or devices such as

- motherboard,

- memory,

- CPU, storage (SSD, HDD, NVMe) and

- PCIE devices (NIC, GPU, Mezz cards, RAID cards)

Test Software Development:

- Create, validate, release, and maintain test software and scripts that automate the testing process. This software may include code for controlling test equipment, collecting, and analyzing data, and generating test reports.

Sustaining Test:

- Support and maintenance for the manufacturing server (L10) and rack (L11) level test software and infrastructure deployed at our production facilities, including the implementation of minor system configuration changes (new IPNs).

Documentation:

- Maintain comprehensive manufacturing server (L10) and rack (L11) documentation of test procedures, specifications, and Infrastructure.

Collaboration:

- Work closely with cross-functional teams, including hardware engineers, manufacturing engineers, and quality assurance personnel, to ensure alignment on testing requirements and quality standards.

Continuous Learning:

- Stay updated on the latest advancements in testing technologies, methodologies, and industry best practices to keep manufacturing processes competitive and up to date.

- Definition and collaboration on overall test infrastructure and application architectures.

Management & Supervisory Responsibilities

- Reports to Management

How will you get here?

- Expertise in the following programming/scripting languages:

- Python,

- BASH,

- C, C++, experience a plus

- Linux development expertise with a solid understanding of its fundamentals:

- CentOS

- Ubuntu

- Expertise with hardware and API solutions for controlling, managing, and stressing L10 devices (servers, network, and storage SSDs, NVMe):

- IPMI,

- Redfish,

- mprime,

- FIO,

- Linpack,

- ptugen,

- memtester

- Expertise in the creation and configuration (DHCP, PXE boot, nginx) of Virtual Machines (VMs), VMWare preferred.

- Expertise with leading edge networking systems, hardware, software, and protocols including but not limited to enterprise ethernet datacenter switching/routing L1, L2, and L3 (BGP, DHCP Relay, ECMP). Arista CloudVision is a plus.

- Experience with code versioning tools (Git preferred).

- Strong knowledge of professional software engineering practices for the complete software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations.

Education:

- BS degree in Electrical/Computer Engineering, Computer Science, or related field. MS preferred.

Experience:

- 5-8 years’ experience in software manufacturing test development/sustaining with enterprise server, storage, or networking products.

- Excellent verbal and written communication skills.

- Experience working in multi-site and multi-cultural environments.

- Domestic and/or international travel, up to 10%, may be required.

What Can Jabil Offer You?

Along with growth, stability, and the opportunity to be challenged, Jabil offers a competitive benefits package that includes:

- Medical, Dental, Prescription Drug, and Vision Insurance with HRA and HSA options

- 401K Match

- Employee Stock Purchase Plan

- Paid Time Off

- Tuition Reimbursement

- Life, AD&D, and Disability Insurance

- Commuter Benefits

- Employee Assistance Program

- Pet Insurance

- Adoption Assistance

- Annual Merit Increases

- Community Volunteer Opportunities
temporary
Cloud Manufacturing Test Development Engineer - Transformative Growth Opportunities (FLORENCE)
✦ New
🏢 JABIL CIRCUIT, INC
Salary not disclosed
FLORENCE, Kentucky 1 day ago
Operating from over 100 facilities in 29 countries, Jabil delivers innovative, integrated, and tailored solutions to customers across a broad range of industries and end-markets, such as automotive, consumer lifestyle and wearable tech, defense and aerospace, connected home and building, industrial and energy, enterprise and infrastructure, healthcare, mobility, packaging and printing.

How will you make an impact?

- Jabil is seeking a Sr. Manufacturing Cloud Test Development Engineer who will directly contribute to the transformative growth within our Enterprise and Infrastructure division by applying unique and innovative approaches to solving problems within a large-scale software production environment.

- The Software Test Development Engineer plays a vital role in ensuring the quality and reliability of hardware products, contributing to the overall success of the manufacturing process and customer satisfaction.

- You will be responsible for contributing to the end-to-end architecture, definition, development and production deployment of production software applications and infrastructure spanning multiple customers and manufacturing regions.

- As the Sr. Manufacturing Cloud Test Development Engineer, you will also be responsible for interfacing internal engineering, manufacturing and quality teams and our end customers to ensure your software deliverables meet the rigorous standards of Jabil’s world-class manufacturing environments.

What will you do?

Test System Development:

- Design and develop test systems and procedures for manufacturing processes. This includes creating test plans, test cases, and test scripts to assess the functionality and performance of hardware components or devices such as

- motherboard,

- memory,

- CPU, storage (SSD, HDD, NVMe) and

- PCIE devices (NIC, GPU, Mezz cards, RAID cards)

Test Software Development:

- Create, validate, release, and maintain test software and scripts that automate the testing process. This software may include code for controlling test equipment, collecting, and analyzing data, and generating test reports.

Sustaining Test:

- Support and maintenance for the manufacturing server (L10) and rack (L11) level test software and infrastructure deployed at our production facilities, including the implementation of minor system configuration changes (new IPNs).

Documentation:

- Maintain comprehensive manufacturing server (L10) and rack (L11) documentation of test procedures, specifications, and Infrastructure.

Collaboration:

- Work closely with cross-functional teams, including hardware engineers, manufacturing engineers, and quality assurance personnel, to ensure alignment on testing requirements and quality standards.

Continuous Learning:

- Stay updated on the latest advancements in testing technologies, methodologies, and industry best practices to keep manufacturing processes competitive and up to date.

- Definition and collaboration on overall test infrastructure and application architectures.

Management & Supervisory Responsibilities

- Reports to Management

How will you get here?

- Expertise in the following programming/scripting languages:

- Python,

- BASH,

- C, C++, experience a plus

- Linux development expertise with a solid understanding of its fundamentals:

- CentOS

- Ubuntu

- Expertise with hardware and API solutions for controlling, managing, and stressing L10 devices (servers, network, and storage SSDs, NVMe):

- IPMI,

- Redfish,

- mprime,

- FIO,

- Linpack,

- ptugen,

- memtester

- Expertise in the creation and configuration (DHCP, PXE boot, nginx) of Virtual Machines (VMs), VMWare preferred.

- Expertise with leading edge networking systems, hardware, software, and protocols including but not limited to enterprise ethernet datacenter switching/routing L1, L2, and L3 (BGP, DHCP Relay, ECMP). Arista CloudVision is a plus.

- Experience with code versioning tools (Git preferred).

- Strong knowledge of professional software engineering practices for the complete software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations.

Education:

- BS degree in Electrical/Computer Engineering, Computer Science, or related field. MS preferred.

Experience:

- 5-8 years’ experience in software manufacturing test development/sustaining with enterprise server, storage, or networking products.

- Excellent verbal and written communication skills.

- Experience working in multi-site and multi-cultural environments.

- Domestic and/or international travel, up to 10%, may be required.

What Can Jabil Offer You?

Along with growth, stability, and the opportunity to be challenged, Jabil offers a competitive benefits package that includes:

- Medical, Dental, Prescription Drug, and Vision Insurance with HRA and HSA options

- 401K Match

- Employee Stock Purchase Plan

- Paid Time Off

- Tuition Reimbursement

- Life, AD&D, and Disability Insurance

- Commuter Benefits

- Employee Assistance Program

- Pet Insurance

- Adoption Assistance

- Annual Merit Increases

- Community Volunteer Opportunities
temporary
Cloud Manufacturing Test Development Engineer
🏢 JABIL CIRCUIT, INC
Salary not disclosed
Florence, KY 4 days ago

Jabil is a product solutions company providing comprehensive design, manufacturing, supply chain and product management services. Operating from over 100 facilities in 29 countries, Jabil delivers innovative, integrated, and tailored solutions to customers across a broad range of industries and end-markets, such as automotive, consumer lifestyle and wearable tech, defense and aerospace, connected home and building, industrial and energy, enterprise and infrastructure, healthcare, mobility, packaging and printing.

How will you make an impact?

  • Jabil is seeking a Software Test Development Engineer who will directly contribute to the transformative growth within our Global cloud Test Development (GCTD) team in the Cloud Enterprise and Intelligent Infrastructure (CE&I) division by applying unique and innovative approaches to solving problems within a large-scale software production environment.

  • As the Software Test Development Engineer plays a vital role in ensuring the quality and reliability of hardware products, contributing to the overall success of the manufacturing process and customer satisfaction.

  • You will be responsible for contributing to the end-to-end architecture, definition, development and production deployment of production software applications and infrastructure spanning multiple customers and manufacturing regions. 

  • You will also be responsible for interfacing with internal engineering, manufacturing and quality teams and our end customers to ensure your software deliverables meet the rigorous standards of Jabil’s world-class manufacturing environments.

What will you do?

Test System Development:

  • Design and develop test systems and procedures for manufacturing processes. This includes creating test plans, test cases, and test scripts to assess the functionality and performance of hardware components or devices such as:

    • Motherboard,

    • Memory,

    • CPU,

    • Storage (SSD, HDD, NVMe) and

    • PCIE devices (NIC, GPU, Mezz cards, RAID cards)

Software Development Test:

  • Create, validate, release, and maintain test software and scripts that automate the testing process.

  • This software may include code for controlling test equipment, collecting and analyzing data, and generating test reports.

Sustaining Test:

  • Support and maintenance for the manufacturing server (L10) and rack (L11) level test software and infrastructure deployed at our production facilities, including the implementation of minor system configuration changes (new IPNs).

Test Infrastructure Expansions:

  • Support the site’s manufacturing server (L10) and rack (L11) current test infrastructure as well as future expansions planning, deployments, and assembly.

Debugging and Troubleshooting:

  • Diagnose and resolve issues with test software, or hardware components (servers, switches, racks, L10, L12) that may arise during the manufacturing process.

Documentation:

  • Maintain comprehensive manufacturing server (L10) and rack (L11) documentation of test procedures, specifications, and Infrastructure.

Collaboration:

  • Work closely with cross-functional teams, including hardware engineers, manufacturing engineers, and quality assurance personnel, to ensure alignment of testing requirements and quality standards.

Continuous Learning:

  • Stay updated on the latest advancements in testing technologies, methodologies, and industry best practices to keep manufacturing processes competitive and up to date.

  • Definition and collaboration on overall test infrastructure and application architectures.

Management & Supervisory Responsibilities

  • Reports to Management

Education:

  • BS degree in Electrical/Computer Engineering, Computer Science or related field. MS preferred

Experience:

  • 5 years’ experience in software manufacturing test development/sustaining with enterprise servers, storage or networking products.

  • Experience in the following programming/scripting languages:

    • Python,

    • Java,

    • BASH,

    • C, C++, experience a plus

  • Linux development experience with a solid understanding of its fundamentals:

    • CentOS

    • Ubuntu

  • Experience with hardware and API solutions for controlling, managing and stressing L10 devices (servers, network and storage SSDs, NVMe):

    • IPMI,

    • Redfish,

    • mprime,

    • FIO,

    • Linpack,

    • ptugen,

    • memtester

  • Familiarity in the creation and configuration (DHCP, PXE boot, nginx) of Virtual Machines (VMs) using VMWare.

  • Expertise with leading edge networking systems, hardware, software and protocols including but not limited to enterprise ethernet datacenter switching/routing L1, L2, and L3 (BGP, DHCP Relay, ECMP)

  • Arista CloudVision is a plus.

  • Experience with code versioning tools (Git preferred).

  • Knowledge of professional software engineering practices for the complete software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations.

  • Excellent verbal and written communication skills.

  • Experience working in multi-site and multi-cultural environments.

What Can Jabil Offer You?

Along with growth, stability, and the opportunity to be challenged, Jabil offers a competitive benefits package that includes:

  • Medical, Dental, Prescription Drug, and Vision Insurance with HRA and HSA options

  • 401K Match

  • Employee Stock Purchase Plan

  • Paid Time Off

  • Tuition Reimbursement

  • Life, AD&D, and Disability Insurance

  • Commuter Benefits

  • Employee Assistance Program

  • Pet Insurance

  • Adoption Assistance

  • Annual Merit Increases

  • Community Volunteer Opportunities

permanent
Linux Cloud Engineer
Salary not disclosed
Charlotte 3 days ago
A financial firm is looking for a Linux Cloud Engineer w/Openshift / AKS to join their team in Charlotte, NC.

Compensation: $150-195k Responsibilities: • Design, deploy, and manage container orchestration platforms using OpenShift and AKS.

• Administer and optimize Linux-based systems in hybrid and multi-cloud environments.

• Automate infrastructure provisioning and configuration using Ansible Automation Platform.

• Develop and maintain Infrastructure as Code (IaC) using Terraform, Helm, and GitOps workflows.

• Collaborate with DevOps and application teams to implement CI/CD pipelines and DevSecOps practices.

• Monitor system performance, troubleshoot issues, and ensure high availability and disaster recovery.

• Implement security best practices for containerized workloads and cloud environments.

• Provide technical leadership and mentorship to junior engineers.

• Stay current with emerging technologies and contribute to strategic cloud initiatives.

• Assist with migrations to cloud, ensuring best practices are followed and architecture is compliant with company standards.

Qualifications: Required: • Bachelor's degree in computer science, Engineering, or related field (or equivalent experience).

• 5+ years of professional experience in Linux system administration and cloud engineering.

• 3+ years of hands-on experience with OpenShift and AKS in production environments.

• Strong proficiency in scripting languages (e.g., Bash, Python).

• Experience with CI/CD tools (e.g., Jenkins, GitLab CI, ArgoCD).

• Deep understanding of Kubernetes architecture, networking, and security.

• Familiarity with cloud platforms (Azure, AWS, GCP) and hybrid cloud strategies.

• Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack).

• Excellent problem-solving and communication skills.

• Linux Administration: Deep expertise in RHEL environment.

• Container Platforms: 3+ years of hands-on experience with OpenShift and AKS.

• Automation: Proficiency with Ansible, Ansible Tower/AAP, and scripting (Bash, Python).

• Infrastructure as Code: Experience with Terraform, Helm, and GitOps tools (e.g., ArgoCD, Flux).

• CI/CD: Familiarity with Jenkins, GitLab CI, Azure DevOps, or similar tools.

• Cloud Platforms: Strong knowledge of Azure, with exposure to AWS or GCP a plus.

• Monitoring & Logging: Experience with Prometheus, Grafana, ELK/EFK, and Azure Monitor.

• Security: Understanding of container security, RBAC, network policies, and compliance frameworks.

• Networking: Solid grasp of Kubernetes networking, service mesh (e.g., Istio), and ingress controllers.

Preferred: • Red Hat Certified Specialist in OpenShift Administration.

• Microsoft Certified: Azure Kubernetes Service Specialist.

• Experience with service mesh technologies (e.g., Istio, Linkerd).

• Experience in regulated industries (e.g., finance, healthcare) is a plus.
Not Specified
Senior Cloud Support Engineer
✦ New
🏢 Crusoe
Salary not disclosed
San Mateo, CA 5 hours ago

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.


About This Role:

Crusoe Cloud is revolutionizing high-performance computing by offering sustainable, low-cost GPU compute power. As a Senior Cloud Support Engineer, you'll play a crucial role in empowering our customers to leverage this technology for groundbreaking advancements in fields like AI/ML, physics simulations, and computational biology. You will be the primary point of contact for technical support, ensuring our customers can seamlessly utilize Crusoe Cloud to achieve their goals. This role directly impacts Crusoe's mission by enabling our customers to accelerate their research and development, contributing to a more sustainable future. You will be involved in exciting projects, working with cutting-edge technologies and collaborating with a talented team to solve complex challenges. The ideal candidate is a highly motivated and experienced technical professional with a passion for customer success, a deep understanding of cloud technologies, and a commitment to Crusoe's values. This is a full-time position.


What You’ll Be Working On:


  • Customer Support: Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+).
  • On-Call Rotation: Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues.
  • Troubleshooting: Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools.
  • Alert Triage and Maintenance: Manage alert triage, prepare for maintenance windows, and conduct node delivery testing.
  • Collaboration: Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery.
  • Global Teamwork: Adhere to global team collaboration and handoff processes for ticketing and on-call procedures.
  • Knowledge Sharing: Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs).


What You’ll Bring to the Team:


  • Education/Experience: Bachelor's degree in IT, Computer Science, Engineering, or a related field, or 4+ years of equivalent technical experience.
  • Linux Proficiency: Strong command-line interface (CLI) skills in Linux environments.
  • Version Control: Proficiency with Git for code management and collaboration.
  • Customer Support Experience: 5+ years of experience in a customer support role, ideally within cloud, storage, or networking environments.
  • Cloud Technologies: Experience with container orchestration (e.g., Kubernetes), workload management (e.g., Slurm, Terraform), and monitoring tools (e.g., Grafana).
  • Public Cloud Knowledge: Familiarity with other public cloud platforms (e.g., AWS, Azure, GCP).
  • Communication Skills: Excellent communication and customer service skills, including the ability to prioritize competing escalations.
  • HPC Knowledge: Understanding of HPC technologies such as Infiniband, RDMA, RoCE, and Software Defined Networking (SDN).


Bonus Points:

  • Certifications: CKA, CKAD, CKS, KCNA, AWS Machine Learning - Specialty, Data Analytics - Specialty, Solutions Architect - Professional, Developer - Associate, NVIDIA AI Infrastructure and Operations, Generative AI and LLMs, Generative AI Multi-modal, Infiniband, Linux Foundation IT Associate, System Administrator.
  • Cloud Expertise: Deep understanding of specific cloud platforms and services.
  • Automation Skills: Experience with automation tools and scripting languages.
  • Problem-Solving Abilities: Demonstrated ability to analyze complex technical issues and develop effective solutions.
  • Collaboration and Mentorship: Proven ability to mentor, train, and onboard colleagues.
  • Passion for Sustainability: A strong interest in contributing to a more sustainable future through technology.


Benefits:

  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit; $300 per pay period


Compensation:

Compensation will be paid between $125,000 and $151,000 + Bonus. Restricted Stock Units are included in all offers. Salary will be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Not Specified
Cloud Monitoring Engineer
✦ New
Salary not disclosed
Atlanta, GA 5 hours ago

Senior Dynatrace Engineer responsible for designing, implementing, and maintaining enterprise monitoring solutions. The role focuses on ensuring end-to-end observability across applications, infrastructure, and cloud environments using Dynatrace. The engineer will also provide expertise in performance monitoring, troubleshooting, and proactive incident management.

Key Responsibilities

Monitoring & Observability

· Configure, maintain, and optimize monitoring solutions using Dynatrace.

· Provide end-to-end visibility across infrastructure, applications, and services.

· Develop dashboards, alerts, and health checks to monitor system performance.

· Define monitoring thresholds to reduce false alerts and improve reliability.

Infrastructure Monitoring

· Monitor Windows and Linux servers, virtual environments (VMware), and cloud platforms (AWS, Azure, GCP).

· Monitor databases, middleware, and network infrastructure components.

· Identify system trends and capacity requirements with infrastructure teams.

· Proactively detect and resolve system performance issues.

Application Performance Monitoring

· Monitor application performance and transaction flows using Dynatrace APM.

· Implement synthetic monitoring and real user monitoring.

· Collaborate with development teams to ensure comprehensive monitoring coverage.

· Troubleshoot performance issues across applications and systems.

Incident Management

· Support incident response activities related to performance and availability issues.

· Provide monitoring insights during root cause analysis.

· Identify monitoring gaps and improve monitoring coverage.

Continuous Improvement

· Improve monitoring standards, documentation, and best practices.

· Recommend enhancements to monitoring configurations and alerting strategies.

· Integrate monitoring tools with ITSM platforms such as Jira or ServiceNow.

Required Qualifications

Education

· Bachelor’s degree in Computer Science, Information Systems, or equivalent experience.

Experience

· Minimum 5+ years of experience in systems engineering, infrastructure support, or monitoring roles.

· 5+ years hands-on experience with Dynatrace or similar APM tools.

· Experience migrating Dynatrace Managed to Dynatrace SaaS environments.

Technical Skills

· Strong knowledge of Windows/Linux servers and VMware environments.

· Experience with cloud platforms (AWS, Azure, GCP).

· Understanding of networking concepts such as DNS, TCP/IP, and load balancing.

· Experience with automation or scripting (PowerShell, Bash).

· Knowledge of monitoring baselines, KPIs, and SLAs.

· Familiarity with enterprise log analysis frameworks and tools like Jira or ServiceNow.

· Experience monitoring containerized environments such as Kubernetes (preferred).

Preferred

· Exposure to AWS Solution Architect Professional Certification.

Soft Skills

· Strong troubleshooting and analytical skills.

· Excellent written and verbal communication.

· Ability to collaborate across technical and business teams.

· Detail-oriented with a proactive approach to system monitoring and reliability.

Not Specified
Sr. Cloud Engineer
✦ New
🏢 Venteon
Salary not disclosed
Rochester, MI 5 hours ago

Position Summary

Our client is building a modern, cloud-native platform that powers connected, data-driven manufacturing operations. Their technology sits at the center of increasingly automated factories, integrating equipment, software systems, and real-time production data into a scalable SaaS platform used by global manufacturers.

To support rapid growth and platform scale, they are seeking a Senior Cloud Operations Engineer to own the reliability, performance, and operational excellence of their cloud infrastructure. This is a highly impactful role responsible for ensuring the platform remains highly available, secure, and scalable as adoption continues to grow.

This position is ideal for engineers who thrive in modern cloud environments, enjoy solving complex reliability challenges, and prefer automating everything possible. The right person will combine deep technical expertise with strong operational discipline, helping build a world-class cloud platform supporting real industrial environments.

Key Responsibilities

Cloud Operations & Reliability

• Maintain and optimize production, staging, and development environments running in Kubernetes on AWS

• Implement and manage monitoring, logging, alerting, and observability frameworks

• Lead incident response efforts and drive post-incident reviews focused on continuous improvement

• Own backup, disaster recovery, and business continuity processes

• Perform system capacity planning and performance tuning

Automation & Infrastructure Management

• Build and maintain Infrastructure-as-Code using tools such as Terraform or Pulumi

• Automate provisioning, configuration management, and environment lifecycle processes

• Identify and eliminate operational inefficiencies through automation

• Manage secrets, environment configuration, and version control across infrastructure environments

Security & Compliance

• Implement and maintain least-privilege access models and cloud security guardrails

• Support vulnerability management, patching workflows, and dependency maintenance

• Assist with compliance readiness efforts including SOC 2, ISO 27001, or similar frameworks

• Ensure proper logging, retention, and audit practices across cloud environments

FinOps / Cost Optimization

• Monitor and optimize cloud spend across services and environments

• Implement tagging standards, budget alerts, and cost visibility frameworks

• Recommend architectural improvements to balance performance and cost efficiency

Collaboration & Leadership

• Partner closely with engineering teams to improve reliability, deployment pipelines, and system architecture

• Mentor engineers on operational best practices and cloud platform management

• Develop runbooks, documentation, and operational standards

• Champion reliability engineering principles, operational maturity, and risk reduction practices

Technical Environment

Candidates should be comfortable working in modern cloud-native environments and familiar with:

• Kubernetes clusters, autoscaling, Helm charts, and service mesh concepts

• AWS cloud services including compute, networking, storage, and cost management

• Infrastructure-as-Code frameworks such as Terraform

• Observability platforms such as Datadog, CloudWatch, Prometheus, or New Relic

• CI/CD tools such as GitHub Actions, Bitbucket Pipelines, or Bamboo

• Linux systems administration and troubleshooting

• SRE practices including SLIs, SLOs, MTTR, RTO/RPO, and incident management

Not Specified
Splunk Engineer/Cloud Logging Engineer (CLS Support)
✦ New
Salary not disclosed
Fairfax, VA 1 day ago


Splunk Engineer/Cloud Logging Engineer (CLS Support)

Job ID

2026-2158

# of Openings

1

Overview

Pyramid Systems is seeking an Cloud Logging Engineer (Splunk & AWS) who is responsible for ensuring the availability, performance, and security.



Responsibilities

  • Advise on cost efficiency for future usage and cost optimization for current infrastructure.
  • Automate the management and enforcement of policies.
  • Create and maintain documentation related to architecture and operational processes for CLS (Centralized Logging Solution).
  • Develop a set of best practices and architecture patterns.
  • Help maintain regulatory compliance of the CLS (Centralized Logging Solution) infrastructure.
  • Help monitor and maintain CLS performance, availability, and capacity.
  • Help maintain application container images.
  • Offer solutions for ingestion of logs to Splunk via cloud native solutions.
  • Maintain all infrastructure as code.
  • Provide operations monitoring of CLS platform to enable proactive issue identification, response, and resolution.
  • Recommend and execute improvements to the existing CLS architecture and design with growth and scalability in mind to optimize performance, stability, reliability, and agility.
  • Responsible for reporting on current infrastructure status, and planning for future usage.
  • Responsible for Beats agent deployments and container infrastructure analysis, optimization, and capacity planning.
  • Maintain CI/CD pipelines for configuration deployments to applications.
  • Support large-scale deployments with data feeds from multiple on premise and cloud data centers.
  • Upgrade, install, configure monitoring solution for AWS for Windows and Linux servers.
  • Utilize automation tool such as Terraform, Ansible, AWS Cloud Formation, Azure Resource Manager, or similar.
  • Participate in a rotating on call schedule and weekly off hours maintenance.


Qualifications

  • Splunk certification required***
  • Candidate background eligibility requirements are United States citizen or be a Permanent Resident and have lived in the United States for at least 3 years, clean criminal background and able to obtain a Public Trust (High-Risk) Position.
  • Bachelor's degree in computer science, electronics engineering or other engineering or technical discipline OR AWS/Azure Certification (AWS Professional / Specialty Cert. OR Azure Expert / Advanced Cert.) OR 4 years of relevant experience in one of the VAECOT suite of tools (Science Logic, Dynatrace, Turbot, AppDynamics)

  • Minimum of three (3) years of experience in leading technical teams to achieve objectives and outcomes.

  • Minimum of six (6) years setting up, configuring, and using AWS cloud operational tools to ensure service level agreements and performance targets are met, and continued compliance with policies, standards and guidelines.

  • Minimum of three (3) years specific to monitoring Centralized Logging Solution (CLS)/Splunk

  • Subject matter expertise with ALL VAEC Cloud Service Providers which currently includes Microsoft Azure and Amazon Web Services (AWS).

  • Experience with programming with Splunk language (SPL) or equivalent (e.g., Python, Powershell, AWS or Azure CLI).

  • One or more of these Splunk certifications: Splunk Core Certified Power User, Splunk Core Certified Advanced Power User, Splunk Enterprise Certified Admin, Splunk Enterprise Certified Architect, Splunk Enterprise Security Certified Admin, Splunk IT Service Intelligence Certified Admin.

  • Knowledge of enterprise logging, with a focus on security event logging.

  • Solid understanding of cloud concepts, either using Azure or AWS semantics.

  • Experience in one or more of the VAECOT suite of tools, shown below:

VAEC Operational Tools (VAECOT)

Some experience in one or more of the following tools:

Third party tools

* Application Performance Monitoring: Dynatrace, AppDynamics

* Cloud Security: Nessus, NetSkope, Enterprise Security External Change Council, Identity and Assessment Management, Continuous Monitoring as a Service, McAfee, eMASS, Centrify

* Cloud Governance: Turbot

* DevOps/Configuration Management/Help Desk: Ansible, Service Desk, ScienceLogic, ServiceNow, SPLUNK, Jira ServiceDesk, Cloudockit, GitHub

* Containerization: Red Hat OpenShift

* Migration: CloudKey, Version One

* Reporting: Apptio

Cloud Service Provider (CSP) Operational Tools Tools/Services

* AWS Security: System Manager (Explorer and OpsCenter), CloudWatch, Config, CloudTrail, Elasticsearch (Kinesis DataStreams), GuardDuty, Inspector, Key Management Service (KMS), Security Hub, Directory Service, Identity and Access Management, Resource Access Manager, Cognito, Secrets Manager, Certificate Manager, Artifact

* Aws Monitoring and Logging: QuickSight, Eventbridge (AWS Kinesis DataStreams), Simple Notification Service (SMS), Elasticsearch (AWS Kinesis DataStreams), CloudTrail, CloudWatch

* Aws Networking: Virtual Private Cloud (VPC), Route S3, API Gateway, Direct Connect, AppStream 2.0, Transit Gateway, Elastic Loadbalancer, Firewall Manager, WAF & Shield

* AWS Storage: Cloud Tiering Services to S3 from On-Prem, Simple Storage Services (S3), S3 Glacier, Storage Gateway, Elastic File System (EFS), Backup

* Azure Security: Monitor (Log Analytics and ASC), Event Hubs, Security Center (ASC), Information Protection (AIP) , Key Vault, PowerBI, Network Watcher (Performance Monitor), Monitor (Log Analytics and ASC)

* Azure Monitoring and Logging: Information Protection (AIP), Advance Threat Protection, Security Center (ASC), Information Protection (AIP), Key Vault, Active Directory, Role Based Access Control (RBAC), Resource Manager (ARM), Resource Graph (ARG), Active Directory B2C, Key Vault, App Service, Service Trust Portal

* Azure Networking: Virtual Network, Traffic Manager, DNS, Application Gateway, Express Route, Web Apps, FrontDoor, VPN Gateway, Loadbalancer, Firewall

* Azure Storage: NetApp File Service, Storage (Blobs, Disks, Files, Queues, Tables), Storage Archive Access Tier, StorSimple, Files, Backup



Target Pay Range

The below listed pay range for this position is not a guarantee of compensation or salary. The final offered salary will be influenced by a host of factors including, but not limited to, geographic location, Federal Government contract labor categories and contract wage rates, relevant prior work experience, specific skills and competencies, education, and certifications. Our employees value the flexibility at Pyramid Systems that allows them to balance quality work and their personal lives. We offer competitive compensation, benefits, to include our Employee Stock Ownership Program, FlexPTO, and learning and development opportunities.

Pyramid Min

USD $92,168.00/Yr.

Pyramid Max

USD $138,252.00/Yr.

Why Pyramid?

Pyramid Systems, Inc. is an award-winning, technology leader, driving digital transformation across federal agencies. We empower forward-thinking innovations, accelerate production-ready software, and deliver secure solutions so federal agencies can meet their mission goals. Voted a Top Workplace, both regionally (Washington, DC) and Nationally (USA) the past 2 years (2023 and 2024) based on the feedback from our employees, we are headquartered in Fairfax, VA. and have a growing national footprint. We value and promote our Flexible Workplace approach because of the positive impacts it has on work-life integration. We remain committed to ensuring every employee's voice is heard, performance and results are recognized and rewarded, development and advancement is a focus, and diversity, equity and inclusion is a company priority. We offer competitive compensation and benefits (including a recently launched Employee Stock Ownership Plan - ESOP), a robust performance-based rewards program, and we know how to have fun! Our people and culture have endured and delivered for our clients for nearly three decades.

EEO Statement

Pyramid Systems, Inc. is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability.

Not Specified
Cloud Infrastructure Automation Engineer (AUSTIN)
🏢 JABIL CIRCUIT, INC
Salary not disclosed
AUSTIN, Texas 4 days ago
**This position supports hybrid work schedule depending on organization needs.**

RESPONSIBILITIES:

- Architect, design, and maintain scalable CI/CD pipelines using Azure/AWS DevSecOps.

- Build and optimize Docker-based microservices, images, and deployment pipelines.

- Lead deployments across Docker Swarm, Kubernetes/EKS, and multi-location environments.

- Develop infrastructure automation using Ansible, bash scripting, Terraform and Git-based workflow.

- Manage release pipelines using container registries, artifact feeds, template pipelines, and multi-stage workflows.

- Design multi-environment strategies for dev, QA, staging, and production deployment.

- Implement cloud-native services with AWS & Azure cloud platforms.

- Implement basic security practices, including IAM roles, secrets management, and access controls.

- Develop secure, modular, reusable build and release systems.

- Work closely with full-stack engineering teams (Angular, Java, Python , backend APIs, database engineers).

- Mentor junior DevOps engineers and lead DevOps roadmap decisions.

KNOWLEDGE REQUIREMENTS:

DevOps Expertise:

Azure DevOps pipelines, YAML templating, CI/CD strategy, Git branching models.

Containerization & Orchestration:

Docker images, Docker Compose, Docker Swarm, multi-node/multi-location deployments.

Cloud Technologies:

Azure deployments & infrastructure, AWS (IAM, Lambda, S3, CloudWatch).

Programming / Scripting Languages:

Python, Bash, Linux/Unix administration, awk, shell automation, groovy.

Infrastructure Automation:

Ansible playbooks, tasks/roles, inventory design, configuration management.

Distributed Deployment Architecture:

Multi-site replication, node selection by IP, dynamic service routing.

Database Stack Experience:

PostgreSQL, MySQL, MariaDB operations & migrations.

Observability & Logging:

CloudWatch monitoring, log collection, Prometheus, Grafana, reporting & metrics.

Version Control & Build Systems:

Azure Devops, Git, Git submodules, artifact storage, registry solutions, Secrets Management.

Nice to have AI knowledge/experience and willingness to learn.

EDUCATION & EXPERIENCE REQUIREMENTS

- BS degree in Electrical/Computer Engineering, Computer Science or related field. MS preferred.
- 7+ years experience in a software devops/development/test capacity with enterprise server, storage or networking products.
temporary
jobs by JobLookup
✓ All jobs loaded