Prometheus Query Examples Curl Jobs in Usa

1,517 positions found — Page 5

Senior Full-Stack Engineer
🏢 Titl
Salary not disclosed
Miami, FL 6 days ago

Company Description

At Titl, we simplify the real estate process by eliminating paperwork, legal obstacles, and delays associated with buying, owning, or selling a home. Our advanced technology ensures transparency and peace of mind throughout every transaction. We provide a modern and user-friendly way to handle property—designed for today and prepared for future needs.


Role Description

We're seeking an experienced Full-Stack Engineer to join our team working on a sophisticated property data research and report generation platform. This role involves building and maintaining enterprise-grade systems that automate property data extraction from government sources, generate comprehensive property reports, and manage complex business workflows including payments, authentication, and blockchain integration.


What You'll Work On

  • Backend Services: Develop and maintain NestJS microservices handling property data scraping, PDF generation, report aggregation, and enterprise account management
  • Frontend Applications: Build responsive Next.js applications with complex state management and real-time updates
  • Data Pipeline: Work with automated scraping systems using Puppeteer and AI-powered document processing (Google Document AI, OpenAI)
  • Integration Development: Implement OAuth flows, Stripe payment processing, webhook handling, and third-party API integrations
  • Queue Management: Design and maintain Bull queue systems for background job processing and async workflows
  • Blockchain Integration: Work with Polymesh blockchain for property ownership verification and asset tokenization
  • Database Design: Create efficient Prisma schemas and optimize PostgreSQL queries for complex property data relationships


Required Technical Skills


Core Stack (Must Have)

  • Backend: Advanced proficiency in NestJS with deep understanding of dependency injection, decorators, guards, and service patterns
  • Frontend: Expert-level Next.js 14 (App Router) and React with TypeScript
  • Database: Strong Prisma ORM experience and PostgreSQL optimization skills
  • TypeScript: Production-level TypeScript across full stack
  • API Design: RESTful API design, DTOs, validation, and Swagger documentation Infrastructure & DevOps
  • Docker: Container orchestration and development environments
  • Cloud Platforms: Google Cloud Platform (Cloud Storage, Cloud Run)
  • Queue Systems: Bull or similar job queue systems (Redis-backed)
  • Monorepo: Experience with pnpm workspaces or similar monorepo tooling Authentication & Payments
  • OAuth 2.0: Multi-provider authentication (Google, Facebook, LinkedIn)
  • JWT: Token-based authentication and authorization patterns
  • Stripe: Payment processing, webhooks, subscription management, and usage-based billing Specialized Skills
  • Web Scraping: Puppeteer or similar browser automation tools
  • PDF Processing: PDF generation, manipulation, and data extraction
  • AI/ML Integration: Experience with AI APIs (OpenAI, Google AI, etc.)
  • Background Jobs: Async processing, retry logic, and error handling


Highly Desired Skills

  • Blockchain: Polymesh or Ethereum blockchain integration experience
  • Document Processing: OCR, document AI, or legal document processing
  • Property/Real Estate Domain: Understanding of property records, deeds, liens, title commitments
  • Legal Tech: Experience with legal document workflows or compliance systems
  • Testing: Jest, testing-library, E2E testing frameworks
  • Performance Optimization: Query optimization, caching strategies, lazy loading
  • Security: OWASP best practices, rate limiting, encryption


Architecture & Design Requirements

You should be comfortable with:

  • Design Patterns: Service-oriented architecture, repository pattern, factory pattern
  • Dependency Injection: Understanding NestJS DI container and module system
  • Database Relations: Complex multi-tenant data models with proper isolation
  • State Management: React Context, server/client component patterns
  • Error Handling: Comprehensive error handling, retry logic, fallback mechanisms, API Security: Rate limiting, API key management, webhook signature verification


Experience Requirements

  • 5+ years of full-stack development experience
  • 3+ years with TypeScript in production environments
  • 2+ years with NestJS or similar enterprise Node.js frameworks
  • 2+ years with modern React and Next.js
  • Experience building production SaaS applications with multi-tenant architecture
  • Track record of shipping complex features end-to-end
  • Experience with third-party integrations and webhook systems
  • Domain Knowledge (Preferred)
  • Understanding of property data and real estate records
  • Familiarity with government data systems and public records
  • Knowledge of legal document structures (deeds, liens, mortgages, title commitments)
  • Experience with regulated industries and compliance requirements
  • Understanding of Miami-Dade County or similar municipal systems (bonus)


Development Practices

You should have experience with:

  • Git workflows: Feature branches, pull requests, code review
  • Documentation: Writing clear technical documentation and API specs
  • Testing: Unit tests, integration tests, E2E tests
  • CI/CD: Automated testing and deployment pipelines
  • Agile: Working in iterative development cycles
  • Code Quality: ESLint, Prettier, TypeScript strict mode


Problem-Solving Skills

We're looking for someone who can:

  • Debug complex distributed systems across multiple services
  • Optimize database queries and reduce API response times
  • Design scalable architectures for high-volume data processing
  • Handle edge cases in automated scraping and data extraction
  • Troubleshoot integration issues with third-party services
  • Implement robust error handling and monitoring
  • Communication & Collaboration
  • Clear written communication for documentation and code reviews
  • Ability to explain technical concepts to non-technical stakeholders
  • Collaborative approach to problem-solving
  • Proactive in identifying and addressing technical debt
  • Experience mentoring junior developers (preferred)
  • Package Manager Note
  • This project uses pnpm exclusively for monorepo management. Experience with pnpm workspaces is preferred, but npm/yarn monorepo experience transfers well.


What Makes You Stand Out

  • Contributions to open-source projects
  • Experience with LangChain or LangGraph for AI orchestration
  • FastAPI or Python experience (for AI service integration)
  • Understanding of title insurance or property ownership verification
  • Experience with Puppeteer clusters and browser farm optimization
  • Background in fintech or regulated industries
  • Experience with multi-environment deployments (local, staging, production)


Working Style

This role requires:

  • Attention to detail when working with legal and financial data
  • Systematic approach to debugging complex systems
  • Ability to work independently on ambiguous problems
  • Comfort with reading and understanding existing codebases
  • Pragmatic decision-making balancing speed and quality
  • Tech Stack Summary: NestJS • Next.js • TypeScript • Prisma • PostgreSQL • Puppeteer • Bull • OAuth • Stripe • Google Document AI • OpenAI • Docker • GCP • Polymesh • pnpm
  • This role offers the opportunity to work on challenging technical problems at the intersection of PropTech, LegalTech, and AI, building systems that handle real-world property data at scale.
permanent
Product Data Analyst
✦ New
Salary not disclosed
Dallas, TX 1 day ago

Loloi Rugs is a leading textile brand that designs and crafts rugs, pillows, and throws for the thoughtfully layered home. Family-owned and led since 2004, Loloi is growing more quickly than ever. To date, we’ve expanded our diverse team to hundreds of employees, invested in multiple distribution facilities, introduced thousands of products, and earned the respect and business of retailers and designers worldwide. A testament to our products and our team, Loloi has earned the ARTS Award for “Best Rug Manufacturer” in 2010, 2011, 2015, 2016, 2018, 2023, and 2025.


Security Advisory: Beware of Frauds

Protect yourself from potential fraud and verify the authenticity of any job offer you receive from Loloi. Rest assured that we never request payment or demand any sensitive personal information, such as bank details or social security numbers, at any stage of the recruiting process. To ensure genuine communication, our recruiters will solely reach out to applicants using an @ email address. Your security is of paramount importance to us at Loloi, and we are committed to maintaining a safe and trustworthy hiring experience for all candidates.


We are building a Business Operations Center of Excellence, and we need a Product Data Analyst to serve as the "Guardian of the Golden Record." In this role, you are the absolute owner of product data integrity as it relates to the digital customer experience. You ensure that every item we sell is accurately represented across every touchpoint—from our ERP and PIM to our website storefront and marketing feeds. This is not a data entry role; it is a high-impact technical logic and investigation role. You will work directly with our Data Platform and Software Engineering teams to define business rules, audit data health via complex SQL, and troubleshoot data transmission errors before they impact the customer.


Responsibilities

  • Storefront Governance: Serve as the absolute owner of product data integrity within the PIM. Ensure that all storefront-critical attributes (pricing, dimensions, weights, image links) are accurate and standardized for a seamless customer experience.
  • Technical Data Auditing: Write and run complex SQL queries against our centralized database to identify anomalies, "orphan" records, and data hygiene issues that need resolution. You will be expected to query across multiple schemas to validate data consistency between systems.
  • Feed Logic & Mapping: You will manage the logic of how data translates from our PIM to external endpoints. You will ensure that our products appear correctly on Google Shopping, Meta, Amazon, and other marketplaces by managing feed rules and mapping definitions.
  • API Payload Analysis: You will act as the first line of defense for data transmission errors. If a product isn't showing up on the site, you will review the JSON/XML response bodies to determine if it is a data payload error or a software code bug.
  • Cross-Functional Impact Analysis: You will act as the gatekeeper for data changes, predicting downstream impacts (e.g., "If Merchandising changes this Category Name, it will break the Finance reporting filter").
  • Hygiene Logic Definition: You will partner with our IT/Database team to define automated health checks. You identify the "rot" (bad data patterns), and they implement the database constraints to stop it.


What You Will NOT Do (The Boundaries)

  • No Web Development: You are not a Front-End Developer. You do not write HTML, CSS, or React code. You ensure the data powering those components is 100% accurate.
  • No Manual Data Entry: Your job is not to copy-paste descriptions. You build the systems, bulk processes, and logic that ensure data quality at scale.
  • No Database Administration: You do not manage server uptime or schema changes (IT owns this). You own the quality of the records inside the database.


Intersection with Technical Teams

  • With IT (Database Mgmt): IT owns the infrastructure and schema; you own the quality of the data within it. When you identify a systemic issue (e.g., "5,000 orphan records"), you partner with IT to implement the technical fix (scripts/constraints).
  • With Software Engineering (Commerce): If a product is missing from the site, you check the data payload. If the data is correct, you hand off to Engineering, confirming it is a code/caching bug rather than a data error.


Experience, Skills, & Ability Requirements

  • 5-8 years of experience in Data Management, PIM Administration, or technical eCommerce Operations.
  • SQL Proficiency: You are comfortable writing queries beyond simple SELECT *. You should be proficient with CTEs (Common Table Expressions), Window Functions (e.g., Rank, Lead/Lag), Subqueries, and complex Joins to act as a forensic data investigator.
  • API Fluency: You can read and understand JSON and XML. You know what a valid payload looks like and can spot formatting errors or missing keys.
  • Data Manipulation: You are an expert at handling large datasets (CSVs, Excel) and understand data types, formatting standards, and normalization concepts.
  • You love hunting down the root cause of an error. You don't just fix the wrong price; you find out why the price was wrong and build a rule to stop it from happening again.
  • You have high standards for accuracy. You understand that a wrong weight in the system means a financial loss on shipping for the business.


Bonus Points (Nice-to-Haves)

  • Familiarity with Visio/Lucidchart to visualize data flows.
  • Ability to build simple dashboards in Tableau to track data health scores.
  • Basic familiarity with Python or R for data manipulation.


What We Offer

  • Health, dental, and vision benefits
  • Paid parental leave
  • 401(k) with employer match
  • A culture of meritocracy that fosters ongoing growth opportunities
  • A stable, growing family-owned company that looks after its employees


Loloi Rugs does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits. We seek a diverse pool of applicants and consider all qualified candidates regardless of race, ancestry, color, gender identity or expression, sexual orientation, religion, national origin, citizenship, disability, Veteran status, marital status, or any other protected status. If you have a special need or disability that requires accommodation, please let us know.

Not Specified
Site Reliability Engineer
✦ New
Salary not disclosed
Austin, TX 1 day ago

Job Title: Site Reliability Engineer (SRE) – DataHub & GraphQL

Location: Austin, TX & Sunnyvale, CA '


Looking For Only Independent Visa


Role Overview

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in DataHub ingestion pipelines and GraphQL APIs. The ideal candidate will be responsible for designing, building, and maintaining scalable data ingestion frameworks, ensuring reliability and performance of enterprise data platforms, and enabling seamless integration with downstream applications. This role requires a balance of software engineering, systems reliability, and data platform knowledge.

Key Responsibilities

  • Design, implement, and optimize DataHub ingestion pipelines for large-scale enterprise data systems.
  • Develop and maintain GraphQL APIs to support data discovery, metadata management, and integration.
  • Ensure high availability, scalability, and performance of data services across cloud and on-prem environments.
  • Collaborate with data engineering, product, and infrastructure teams to deliver reliable data solutions.
  • Automate monitoring, alerting, and incident response processes to improve system resilience.
  • Drive best practices in observability, logging, and distributed system reliability.
  • Troubleshoot complex production issues and implement long-term fixes.

Must-Have Skills

  • 5+ years of experience as an SRE, DevOps Engineer, or Software Engineer with a focus on reliability and scalability.
  • Strong hands-on experience with DataHub ingestion frameworks and metadata pipelines.
  • Proficiency in GraphQL API design and implementation.
  • Solid understanding of cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker).
  • Expertise in monitoring tools (Prometheus, Grafana, ELK, Datadog, etc.).
  • Strong programming skills in Python, Java, or Go.
  • Experience with CI/CD pipelines and infrastructure-as-code (Terraform, Ansible).

Good-to-Have Skills

  • Familiarity with data governance and metadata management tools.
  • Experience integrating with data platforms like Kafka, Spark, or Snowflake.
  • Knowledge of REST APIs and microservices architecture.
  • Exposure to security and compliance practices in data systems.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
  • Proven track record of delivering reliable, scalable data infrastructure solutions.
Not Specified
Cassandra Database Engineer/Administrator
✦ New
Salary not disclosed
Beaverton, OR 1 hour ago

Location: Database Engineer

Duration: 11-12 months

Location : Austin , TX ( 78759) Hybrid role - In office Mon, Wed, Thurs is a must. (No flexibility on these days)


Job Description:


The Cassandra Database Engineer is an expert across NOSQL database technologies, but specifically a specialist on Cassandra database administration.

For this position, NOSQL database expertise is mandatory with a primary focus on Cassandra databases, as well as expertise in Public Cloud technology (AWS and/or GCP).

For this mission, the engineer will primarily be responsible for database operational activities.


Essential Functions / Key Areas of Responsibility

The Database Engineer primary responsibility footprint:

· Database performance analysis and operations review for production database platforms

· Manage database operations activities including incident response, database alert resolution, and managing third party support engagement

· Deploy and maintain database monitoring solutions.

· Test and build database restore and recovery procedures

· Database platform deployment, installation, patching, change management, and third-party software upgrades.

· Responsible for database hardening procedure identification and deployment on public cloud, hosted, and on-premises platforms.

· Responsible for providing database expertise and operations support to the technical support teams and project delivery teams.

· Responsible for participating in database platform review, bench and tuning exercises, security evaluation, provide technical analysis and proactive recommendations for improvements and/or design changes for production platforms


Minimum Requirements: Skills, Experience & Education

· HS diploma with 8+ experience in Cassandra administration (NOT architecture or design)

· College degree in Computer Science preferred + 8-10 years’ experience

· NOSQL Database: 8-10 years Cassandra administration

· Extensive background with public cloud database deployment, management and migration.

· Expertise in database concepts, defining standards, processes, and procedures in database deployment methodologies

· Expert in operations of high-profile production database platforms with high SLA and high-performance expectation

· High level of experience in managing change on production database platform on hosted, on premise, and cloud database platforms

· Expert in deploying high availability database architectures

· Proactive, team player, and leadership qualities with strong technical background

· Excellent verbal and written communication skills


Preferred Qualifications

· Highly skilled in Cassandra database administration

· DataStax enterprise Cassandra administration a plus

· Strong production operations and troubleshooting skills

· Linux operating system background

· Skilled in Public Cloud deployment methods/tools (Gitlab, Terraform, Datadog)

· Knowledge of Kubernetes and Docker.

· Database performance evaluation and platform bench participation


Special Position Requirements:

Candidate will need to be able to multitask and quickly switch if needed to work on emergency incidents on production platforms. The position requires the ability to be able to manage tight deadlines and have visibility on project delivery goals and the ability to communicate effectively to project teams and management. The candidate will be able to thrive in fast paced work environment.

  1. Looking for a candidate that is currently in the position of maintaining Cassandra clusters today (avoid those that have worked in past, or a couple years ago...)
  2. How many clusters are maintained today
  3. How many nodes
  4. What Cassandra version are they
  5. How many years have you worked on Cassandra (ideally 5+)
  6. Candidate has operations experience and can speak to challenges in his environment today
  7. manages patching / upgrades
  8. is called upon in crisis to manage
  9. delivers new environments
  10. Performance tuning experience with Cassandra
  11. familiar with backup and recovery
  12. Familiar with monitoring Cassandra (Prometheus or Datadog a plus)
  13. is go to for other teams on Cassandra database topics
  14. Candidate is adaptable to work in fast paced environment, context switching is normal
  15. Candidate is ok to be in stressful/challenging situations
  16. Outages
  17. Crises team
  18. War room
Not Specified
Site Reliability Engineer II
Salary not disclosed
Alpharetta, GA 3 days ago
Title: Site Reliability Engineer II

Location: Alpharetta, GA (3 days a week onsite)

Duration: 6 months


Job Description:

We are seeking a skilled Site Reliability Engineer to join our team and help build, maintain, and scale our cloud-native infrastructure. You will work closely with development and operations teams to ensure our systems are reliable, scalable, and efficient. The ideal candidate is passionate about automation, observability, and infrastructure-as-code, and thrives in a collaborative, fast-paced environment.

Key Responsibilities



  • Design, implement, and manage cloud infrastructure on Azure using Terraform and Terragrunt.


  • Maintain and optimize Kubernetes clusters on Azure Kubernetes Service (AKS).


  • Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD for GitOps deployments.


  • Enhance system reliability by implementing monitoring, alerting, and observability solutions with Grafana.


  • Automate operational tasks to reduce toil and improve team efficiency.


  • Participate in on-call rotations, incident response, and post-mortem analysis.


  • Collaborate with development teams to improve application performance, scalability, and resilience.


  • Implement and advocate for SRE best practices, including SLIs, SLOs, and error budgets.


  • Continuously improve system performance, cost efficiency, and security.



Required Skills & Qualifications



  • 3+ years of experience in an SRE, DevOps, or cloud infrastructure role.


  • Strong experience with Azure cloud services and infrastructure.


  • Hands-on experience with java and Terraform and Terragrunt for infrastructure-as-code.


  • Proficiency with Kubernetes (preferably AKS and container orchestration.


  • Experience with CI/CD tools, especially GitHub Workflows/Actions and ArgoCD.


  • Solid understanding of observability tools like Grafana (Prometheus, Loki, Tempo experience is a plus).

    Education Requirements Bachelor's degree required, (Masters preferred)

Not Specified
Staff Software Engineer, Observability
Salary not disclosed
San Francisco, CA 3 days ago

About Pinterest:


Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we're on a mission to bring everyone the inspiration to create a life they love, and that starts with the people behind the product.


Discover a career where you ignite innovation for millions, transform passion into growth opportunities, celebrate each other's unique experiences and embrace theflexibility to do your best work. Creating a career you love? It's Possible.


At Pinterest, AI isn't just a feature, it's a powerful partner that augments our creativity and amplifies our impact, and we're looking for candidates who are excited to be a part of that. To get a complete picture of your experience and abilities, we'll explore your foundational skills and how you collaborate with AI.


Through our interview process, what matters most is that you can always explain your approach, showing us not just what you know, but how you think. You can read more about our AI interview philosophy and how we use AI in our recruiting process here.

We're seeking an exceptional Staff Software Engineer to join our Observability team at Pinterest. This role combines deep technical expertise in distributed systems and data engineering with a product-oriented mindset to build world-class observability solutions that empower our engineering organization.As a Staff Engineer on the Observability team, you'll be responsible for designing and building the infrastructure and tools that provide visibility into Pinterest's large-scale distributed systems, helping thousands of engineers understand, debug, and optimize their services.


What you'll do:



  • Define and execute the observability roadmap, treating it as a product. Understand engineering team needs and translate them into technical solutions with measurable impact.
  • Architect, build, and scale distributed observability infrastructure (metrics, logs, traces) to handle massive volumes across Pinterest's distributed systems.
  • Build high-performance data pipelines and storage for real-time and historical telemetry analysis at Pinterest scale.
  • Champion Best Practices: Establish observability standards and patterns across the organization, making it easy for teams to instrument their services and gain actionable insights
  • Technical Leadership: Mentor engineers, lead architectural reviews, and influence technical decisions across teams to improve overall system reliability and performance
  • Cross-functional Collaboration: Partner with SRE, Infrastructure, Product Engineering, and other teams to understand pain points and deliver solutions that improve developer productivity and system reliability
  • Innovation: Stay current with observability trends and technologies, evaluating and adopting cutting-edge tools and techniques to keep Pinterest at the forefront

What we're looking for:



  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • Product Mindset: Demonstrated ability to work backwards from customer needs -understanding user needs, prioritizing features, measuring success, and iterating based on feedback. Experience building internal platforms or tools with strong adoption
  • Distributed Systems Expertise: 7+ years of experience designing and operating large-scale distributed systems with deep understanding of consistency, availability, scalability, and failure modes
  • Data Engineering Skills: Strong background in building data pipelines, working with time-series databases, columnar storage, stream processing (Kafka, Flink, etc.), and data modeling at scale
  • Observability Domain Knowledge: Hands-on experience with modern observability tools and practices including metrics, logging, tracing, and profiling. Familiarity with OpenTelemetry, Prometheus, Grafana, or similar technologies
  • Programming Proficiency: Expert-level coding skills in languages like Java, Python, Go, or Scala with ability to write production-quality code
  • Systems Thinking: Ability to see the big picture while managing complex technical details, balancing trade-offs between cost, performance, and reliability
  • Experience building observability platforms from the ground up or significantly scaling existing solutions
  • Familiarity with cloud-native architectures and technologies (Kubernetes, service mesh, etc.)
  • Track record of driving adoption of internal platforms through excellent documentation, UX, and developer advocacy
  • Experience with machine learning or anomaly detection applied to observability use cases
  • Strong communication skills with ability to influence stakeholders at all levels
  • Contributions to open-source observability projects, a plus


In-Office Requirement Statement:



  • We let the type of work you do guide the collaboration style. That means we're not always working in an office, but we continue to gather for key moments of collaboration and connection.
  • This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.

Relocation Statement:



  • This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.


#LI-REMOTE


#LI-JT1

At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.


Information regarding the culture at Pinterest and benefits available for this position can be found here.

US based applicants only$177,185—$364,795 USD

Our Commitment to Inclusion:


Pinterest is an equal opportunity employer and makes employment decisions on the basis of merit. We want to have the best qualified people in every job. All qualified applicants will receive consideration for employment without regard to race, color, ancestry, national origin, religion or religious creed, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, marital status, status as a protected veteran, physical or mental disability, medical condition, genetic information or characteristics (or those of a family member) or any other consideration made unlawful by applicable federal, state or local laws. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you require a medical or religious accommodation during the job application process, please completethis formfor support.

Not Specified
REO Resiliency Engineering and Quality Leader (Hybrid)
✦ New
Salary not disclosed

*At Securian Financial the internal position title is Infrastructure Dir."

Mission

"To lead the engineering discipline that ensures Securian's technology platforms and cloud services are built and operated with uncompromising resilience, performance, and quality. This role drives the design and automation of fault-tolerant, high-availability architectures across AWS, Azure, and GCP-ensuring the enterprise meets resiliency, scalability, and efficiency expectations at every layer of technology."

Positioning

The Director of Resilience Engineering and Quality Leader is both a strategic peer and technical counterpart to the Infrastructure & Reliability Engineering Leader.

This role provides bench depth and succession coverage for REO's most technically complex domains while driving innovation in reliability, resilience, and performance practices.

  • Strategic influence: Shapes cloud reliability, quality engineering, and resilience strategy across REO and Architecture domains.

  • Operational authority: Leads Sr. Managers and Managers who own the execution of quality, resilience, and performance engineering capabilities.

  • Enterprise collaboration: Works hand-in-hand with Technology, Solution, Business, Data, and Enterprise Architects to embed reliability and resilience as core architecture principles.

Scope of Accountability

Resilience Engineering & Cloud Reliability

  • Architect and validate fault-tolerant, regionally resilient architectures across AWS, Azure, and GCP.

  • Own resilience automation, chaos testing, and IaC-based recovery validation.

  • Lead cross-cloud reliability design reviews and failure-mode analyses for critical systems.

Quality Engineering & Continuous Testing

  • Define enterprise-wide quality engineering strategy integrated into CI/CD pipelines.

  • Drive automation-first testing (functional, non-functional, performance, resilience).

  • Embed observability-driven quality validation and contract testing across services.

Performance, Capacity & Efficiency Engineering

  • Oversee predictive capacity planning, scaling automation, and cost/efficiency optimization (FinOps/GreenOps).

  • Partner with Platform & Infrastructure teams to tune performance across application and platform layers.

  • Measure and report on performance SLIs/SLAs aligned to REO's Reliability Metrics framework.

Cross-Domain Architecture Collaboration

  • Partner with Enterprise Architects to codify resilience and reliability standards in technology blueprints.

  • Collaborate with Technology & Solution Architects to design service reliability into delivery architectures.

  • Engage Data Architects for data resilience, replication, and pipeline reliability.

  • Work with Business Architects to align technical reliability goals with critical business outcomes.

Leadership & Talent Development

  • Lead a team of Sr. Managers and Managers, fostering a high-performance, hands-on engineering culture.

  • Build and mentor top-tier technical talent in cloud reliability, resilience, and quality automation.

  • Partner with HR and REO Enablement to develop succession plans and technical competency frameworks.

Core Technical Competencies

  • AWS (primary) - Multi-account design, HA architecture, region failover, resilience automation, Terraform/CDK/CloudFormation.

  • Azure & GCP (secondary) - Compute, networking, and reliability constructs; hybrid cloud design and failover integration.

  • Infrastructure as Code (IaC) - Deep proficiency in Terraform, policy-as-code (OPA/Conftest), drift detection, pipeline integration.

  • Reliability & Chaos Engineering - AWS Fault Injection Simulator, Gremlin, steady-state hypothesis design.

  • Observability & Quality Automation - OpenTelemetry, Prometheus, CloudWatch, K6, Gatling; CI/CD quality gates and dashboards.

  • Performance Engineering - Load, stress, and soak testing automation; performance profiling and SLO alignment.

  • Disaster Recovery Automation - Cross-region orchestration, IaC-driven DR runs, replication validation.

  • FinOps/GreenOps - Cloud cost and efficiency automation, carbon-aware scaling policies.

Leadership Competencies

  • Strategic Technical Leadership: Operates at the intersection of deep engineering and executive strategy.

  • Multi-Domain Collaborator: Integrates reliability and resilience across architecture, operations, and business domains.

  • Talent Multiplier: Develops and empowers senior managers, fostering engineering mastery and innovation.

  • Credible Technical Authority: Trusted peer to Infrastructure & Reliability Engineering; capable of leading architecture reviews and executive briefings.

  • Change Champion: Drives transformation of reliability practices across platforms, pipelines, and teams.

Qualifications & Experience

  • 12+ years in cloud engineering, reliability, or platform leadership roles.

  • 5+ years leading Sr. Managers/Managers in technical domains.

  • Proven expertise across AWS, with working knowledge of Azure and GCP.

  • Experience with multi-cloud governance, DR design, IaC at scale, and reliability automation.

  • Strong understanding of observability, SRE principles, and REO/ITIL-aligned reliability frameworks.

  • Certifications:

    • Required: AWS Certified Solutions Architect - Professional

    • Preferred: AWS DevOps Engineer, Azure Solutions Architect Expert, Google Professional Cloud Architect

Success Metrics

  • 99.9% availability maintained for Tier-1 workloads.

  • 100% coverage of DR automation for Tier-1 services.

  • 25% annual increase in automated quality/test coverage.

  • 15% annual improvement in resource efficiency and cost performance.

  • Documented resilience participation across all enterprise architecture blueprints.

  • Positive "technical peer readiness" and succession rating from Head of REO.

Summary Value Proposition

This Director role blends deep AWS reliability engineering expertise, multi-cloud technical breadth, and leadership scale.

It ensures REO maintains both technical depth and leadership redundancy, and it strengthens the bridge between engineering execution and enterprise architecture alignment.

#LI-hybrid **This position will be in a hybrid working arrangement.**


Securian Financial believes in hybrid work as an integral part of our culture. Associates get the benefit of working both virtually and in our offices. If you're in a commutable distance (90 minutes), you'll join us 3 days each week in our offices to collaborate and build relationships. Our policy allows flexibility for the reality of business and personal schedules.

The estimated base pay range for this job is:

$145,000.00 - $267,000.00

Pay may vary depending on job-related factors and individual experience, skills, knowledge, etc. More information on base pay and incentive pay (if applicable) can be discussed with a member of the Securian Financial Talent Acquisition team.

Be you. With us. At Securian Financial, we understand that attracting top talent means offering more than just a job - it means providing a rewarding and fulfilling career. As a valued member of our high-performing team, we want you to connect with your work, your relationships and your community. Enjoy our comprehensive range of benefits designed to enhance your professional growth, well-being and work-life balance, including the advantages listed here:

Paid time off:

  • We want you to take time off for what matters most to you. Our PTO program provides flexibility for associates to take meaningful time away from work to relax, recharge and spend time doing what's important to them. And Securian Financial rewards associates for their service by providing additional PTO the longer you stay at Securian.

  • Leave programs: Securian's flexible leave programs allow time off from work for parental leave, caregiver leave for family members, bereavement and military leave.

  • Holidays: Securian provides nine company paid holidays.

Company-funded pension plan and a 401(k) retirement plan: Share in the success of our company. Securian's 401(k) company contribution is tied to our performance up to 10 percent of eligible earnings, with a target of 5 percent. The amount is based on company results compared to goals related to earnings, sales and service.

Health insurance: From the first day of employment, associates and their eligible family members - including spouses, domestic partners and children - are eligible for medical, dental and vision coverage.

Volunteer time: We know the importance of community. Through company-sponsored events, volunteer paid time off, a dollar-for-dollar matching gift program and more, we encourage you to support organizations important to you.

Associate Resource Groups: Build connections, be yourself and develop meaningful relationships at work through associate-led ARGs. Dedicated groups focus on a variety of interests and affinities, including:

  • Mental Wellness and Disability

  • Pride at Securian Financial

  • Securian Young Professionals Network

  • Securian Multicultural Network

  • Securian Women and Allies Network

  • Servicemember Associate Resource Group

For more information regarding Securian's benefits, please review our Benefits page.

This information is not intended to explain all the provisions of coverage available under these plans. In all cases, the plan document dictates coverage and provisions.

Securian Financial Group, Inc. does not discriminate based on race, color, religion, national origin, sex, gender, gender identity, sexual orientation, age, marital or familial status, pregnancy, disability, genetic information, political affiliation, veteran status, status in regard to public assistance or any other protected status. If you are a job seeker with a disability and require an accommodation to apply for one of our jobs, please contact us by email at , by telephone (voice), or 711 (Relay/TTY).

To view our privacy statement click here

To view our legal statement click here


Remote working/work at home options are available for this role.
Not Specified
W2 Role: Senior Site Reliability Engineer
✦ New
🏢 Yochana
Salary not disclosed

Job Title : Senior Site Reliability Engineer

Location : Charlotte, NC/ Columbus, OH – Hybrid (3 days onsite a week)

Duration : Contract role (W2)

In-person Interview required in NJ or NC on 21st Saturday March

Job Description:

Tech Stack: Java/J2EE (Spring, Spring Boot, Python, Shell Scripting, Kafka, Oracle, MongoDB etc.).

  • 10+ years of Software Engineering experience
  • 5+ years of experience in Site Reliability Engineering teams with continued focus on improving Platform health
  • Familiar with Agile or other rapid application development practices
  • Hands-on expertise in building dashboards using APM tools.
  • Experience with distributed (multi-tiered) systems, algorithms, relational databases, and NoSQL databases.
  • Knowledge & Exposure caching tools (Redis, memcache) or messaging tools such as MQ, Kafka.
  • Must have working knowledge of APM tools such as splunk, GCL, ELK, Grafana, Prometheus etc.
  • Able to create Dashboards using GCL/Splunk/ELK and setup alerts.
  • Working knowledge of CICD is a plus – Source control like Git, Continuous Integration – Jenkins / UCD Release etc. .
  • Ability to work with Engineering teams across the ecosystem such as Security, Networking & Infrastructure challenges which can impact platform health & resiliency.
  • Shell Scripting / DevOps tools like Ansible with good knowledge of yaml file to write playbooks .
  • Experience with distributed storage technologies like NFS as well as dynamic resource management frameworks PCF, Kubernetes / OpenShift, AWS or Azure.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
Not Specified
RedHat OpenShift & Kubernetes SME
✦ New
🏢 VDart
Salary not disclosed
Princeton, New Jersey 7 hours ago

Job Title: RedHat OpenShift & Kubernetes SME

Location: Princeton - NJ - 08540

Mode : Contract (6+ Months) – Onsite

Min 15 Years of experience required.

Qualifications:

Design, deploy and maintain Red Hat OpenShift and Rancher Managed Kubernetes Clusters

Architect Highly available, scalable, and secure container platforms

Install, configure, upgrade and patch OpenShift and Rancher Clusters

Implement logging, monitoring, and alerting (Prometheus, Grafana, EFK etc.)

Troubleshoot Cluster, Networking, Storage, and application issues

Perform root cause analysis and provide performance optimization

Act as an SME for OpenShift and Rancher Technologies

Provide guidance to Customer and application teams

Create documentation, standards, and operational runbooks

Strong Hands-on experience with RedHat Open Shift and Rancher (RKE, RKE2)

Expert knowledge of Kubernetes architecture and Operations

Experience supporting mixed OS environments (Windows and Linux).

Excellent communication skills, able to explain complex concepts to technical and non-technical audiences.

Demonstrated ability to work independently and as part of a team.

Relevant certifications (RHCA, CKA, CKAD, etc.) and active participation in the Kubernetes community are a plus.

Experience with CI/CD Pipelines

Not Specified
Agentic AI Engineer
✦ New
🏢 Unisys
Salary not disclosed
Rockville, Maryland 7 hours ago

Overview

Architects and builds the infrastructure and tooling that powers AI agent development across the Software Development Lifecycle (SDLC). Develops production-grade agentic systems, orchestration frameworks, and observability solutions that enable teams to build, deploy, and monitor reliable AI agents at scale. Plays a key role in defining and implementing the next generation of SDLC through AI-first innovation and comprehensive instrumentation.

What We're Looking For

You demonstrate sharp product sense for high-impact automation opportunities, technical taste in implementation decisions, and the ability to clearly articulate trade-offs. You know when to apply AI agent solutions versus simpler approaches and can explain the \"why\" behind architectural choices.

You excel at 0-to-1 (and 1-to-100) product development, comfortable operating in ambiguous environments where requirements emerge through experimentation and iteration rather than upfront specification.

Key Responsibilities

AI Agent Development & Automation:

• Develop production-grade AI agents that eliminate manual handoffs across the SDLC

• Create custom integrations and CLI tools that give agents deep understanding of internal systems and codebases

• Design comprehensive testing strategies to ensure agent reliability and output quality

• Implement \"Golden Path\" scaffolding that embeds organizational standards into new projects

• Build AI solutions that improve codebase navigation, documentation, and developer workflows

• Identify workflow bottlenecks and deliver measurable impact through intelligent automation

• Shape SDLC evolution by identifying AI-first opportunities and proving outcomes through experimentation

Agent Infrastructure & Platform:

• Architect and maintain production infrastructure supporting agent deployment, lifecycle management, and scaling

• Develop agent frameworks, templates, and SDKs that accelerate agent development

• Create governed Model Context Protocol (MCP) catalog enabling compliant agent-to-agent and agent-to-MCP communication

• Implement governance controls for agent behavior, permissions, and system access

Observability & Performance Analytics:

• Design and implement metrics, monitoring, and logging infrastructure for AI agents and development workflows

• Build dashboards that provide actionable insights into developer productivity, tool adoption, and agent performance

• Establish KPIs and measurement frameworks to quantify the impact of AI-powered automation

• Create alerting and anomaly detection systems to ensure reliability of agents and tooling

• Analyze telemetry data to identify optimization opportunities and guide strategic investment decisions

Collaboration & Impact:

• Partner across teams to drive adoption of AI-powered tooling and process transformation

• Stay current with LLM technologies and coach colleagues on AI-assisted development and automation best practices

• Rapidly prototype solutions to validate use cases and prove value quickly

• Communicate data-driven insights to stakeholders through clear visualizations and reports

Preferred Qualifications:

• 5-7+ years of software engineering experience building production systems

• Proven experience building agentic systems using LLM orchestration frameworks

• Hands-on expertise with AI-powered development tools (code assistants, AI-enhanced editors)

• Strong foundation in SDLC, system design, and internal tooling development

• Experience with observability tools and practices including metrics collection, logging frameworks, and dashboard development

• Full-stack technical proficiency:

• Languages: Java, Python, JavaScript/TypeScript

• Frameworks: Angular, Spring Boot

• CI/CD platforms and cloud infrastructure (AWS)

• Monitoring/observability tools (e.g., Prometheus, Grafana, CloudWatch)

• Passion for transforming software development through AI innovation and data-driven decision making

# LI-CGTS

# TS-2505

Not Specified
jobs by JobLookup
✓ All jobs loaded