Senior Site Reliability Engineer
Company: Credit Acceptance
Location: Southfield
Posted on: May 27, 2025
Job Description:
Credit Acceptance is proud to be an award-winning company with
local and national workplace recognition in multiple categories!
Our world-class culture is shaped by dedicated Team Members who
share a drive to succeed as professionals and together as a
company. A great product, amazing people and our stable financial
history have made us one of the largest used car finance companies
nationally.Our Engineering and Analytics Team Members utilize the
latest technology to develop, monitor, and maintain complex
practices that help optimize our success. Our Team Members value
being challenged, are encouraged to express their ideas, and have
the flexibility to enjoy work life balance. We build intrinsic
value by partnering with all functions of our business to support
their success and make strategic business decisions. We focus on
professional development and continuous improvement while enjoying
a casual work environment and Great Place to Work culture!We are
seeking a talented and experienced Senior Site Reliability Engineer
to join our dynamic and innovative team. As a Senior Site
Reliability Engineer, you will play a crucial role in ensuring our
software systems' reliability, availability, and performance. You
will collaborate with cross-functional teams to design, implement,
and maintain robust systems, monitoring tools, and
processes.Outcomes and Activities:
- This position will work from home; occasional planned travel to
an assigned Southfield, Michigan office location may be required.
However, this position is permitted to work at a Southfield,
Michigan office location if requested by the team member.
- System Architecture and Design:
- Collaborate with software engineers, architects, and operations
teams to design highly reliable and scalable systems.
- Evaluate existing systems and propose improvements to enhance
reliability, performance, and availability.
- Drive modernization initiatives, including implementing Open
Telemetry collectors and transitioning to structured logging for
improved observability and cost efficiency.
- Implementation and Coding:
- Develop and implement code to automate operational processes
and tasks to improve system reliability and performance.
- Create self-service tools, such as observability dashboards and
automated incident analysis solutions, enabling teams to detect and
resolve issues faster.
- Build and maintain scripts, pipelines, and tools for
monitoring, logging, and alerting, aligned with Golden Path
initiatives.
- Monitoring and Incident Response:
- Implement and manage monitoring solutions to proactively
identify and address reliability issues.
- Participate in on-call rotations and respond promptly to
incidents to minimize downtime and improve Mean Time to Restore
(MTTR).
- Define and implement standardized logging schemas for improved
debugging efficiency and cost optimization.
- Lead efforts to adopt Open Telemetry (OTEL) for distributed
tracing, metrics, and logs, enabling better observability and
scalability.
- Performance Analysis and Optimization:
- Conduct performance analysis to identify bottlenecks and
optimize system performance.
- Partner with development teams to address performance issues in
the codebase and ensure systems are resilient under load.
- Capacity Planning:
- Collaborate with capacity planning teams to ensure systems can
handle anticipated growth and demand.
- Proactively identify capacity-related challenges and propose
solutions.
- Documentation and knowledge sharing:
- Maintain comprehensive documentation for system configurations,
processes, and procedures to ensure operational transparency.
.
- Contribute to knowledge sharing within the SRE team and across
departments by creating best practice guides and conducting
training sessions.Competencies: The following items detail how you
will be successful in this role.
- Development: Develops solutions using standards and best
practices of the applications language. Writes code that implements
the design that is testable, extensible, efficient and
maintainable.
- Impact Analysis: Understand the rationale behind and how
changes impact the enterprise and/or applications and across the
technical ecosystem.
- Solution Design: Ability to translate high level requirements
to create and implement designs that meet the needs of the
customer, are technically sound, maintainable and cost effective.
Ability to identify missing or ambiguous requirements. Ability to
design at both high and low levels of abstraction, understand
complex requirements and translate into understandable solutions.
Ability to accurately estimate based on requirements.
- Technical Domain: Have an understanding of the technical
domain, including the application architecture, design and data of
the application they support and systems to which it
interfaces.
- Facilitation Techniques: Organize, support and/or conduct
workshops, meetings, presentations specific to the objectives of
each, problem to be solve, and needs of the audience.Requirements:
- Bachelor's or Master's degree in Computer Science, Information
Technology, or a related field.
- Proven experience as a Site Reliability Engineer or similar
role.
- Proficient in distributed systems, and modern observability
practices (e.g., OpenTelemetry, Prometheus), with strong
cross-functional collaboration and knowledge-sharing skills.
- Experience implementing and maintaining distributed systems
using modern architectural patterns.
- In-depth knowledge of system architecture, distributed systems,
and networking.
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and
containerization technologies (e.g., Docker, Kubernetes).
- Familiarity with continuous integration and continuous
deployment (CI/CD) practices.
- Excellent troubleshooting and problem-solving skills.
- Strong communication and collaboration skills.
- Certification in relevant areas (e.g., AWS Certified DevOps
Engineer, Kubernetes Certified Administrator) is a plus.
- Expertise in designing and implementing resilience patterns for
distributed systems and microservices architectures, such as
Circuit Breakers and Retries. Proficient in applying modern
resiliency frameworks to address diverse failure scenarios.
- Ability to identify and address gaps in observability,
scalability, and fault tolerance prior to deployment, ensuring
systems meet reliability and performance standards throughout the
SDLC.
- Develop efficient, testable, and maintainable solutions using
industry best practices to enhance reliability and automate
operational tasks.
- Design resilient, scalable, and cost-effective systems while
evaluating the broader impact of changes on the technical
ecosystem.Target Compensation: A competitive base salary range from
$117,963 - $173,012. This position is eligible for an annual
variable cash bonus, between 7.5 - 15%. Final compensation within
the range is influenced by many factors including role-specific
skills, depth and experience level, industry background, relevant
education and certifications.Candidates who reside in the following
major metropolitan areas may be eligible for a premium on top of
the posted range based on their specific zone: San Francisco,
Seattle, Boston, New York City, Los Angeles and San Diego.
INDENGLP#zip#LI-RemoteBenefits
- Excellent benefits package that includes 401(K) match, adoption
assistance, parental leave, tuition reimbursement, comprehensive
medical/ dental/vision and many nonstandard benefits that make us a
Great Place to WorkOur Company Values:To be successful in this
role, Team Members need to be:
- Positive by maintaining resiliency and focusing on
solutions
- Respectful by collaborating and actively listening
- Insightful by cultivating innovation, accumulating business and
role specific knowledge, demonstrating self-awareness and making
quality decisions
- Direct by effectively communicating and conveying courage
- Earnest by taking accountability, applying feedback and
effectively planning and priority settingExpectations:
- Remain compliant with our policies processes and legal
guidelines
- All other duties as assigned
- Attendance as required by departmentAdvice!We understand that
your career search may look different than others. Our hiring team
wants to make sure that this would be a fit not just for us, but
for you long term. If you are actively looking or starting to
explore new opportunities, send us your application! P.S.We have
great details around our stats, success, history and more. We're
proud of our culture and are happy to share why - let's
talk!Required degrees must have been earned at institutions of
Higher Education which are accredited by the Council for Higher
Education Accreditation or equivalent.Credit Acceptance is
dedicated to providing a safe and inclusive working environment for
all. As part of our Culture of Compliance, we are proud to be an
Equal Opportunity Employer and value our culturally diverse
workforce. All qualified applicants will receive consideration for
employment regardless of the person's age, race, color, religion,
sex, gender, sexual orientation, gender identity, national origin,
veteran or disability status, criminal history, or any other
legally protected characteristic.California Residents: Please click
for the California Consumer Privacy Act (CCPA) notice regarding the
personal information Credit Acceptance may collect from you.Play
the video below to learn more about our Company culture.Required
Keywords: Credit Acceptance, Canton , Senior Site Reliability Engineer, Professions , Southfield, Michigan
Didn't find what you're looking for? Search again!
Loading more jobs...