We are currently seeking a Site Reliability Engineer who will be responsible for the reliability and automation of ARIN’s infrastructure systems and processes, with an emphasis on automation, enabling successful deployments, monitoring releases and keeping software-defined and containerized infrastructure highly available. This engineer will run the production environment by monitoring availability and taking a holistic view of system health. This position will also measure and optimize system performance and provide operational support and engineering for multiple distributed software applications and databases.
Job Description and Responsibilities
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
- Support development teams to improve services through rigorous testing and release procedures.
- Participate in platform management and capacity planning.
- Utilize Infrastructure-as-Code (IaC) tools such as Ansible to automate deployments and configurations.
- Manage, configure and troubleshoot operating system issues, storage, networking, and administer high-availability PostgreSQL clusters.
- Continuously evaluate system security, make recommendations for related improvements and incorporate security improvements as required by ARIN policies.
- Provide technical expertise in the operation of all information or database platforms and DNS zones managed by ARIN.
- Establish guidelines and training for disaster recovery processes.
- Evaluate and make recommendations on hardware and software products based on an assessment of operating requirements.
- Provide on-call support for all critical network and system operations on a rotating basis.
Additional Duties
- Perform other related duties as required and assigned.
- Ability and willingness to travel in accordance with ARIN travel guidelines.
Background/Skills Required
- 4+ years building or supporting applications in distributed environments (LINUX/SQL) and supporting or improving the Systems/Software Development Life Cycle.
- 4+ years of experience in writing automation scripts, building application dashboards for proactive monitoring, setting up Alerts for early determination of the issues.
- 4-year college degree preferably in an information systems or computer science related discipline OR equivalent combination of education and experience.
- Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
- Experience in modern DevOps technologies and container orchestration (Kubernetes, Docker), service deployment, monitoring and scaling.
- Experience in installing, configuring, cabling, and maintaining physical server hardware (hands on or remote (guiding on-site technicians)).
- Knowledge of Red Hat Enterprise Linux OS, OpenShift, nginx, Apache httpd, Ansible / Puppet / Chef.
- Knowledge of IP networking including DNS, DHCP, firewalls, IP routing, load-balancing etc.
- Coding experience in one or more programming languages.
- Good interpersonal skills. Strong verbal and written skills. Ability to complete tasks on time, while working in the office or from home.
Background/Skills Preferred
- Experience with common Internet protocols such as TCP/IP, IPv6, DNS, HTTP, IGPs and BGP.
- Experience with network operations and database administration (PostgreSQL and MySQL).
- Prior experience working in a SOC-2 environment.
Physical Requirements
- Moderate physical activity with the ability to type and perform repetitive motions.
- Ability to pick up and maneuver computer equipment weighing up to 50 pounds. May occasionally maneuver heavy equipment weighing up to 200 pounds.
- 80% of time looking at a computer screen.
About ARIN
The American Registry for Internet Numbers, Ltd. (ARIN) is a nonprofit, member-based association that administers IP addresses and ASNs (collectively referred to as Internet number resources) in support of the operation and growth of the Internet. Additionally, ARIN coordinates the development of policies by the community for the management of Internet number resources and advances the Internet through informational outreach. ARIN is a well-respected leader in the Internet community and likewise a thought leader in the Internet governance discussion. Learn more at www.arin.net.
ARIN offers competitive salaries and comprehensive benefits, including but not limited to:
· Group health and dental insurance – ARIN pays over 90% of the premium costs
· Group vision care – no employee deduction for employees and any dependents
· Flexible Spending Account and Dependent Care Account
· 401(k) retirement plan - up to 9% matching after first year of service, and all contributions are 100% vested.
· Education/Tuition Reimbursement - up to $5,000 per year
· Reimbursement for training opportunities
· Casual work environment with snacks, drinks, and coffee
· Regularly-scheduled team outings and staff lunches
· Twelve paid holidays, one floating holiday, and a generous comprehensive leave program starting at 4 weeks (20 days)
· Six weeks paid Parental Leave Program for full-time employees with at least 12 months of service
· Sabbatical Leave Program for employees with 20+ years of combined service
Job Type: Full-time
Benefits:
- 401(k)
- 401(k) matching
- Dental insurance
- Employee assistance program
- Flexible spending account
- Health insurance
- Life insurance
- Paid time off
- Parental leave
- Professional development assistance
- Tuition reimbursement
- Vision insurance
Compensation Package:
Schedule:
Work Location: Hybrid remote in Chantilly, VA 20151