Site Reliability Engineer

Remote

Full Time

Mid Level

At TRG we work in a global environment where every diverse personality and culture is included. We look for talented people all over the world who have passion for what they do and work together, shoulder to shoulder, to empower our customers in their fight against crime and terror.

Though technology is important, people always come first!

Our Core Values are our DNA, so if your DNA matches ours you are at the right place!

Let's do it! We follow a can-do approach, taking responsibility for milestones.
Passion is our fuel! We Learn. We Grow. We Innovate.
We make an impact! We bring innovation and always ask why.
We work together! One for all, and all for one.

Your team: At the Operations team we are all about stability and observability. Our main goal is to innovate and automate the majority of operational workflows, thus providing a stable environment for our customers to work with. We are all about maximizing the potential we get out of utilizing the majority of technologies these days such as Docker, Kubernetes, multiple cloud providers, and many more related tools but also custom ones.

About you

As a Site Reliability Engineer, you will be able to combine multiple skills you have, with IT engineering practices to create highly reliable systems. As described better, you will be able to create perfect harmony and balance between releasing new features and ensuring reliability for users. Standardization and automation are your companions.

What you’ll work on:

Collaborate with Customer Support and DevOps teams to establish SLA, SLO, and SLI, ensuring clear expectations for internal and external customers.
Maintain 24/7 production stability year-round.
Deploy, configure, and monitor production environments.
Automate production deployments, validations, and reporting processes.
Develop and maintain tools for production operations.
Manage and document incidents.
Develop disaster recovery automation.
Handle Mean Time to Respond (MTTR) and Mean Time to Detect (MTTD) metrics.
Implement strategies to ensure 100% application uptime.
Work with development and QA teams to enhance code quality and resilience.

What we look for:

At least 2 years of experience in a similar role (DevOps, SRE, System Engineer)
Experience with IaC practices (Terraform)
Experience with Docker and Kubernetes
Experience with one of the major cloud providers (AWS, Azure)
Worked with Linux Administrative Skills
Proven work experience with Python is mandatory
Excellent problem-solving and communication skills
Be willing to understand the business logic of each component and its impact in case it becomes unavailable

You’ll stand out if you have:

Worked with monitoring tools like Prometheus, New Relic or similar
Experience with web-related technologies (Web applications, Web Services, Service Oriented Architecture) and network/web-related protocols
Being able to understand and implement complex networking solutions between different cloud providers and/or bare metal infrastructure
Configure and manage data sources like Mongo, Elasticsearch, Redis, ArangoDB, etc

Your perks:

Working from home Hit your goals from the comfort of your home because we value the performance, not the place.
Flexible hours because we promote work-life balance.
Yearly performance bonus to reward good performance and hard work.
Paid medical insurance because we like to take care of you.
Daily lunch allowance so you can save lots of money, time, and effort by enjoying your free lunch at the office or home every day.
Sport/Gym(Exercise) allowance to focus more on yourself, your well-being, and your health.
Udemy unlimited subscription to promote your learning and development and grow your career.
Onboarding plan and training so that you have a smooth induction and feel confident and ready to take over your new role.
Equipment support so you have all the tools to do effectively and efficiently your work.
No dress code because we want you to be as comfortable as possible.
Gifts and rewards for celebrating birthdays, anniversaries, and personal milestones.
Happy hours, coffee time, online team building, company events, and much more to promote team bonding and of course to have fun!
Fresh fruit, snacks, coffee, and tea at the office because work makes us hungry!

About us

TRG Research and Development deals with Data Fusion and AI products for civilian protection.

Our mission is to empower our customers to fight crime and terror through state-of-the-art technologies that provide accurate and precise intelligence.

Our product, Intellectus is the only cloud-based all-in-one fusion platform with integrated web intelligence, discovery capabilities, advertisement intelligence, virtual ID management, and third-party database integration that can be deployed in less than 24 hours.

Want to learn more? Visit our website at www.trgint.com and our LinkedIn page.

TRG Research and Development Ltd processes and/or controls all data in accordance with the EU General Data Protection Regulation (GDPR) 2016/679. For more information visit our Privacy Policy.

Apply for this position

Required*

First Name*

Last Name*

Email Address*

Phone*

Address*

Resume*

We've received your resume. Click here to update it.

Attach resume or Paste resume

Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

LinkedIn Profile URL:*

Desired salary*

Why are you interested in this role?*

How many years of experience do you have as a Site Reliability Engineer/DevOps Engineer/Systems Engineer?*

With what related technologies do you have experience? (e.g. Cloud Providers, Programming Languages, CI/CD Tools, Kubernetes etc)*

Human Check*

Submit Application

Thanks for visiting our Career Page. Please review our open positions and apply to the positions that match your qualifications.

Site Reliability Engineer

Apply for this position