Senior Observability Engineer
Date: 15 Apr 2026
Location: Braddell, SG
Company: Network For Electronic Transfers (S)
BCS is NETS’ wholly owned subsidiary, and is an entity within the NETS Group. It manages and operates clearing and payment infrastructure for the Singapore Automated Clearing House, including Fast And Secure Transfers (FAST), Inter-bank GIRO (IBG), Cheque Truncation System (CTS), and provides services for PayNow and SGQR Central Repository.
Overview
The Technology Division at BCS is responsible for the operations, maintenance, and support of BCS applications such as FAST, Giro, CTS, PayNow, SGQR, TFR, eGIRO and its critical infrastructure, ensuring its operational stability, regulatory compliance, and service reliability.
The IT Operations department is regarded as BCS’s backbone, overseeing the day-to-day operations, maintenance, and support of IT systems and infrastructure that ensure the continuous delivery of BCS applications. This includes the management of data center operations, application monitoring and Level incident management support, ensuring the systems are always available, secure, and compliant.
Senior Observability Engineer role
We are seeking a hands-on Observability Engineer to design, build, and operate enterprise-grade observability platforms. This role focuses strongly on Elastic Stack (Elasticsearch, Logstash, Kibana, ML, AI Assistant) while integrating with other monitoring and APM tools such as SolarWinds, Dynatrace, Prometheus, and Grafana.
The ideal candidate is deeply technical, passionate about data quality and usability, and capable of working closely with application teams, engineers, and stakeholders to understand what telemetry data matters, how it should be ingested, parsed, enriched, and visualized to deliver actionable insights—while continuously driving observability maturity to enable faster problem identification, probable cause analysis, intelligent recommendations, and, ultimately, automated recovery. The candidate should also be able to drive observability project from end to end too with minimum supervision. This role encompasses both BAU operations and project-based initiatives, requiring the ability to balance operational support with continuous improvement and delivery.
This role also looks forward—driving AI driven observability, service mapping and topology based visualization, and advanced analytics using Graph DB, RAG, Graph RAG, Vector DB, and agentic AI, particularly within an AWS cloud environment.
Key Responsibilities
Observability Platform Engineering (ELK-Focused)
• Design, implement, and operate scalable Elastic Stack (ELK) solutions for logs, metrics, traces, and events.
• Own end-to-end log ingestion pipelines using Beats, Logstash, Elastic Agent, and custom integrations.
• Perform log parsing, filtering, cleanup, normalization, and enrichment using Grok, conditionals, processors, ingest pipelines, and ECS standards.
• Define and implement ingestion best practices for performance and reliability.
• Configure and maintain Kibana dashboards, visualizations, Lens, and Canvas for operational and business observability use cases.
• Experience using Elastic Observability/SIEM and Elastic APM to instrument applications, collect and correlate logs, metrics, and traces, perform performance analysis, and visualize service dependencies.
• Create and manage Elastic Machine Learning jobs (anomaly detection using multi metrics, forecasting) and interpret outcomes to generate insights and alerts.
• Integrate Elasticsearch with other observability tools such as:
o Prometheus & Grafana (metrics collection and visualization).
o SolarWinds and Dynatrace (infrastructure monitoring and APM).
• Correlate logs, metrics, traces, and events across platforms to enable unified observability.
• Design observability solutions that support operations, infrastructure, and application teams.
• Setup kibana alert rules and write advance watcher scripts.
• Leverage Elastic AI Assistant, including LLM integrations in cloud environments (especially AWS), to enhance investigation, analysis, and insights.
________________________________________
ELK Cluster Administration & Operations
• Manage Elasticsearch clusters, including:
o Familiar in installing ELK tech stack, perform patching and upgrade.
o Node roles, index lifecycle management (ILM), shard strategies, and data tiers
o Security (users, roles, API keys, TLS).
o Performance tuning, scaling, and troubleshooting.
• Apply ELK cluster management best practices for stability, availability, and resiliency.
• Monitor cluster health and proactively address capacity and performance issues.
________________________________________
Stakeholder & Use-Case Driven Observability
• Work closely with operation, infrastructure and application teams to:
o Understand application behaviour and observability requirements.
o Identify what data should be collected, parsed, enriched, and retained.
o Translate business and technical needs into effective observability designs.
• Act as a consultant and advisor on observability best practices across teams.
________________________________________
AI-Driven & Future Observability Capabilities
• Contribute to building next-generation observability, including:
o Topology-based observability.
o Graph-based data modeling and relationships.
o Retrieval-Augmented Generation (RAG) for operational intelligence.
o Integration of LLMs and agentic AI for automated analysis, root-cause discovery, and recommendations.
• Explore and prototype AI-assisted workflows for incident response and system understanding.
________________________________________
Cloud & Infrastructure Observability (AWS)
• Instrument and observe AWS workloads including: EC2, Lambda, ECS/EKS, API Gateway, RDS, S3, and other supporting services.
• Some experience in using Elastic Cloud will be advantage.
• Implement cloud-native telemetry collection and integration with ELK and other monitoring tools.
• Optimize observability architectures for scalability, resilience, and cloud cost management.
________________________________________
DevSecOps & Automation
• Integrate observability deployment (eg :logstash deployment) into DevSecOps practices.
• Use automation tools where applicable for operational tasks (eg: for data extraction/cleaning/transformation, reconciliation) using scripting or programming languages (Python) where applicable.
• Ensure observability configurations meet security and compliance requirements.
________________________________________
General
• Provide technical support for Observability tools including incident/problem coordinator, troubleshooting, resolution and performance optimization.
• Collaborate with application, infrastructure, security, and operations teams.
• Some experiences in handling audit and risk activities from MAS, Group audit, Risk & Compliance, and 3rd party auditor.
• Prepare and update Observability documentations, knowledgebase, SOP, best practices as required.
• Able to mentor observability engineer peers as required.
• Comfortable liaising with vendor to discuss on the requirements, solutioning and validating the implementation.
Requirements
Education
Bachelor’s degree in information technology, Computer Science, or a related field.
Experiences
At least 7 years of hands-on experience in the following:
Technical Skills
1. Strong understanding on observability concepts, eg: know what is considered as important telemetry, golden signal, how to monitor, how to derive insights, etc.
2. Able to propose solution that can uplift observability maturity in the orgamization.
3. Strong hands-on experience with Elasticsearch, Logstash, Kibana, and Elastic ML.
4. Strong know how to perform log ingestion, parsing, Grok patterns, filtering, and enrichment.
5. Experience managing and operating production enterprise ELK clusters.
6. Experience with monitoring tools such as Solarwinds, Prometheus, Slack, Grafana, Dynatrace, or similar tools.
7. Good understanding of AWS services (EC2, S3, Lambda, VPC, Cloudwatch) relevant to observability.
8. Familiarity with Rest API, AI/ML, LLMs, RAG, Graph Databases, OTEL and emerging observability intelligence concepts.
9. Experience on topology mapping or service dependency visualization.
10. Strong scripting and automation skills.
11. Experience with CI/CD pipelines and deployment automation for logstash pipeline deployment or dashboard/canvas deployment.
12. Good understanding of infra (Servers, network, storage) and application tech. stack monitoring.
13. Ensure observability configurations meet security and compliance requirements.
14. Familiarity with Erlang, Java and MQ application architecture for understanding application behaviour and identifying useful observability telemetry would be an advantage.
Personal Attributes
1. Strong communication and stakeholder engagement skills, with the ability to translate complex telemetry data into clear, actionable insights.
2. Strong sense of ownership and accountability, with a high level of commitment to delivery, quality, and outcomes.
3. Ability to drive observability project end to end and perform BAU roles.
4. Proactive, self motivated, and growth oriented, with a passion for continuous improvement and innovation.
5. Demonstrated analytical and problem solving skills, with the ability to identify issues, assess impact, and drive resolution.
6. Excellent follow up and follow through, ensuring end to end execution and adherence to committed timelines.
7. Able to work effectively independently and within cross functional teams in fast paced environments.
8. Collaborative team player with strong interpersonal skills and a focus on building trusted working relationships.
9. Open mindset with willingness to learn new skills and apply them to improve team productivity and outcomes.
Certifications would be a plus:
1. Elastic Certified Observability Engineer / Analyst
2. AWS Certified Solution Associate / Architect
3. Redhat Ansible Automation
Banking Computer Services Pte Ltd (a subsidiary of Network for Electronic Transfers (Singapore) Pte Ltd)