top of page

Centralised Monitoring For Compliance

Centralised Monitoring


End to end monitoring can help reduce infrastructure issues and downtimes through proactive identification of fault lines and bottlenecks in infrastructure.

A centralised monitoring dashboard can ensure everyone from engineers to CXOs are on the same page when it comes to the state of infra.


I want to tell you about a very interesting conversation from my sysadmin days.

This was back in 2010 while I was working as a sysadmin, one day our head of infra services called us and asked: “Is everything fine with the network, why am I not seeing any alerts from last few months”.


To that our senior replied, "There are no alarms raised because everything has been working fine."

He was still not convinced, so he had to be taken through the monitoring dashboard and shown that, there was no alerts because there is no issue. Interesting story isn’t it. Even though it’s been 11 years since then, I still see organisations having trust issues with infrastructure. It’s the easiest scapegoat to blame for any issue in the services. 

As the saying goes seeing is believing, it is necessary to bring transparency into all activities happening in infrastructure through a robust monitoring system. My ideal design is a one stop station for monitoring all kinds of metrics and analytics, a centralised dashboard with metrics from:


  • Infrastructure resources - Network, System, Storage etc

  • Applications - Frontend and Backend Services, Third Party Integrations etc

  • Backend Services - Deployment state, health check, performance, latency etc

  • Frontend App (Mobile)- Synthetics, Crash analytics etc

  • Business - User activity, sessions, transactions etc


How to categorise Monitoring?

Monitoring data can be categorised in two ways:


What to Monitor:

  • Infrastructure

  • Application

  • Services

How to Monitor:

  • State

  • Performance

  • Events


Infrastructure Monitoring:

Based on the two categories above infrastructure must be monitoring for its:

State:

  • Health Check

  • Uptime/Downtime

  • Availability - Data Center/AZ monitoring

  • Connectivity - Intranet & Internet works

Performance:

  • CPU/Memory/Disk utilisation

  • Network Bandwidth

  • Peak hour Traffic

Events:

  • Authentication failures/Too many failed logins

  • Unauthorised access

  • Change Management

  • Configuration Changes like FW rules

  • Deployments


Application Monitoring:

Likewise for applications and services monitoring must be set up for the following:

State:

  • Third party connectivity

  • Integration

  • Frontend to Backend Flow

  • Truepath/Purepath for fault identification

Performance:

  • Crash analytics for mobile apps

  • Performance tests

  • Regression Tests

Events:

  • User activity/sessions

  • Transactions

  • Downloads

  • Analytics Business

  • Business metrics, conversions, behavior etc


Centralised Monitoring Dashboard:

A one stop station that provides end to end pure path visibility of how each and every component involved in the software delivery cycle is performing.

This monitoring system should be integrated to a centralised Identity provider that has role based access control and single sign on to provide secure user access.

Compliance of user data, localised caching, security, cloud service monitoring are some more factors that should be considered while selecting any monitoring system.


How to choose your monitoring tool:

There are both cloud managed and self managed options available for monitoring tools. The choice depends on various factors : 

Ensure the tool you choose can monitor everything from infrastructure, to application till business metrics. Centralised monitoring means you have everything under one umbrella. Many clouds provide centralised managed services for monitoring, such as AWS Cloudwatch, GCP Operations etc.

There are also managed versions of various third-party or self managed services, such as Prometheus, Dynatrace. You can either set it up yourself or use the managed or SaaS versions available at the respective vendor. 

Cloud managed or Saas versions of monitoring tools are efficient because they take the operations overhead away, with a service change. If you compare the time and effort spent on setting up and managing an entire centralised monitoring tool by yourself, you will find spending a few more dollars cost efficient on managed tools.

However, there are some security factors to consider when using managed services. At times, it has been observed that these cloud managed monitoring tools scan the apps and infra from some management console outside the customer vpc. In some of my previous projects this had raised red flags as we had no access to those networks nor the management console. Hence we had to get it in agreement from the cloud provider to confirm that none of our data is cached outside of our designated region and is protected from any third party snooping or data theft. 

Another important aspect is the ability of the monitoring tool to integrate with your centralised identity provider(IDP). Check for a tool which can integrate with your IDP with SSO. This will help onboarding and off-boarding users a lot easier and traceable. All the more reason to choose them if they provide RBAC. 


Don't Forget The Alerts/Alarms:

As much as it is important to choose the right tool, it’s equally important to set the alerts/alarms properly. You must set alerts for what you want to be notified for, neither allow all or none. 

Here are some alerts that are useful for: 

Cloud Account

  • Billing payment due or credit check

  • Root Account Login

  • Renewal of Subscription

  • Alerts are sent out upon modifications of ACLs and security groups

  • Updates - version, patches, certificated etc

Service Availability

  • Alert when something goes down and comes back up

  • Fault Aware and Tolerant

  • Separate Alerts for infra and app

Operations Dashboard

  • Collect Alerts in one system like slack

  • Pagerduty

  • Sanitise Alerts, don't spam

And last but not least, once received don't forget to acknowledge the alerts or it will spam your inbox or annoy you with constant notifications.


Things to consider for monitoring setup:

  • First responders need a better dashboard where valid alerts are collected, monitored and acknowledged. Alerts must be sanitised and categorised to prevent spamming.

  • It is ok to go with licensed monitoring tools provided it covers all the layers like infra, app, mobile, business etc and provides technical support and maintenance.

  • You can use more than one tools but ensure all of them are integrated into one dashboard

  • Monitoring Screens also help in cases on satellite service centers or during critical launches.


Summary:

For a fault tolerant infrastructure you need robust end to end monitoring and alerting.
This can be achieved through a centralised system to monitor the state, performance and events happening across infrastructure and application which sends alerts almost immediately when incidents occur.
This monitoring system should be integrated to a centralised Identity provider that has role based access control and single sign on to provide secure user access.

If you like this article, I am sure you will find the 10-Factor Infrastructure even more useful. It compiles all these tried and tested methodologies, design patterns & best practices into a complete framework for building secure, scalable and resilient modern infrastructure. 

 

Don’t let your best-selling product suffer due to an unstable, vulnerable & mutable infrastructure.




 


Thanks & Regards

Kamalika Majumder

6 views0 comments

Recent Posts

See All

Comments


Join the 10factorinfra Club

Learn about secure, scalable & sustainable modern infrastructure development & delivery.

Thank You for Subscribing!

©2024 by Staxa LLP. All Rights Reserved.

bottom of page