![Centralised Monitoring Guide](https://static.wixstatic.com/media/981170_531eb59cda6e4dc99c5f5a10dd66b343~mv2.png/v1/fill/w_980,h_551,al_c,q_90,usm_0.66_1.00_0.01,enc_avif,quality_auto/981170_531eb59cda6e4dc99c5f5a10dd66b343~mv2.png)
End to end monitoring can help reduce infrastructure issues and downtimes by proactive identification of fault lines and bottlenecks in infrastructure. A centralised monitoring dashboard can ensure everyone from engineers to CXOs is on the same page when it comes to the state of infra.
As the saying goes, seeing is believing, it is necessary to bring transparency into all activities happening in infrastructure through a robust monitoring system - a one stop station for monitoring all kinds of metrics and analytics, a centralized dashboard with metrics from:
Infrastructure resources - Network, System, Storage etc
Applications - Frontend and Backend Services, Third Party Integrations etc
Backend Services - Deployment state, health check, performance, latency etc
Frontend App (Mobile)- Synthetics, Crash analytics etc
Business - User activity, sessions, transactions etc
Centralised Monitoring Dashboard:
A one stop station that provides end to end pure path visibility of how each and every component involved in the software delivery cycle is performing.
This monitoring system should be integrated to a centralized Identity provider that has role based access control and single sign on to provide secure user access.
Compliance of user data, localized caching, security, cloud service monitoring are some more factors that should be considered while selecting a centralised monitoring system.
How to choose your monitoring tool:
There are both cloud managed and self managed options available for monitoring tools. The choice depends on various factors :
Ensure the tool you choose can monitor everything from infrastructure, to application till business metrics. Centralised monitoring means you have everything under one umbrella. Many clouds provide centralised managed services for monitoring, such as AWS Cloudwatch, GCP Operations etc.
There are also managed versions of various third-party or self managed services, such as Prometheus, Dynatrace. You can either set it up yourself or use the managed or SaaS versions available at the respective vendor.
Cloud managed or SaaS versions of monitoring tools are efficient because they take the operations overhead away, with a service change. If you compare the time and effort spent on setting up and managing an entire centralized monitoring tool by yourself, you will find spending a few more dollars cost efficient on managed tools.
However, there are some security factors to consider when using managed services. At times, it has been observed that these cloud managed monitoring tools scan the apps and infra from some management console outside the customer vpc.
In some of my previous projects this had raised red flags as we had no access to those networks nor the management console. Hence we had to get it in agreement from the cloud provider to confirm that none of our data is cached outside of our designated region and is protected from any third party snooping or data theft.
Another important aspect is the ability of the monitoring tool to integrate with your centralised identity provider(IDP). Check for a tool which can integrate with your IDP with SSO. This will help onboarding and off-boarding users a lot easier and traceable. All the more reason to choose them if they provide RBAC.
Cloud vs On-Premise:
With both cloud-managed and self-managed options available, organizations must carefully consider various factors before making a choice. Let’s explore cloud-based and on-premise centralized monitoring models, comparing them on several critical points including cost, scalability, security, maintenance, and performance.
Cost
Cloud-based monitoring systems typically operate on a subscription model, which includes various service tiers based on usage and required features. This model allows for predictable budgeting and operational expenses without the need for significant upfront capital investment in hardware and software. Additionally, cloud providers handle infrastructure costs, including power, cooling, and physical space, which can further reduce overall expenses. Managed services, such as AWS CloudWatch and GCP Operations, remove the operational overhead, making them cost-efficient despite the service charges.
On-premise solutions require a substantial initial investment in hardware, software, and the setup of a physical datacenter. This capital expenditure can be a significant barrier for many organisations. Additionally, ongoing costs for power, cooling, physical security, and hardware maintenance can add up over time. However, for organizations with existing infrastructure, on-premise solutions may leverage sunk costs, potentially offering lower incremental costs compared to cloud-based models. Self-managed tools like Prometheus can be deployed on-premise, allowing for full control over the infrastructure.
Scalability:
Cloud-Based Monitoring:
One of the most significant advantages of cloud-based monitoring is its scalability. Cloud providers offer elastic scaling, allowing organisations to quickly adjust their monitoring capacity to meet changing demands without manual intervention. This flexibility is particularly beneficial for businesses with fluctuating workloads or those experiencing rapid growth. Cloud platforms also provide global reach, making it easier to monitor resources across different geographic locations seamlessly.
On-Premise Monitoring:
Scalability in on-premise solutions is often limited by physical constraints and the need for manual hardware upgrades. Scaling up requires procuring and installing additional servers and storage, which can be time-consuming and expensive. Conversely, scaling down to reduce costs is also challenging, as it may involve decommissioning and repurposing hardware, often at a financial loss. This inflexibility makes on-premise solutions less ideal for dynamic or unpredictable workloads.
Data Security:
Cloud-Based Monitoring: Security in cloud-based systems is a double-edged sword. On one hand, leading cloud providers invest heavily in security measures, offering robust protection against threats, including DDoS attacks, data breaches, and other vulnerabilities. These providers often comply with stringent industry standards and certifications. On the other hand, cloud-based solutions can introduce concerns about data sovereignty, as data is stored offsite and possibly across different jurisdictions, which may not align with all regulatory requirements.
Cloud managed services sometimes scan applications and infrastructure from a management console outside the customer VPC, raising potential security concerns. Organisations must ensure that their data is not cached outside their designated regions and is protected from third-party snooping or data theft. This often requires agreements and assurances from the cloud provider.
On-Premise Monitoring: On-premise solutions offer more control over security configurations and data management, which can be critical for organisations with strict compliance requirements or those handling highly sensitive data. By keeping data within the physical premises, organisations can implement custom security protocols and have direct oversight of all security measures. However, this control also means that the responsibility for security lies entirely with the organisation, necessitating substantial investments in security infrastructure and expertise.
Maintenance:
Cloud-based monitoring significantly reduces the burden of maintenance. The service provider is responsible for hardware upkeep, software updates, and ensuring system availability. This managed approach allows internal IT teams to focus on strategic initiatives rather than routine maintenance tasks. Automatic updates and patches ensure that the monitoring system is always running the latest and most secure version, reducing vulnerability to exploits.
On-Premise Monitoring: Maintaining an on-premise monitoring solution is resource-intensive. It requires dedicated IT staff to manage hardware, apply software updates, and ensure continuous system availability. The maintenance overhead can be substantial, especially in large or complex environments. Failure to keep systems updated can lead to security vulnerabilities and performance issues. Despite these challenges, some organisations prefer the hands-on control that on-premise maintenance provides, allowing for tailored configurations and optimisations.
Performance:
Cloud-based monitoring solutions often boast high performance due to the advanced infrastructure and optimised environments provided by leading cloud vendors. These platforms leverage distributed architectures, load balancing, and high-speed connectivity to deliver reliable and fast monitoring services. Additionally, cloud services are designed to handle large-scale data processing and analytics, offering real-time insights and quick anomaly detection.
On-Premise Monitoring: Performance in on-premise monitoring systems can be highly optimised for specific organisational needs, as the infrastructure is dedicated and under complete control of the organisation. However, achieving and maintaining high performance requires significant expertise and resources. On-premise solutions can suffer from bottlenecks if not properly managed, and scaling performance to meet increasing demands can be challenging and costly. For organisations with the requisite expertise and resources, on-premise solutions can be fine-tuned to offer exceptional performance tailored to specific requirements.
Integration and Identity Management:
Cloud-Based and On-Premise Monitoring: An important aspect of choosing a monitoring tool is its ability to integrate with your centralised identity provider (IDP). Tools that support single sign-on (SSO) and role-based access control (RBAC) streamline the onboarding and off-boarding of users, making user management more efficient and secure. Many cloud-managed and SaaS versions of monitoring tools offer robust integration capabilities with popular IDPs, enhancing security and simplifying access management.
Don't Forget Alerts:
As much as it is important to choose the right tool, it’s equally important to set the alerts/alarms properly. You must set alerts for what you want to be notified for, neither allow all or none.
Here are some alerts that are useful for:
Cloud Account:
Billing payment due or credit check
Root Account Login
Renewal of Subscription
Alerts are sent out upon modifications of ACLs and security groups
Updates - version, patches, certificated etc
Service Availability:
Alert when something goes down and comes back up
Fault Aware and Tolerant
Separate Alerts for infra and app
Operations Dashboard:
Collect Alerts in one system like slack
Pagerduty
Sanitise Alerts, don't spam
And last but not least, once received don't forget to acknowledge the alerts or it will spam your inbox or annoy you with constant notifications.
Summary:
Ultimately, the decision hinges on the specific needs and constraints of the organisation. A hybrid approach, leveraging the strengths of both models, is also an increasingly popular strategy, offering a balanced solution that can adapt to a wide range of operational requirements. Ensuring that the chosen tool can monitor everything from infrastructure to application to business metrics under one umbrella is crucial for comprehensive oversight and efficiency. Lastly few thing needs to be kept in mind while implementing a centralised monitoring system:
First responders need a better dashboard where valid alerts are collected, monitored and acknowledged. Alerts must be sanitised and categorised to prevent spamming.
It is ok to go with licensed monitoring tools provided it covers all the layers like infra, app, mobile, business etc and provides technical support and maintenance.
You can use more than one tools but ensure all of them are integrated into one dashboard
Monitoring Screens also help in cases on satellite service centers or during critical launches.
For a fault tolerant infrastructure you need robust end to end monitoring and alerting.
This can be achieved through a centralised system to monitor the state, performance and events happening across infrastructure and application which sends alerts almost immediately when incidents occur.
This monitoring system should be integrated to a centralised Identity provider that has role based access control and single sign on to provide secure user access.
If you like this article, I am sure you will find 10-Factor Infrastructure even more useful. It compiles all these tried and tested methodologies, design patterns & best practices into a complete framework for building secure, scalable and resilient modern infrastructure.
If you like this article do like 👍 and share ♻ it in your network and follow Kamalika Majumder for more.
![](https://static.wixstatic.com/media/981170_fabf63fc52a842519deaca41970a5be2~mv2.jpg/v1/fill/w_980,h_245,al_c,q_80,usm_0.66_1.00_0.01,enc_avif,quality_auto/981170_fabf63fc52a842519deaca41970a5be2~mv2.jpg)
Thanks & Regards
Kamalika Majumder
Comments