# Cloud Native Observability

The term **cloud native observability** might sound like another buzzword that is used to sell new tools. And that might be true to the fact that we have a lot of new tools that emerged to solve the problems of monitoring container infrastructure.

Conventional monitoring for servers may include collecting basic metrics of the system like CPU and memory resource usage and logging of processes and the operating system.

**Observability** is often used synonymously with monitoring, but monitoring is only one of the subcategories of cloud native observability and does not do justice to its scope. The term observability is closely related to the *control theory* which deals with behavior of dynamic systems. In essence, the control theory describes how external outputs of systems can be measured to manipulate the behavior of the system. When we deal with container orchestration and microservices, the biggest challenge is keeping track of the systems, how they interact with each other and how they behave when under load or in an error state.

The **higher goal of observability** is to allow analysis of the collected data. This helps to get a better understanding of the system and react to error states. This more technological side of things is closely related to modern agile software development that also uses feedback loops in which you analyze the behavior of software and adapt it constantly based on the outcome.

The term **telemetry** has Greek roots and means remote or distance (tele) and measuring (metry). Measuring and collecting data points and then transferring it to another system is of course not exclusive to cloud native or even IT systems. A good example would be a weather station with a data-logger that measures the temperature, humidity, wind speed and more at a certain point and then transmits it to another system that can process and display the data.

In container systems, each and every application should have tools built in that generate information data, which is then collected and transferred in a centralized system. The data can be divided into three categories:

* Logs
* Metrics
* Traces<br>

To ship the logs, different methods can be used:

* **Node-level logging** – The most efficient way to collect logs. An administrator configures a log shipping tool that collects logs and ships them to a central store.
* **Logging via sidecar container** – The application has a sidecar container that collects the logs and ships them to a central store.
* **Application-level logging** – The application pushes the logs directly to the central store. While this seems very convenient at first, it requires configuring the logging adapter in every application that runs in a cluster.

**Prometheus** is an open source monitoring system, originally developed at SoundCloud, which became the second CNCF hosted project in 2016. Over time, it became a very popular monitoring solution and is now a standard tool that integrates especially well in the Kubernetes and container ecosystem.

**Prometheus** can collect metrics that were emitted by applications and servers as time series data - these are very simple sets of data that include a timestamp, label and the measurement itself. The Prometheus data model provides four core metrics:

* Counter: A value that increases, like a request or error count
* Gauge: Values that increase or decrease, like memory size
* Histogram: A sample of observations, like request duration or response size
* Summary: Similar to a histogram, but also provides the total count of observations.

A **trace** describes the tracking of a request while it passes through the services. A trace consists of multiple units of work which represent the different events that occur while the request is passing the system. Each application can contribute a span to the trace, which can include information like start and finish time, name, tags or a log message.

These traces can be stored and analyzed in a tracing system like **Jaeger**.

While tracing was a new technology and method that was geared towards cloud native environments, there were again problems in the area of standardization. In 2019, the **OpenTracing** and **OpenCensus** projects merged to form the OpenTelemetry project, which is now also a CNCF project.

**OpenTelemetry** is a set of application programming interfaces (APIs), software development kits (SDKs) and tools that can be used to integrate telemetry such as metrics, protocols, but especially traces into applications and infrastructures.

Since cloud providers don’t offer their services "pro-bono", ***the key to cost optimization in the cloud is to analyze what is really needed*** and, if possible, automate the scheduling of the resources needed. There are several ways to do automatic and manual optimization:

* Identify wasted and unused resources
* Right-Sizing
* Reserved Instances
* Spot Instances


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://abdelfare.gitbook.io/notes/kubernetes/kcna-notes/cloud-native-observability.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
