Search Results

BCStrategies

All (65)

Events (1)

Blog Posts (50)

Other Pages (14)

50 results found with an empty search

Determining the Value of Proactive Monitoring
System downtime has a cost to any organization. If your organization uses Microsoft Teams for its collaboration and communications, when Teams is unavailable your business is impacted. The challenge is separating headline grabbing estimates, “… downtime can eclipse $5 million an hour in certain scenarios…” (Forbes Technology Council, April 10, 2024), often associated with only the largest organizations, from reasonable estimates for various sized medium and large organizations. To address this challenge, Martello Technologies commissioned EnableUC to independently develop a model that estimates the impact of deploying Vantage DX proactive monitoring and enhanced diagnostic tools within a Teams environment, based on different sized and configured organizations. Our research and model development occurred without any input or influence from Martello and we only shared the results when completed. In completing this project, we ended up building two models (we are over achievers 😊), one that focused on the operational costs to support a Teams environment and another that looked more broadly at operational, productivity, and revenue impacts. This article discusses our first model which calculated the difference proactive monitoring and enhanced diagnostic tools can have on operational costs. Key Takeaways ➡️ 60% of issues could potentially be mitigated with proactive monitoring. ➡️ For an organization with 1,000 users, proactive monitoring is likely to halve IT support labor required. ➡️An organization with over 10,000 users should expect to reduce required staffing by 70% if proactive monitoring and enhanced diagnostics are deployed. Building a Model To assess the advantages of an enhanced monitoring and issue diagnosing toolset, we developed an operational model loosely based on the Microsoft Operations Framework (MOF). While Microsoft has shifted its focus to a tool-based approach they call the Microsoft Operations Management Suite (OMS), MOF provides a structured life-cycle based approach and serves as a good foundational model for IT service management. We extended MOF using a series of “ runbooks ” we have developed over the years for various organizations who have implemented Microsoft Teams. The result was a clearly defined series of daily, weekly, monthly and annual tasks required to successfully operate any Microsoft Teams environment. Task Effort Estimates Based on our collective expertise, discussions with IT professionals, who are responsible for managing Teams environments, and Microsoft MVPs (most valuable professionals), along with online research, we assigned effort estimates to each of the identified Teams management tasks. We then estimated the number of issues and tickets that would be generated, based on hands-on experience and research. Understanding the number of tickets generated is critical because a significant portion of daily IT time is typically allocated to addressing tickets. We identified 11 categories of issues that created outages or service degradation. (Categories included core services issues, supporting service issues, hardware and software issues, human error, loss of power, etc. We will explore these categories in detail in a follow-up article.) Collectively, these items degrade Teams service 1.8% of the time, for one or more users. Depending on your organization’s work hours, not all these outages will occur during working hours, unless you operate 7 x 24, the model accounts for this. Additional assumptions built into the model (which can be configured) include: Expect 1 incident per every 1,000 physical phones deployed per day Expect 1 incident per every 50 Microsoft Teams Rooms per day. An issue or outage needs to last 10 minutes in order to potentially create a ticket. For instance, if a momentary “blip” occurs while trying to join a meeting, most users simply retry a few times. On average 16% of users raise a ticket when an incident/issue occurs. The Impact of Proactive Monitoring Proactive monitoring reduces the number of user-impacting incidents, because it allows IT teams to correct issues quickly, potentially before users are impacted or, when an issue can’t be quickly corrected, allows IT to communicate alternatives. For example, if a network issue is impacting a location, users can be advised to work from home, a coffee shop, or another nearby location. If Teams, or a supporting service (e.g. authentication), is experiencing an issue, users can be alerted that they should use a backup UC solution, or their mobile phones for an upcoming meeting. For each of the identified 11 issue categories, we estimated the percentage of issues that could be mitigated with proactive monitoring, ranging from 0% to 90% depending on the source of the issue. In total, our model indicates that up to 60% of potential issues could potentially be mitigated with proactive monitoring. Implementing Proactive Monitoring To proactively monitor a Microsoft Teams environment synthetic transactions and agents or appliances are key tools. Here’s a breakdown of how they work and their benefits: Synthetic Transactions Synthetic transactions simulate user activities to test and monitor the performance and availability of Microsoft Teams services. These transactions are pre-scripted actions that mimic real user interactions, such as: Joining a Teams meeting Sending a message Sharing a file Scheduling a meeting By continuously running these synthetic transactions, IT teams can detect issues before they impact actual users. This proactive approach helps identify performance bottlenecks, service outages, and other problems early on. Agents or Appliances To execute synthetic transactions, organizations deploy agents or appliances at various locations. These agents can be software-based or hardware devices that perform the following functions: Monitoring Performance : Agents simulate user activities and measure the response times and success rates of these actions. Collecting Data : They gather detailed metrics on network performance, application responsiveness, and service availability. Alerting and Reporting : When an issue is detected, agents can trigger alerts and generate reports, providing IT teams with actionable insights. Enhanced Diagnostics Proactive monitoring can reduce issues, but it cannot eliminate every issue or the corresponding tickets that users raise. As such, our model takes into account how enhanced diagnostics can reduce the time required to identify a root cause and address a particular issue. Microsoft continues to improve the built-in diagnostic reports, most recently deprecating the Call Quality Dashboard in favor of PowerBI Quality of Experience (QER) report templates. However, both CQD and QER reports can be data rich and information poor. They provide lots of technical details but overwhelm all but the most skilled IT professionals. Additionally, the Microsoft reports don’t provide much detail outside the Microsoft environment. Local network and ISP details are not fully captured using the Microsoft built-in reports. For organizations using direct routing, session border control (SBC) details and carrier SIP trunk details are incomplete. For customers using Operator Connect, key carrier or network service provider details are sparse. We believe that enhanced third-party diagnostic tools can reduce the time taken to resolve a particular incident from an average of 30 minutes to 15 minutes. Put another way, a typical support engineer can handle an average of 20 tickets per day with the bult-in tools and an average of 30 tickets per day with an enhanced set of tools. Note that these tickets per day averages assume some tickets are more straightforward moves, adds, or changes and do not require root cause analysis. Results Taking into consideration all of the above, here is what the model indicates for several different sized organizations. For organizations smaller than approximately 200 users, you typically require at least one person whether proactive monitoring or enhanced diagnostic tools are deployed. Once you reach approximately 250 users, you can invest in more people or use better tools to reduce overall labor costs. With 1,000 users working in the office 3 out of 5 days (a common hybrid arrangement), the potential labor savings are significant as proactive monitoring reduces the number of tickets that require investigating and speeds up the time to resolution for issues that can’t be mitigated. Scenario: 1,000 users in 2 locations As the number of users increases, proactive monitoring has a larger potential impact. Scenario: 2,500 users in 5 locations Scenario: 10,000 users in 20 locations The complete model takes into consideration other factors including the number of desk phones and room systems deployed, the number of locations, the number of time zones operated in, etc. Conclusion Using reasonable assumptions related to operational management of a Microsoft Teams environment, for most organizations, with 200 or more people, proactive monitoring and enhanced diagnostic tools can provide a significant return on investment by reducing the amount of support labor required. For organizations with over 1,000 users, proactive monitoring can halve the amount of IT support labor required. Larger organizations with over 10,000 users can expect proactive monitoring to reduce support labor by two-thirds. This is only part of the story because outages also impact productivity and revenue generation for an organization. We will explore these broader impacts in a follow-up article that will dive into the details of the second model we developed as part of this project.
Teams Reporting: Evolving But Still Gaps
Microsoft has consistently worked to improve quality and usage reporting associated with Lync, then Skype for Business, then Skype Online, and now Teams. There have been significant advances over the past nine years, but gaps and opportunities for improvement remain. Let’s first acknowledge the significant advancements Microsoft has made. Then we can examine the current gaps and opportunities to improve. The History The Call Quality Dashboard (CQD) was originally released as a free add-on for Skype for Business Server, the on-premises version of Microsoft’s chat, meeting, and communications server. In 2015 Microsoft released a version of CQD that worked with Skype for Business Online (the platform that would morph eventually into Teams). In 2016 version 2.0 of CQD provided access to 6 months of data and expanded reporting beyond audio quality, including video and appsharing information. The year 2017 brought further updates to CQD that added a reliability issue report focused on call setup issues. This was also the year Teams launched. The combined Teams and Skype for Business admin center was launched in 2019 which also integrated the call quality dashboard (although it really was just a menu link to the CQD portal). A significant number of CQD updates were launched in 2019 under the “ Advanced CQD ” banner. Call data was now updated within 30 minutes (labeled “near real-time data”) as opposed to taking over 24 hours. The ability to drill down within reports even to the user level was provided along with the addition of several near reports. After years of improving CQD, Microsoft pivoted in 2020 bringing call quality data into Power BI (business intelligence) with the release of the first version of the Quality of Experience (QER) templates. Current State The latest version of the Power BI QER templates, version 8, are available here and a detailed listing of the various Power BI QER reports can be found here . Recently Microsoft has deprecated the original CQD portal, adding a banner that directs users to use Power BI: The current series of QER Power BI templates is packaged into five different templates, each with many reports: QER.pbit is the main template with over 20 reports focused on identifying Teams meeting and calling issues. QER MTR.pbit provides reports focused on Microsoft Teams Rooms. QER PS.pbit is a template optimized to analyze Microsoft Teams Phone System deployments. CQD Teams Auto Attendant & Call Queue Historical Report.pbit includes three reports related to auto attendant, call queue, and agent usage. CQD Teams Usage Report.pbit details how users in your organization are using Teams. Current Gaps Despite the significant number of changes and the large number of reports available through the admin center, the Teams admin center, and Power BI, there are several gaps between the current state and the ideal state: 1. Too much data too few insights The goal of analytics is to provide actionable insights, that is, to highlight issues you can take corrective action to address. The current reports still too often provide interesting visuals that don’t point IT professionals towards specific issues. 2. Inability to compare groups The ability to compare quality, reliability, usage, adoption, and user satisfaction across different geographical, functional, and facility groups is one of the most powerful mechanisms to identify potential issues. While some existing Teams reports allow you to group results based on IP address, they lack the ability to track “VIPs” or other functional groups. 3. Too many “good” calls CQD uses a very specific formula to classify calls as “poor”. The rules are too rigid and often having multiple parameters near the threshold can cause users to indicate the call was poor, even though it is marked as good. Specifically, CQD only marks a call as poor if one or more of the following conditions are met and Packet Utilization is > 500 packets: 4. Lacking a complete view The CQD and Power BI reports do not have the ability to pull data from on-premises session border controllers (SBCs) or other network devices which means you have an incomplete view of what may be causing issues.[TJ1] For organizations using Operator Connect or Direct Routing as a Service (DRaaS) this becomes even more challenging as they don’t have access to details that can help identify the likely source of an issue. Filling the Gaps I recently had a detailed discussion with representatives from VOSS that focused on how they address the issues related to the built-in Teams reports for their customers. I came away from our discussion, understanding that VOSS Insights was focused on addressing several significant Teams reporting limitations: 1. Focusing on actionable insights. According to VOSS, the name of its reporting product “Insights” speaks to the intent for the VOSS toolset to provide actionable intelligence into your complete UC estate. Customized dashboards can readily compare different user groupings. Customized dashboards can be complemented by intelligent alerting, the ability to group and summarize alerts as opposed to overwhelming IT pros wit a barrage of alerts during an incident. Beyond providing actionable insights and alerting, in some cases the VOSS tools can initiate automated remedial action, known as self-healing. [TJ2] [KK3] This can reduce the burden on the operations team and help to resolve certain issues more quickly. 2. Delivering multi-platform reporting. While many organizations have standardized on Microsoft O365 and Teams, lots still use other UC&C platforms for specific use cases. VOSS provides a “single pane of glass” even if you use multiple UC&C tools, so you can gain view and manage the full UC stack from a single point of control. Understandably, Microsoft reporting does not (and likely will not) provide this capability. 3. Providing a more complete “big picture”. VOSS Insights incorporates the traditional CQD data along with proactive synthetic testing data, detailed data from SBCs, and network layer data such as NetFlow to provide an in-depth insight into the UC stack, helping to ensure better UC observability. This more complete picture can help shorten resolution time and reduce finger-pointing between teams (or providers). 4. Helping optimize cost. VOSS Insights can help analyze usage and optimize capacity and licensing data to ensure you are delivering communications and collaboration capabilities as cost-effectively as possible. Additionally, by ingesting facility information, including power consumption data, customized Insights dashboards can assist in delivering better overall asset management. Information is Key The built-in Teams reports have certainly evolved, and no doubt will continue to improve. However, the Microsoft approach often provides lots of reports all with an overwhelming amount of data and limited information. Based on my discussions with VOSS, their toolset starts where the Microsoft reports end and focus on providing actionable insights. For those responsible for delivering consistent, reliable, cost-effective communications and collaboration, this combination is worth investigating. References : VOSS site: https://www.voss-solutions.com/ VOSS Insights product details: https://www.voss-solutions.com/offerings/voss-insights/ CQD Stream Classification: https://learn.microsoft.com/en-us/microsoftteams/stream-classification-in-call-quality-dashboard Power BI Quality of Experience Reporting: https://learn.microsoft.com/en-us/microsoftteams/cqd-power-bi-query-templates RFC 350: https://datatracker.ietf.org/doc/html/rfc3550