SEARCH 


Recently in Availability & Performance Management Category

As a follow up to my recent post titled “End User Experience Monitoring as lynchpin for BSM,” I spoke with Eden Shochat, CTO of Aternity, to learn more about their offerings in this space and discuss unique contributions they could make as part of a larger BSM strategy. For those who aren’t familiar with them, Aternity provides a new class of Application Performance Management software via its Frontline Performance Intelligence Platform.

An excerpt of our very interesting discussion is presented below:

Abbas: I’ve described the role that EUEM solutions like Aternity’s Frontline Performance Intelligence play in the context of BSM, but what is unique about your technology in this area?

Eden: Aternity’s Platform fuses application, desktop and user performance monitoring with real-time business intelligence. This approach generates the most accurate and comprehensive user experience information from multiple levels of the network and application stack on the end-user machines (physical or virtual). By combining this unique data with correlation, clustering and anomaly detection analysis algorithms, Aternity’s Platform is able to perform preemptive problem detection, usage & usability analysis, fact-based capacity planning and activity-oriented compliance. 


Abbas: What is the typical deployment architecture for one of your Global 2000 customers? Does the solution require agents, appliances, or a combination? 

Eden: There is basically a four-tiered architecture approach we follow during the course of our enterprise deployments: 

> The Microsoft Certified Agent(s) collects information and measures the performance of the desktop, applications and user productivity 
> Next, Aggregation services communicate with the Agent(s) to aggregate the measurements and further compress the traffic 
> Then, Analytics Services perform analysis on any incoming data, such as activity usage metrics, and clusters similar data together, performing anomaly detection and correlating endpoints with similar characteristics to locate probable cause 
> Finally, the architecture is supported by a Management Console and a historical database store for enabling the management, configuration and interactive drill-down into specific user experience business intelligence data. 

The aggregation, analytics and management tiers can all run on the same server or they can be split into multiple servers, scaling horizontally or vertically to support tens of thousands of monitored users. 

Unlike appliances which are located within the data center, the Microsoft Certified Agent(s) resides on the end-point where the service is consumed, providing for an in-depth level of accuracy that is impossible to achieve with sniffer based technologies. Additionally, Agent distribution is performed as a software update versus having to distribute multiple hardware appliances.
 
And, by having aggregation separated, the architecture can more easily support distributed models. These could for example include those found in the oil and gas industry, where some of the services that are consumed there are behind very high cost vsat links. Bank branches are another good example, where some of the network and application services are local and don’t go to through a general corporate network.
 
Abbas: When monitoring performance of applications, do you treat them all the same (i.e. agnostic) or are their specialized analyses for common applications such as Exchange, SAP, and Web?
 
Eden: The Aternity Agent performs both protocol agnostic monitoring, supporting virtually all applications as well as technology specific monitoring. This includes:
 
> Generic network Cartridge, supporting any request à response type protocol, e.g: Java RMI, CIFS, 3270 or other, unpublished protocols.
> HTTP/s Cartridge supporting any HTTP-based application, web or otherwise, without requiring the secure breaches by appliance-type key management
> Win32 Client/Server Cartridge: Passive monitoring of any win32 user interface application, be it .NET forms, Powerbuilder or plain vanilla win32 programming.
> Oracle EBusiness Cartridge: Generic monitoring of JInitiator and JDK based form applications, including customizations performed by customers.
> Java: Monitor any Swing-based (AWT is supported by the Win32 Cartridge) applications and applets
 > Server-based Computing ICA/RDP Support: We monitor both the latency of the screen refreshes as well as the actual applications on the Citrix/Terminal Server for published desktops and applications.
 
The support for technology-based instrumentation means that most (if not all) of the applications, shrink wrapped or custom, can be monitored by the Aternity Platform.
 
In addition, the Agent collects environmental information, e.g: network statistics, process information (including crashed and hung processes, user activity) and operating system, service packs, installed applications & patches. The agent is a Windows service monitoring the endpoint providing insight into the network, desktop, server-based computing protocols.
 
This monitoring can be applied to standard desktop/laptop clients, server-based computing environments like Citrix XenApp and virtual desktop infrastructure (VDI) deployments, all this with incredibly minimal footprint of less than 10MB of physical RAM and under 0.1 percent CPU on average.
 
Abbas: Analytics seems to play a prominent role in how Aternity positions itself. Can you go into more detail about how it works and the value that provides to your customers?
 
Eden: Attempting to understand or derive business intelligence from volumes of end user performance metrics is like looking for the proverbial, “needle in the haystack”. Sophisticated, real-time analytics are therefore necessary to truly bring about what we call, Frontline Performance Intelligence.
 
The issue that plagued the early attempts of monitoring user experience is having the capability to transform huge volumes of data into actionable intelligence. Previously, organizations would try to lessen the flow of data from end users’ desktops by only supporting partial deployments of Application Performance Management (APM) technologies. These deployments would be applied to PCs that were exhibiting performance issues. This mode of operation prohibits on-going introspection of user productivity and experience.
 
By collecting a comprehensive set of end user performance and productivity metrics at the Frontline, and processing this data with analytics, the Aternity Platform generates Frontline Performance Intelligence from real frontline performance metrics. The analytical components in the Aternity algorithm engine include:
 
Autonomic Performance Profiling: An Autonomic Performance Profile™ is the mathematical model used for automatic, real-time identification of groups of homogenous users sharing the same behavior at a particular time, and is used to quantify, detect and distinguish between normal and abnormal behavior.
 
Deviation detection: Autonomic Performance Profiles were designed to provide the earliest possible detection of performance problems that impact multiple users while simultaneously eliminating the need for manual alert configuration and tuning, which many other products in the market require. The Analytic Engine performs continuous correlation between the real-time performance measurements captured by the Agents, and the Baselines of the Autonomic Performance Profiles. In this way, performance deviations of any magnitude can be automatically detected, for groups of users of any size, with no manual configuration and/or intervention.
 
Problem Minimization: Each of the detected symptoms is analyzed for commonalities to tie multiple symptoms together into a problem. This has been shown to greatly reduce the number of alerts going to the IT operations.
 
Problem Isolation through Endpoint Classification: End users with like symptoms are first grouped together into an “Effect group”, and an alert is raised. The analytic engine then automatically identifies the end users’ unique commonalities, with two levels of correlation, across the effect group:
 
1. Positive Correlation: the attributes that the affected group have in common
2. Negative Correlation: the attributes common to the effect group that are also common to the non-affected user groups
 
The intersection of these two correlations, i.e. the Query Group and the Effect Group is shown as the “Match Group” above. The attributes that produce the strongest Match Group are “surfaced” as a Probable Cause. Any attributes collected by the Aternity Agent (e.g.: the amount of memory, installed application or the subnet where the endpoint resides) may be used for Dynamic Problem Isolation, i.e. Probable Cause Analysis.
 
Abbas: Given that FPI may well be the early warning system that companies would rely on to get ahead of end user performance issues, what mechanisms do you provide by which another management platform can gain access to the results of the analytics so that they can be presented inside of a BSM view?
 
Eden: When we designed the Aternity Platform, it was clear that we are generating a new type of a data stream - user experience combined with activity data. As such, we architected the system to be totally open. The system components communicate over a message bus among themselves. And, the complete database schema is open, documented and simple for custom-built reports. The problem detection analytics are exposed through our object-oriented Problem Life Cycle Manager and CLI layers.
 
Some of the existing integrations at customers include to Ticketing Systems (CA, BMC), Portals (IBM WebSphere, Microsoft Sharepoint), SNMP alert systems (HPOV) and other proprietary systems.
 
Contact information for Aternity is available here.


Bits and bytes from itSMF Fusion 2008

| Comments (0)

 
CindywinstheWii (Small).jpg
We had a great week at the itSMF Fusion 2008 show in San Francisco this past week – certainly time well spent. We had some really insightful conversations with current and prospective customers, engaged a handful of analysts, scoped out the competition and sat in on a handful of very interesting sessions which unveiled some rather unique data points.

For example, in one session on CMDBs, in excess of 50% of the audience (by show of hands) said they were currently implementing a CMDB. Two-percent admitted they were on their second try having failed the first time around. No surprise integration was routinely cited as the main culprit and that’s an area Managed Objects has certainly mastered.

StackSafe has posted some notes from the show here – and we’d like to offer some bits and bytes – mostly paraphrases – as well:

>> Congratulations to Cindy from Hallmark (photo nearby) who won our Wii raffle.

>> IT is good at measuring performance, but poor at measuring quality.  A help desk that aims to solve 60% of incidents on the first call is really just encouraging staff to close a ticket with a poor answer and reopen a new one with another call. – Malcolm Fry, “CIO and the 366 Degree Circle

>> Roughly 10% of the audience raised their hand when asked “do you know what BSM is?” – Lisa Erickson Harris, “BSM and Best Practices, Elevating the Role of the Service Desk”

>> IT investments will continue to grow, but they must either produce cost savings in the supply chain or improve the customer experience – Charlie Feld “Enabling 21st Century Business Model with IT”

>> A well run IT department is like air – it’s taken for granted. – Dennis Ravenelle, IT Service Continuity Management, Where do I start?

>> “An inaccurate CMDB is worse than no CMDB.” – Richard Peasley, Building Decision Support Systems that Work

- Abbas


I get to have some very interesting and varied discussions with people interested in BSM. They generally start off with a high level strategic goal for their IT organization and go from there. A common theme that’s emerged in a large percentage of them lately is the topic of Application Performance Management (APM) and its relationship with BSM. More specifically, I’ve been spending a lot of time talking about End-User Experience Monitoring (as a subset of APM) as a key element of IT management strategy.

The most important factor when organizations look towards End-User Experience Monitoring (EUEM) technologies is of course that user experience is the ultimate success criteria by which IT services should be evaluated. If I could provide highly responsive application services that were always up and running, what else could be expected of me? The end users would be happy and the lines of business would be realizing the value of their application investments. The underlying technology at that point becomes largely irrelevant. Whether the applications run in corporate datacenters, are hosted by a 3rd party, run in a cloud, or are 100% virtualized, it just doesn’t matter. I am of course oversimplifying the matter by focusing on performance and availability of applications, while leaving out other elements such as security, but, for the sake of discussion, let’s stick to performance and availability for now.

The benefits for having an EUEM solution in place are obvious:

-- Instead of waiting for the phone to ring at the IT Ops Help Desk, you are proactively notified about application failures and brownouts (“this application is so slow!”)
-- Is the reported issue real; is the application really slow or is it performing the same as it always does?
-- Quickly get a handle on the scope of an issue; how many users impacted, which locations, is it all applications or just a few, etc.
-- Get initial diagnostic information to start the triage process; who gets called into the war room? Visibility always sounds great, but I always like to ask – “Have you considered the downside of total visibility into end user performance?”
-- Issues that previously went unreported will be obvious on the chosen EUEM products reports – If an application performs slowly and no one calls it in, was it a real problem? The answer is now Yes, whereas before the EUEM tool, it never happened.
-- Don’t be surprised when you hear talk about establishing internal service level objectives for performance, availability, and MTTR not for IT infrastructure elements, but End User Experience.
-- How well equipped are you to handle an increase in the number of events that need troubleshooting?  Now that you know every time an app fails or is slow, how quickly can you figure out WHY it happened?

This is where the BSM discussion starts to come up organically (unless it was the starting point, of course). Having a defined service model in place for the key applications & services that IT provides to the business is key to the WHY side of the EUEM equation. We talk about having a BSM implementation that ties together the best of breed End-User Experience Monitoring solutions, Service Desks, Fault and Event Management solutions, Element Management platforms, IT infrastructure performance monitoring tools, and other types of products to build a complete strategy for IT and Business Operations. Integration discussions quickly follow but that is a topic that has been covered extensively in other blog posts (but if comments suggest it’s time to revisit the topic, we certainly can).

I often wind up talking about specific types of EUEM technologies and vendors because of my background in this space (see full disclosure below) wanted to capture some of those discussions in a series of posts. First off, I generally group EUEM tools into one of three categories:

1. Passive monitoring systems

    -- Generally appliance offerings which capture and analyze IP packets to provide insight into End-User Experience of a wide array of applications
    -- May have specialized analyses for standards-based apps such as Web, Voice, and Video
    -- Deployment is usually at datacenters or wherever the applications are hosted
    -- Vendors providing solutions in this space (in alphabetical order):
            -- CA’s Wily Customer Experience Manager (via Wily)
            -- Compuware’s ClientVantage (via Adlex)
            -- Coradiant
            -- HP’s Real User Monitor software (via Mercury Interactive)
            -- IBM Tivoli
            -- NetQoS
            -- Nimsoft
            -- OPNET Technologies
            -- Quest’s Foglight End User Management

2. Active monitoring systems w/ synthetic transactions

    -- In most cases this involves recording sample user activities (e.g. login, search for info, run report, etc.) and using deployed robot agents/appliances to replay them at varied intervals.
    -- Most frequently these solutions tend to be oriented exclusively towards Web apps, but there are specialized vendors that focus on large enterprise application suites such as SAP and Oracle.
    -- Generally deployed at user population centers such as campuses or representative locations such as selected offices in Europe or Asia.
    -- Market was pretty much created by Mercury Interactive (now part of HP) so they are the dominant player here.
    -- Some of the vendors in the passive monitoring space also have some active monitoring elements in their portfolio to “poke” the applications when users might not be utilizing their applications (after business hours, for example); other smaller vendors offer lower cost solutions when compared to the HP suite including Managed Objects’ own Business Experience Manager.
    -- For internet facing applications, services from companies like Gomez and Keynote Systems are great sources of performance data without having to deploy your own monitoring “robots”

3. End-user behavior monitoring & analysis

    -- A few of the passive monitoring vendors offer aggregate or high level data, but I tend to group solutions into this category that have detailed analysis not just performance, but also business analytics about user behavior.
    -- Some of the defining elements in this category include capture and playback of complete user sessions, tracking time spent on each page, click-through rates, and error or missing content identification.
    -- Solutions in this space are specialized by applications (e.g. Web, SAP, Oracle, etc.)
    -- Deployment can be a combination of appliances and software agents. 
    -- A few of the vendors in this space (in alphabetical order):
            -- Aternity
            -- Knoa
            -- Tealeaf

The list of players in the EUEM space is not comprehensive, just a short list of those whom I have some level of familiarity. Some quick Google searches or a call to your favorite analyst should turn up more.

I would like to explore this area in more detail and plan a series of follow up posts with topics such as: details about what information from EUEM solutions should be incorporated into the BSM fabric (Hint: it’s not just events); sharing field experience on deployment strategies that have worked & pitfalls to avoid; detailed coverage on specific vendors including strengths, weaknesses, integration details, etc.

If you have experience with any of the products mentioned, other vendors that feel should be mentioned in any of the categories, general feedback, or would like to have a one-on-one discussion on this (or other) topic I’ve covered, leave me a comment below.

***Disclosure: The author of this post, Abbas Haider Ali, held roles at OPNET Technologies and IBM prior to Managed Objects.