Sunday, February 21, 2010

Let's Troubleshoot (putting it all together) !!

Note: This is the last of a set of articles about troubleshooting outbound messaging issues. If you need or want to read from the beginning, start here.

Now that you have some knowledge of where you can look and what information you can find, how do you use it to troubleshoot? Let's go back to my original roadmap:

First, know the path a message will take to get to its destination
Second, determine how far along that path it got
Third, figure out why it stopped (or if it stopped)

What path should a message take? By now you should know whether Exchange will send outbound messages to a Smarthost or directly to the recipient system.

How far did it get? Did it reach the Exchange server? Check Message Tracking. If the message doesn't appear, then the client (typically Outlook) never delivered it successfully. Check Outlook - is it still in the Outbox?

Let's say the message shows up in Message Tracking. Did the message leave the Exchange server? In other words, does it report it was transferred through SMTP? If it does, it means Exchange delivered the message to the next hop.

If not, what is the last thing reported by Exchange? Check the Queues. Remember your routing configuration. Does Exchange send all messages to a Smarthost, or does it use DNS?

It's at this point you may want to verify DNS lookups and test communications with Telnet.

If Exchange delivered the message to the next hop, it's time to examine the SMTP log. Checking the SMTP log will show you the communication between your Exchange server and the system to which it wanted to deliver the message.
Did it receive an OK response to the HELO/EHLO command?
Did it receive an OK response to the MAIL FROM command?
Did it receive an OK response to the RCPT TO command?
Did it receive an OK response to the DATA command?
Did it receive an OK response to the QUIT command?

If it passed all the way through the QUIT command, the message is now the responsibility of the system that received it from your Exchange server. If that system is under your administration, check there. If not, your troubleshooting has come to an end. You have verified that your system delivered the message successfully.

While this does not account for many situations, it does take a lot of the mystery out of troubleshooting. You can certainly dig deeper into SMTP and other (non-MS Exchange) systems, but I think this will start you on the way to becoming a troubleshooting guru. With an understanding of some of the concepts, you can read through technical articles and reference materials for more information.

Understanding basic SMTP commands and responses

There are only a few SMTP commands commonly used, and more importantly only a few responses that matter.

SMTP responses
Recipient systems will respond to each SMTP command with a numeric value and optional text. Any value in the 200-299 range is considered to be an "OK" acknowledgement.

HELO / EHLO
This is how the sending system opens an SMTP conversation with a recipient system that acknowledges a TCP 25 communication attempt. HELO is the original SMTP specification, EHLO is an ESMTP command. Parameters after the command are optional, although it should be noted that some recipient systems may attempt to match that against the domain name indicated by performing a reverse-DNS lookup of the sending IP address.

MAIL FROM:
This command displays the reply address of the sender.

RCPT TO:
This command displays the recipient address. Only one address is allowed per command, so messages with multiple recipients will show each separately.

DATA
This command signifies the start of the actual message. That includes what appears in the TO, CC, BCC, and Subject lines of the message. It includes the message body and attachments. None of that information is displayed in the SMTP log. ESMTP sending systems may declare the length of the message.

QUIT
This command requests a termination to the SMTP session.

Understanding the Queues view

For each server listed in the ESM, there is a folder called Queues. It displays the status of all connections the Exchange server is attempting to make. It will show the number of messages waiting to be sent. For any of the individual messages, you can see the sender, recipients, subject, and size. You can also see if a particular message is a Non-Delivery Report (NDR).

Messages that stay in the queue typically indicate a transmission problem. It could be that the destination domain doesn't exist, or that the destination domain is refusing your transmission, or that the destination domain is having a problem receiving messages. In any case, if a message stays in the queue, Exchange will attempt to send it again later.

Understanding the SMTP log

Starting with Exchange 2003, all inter-server communication is by default done via SMTP. This makes the SMTP log a convenient way to see the high-level communication between your Exchange server and other SMTP systems. In particular, the SMTP commands and the responses to those commands.

Depending upon the amount of traffic your Exchange server handles, SMTP logs can get large. There is no automatic purging, so carefully consider where you are storing the log files. It is enabled/disabled on the General tab of the Default SMTP Virtual Server properties.

One of the unfortunate issues with the SMTP log is that there is no thread-organization. In other words, it is not possible to tell which log entry belongs to which thread. The entries are posted in the order received. If multiple threads are running concurrently, the entries will all be mixed together. That said, you can typically figure it out because of the sending address and recipient address.

Understanding Message Tracking

Message Tracking (MT) is a tool that is part the Exchange System Manager (ESM). It reports what a given Exchange server does with particular messages. Note that tracking ends when the message leaves that Exchange server. If the message goes to another Exchange server, you can consult MT on that other server for more information.

MT can be used to determine if a message was delivered to a mailbox, or to another system. If the tracking ends without the message being delivered, you will see what Exchange was doing last. That can give clues as to what the underlying issue is.

MT can be enabled or disabled. Open the properties of the Exchange server. On the General tab there is a checkbox to Enable Message Tracking. There are also settings for log location and retention.

What's a Relay?

A Relay is essentially any system that accepts and forwards mail to SMTP domains for which it does not act authoritatively. That's a mouthful - what does that mean exactly? Let's say you address a message to uemurad@yahoo.com and send it to my corporate server. If my server accepts and forwards the message, my server is a relay. If my server accepts and forwards all messages from everyone, it's an open relay. Open relays are considered bad because Spammers find and utilize them to help disguise the source of the spam. Being an Open Relay is bad because recipient systems will see the spam coming from your system and will quickly degrade your reputation. You can end up on blacklists then have your legitimate messages rejected.

A system can be a Relay without being an Open Relay. A Relay is not a bad thing in itself.

What's a Smarthost?

A Smarthost is essentially a system to which all messages are forwarded, regardless of their ultimate destination. Examples of Smarthosts are AntiVirus appliances, Content filters, Sender/Recipient filters, and Spam filters. In addition, some ISPs and some of their service levels require customers to send all outbound mail through their systems instead of directly to recipients. A Smarthost is responsible for receiving mail and forwarding to the appropriate system. Note that it too may send messages to a Smarthost or directly to a recipient system.

Understanding Logs

Logs are nothing more than a collection of status notes a system makes about a process. They are a way to review what happened during that process to understand what worked and what didn't. Most systems have some sort of logging option. Exchange has a couple of useful logging features that you've probably heard about - Message Tracking and SMTP logging.

Enabling Message Tracking is done in the Exchange System Manager (ESM). This is done at the server level. Open the properties of the server and find the checkbox on the General tab. That configuration also defines the location of the log files and the retention period.

Enabling SMTP logging is also done in the ESM. Although this is configured per server, the configuration is found in the properties of the Default SMTP Virtual Server (expand the server, Protocols, and SMTP).

Understanding your outbound mail flow

Before you can troubleshoot why outbound mail isn't getting to where it supposed to, you first have to understand how it's supposed to get there. One of the keys to this understanding is knowing whether or not Exchange sends messages directly to recipient systems. When an Exchange server gets a message destined for the outside world, there are two main mechanisms that it consults. The Virtual Server (VS) and Connectors.

The Virtual Server setting takes precedence, so look there first. Open the Exchange System Manager (ESM) and expand to the server in question. Further expand Protocols and SMTP. Beneath SMTP there is typically a single entry - the Default SMTP Virtual Server. Open the Properties of it, go to the Delivery tab, and click on the Advanced button. The main thing you are looking for is whether there is an entry in the Smart host field. If there is, it means that all outbound messages will be sent to that FQDN. If not, your Exchange Organization will require one or more SMTP Connectors to route outbound messages.

If the VS is not configured with a Smarthost, Exchange then looks to the Connectors for outbound routing information. The Connector structure can be simple or complex, depending upon the size of your Exchange Organization and how your enterprise wants mail to flow. For this example, let's assume the simplest configuration possible - a single Exchange server environment.

To route outbound messages, your single-server Exchange Organization requires at least one Connector (assuming you did not configure a smarthost via the VS).

Let's take a moment to review the function and difference between Exchange Administrative Groups (AG) and Exchange Routing Groups (RG). AGs allow you to easily configure administrative permissions to a group of servers, regardless of their geographic location or purpose. RGs are specifically for message routing. There is no implied correlation between any particular AG and RG.

Connectors are associated with RGs and are also referred to as SMTP Connectors.

Getting back our discussion on mail flow, in the simplest configuration you'd have a single Connector. Looking at the Properties, on the General tab there is a radio-button selection allowing you to either configure a smarthost or to use DNS.

If smarthost is selected, there should be an IP address in the field (as opposed to the FQDN format used in the VS).
If DNS is selected, it means that Exchange will attempt to send messages directly to the recipient system.

Troubleshooting Outbound Messaging

A couple of years ago I wrote a series of articles on what to do when users say inbound messages aren't getting to them. I am frequently asked for help figuring out why outbound messages seemingly don't get to where they are going.

Troubleshooting outbound mail flow is relatively simple (at least to me, but then again I'm a self-professed Messaging-geek). Whenever I bring up the subject, people look at me like I'm talking about something more (dark) art than science. I think that's only true until you gain an understanding of the science. With that in mind, here's my attempt to explain some of that science and hopefully put you on the path to becoming a troubleshooting wiz.

The easiest way to think about messaging is that it is a point-to-point transmission system. It starts somewhere and wants to go somewhere else. It may require only one tranmission (hop) or several. Each transmission succeeds or fails. Sounds simple so far, right?

Let me put it another way - here's how I look at troubleshooting message flow:

First, you have to know the path a message will take to get to its destination
Second, you have to determine how far along that path it got
Third, you have to figure out why it stopped (or if it stopped)

Sounds simple enough, but as the saying goes the devil is in the details. Towards that end, here are some separate articles about narrowing the focus of your search. I'm not sure the best way to follow along. As you get used to some of these ideas you may have to go back and forth. You may want to jump to the last article in the list and see how far you can get. I'll try to link up the articles as much as possible.


Understanding your outbound mail flow
Understanding Logs
What's a Smarthost?
What's a Relay?
Understanding Message Tracking
Understanding the Queues view
Understanding the SMTP log
Understanding basic SMTP commands and responses
Using NSLookup to verify recipient system address
Using Telnet to verify SMTP connectivity
Let's Troubleshoot (putting it all together) !!