Sunday, February 21, 2010

Let's Troubleshoot (putting it all together) !!

Note: This is the last of a set of articles about troubleshooting outbound messaging issues. If you need or want to read from the beginning, start here.

Now that you have some knowledge of where you can look and what information you can find, how do you use it to troubleshoot? Let's go back to my original roadmap:

First, know the path a message will take to get to its destination
Second, determine how far along that path it got
Third, figure out why it stopped (or if it stopped)

What path should a message take? By now you should know whether Exchange will send outbound messages to a Smarthost or directly to the recipient system.

How far did it get? Did it reach the Exchange server? Check Message Tracking. If the message doesn't appear, then the client (typically Outlook) never delivered it successfully. Check Outlook - is it still in the Outbox?

Let's say the message shows up in Message Tracking. Did the message leave the Exchange server? In other words, does it report it was transferred through SMTP? If it does, it means Exchange delivered the message to the next hop.

If not, what is the last thing reported by Exchange? Check the Queues. Remember your routing configuration. Does Exchange send all messages to a Smarthost, or does it use DNS?

It's at this point you may want to verify DNS lookups and test communications with Telnet.

If Exchange delivered the message to the next hop, it's time to examine the SMTP log. Checking the SMTP log will show you the communication between your Exchange server and the system to which it wanted to deliver the message.
Did it receive an OK response to the HELO/EHLO command?
Did it receive an OK response to the MAIL FROM command?
Did it receive an OK response to the RCPT TO command?
Did it receive an OK response to the DATA command?
Did it receive an OK response to the QUIT command?

If it passed all the way through the QUIT command, the message is now the responsibility of the system that received it from your Exchange server. If that system is under your administration, check there. If not, your troubleshooting has come to an end. You have verified that your system delivered the message successfully.

While this does not account for many situations, it does take a lot of the mystery out of troubleshooting. You can certainly dig deeper into SMTP and other (non-MS Exchange) systems, but I think this will start you on the way to becoming a troubleshooting guru. With an understanding of some of the concepts, you can read through technical articles and reference materials for more information.

Understanding basic SMTP commands and responses

There are only a few SMTP commands commonly used, and more importantly only a few responses that matter.

SMTP responses
Recipient systems will respond to each SMTP command with a numeric value and optional text. Any value in the 200-299 range is considered to be an "OK" acknowledgement.

HELO / EHLO
This is how the sending system opens an SMTP conversation with a recipient system that acknowledges a TCP 25 communication attempt. HELO is the original SMTP specification, EHLO is an ESMTP command. Parameters after the command are optional, although it should be noted that some recipient systems may attempt to match that against the domain name indicated by performing a reverse-DNS lookup of the sending IP address.

MAIL FROM:
This command displays the reply address of the sender.

RCPT TO:
This command displays the recipient address. Only one address is allowed per command, so messages with multiple recipients will show each separately.

DATA
This command signifies the start of the actual message. That includes what appears in the TO, CC, BCC, and Subject lines of the message. It includes the message body and attachments. None of that information is displayed in the SMTP log. ESMTP sending systems may declare the length of the message.

QUIT
This command requests a termination to the SMTP session.

Understanding the Queues view

For each server listed in the ESM, there is a folder called Queues. It displays the status of all connections the Exchange server is attempting to make. It will show the number of messages waiting to be sent. For any of the individual messages, you can see the sender, recipients, subject, and size. You can also see if a particular message is a Non-Delivery Report (NDR).

Messages that stay in the queue typically indicate a transmission problem. It could be that the destination domain doesn't exist, or that the destination domain is refusing your transmission, or that the destination domain is having a problem receiving messages. In any case, if a message stays in the queue, Exchange will attempt to send it again later.

Understanding the SMTP log

Starting with Exchange 2003, all inter-server communication is by default done via SMTP. This makes the SMTP log a convenient way to see the high-level communication between your Exchange server and other SMTP systems. In particular, the SMTP commands and the responses to those commands.

Depending upon the amount of traffic your Exchange server handles, SMTP logs can get large. There is no automatic purging, so carefully consider where you are storing the log files. It is enabled/disabled on the General tab of the Default SMTP Virtual Server properties.

One of the unfortunate issues with the SMTP log is that there is no thread-organization. In other words, it is not possible to tell which log entry belongs to which thread. The entries are posted in the order received. If multiple threads are running concurrently, the entries will all be mixed together. That said, you can typically figure it out because of the sending address and recipient address.

Understanding Message Tracking

Message Tracking (MT) is a tool that is part the Exchange System Manager (ESM). It reports what a given Exchange server does with particular messages. Note that tracking ends when the message leaves that Exchange server. If the message goes to another Exchange server, you can consult MT on that other server for more information.

MT can be used to determine if a message was delivered to a mailbox, or to another system. If the tracking ends without the message being delivered, you will see what Exchange was doing last. That can give clues as to what the underlying issue is.

MT can be enabled or disabled. Open the properties of the Exchange server. On the General tab there is a checkbox to Enable Message Tracking. There are also settings for log location and retention.