Showing posts with label troubleshooting. Show all posts
Showing posts with label troubleshooting. Show all posts

Sunday, February 21, 2010

Let's Troubleshoot (putting it all together) !!

Note: This is the last of a set of articles about troubleshooting outbound messaging issues. If you need or want to read from the beginning, start here.

Now that you have some knowledge of where you can look and what information you can find, how do you use it to troubleshoot? Let's go back to my original roadmap:

First, know the path a message will take to get to its destination
Second, determine how far along that path it got
Third, figure out why it stopped (or if it stopped)

What path should a message take? By now you should know whether Exchange will send outbound messages to a Smarthost or directly to the recipient system.

How far did it get? Did it reach the Exchange server? Check Message Tracking. If the message doesn't appear, then the client (typically Outlook) never delivered it successfully. Check Outlook - is it still in the Outbox?

Let's say the message shows up in Message Tracking. Did the message leave the Exchange server? In other words, does it report it was transferred through SMTP? If it does, it means Exchange delivered the message to the next hop.

If not, what is the last thing reported by Exchange? Check the Queues. Remember your routing configuration. Does Exchange send all messages to a Smarthost, or does it use DNS?

It's at this point you may want to verify DNS lookups and test communications with Telnet.

If Exchange delivered the message to the next hop, it's time to examine the SMTP log. Checking the SMTP log will show you the communication between your Exchange server and the system to which it wanted to deliver the message.
Did it receive an OK response to the HELO/EHLO command?
Did it receive an OK response to the MAIL FROM command?
Did it receive an OK response to the RCPT TO command?
Did it receive an OK response to the DATA command?
Did it receive an OK response to the QUIT command?

If it passed all the way through the QUIT command, the message is now the responsibility of the system that received it from your Exchange server. If that system is under your administration, check there. If not, your troubleshooting has come to an end. You have verified that your system delivered the message successfully.

While this does not account for many situations, it does take a lot of the mystery out of troubleshooting. You can certainly dig deeper into SMTP and other (non-MS Exchange) systems, but I think this will start you on the way to becoming a troubleshooting guru. With an understanding of some of the concepts, you can read through technical articles and reference materials for more information.

Understanding basic SMTP commands and responses

There are only a few SMTP commands commonly used, and more importantly only a few responses that matter.

SMTP responses
Recipient systems will respond to each SMTP command with a numeric value and optional text. Any value in the 200-299 range is considered to be an "OK" acknowledgement.

HELO / EHLO
This is how the sending system opens an SMTP conversation with a recipient system that acknowledges a TCP 25 communication attempt. HELO is the original SMTP specification, EHLO is an ESMTP command. Parameters after the command are optional, although it should be noted that some recipient systems may attempt to match that against the domain name indicated by performing a reverse-DNS lookup of the sending IP address.

MAIL FROM:
This command displays the reply address of the sender.

RCPT TO:
This command displays the recipient address. Only one address is allowed per command, so messages with multiple recipients will show each separately.

DATA
This command signifies the start of the actual message. That includes what appears in the TO, CC, BCC, and Subject lines of the message. It includes the message body and attachments. None of that information is displayed in the SMTP log. ESMTP sending systems may declare the length of the message.

QUIT
This command requests a termination to the SMTP session.

Understanding the Queues view

For each server listed in the ESM, there is a folder called Queues. It displays the status of all connections the Exchange server is attempting to make. It will show the number of messages waiting to be sent. For any of the individual messages, you can see the sender, recipients, subject, and size. You can also see if a particular message is a Non-Delivery Report (NDR).

Messages that stay in the queue typically indicate a transmission problem. It could be that the destination domain doesn't exist, or that the destination domain is refusing your transmission, or that the destination domain is having a problem receiving messages. In any case, if a message stays in the queue, Exchange will attempt to send it again later.

Understanding the SMTP log

Starting with Exchange 2003, all inter-server communication is by default done via SMTP. This makes the SMTP log a convenient way to see the high-level communication between your Exchange server and other SMTP systems. In particular, the SMTP commands and the responses to those commands.

Depending upon the amount of traffic your Exchange server handles, SMTP logs can get large. There is no automatic purging, so carefully consider where you are storing the log files. It is enabled/disabled on the General tab of the Default SMTP Virtual Server properties.

One of the unfortunate issues with the SMTP log is that there is no thread-organization. In other words, it is not possible to tell which log entry belongs to which thread. The entries are posted in the order received. If multiple threads are running concurrently, the entries will all be mixed together. That said, you can typically figure it out because of the sending address and recipient address.

Understanding Message Tracking

Message Tracking (MT) is a tool that is part the Exchange System Manager (ESM). It reports what a given Exchange server does with particular messages. Note that tracking ends when the message leaves that Exchange server. If the message goes to another Exchange server, you can consult MT on that other server for more information.

MT can be used to determine if a message was delivered to a mailbox, or to another system. If the tracking ends without the message being delivered, you will see what Exchange was doing last. That can give clues as to what the underlying issue is.

MT can be enabled or disabled. Open the properties of the Exchange server. On the General tab there is a checkbox to Enable Message Tracking. There are also settings for log location and retention.

What's a Relay?

A Relay is essentially any system that accepts and forwards mail to SMTP domains for which it does not act authoritatively. That's a mouthful - what does that mean exactly? Let's say you address a message to uemurad@yahoo.com and send it to my corporate server. If my server accepts and forwards the message, my server is a relay. If my server accepts and forwards all messages from everyone, it's an open relay. Open relays are considered bad because Spammers find and utilize them to help disguise the source of the spam. Being an Open Relay is bad because recipient systems will see the spam coming from your system and will quickly degrade your reputation. You can end up on blacklists then have your legitimate messages rejected.

A system can be a Relay without being an Open Relay. A Relay is not a bad thing in itself.

What's a Smarthost?

A Smarthost is essentially a system to which all messages are forwarded, regardless of their ultimate destination. Examples of Smarthosts are AntiVirus appliances, Content filters, Sender/Recipient filters, and Spam filters. In addition, some ISPs and some of their service levels require customers to send all outbound mail through their systems instead of directly to recipients. A Smarthost is responsible for receiving mail and forwarding to the appropriate system. Note that it too may send messages to a Smarthost or directly to a recipient system.

Understanding Logs

Logs are nothing more than a collection of status notes a system makes about a process. They are a way to review what happened during that process to understand what worked and what didn't. Most systems have some sort of logging option. Exchange has a couple of useful logging features that you've probably heard about - Message Tracking and SMTP logging.

Enabling Message Tracking is done in the Exchange System Manager (ESM). This is done at the server level. Open the properties of the server and find the checkbox on the General tab. That configuration also defines the location of the log files and the retention period.

Enabling SMTP logging is also done in the ESM. Although this is configured per server, the configuration is found in the properties of the Default SMTP Virtual Server (expand the server, Protocols, and SMTP).

Understanding your outbound mail flow

Before you can troubleshoot why outbound mail isn't getting to where it supposed to, you first have to understand how it's supposed to get there. One of the keys to this understanding is knowing whether or not Exchange sends messages directly to recipient systems. When an Exchange server gets a message destined for the outside world, there are two main mechanisms that it consults. The Virtual Server (VS) and Connectors.

The Virtual Server setting takes precedence, so look there first. Open the Exchange System Manager (ESM) and expand to the server in question. Further expand Protocols and SMTP. Beneath SMTP there is typically a single entry - the Default SMTP Virtual Server. Open the Properties of it, go to the Delivery tab, and click on the Advanced button. The main thing you are looking for is whether there is an entry in the Smart host field. If there is, it means that all outbound messages will be sent to that FQDN. If not, your Exchange Organization will require one or more SMTP Connectors to route outbound messages.

If the VS is not configured with a Smarthost, Exchange then looks to the Connectors for outbound routing information. The Connector structure can be simple or complex, depending upon the size of your Exchange Organization and how your enterprise wants mail to flow. For this example, let's assume the simplest configuration possible - a single Exchange server environment.

To route outbound messages, your single-server Exchange Organization requires at least one Connector (assuming you did not configure a smarthost via the VS).

Let's take a moment to review the function and difference between Exchange Administrative Groups (AG) and Exchange Routing Groups (RG). AGs allow you to easily configure administrative permissions to a group of servers, regardless of their geographic location or purpose. RGs are specifically for message routing. There is no implied correlation between any particular AG and RG.

Connectors are associated with RGs and are also referred to as SMTP Connectors.

Getting back our discussion on mail flow, in the simplest configuration you'd have a single Connector. Looking at the Properties, on the General tab there is a radio-button selection allowing you to either configure a smarthost or to use DNS.

If smarthost is selected, there should be an IP address in the field (as opposed to the FQDN format used in the VS).
If DNS is selected, it means that Exchange will attempt to send messages directly to the recipient system.

Troubleshooting Outbound Messaging

A couple of years ago I wrote a series of articles on what to do when users say inbound messages aren't getting to them. I am frequently asked for help figuring out why outbound messages seemingly don't get to where they are going.

Troubleshooting outbound mail flow is relatively simple (at least to me, but then again I'm a self-professed Messaging-geek). Whenever I bring up the subject, people look at me like I'm talking about something more (dark) art than science. I think that's only true until you gain an understanding of the science. With that in mind, here's my attempt to explain some of that science and hopefully put you on the path to becoming a troubleshooting wiz.

The easiest way to think about messaging is that it is a point-to-point transmission system. It starts somewhere and wants to go somewhere else. It may require only one tranmission (hop) or several. Each transmission succeeds or fails. Sounds simple so far, right?

Let me put it another way - here's how I look at troubleshooting message flow:

First, you have to know the path a message will take to get to its destination
Second, you have to determine how far along that path it got
Third, you have to figure out why it stopped (or if it stopped)

Sounds simple enough, but as the saying goes the devil is in the details. Towards that end, here are some separate articles about narrowing the focus of your search. I'm not sure the best way to follow along. As you get used to some of these ideas you may have to go back and forth. You may want to jump to the last article in the list and see how far you can get. I'll try to link up the articles as much as possible.


Understanding your outbound mail flow
Understanding Logs
What's a Smarthost?
What's a Relay?
Understanding Message Tracking
Understanding the Queues view
Understanding the SMTP log
Understanding basic SMTP commands and responses
Using NSLookup to verify recipient system address
Using Telnet to verify SMTP connectivity
Let's Troubleshoot (putting it all together) !!

Saturday, June 9, 2007

Using Telnet to simulate server communication

The best place to run Telnet is on the server which sends out your SMTP traffic. This will show you the same information that your SMTP engine receives when communicating with an outside system. Telnet allows you to specify the port through which to communicate. SMTP is defined as TCP port 25.

Open a command prompt window. Determine the FQDN or the IP address. If you need to determine this information, you can use NSLookup if you know the SMTP domain name you are attempting to connect to. For more information about this, read this article.

At the prompt, type telnet fqdn 25

If the receiving server is accepting SMTP communications, it will respond with an acknowledgement message indicating it is ready to receive your transmission. The acknowledgement should also indicate if it understands SMTP or ESMTP.

Type ehlo testdomain.com

There are two established protocols, SMTP and ESMTP (Enhanced SMTP). If the receiving system only understands SMTP, you must begin with helo. If the receiving system understands ESMTP, you may begin with either helo or ehlo.

If you receive an OK message from the receiving mail system, proceed. If not, double check the protocol named in the response to the telnet command.

Type
mail from:exchange.admin@testdomain.com

This indicates the reply address. Some receiving systems will compare the parameter from the ehlo command, and the domain listed in the address on the mail from: command to the domain name returned when performing a reverse DNS (RDNS) lookup on the IP address from which the message is coming. It is a method to combat address spoofing and more reliably identify undesirable senders.

If you are testing communications to an outside messaging system, you may need to use your actual domain name to be allowed to continue.

Type
rcpt to:valid.user@receivingdomain.com

This indicates the recipient address. If receivingdomain.com is not a domain being fielded by the receiving system, and the system does not allow relaying to receivingdomain.com, an error code will be returned.

Type data

This begins the actual message. Optionally, From:, To:, and BCC: can be entered at this time (to be covered in a future article).

Type subject: Test message via Telnet

Type a blank line - this denotes the end of the subject and the beginning of the message body.

Type This is a test
Type Please reply if received
Type a blank line
Type a period (".") and press Enter - this marks the end of the message body. The receiving system will understand and return a prompt.
Type quit

This ends the Telnet session and you will be returned to the OS prompt.

If everything has gone well, the message will be on its way to the recipient address. Give it a minute and check. You now know how to manually create and send an SMTP message! This can be a great troubleshooting tool, as you will receive reponses and acknowledgements from the receiving system that can aid in diagnosing a communication problem.

Friday, June 8, 2007

Using NSLookup to determine an SMTP receiving system

Background
NSLookup is a great tool that comes with Windows that allows you to search DNS for information. &nbspIt is especially useful to troubleshoot particular issues with Exchange. &nbspExchange is reliant upon DNS to know where to send outbound messages. &nbspWhen Exchange has problems getting messages to a particular domain, it's time to open the toolbox.

The best place to run NSLookup is on the server which sends out your SMTP traffic. &nbspThis will show you the same information that your SMTP engine uses when determining where to send mail to a particular domain.

Open a command prompt window
At the prompt, type nslookup
Type the command set type=mx
Type the registered domain name (e.g. domain.com)

You will receive a response similar to:

Non-authoritative answer:
domain.com MX preference = 10, mail exchanger = mail1.domain.com
domain.com MX preference = 20, mail exchanger = mail2.domain.com
domain.com MX preference = 30, mail exchanger = mail3.domain.com

Interpreting the NSLookup results
Your SMTP engine will attempt to use the MX records in ascending order according to their value. &nbspThe name associated with the MX record is what your engine will use. &nbspYou can simulate what the engine does by using the Telnet command. &nbspIn other words, the FQDN associated with the lowest numbered MX value would be the one that your SMTP engine would attempt to connect with.

Using the NSLookup results to test connectivity
In the simulated response shown above, you can test the readiness for receiving SMTP communications by using the Telnet command. &nbspIn a command-prompt window, type telnet mail1.domain.com 25. &nbspIf the system connected to the FQDN is accepting SMTP communications, you’ll receive a response.

Thursday, June 7, 2007

DNS Root Servers

What are the root servers? These are the DNS servers from which all others get information.

Why would anyone care? To verify the information other mail systems see when trying to reach your SMTP domain.

How do I use the information? You can force NSLookup to poll a particular DNS server by using the command:
server ip_address or server fqdn

As an example, open a command prompt window
At the prompt, type nslookup
Type the command set type = mx
Type the command server a.root-servers.net
Type the registered domain name (e.g. domain.com)

You have requested the MX information for domain.com directly from one of the DNS root servers.

Here is IP information for the thirteen (13) root servers.
a.root-servers.net   198.41.0.4
b.root-servers.net   192.228.79.201
c.root-servers.net   192.33.4.12
d.root-servers.net   128.8.10.90
e.root-servers.net   192.203.230.10
f.root-servers.net   192.5.5.241
g.root-servers.net   192.112.36.4
h.root-servers.net   128.63.2.53
i.root-servers.net   192.36.148.17
j.root-servers.net   192.58.128.30
k.root-servers.net   193.0.14.129
l.root-servers.net   198.32.64.12
m.root-servers.net   202.12.27.33

The complete information can be found at the website: http://www.root-servers.org

Sunday, June 3, 2007

Missing messages - Part 4

The message was delivered to the mailbox - where did it go?

This is the most common scenario [I mean, speaking as one Exchange Admin to another - what else could it be? ;) ].  As a personal aside, your goal is to figure out what happened and calmly point it out to the user.  The user will likely feel embarrassed already - no need to editorialize or lecture.

As discussed in Part 1, a successful message delivery typically means one of the following:

-  It reached the mailbox and was segregated or deleted by a system function

-  It reached the mailbox and was segregated or deleted by a client function

-  It reached the mailbox and was segregated or deleted by a user function

-  It reached the mailbox and was manually segregated or deleted

System function typically means forwarding configured in AD

To check for forwarding, open the Users & Computers console (ADUC) and open the properties of the recipient's object.  On the Exchange General tab go to Delivery Options.  Any forwarding configured at the Active Directory level will appear there.

Client functions include anti-virus/anti-spam filtering, and directing new messages to a personal folder

Check the console and logs of any 3rd-party anti-virus and anti-spam software.  Check the Junk E-mail folder in the user's mailbox.

Check all workstations this user logs on to for a profile that directs all new messages to a Personal Folder instead of to the mailbox.

User functions include rules, auto-archiving, and viewing filters

Check for and disable any viewing filters in Outlook (View-->ArrangeBy-->Custom)

Check for auto-archiving (File-->Archive), look in all Personal folders listed in the Outlook profile.  Search for all PST files on the local drive and all mapped drives.

Check for rules (Tools-->Rules and Alerts)

If the ruleset is empty, there is still a possibility that something formerly in rules is still acting on messages.  To make sure, close Outlook, then launch it again from a command line using the /cleanrules switch (e.g. outlook.exe /cleanrules)

If the ruleset is not empty and you wish to keep them, you can export the set to a file then import again later.

Remember that the Out Of Office function can also have rules.  If OOO is enabled, make sure you check that configuration for rules.

Manual processes initiated by the user

Look for and search any PST files in the Outlook profile and on the local drive.

Look in the Deleted Items folder.  Look at the Recover Deleted Items area.

Search the other folders for items which were Shift-Deleted.

Missing messages - Part 3

Message Tracking sees the message, but it was not delivered to the mailbox

If Message Tracking (MT) has a record of the messsage in question, Exchange has received it.  If it does not reach the mailbox, the message is typically:
-  stuck in a queue
-  stuck in a routing loop
-  segregated by anti-virus/anti-spam filtering

You can often gain insight as to what is happening by reading through the audit trail of the message in MT.

Search the local and inter-server queues on all your servers.  If found, try manually releasing it and see what happens.

Check the logs of any 3rd-party anti-virus and anti-spam software.

Missing messages - Part 2

Message Tracking does not find the message I was expecting, where could it be?

This situation calls for additional sleuthing.  You need to understand your messaging environment and all the systems a message passes through on its way to the Exchange server.  Identify each and check any available logs.  Configuration and operation of routers, firewalls, mail gateways, even managed layer-3 switches can have an effect on inbound mail.


How widespread is this issue?  Does it affect all inbound messages, a significant number of inbound messages, or a small number of inbound messages?  Look for any consistencies (sending domain, sending address, receiving address).


Test inbound routing by sending yourself a message from an outside mail system (e.g. Yahoo, Hotmail, gMail).  Test by sending the affected user a message from that same outside system.


This scenario can get very complicated and vary greatly from environment to environment because you are dealing with any number of different devices and configurations.  Take it systematically, start at the outside and work your way in. Test each step, raise logging levels if necessary.

Missing messages - Part 1

How to start

I have been approached many times by users claiming that they never received a particular Email message. So where does one start looking? Most of the following scenarios have happened to me. The rest are follow-up thoughts of my own.

I start by asking my user some questions:
- What was the sending address?
- Approximately what time was the message sent?
- Are you seeing other messages arrive?
- If necessary may I open your mailbox to investigate?

I could also ask if the sender received a "bounce" message (a.k.a. Non-Delivery Report, a.k.a. NDR), but that tends to take extra coordinative effort. It's easier to assume that the sender is fine and to search for issues in the environment you can control (i.e. your own). Prove your own system sound before trying to look for causes outside. Show that you want to solve problems and not look for someone to blame.

Armed with this information, let's consider some possibilities:
1. It never reached our systems
2. It reached our systems but did not reach Exchange
3. It reached Exchange but was not delivered to the recipient's mailbox
4. It reached the recipient's mailbox but does not appear in the client software

The list is sorted according to message flow, but that does not mean you have to investigate in the same order.

The first question I ask myself is, does Exchange think it was delivered to the recipient's mailbox? Most of the time I find that the message did in fact reach the recipient's mailbox and something was done to it either automatically or manually.

Use Message Tracking (MT) to confirm whether the message was delivered. Use the information obtained from the user as the search parameters. If MT finds the message (regardless of outcome), rule out #1 and #2. If MT reports "Message delivered locally to store", it reached the recipient ruling out #3.

At this point, let's break the investigation into three parts.

If you cannot find the message in MT, continue with Part 2, Message Tracking does not find the message I was expecting, where could it be?

If MT finds the message, but reports something other than "delivered locally", continue with Part 3, Message Tracking sees the message, but it was not delivered to the mailbox

If MT does not find the message, continue with Part 4, The message was delivered to the mailbox - where did it go?