Among the more common questions I get from my users concerns Non-Delivery Reports (NDRs) they receive for messages they didn't send. Some users are understandably upset because many of the subject lines would lead you to question their character.
In addition, I see lots of questions from messaging administrators concerned that their environment is an open relay because either they are receiving a lot of NDRs from outside systems, or their own queues are clogged with NDRs to other SMTP domains.
Recently I posted this response at MSExchange.org to an Exchange administrator searching for a reasonable explanation for the flood of inbound NDRs his users were receiving.
===============
Consider this scenario:
1. I am a "Secret Creator of Unwanted Messages" (a.k.a. SCUM)
2. There is an SMTP domain named company.com
3. The address no.one@company.com does not exist
4. Your SMTP domain is unsuspecting-user.com
5. You have a user with the address someone@unsuspecting-user.com
6. I send out a message addressed from someone@unsuspecting-user.com and to no.one@company.com
The message goes out from my secret lair in a nearby septic tank out to the Internet.
The message then gets delivered to the server at company.com. That server accepts the message, attempts to find a match for no.one@company.com and discovers there is no such address. Being the RFC-compliant system it is, it dutifully creates and sends out an NDR to the sender. The trouble is, it thinks the sender is someone@unsuspecting-user.com and so sends the NDR there.
===============
This isn't to say that we shouldn't watch for signs that our system has been compromised, but more on that later...
Tuesday, October 23, 2007
Thursday, August 30, 2007
GAL vs. OAB - Why can't I see my AD changes?
When you run Outlook in non-cache mode, or connect using Outlook Web Access (OWA), you access the Global Address List (GAL) directly. Any changes made will appear as soon as your AD forest is replicated.
When you run Outlook in cache mode, you are viewing an offline copy of the GAL called an offline address book (a.k.a. offline address list). By default, Exchange rebuilds the OAB once a day. Also by default, Outlook downloads the OAB once when you launch it - thinking it doesn't need to check more often because of the rebuild schedule.
If you need to force a new entry to show up immediately, you need to do two things. First, manually rebuild the OAB. Open the System Manager (ESM) and expand Recipients then click on Offline Address Lists. In the right pane right click Default Offline Address List and select Rebuild. You should then wait anywhere from 2 to 10 minutes (depending upon how many entries are in your GAL). Second, in Outlook (running in cache mode) go to Tools-->Send/Receive and select Download Address Book. You should then see the new entry.
If you wish to view or change the rebuild schedule, open the ESM and expand Recipients. Open the properties of the particular OAB in question. On the General tab, there is a field for Update interval. By default this is set to run daily - early in the morning. If you wish to have the OAB rebuild more than once a day, you can select "Use custom schedule" and create your schedule.
When you run Outlook in cache mode, you are viewing an offline copy of the GAL called an offline address book (a.k.a. offline address list). By default, Exchange rebuilds the OAB once a day. Also by default, Outlook downloads the OAB once when you launch it - thinking it doesn't need to check more often because of the rebuild schedule.
If you need to force a new entry to show up immediately, you need to do two things. First, manually rebuild the OAB. Open the System Manager (ESM) and expand Recipients then click on Offline Address Lists. In the right pane right click Default Offline Address List and select Rebuild. You should then wait anywhere from 2 to 10 minutes (depending upon how many entries are in your GAL). Second, in Outlook (running in cache mode) go to Tools-->Send/Receive and select Download Address Book. You should then see the new entry.
If you wish to view or change the rebuild schedule, open the ESM and expand Recipients. Open the properties of the particular OAB in question. On the General tab, there is a field for Update interval. By default this is set to run daily - early in the morning. If you wish to have the OAB rebuild more than once a day, you can select "Use custom schedule" and create your schedule.
Saturday, July 14, 2007
Lesson learned - Recovering from corruption
As if the previous lesson learned wasn't enough, it led to another painful lesson.
Our Microsoft PSS engineer had us take a backup of the current mailbox stores that wouldn't mount along with the transactional logs so that we'd have something to fall back on should things not go well. Our mailbox stores are large enough that the process would take several hours. Rather than have new mail bounce he had us create empty databases by configuring the file names different from the original files. This gave the new messages a place to go.
After the backup completed, we restored one mailbox store files back to the previous day (which took a couple of hours). We then went through several rounds of replaying the logs until we figured out where the corruption started. When that process finally completed, we had a restored mailbox store as good as we could get it, plus a newly created mailbox store with a couple of days worth of messages. The final step is to merge the two. This is done by making one of the stores the Recovery Storage Group and merging the new data with the old.
I'll interrupt the story here to say that because of the length of time involved with the process, we ended up doing this with two different groups of people. The first group completed the merge and all appeared well. The second group went through the same process with another mailbox store with a different PSS engineer.
The bad news
When the second group got to the merge, it was taking a very long time. Much longer than it took the first group. We could do nothing but watch the merge wizard slowly process each mailbox. Our mailbox store was large enough that the entire process took over a day to complete. Our users were patient and understanding, but at the same time displeased.
Lesson learned
Some of you may have recognized that what we were doing was performing a Dial-tone Restore. Henrik Walther wrote an excellent set of articles on this subject at MSExchange.org. The first group did exactly what Henrik described, and the second group had left out one important step which would have saved us many hours of frustration. Before performing the merge, you need to swap the mailbox stores so that you are merging the small into the large. The mailbox store was somewhere between 75GB and 100GB and therefore took a very long time!
I strongly urge everyone to read through Henrik's articles to familiarize yourself with the process. You never know when you'll need it!
Our Microsoft PSS engineer had us take a backup of the current mailbox stores that wouldn't mount along with the transactional logs so that we'd have something to fall back on should things not go well. Our mailbox stores are large enough that the process would take several hours. Rather than have new mail bounce he had us create empty databases by configuring the file names different from the original files. This gave the new messages a place to go.
After the backup completed, we restored one mailbox store files back to the previous day (which took a couple of hours). We then went through several rounds of replaying the logs until we figured out where the corruption started. When that process finally completed, we had a restored mailbox store as good as we could get it, plus a newly created mailbox store with a couple of days worth of messages. The final step is to merge the two. This is done by making one of the stores the Recovery Storage Group and merging the new data with the old.
I'll interrupt the story here to say that because of the length of time involved with the process, we ended up doing this with two different groups of people. The first group completed the merge and all appeared well. The second group went through the same process with another mailbox store with a different PSS engineer.
The bad news
When the second group got to the merge, it was taking a very long time. Much longer than it took the first group. We could do nothing but watch the merge wizard slowly process each mailbox. Our mailbox store was large enough that the entire process took over a day to complete. Our users were patient and understanding, but at the same time displeased.
Lesson learned
Some of you may have recognized that what we were doing was performing a Dial-tone Restore. Henrik Walther wrote an excellent set of articles on this subject at MSExchange.org. The first group did exactly what Henrik described, and the second group had left out one important step which would have saved us many hours of frustration. Before performing the merge, you need to swap the mailbox stores so that you are merging the small into the large. The mailbox store was somewhere between 75GB and 100GB and therefore took a very long time!
I strongly urge everyone to read through Henrik's articles to familiarize yourself with the process. You never know when you'll need it!
Friday, July 13, 2007
Addendum - Stopping Exchange services
Andy Grogan wrote an excellent commentary to my previous post, and I wrote in return. I don't know how many people would actually read the comments, and the points made are important enough that I wanted them to be more than a footnote. I left Andy's comments verbatim, and added a couple of notes to mine.
Andy wrote:
"The problem is, how do you know when the store is being written to? or not, I have seen a similar problem before CPU 100 % nothing much responding in terms of Exchange so you naturally try to stop the service. Then you get the dreaded "The Microsoft Information Store Service did not respond to the stop request in a timely fashion" and hang on stopping. This then raises - how long do you wait - and hour, two hours - a day?"
"I have in the past opened up Task Manager and added in the IO, and Bytes counters to see if the store process is writing, but its not fool proof. Personally I would love Microsoft to put some options into ending processes like there are in Unix where you have multiple levels to killing a process - sorry for the ramble - I just don't think there was much you could have done your were in a hole that most of us Exchange folks face at some point - do I pull the plug or don't I?"
The issues Andy raises in his comment are exactly the same questions and issues that haunted me. That's why I've adopted this new strategy which seems to work (at least it alleviates my fears). Dismount the stores first - don't try to do anything with the services. In fact, don't do anything with the services until the stores dismount.
In an emergency, I'd pull the plug on new messages (e.g. block port 25) and cause bounces rather than risk corruption. That will allow the stores to eventually catch up and quiet down. Then dismount the stores, then finally stop the services.
Andy wrote:
"The problem is, how do you know when the store is being written to? or not, I have seen a similar problem before CPU 100 % nothing much responding in terms of Exchange so you naturally try to stop the service. Then you get the dreaded "The Microsoft Information Store Service did not respond to the stop request in a timely fashion" and hang on stopping. This then raises - how long do you wait - and hour, two hours - a day?"
"I have in the past opened up Task Manager and added in the IO, and Bytes counters to see if the store process is writing, but its not fool proof. Personally I would love Microsoft to put some options into ending processes like there are in Unix where you have multiple levels to killing a process - sorry for the ramble - I just don't think there was much you could have done your were in a hole that most of us Exchange folks face at some point - do I pull the plug or don't I?"
The issues Andy raises in his comment are exactly the same questions and issues that haunted me. That's why I've adopted this new strategy which seems to work (at least it alleviates my fears). Dismount the stores first - don't try to do anything with the services. In fact, don't do anything with the services until the stores dismount.
In an emergency, I'd pull the plug on new messages (e.g. block port 25) and cause bounces rather than risk corruption. That will allow the stores to eventually catch up and quiet down. Then dismount the stores, then finally stop the services.
Wednesday, July 11, 2007
Lesson Learned - Stopping Exchange services
I inadvertently corrupted a couple of mailbox stores (like anyone actually causes corruption intentionally) because of a lack of understanding. I hope this story will save someone else some pain.
The story started when an Exchange server's CPU was running at 100% for a long time, causing my monitor to alert us. After some intial troubleshooting, it was decided to restart the Exchange services. The restart process failed, the CPU continued to run at 100%.
We waited an hour, and the status had not changed. Figuring a reboot would resolve any ailments, we did just that. The server took a long time, but finally rebooted after 15 minutes. When the server restarted, some of the mailbox stores did not mount. After spending some time to try to fix things, we put in a call to Microsoft PSS.
The bad news
We discovered that our actions corrupted several mailbox stores (this was on an Enterprise Edition server). In talking to Microsoft, we discovered that a shutdown or restart of the Operating System does not necessarily wait for all services to stop. The Information Store service apparently did not stop completely and after a timed delay, Windows shut itself down. We were told that the corruption happened because the store was actively being written to when the service stopped.
Lesson learned
With a new understanding of the Information Store service, whenever maintenance is performed on our Exchange servers, we always dismount all of the mailbox stores first. This assures that all "in flight" transactions are complete before the service is stopped.
The story started when an Exchange server's CPU was running at 100% for a long time, causing my monitor to alert us. After some intial troubleshooting, it was decided to restart the Exchange services. The restart process failed, the CPU continued to run at 100%.
We waited an hour, and the status had not changed. Figuring a reboot would resolve any ailments, we did just that. The server took a long time, but finally rebooted after 15 minutes. When the server restarted, some of the mailbox stores did not mount. After spending some time to try to fix things, we put in a call to Microsoft PSS.
The bad news
We discovered that our actions corrupted several mailbox stores (this was on an Enterprise Edition server). In talking to Microsoft, we discovered that a shutdown or restart of the Operating System does not necessarily wait for all services to stop. The Information Store service apparently did not stop completely and after a timed delay, Windows shut itself down. We were told that the corruption happened because the store was actively being written to when the service stopped.
Lesson learned
With a new understanding of the Information Store service, whenever maintenance is performed on our Exchange servers, we always dismount all of the mailbox stores first. This assures that all "in flight" transactions are complete before the service is stopped.
Subscribe to:
Posts (Atom)