TsooRaD: 2011-02

2011/02/24

Exchange 2010 SP1 UM & UCMAredist.msp

I ran into a bizarre error today. The client has two full-up DAG servers – all roles on each server, including UM. We noticed that the UM role would not stay running. Checking the DR site revealed that the DR server had the exact same set of errors and the UM service would not stay running. The errors we were seeing are:

(application log, EID 1000, source “application error”)

Faulting application name: UMworkerprocess.exe, version: 14.1.218.11, time stamp: 0x4c5faa82
Faulting module name: Microsoft.Rtc.Internal.Media.dll, version: 3.5.6907.206, time stamp: 0x4c2c21fe
Exception code: 0xc0000005
Fault offset: 0x000000000019ccab
Faulting process id: 0x%9
Faulting application start time: 0x%10
Faulting application path: %11
Faulting module path: %12
Report Id: %13

(application log, EID 1430, source MSExchange Unified Messaging)

The Unified Messaging server shut down process umservice (PID=17732) because a fatal error occurred.

(application log, EID 1038, source MSExchange Unified Messaging

The Microsoft Exchange Unified Messaging service was unable to start. More information: "Microsoft.Exchange.UM.UMService.UMServiceException: The UM worker process exceeded the configured maximum number of consecutive crashes, "5".
   at Microsoft.Exchange.UM.UMService.WorkerProcessManager.RestartWorkerInstance(WorkerInstance workerInstance)
   at Microsoft.Exchange.UM.UMService.WorkerProcessManager.OnWorkerExited(WorkerInstance workerInstance, Boolean resetRequested, Boolean fatalError)
   at Microsoft.Exchange.UM.UMService.WorkerInstance.OnExited(Object sender, EventArgs e)
   at System.Diagnostics.Process.OnExited()
   at System.Diagnostics.Process.RaiseOnExited()
   at System.Threading._ThreadPoolWaitOrTimerCallback.WaitOrTimerCallback_Context_f(Object state)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading._ThreadPoolWaitOrTimerCallback.PerformWaitOrTimerCallback(Object state, Boolean timedOut)"

WFT?!

I could not find anything obvious, and a Google on these items turned up nothing substantive. I double-checked my install as-builts, verified that the proper prerequisite were met (desktop-experience, speechplatformruntime, ucmaruntimesteup). I also checked the remainder of the services to confirm that all other Exchange functions were as expected.

I went down the galgrammargenerator route to rule out corrupt grxml files. I stood up a naked E2010 in a lab, installed the UM role, grabbed the core grxml files and still the UM service would not start and stay running. I then removed the service and reinstalled, which did not help.

What DID work was that I remembered that the OCS to OWA integration required a ucmaredist.msp file…so I tried that. Voila!

Question to ponder: why did the lab server install and function properly without the ucmaredist.msp file? I built the lab server using the same sequence and files as the prod server so as to duplicate the error.

2011/02/23

Lync 2010 DNS and certificate fun

Maybe I am dense, and maybe I just did not know before this. Or maybe I forgot. No matter.

The scenario is multi-label domains inside of a single forest. And using one of them to log in, and having the SIP domain be something other than that login.

So, we have a forest/domain of domain.org, and another name on the same domain as second.com. We want the FQDN of the server (an SE in this case) to be on the certificate, but we also need to have the second domain listed as well; otherwise the client gets confused. We will ignore the other SAN entries in hopes of keeping this simple.

Our certificate looks like this:

CN= fqdn.domain.org, SAN=fqdn.domain.org,sip.second.com

We put the (we thought) proper DNS records in place: SRV and A records in each zone. Except that we had the second.com zone point the SRV to the A record in the zone for domain.org. Our logic was imperfect in expecting the client to accept the certificate because of the re-direct would end up with a cert that had a name that rooted in the SIP domain (sip.second.com). Ooops. Cert errors. You can click through them, or accept them forever, but that is clunky.

The proper method is to have a full _sipinternaltls._tcp.domain.org SRV record pointing to an A record for each affected domain, SIP or server. In our case the SIP domain SRV record pointed at the fqdn.domain.org server. While technically the names matched, root certs were in AD and chained properly, the client did not like not having a clean track. After all, the client started the session looking at username@second.com and got pointed to fqdn.domain.org. The bottom line is that the client expected to show up at a server that was, or at least reported itself as, somethingorother.second.com.

Simple?

2011/02/21

Lync Server 2010 Virtualization White Paper

Microsoft has revised their virtualization guidance for Lync 2010. Download the new white paper here.

Lync BPA

The Lync Server 2010 Best Practices Analyzer is here.

2011/02/15

Lync 2010 documentation

http://technet.microsoft.com/en-us/library/gg398616.aspx has many updated Lync 2010 documents. From the basics to the byzantine. Enjoy.

2011/02/11

Lync not searching GALContacts.db

We all expect the Lync client to just give us the name of the contact right after we type in the first few letters of the contact name. But what happens when it does not? For some strange reason, yesterday, right after an initial deployment of the first pool server, the clients would not search the GAL. We patiently waited for the db file to arrive (overnight – it just worked out that way). Nothing. No galcontacts.db. We created the initial 10 users and ran update-csaddressbook. Nothing.

The really strange part is after a log off/login cycle – with a full client exit, we did get the galcontacts.db file along with the idx, but the client did NOTHING. Then it started just giving me the “A” list – which is the two builtins – Audio Test Service & Announcement Service – as shown…

But no real user. You could not even type in the full SIP URI and get the user. So I removed the db and idx file, logged back in, and then I could use the manual method. Put the db and idx back in place, the situation returned. Odd.

Long story short, there was something wrong with that galcontacts.db and galcontacts.db.idx pair. We forced a new gal download by removing the existing files, closing the client completely, and setting the client to ignore the gal download delay by setting the registry as shown…

reg add HKLM\Software\Policies\Microsoft\Communicator /v GalDownloadInitialDelay /t REG_DWORD /d 0 /f

After getting a new GAL, everything was fine.

2011/02/10

Lync 2010 Visio Stencil

For those folks who are into creating Visio diagrams of the topology, design, and architecture of Lync Server 2010, here is a squeaky new copy of the official vision stencil for Lync 2010.

2011/02/07

Lync 2010 Planning Tool is here

Finally! Woo hoo! Et Cetera! One of the nicest things for your Lync project is now in a final form.

clicketh here for the download

2011/02/04

Greatness

"Greatness lies in your ability to overcome."

Walter Irvin

‘nuff said

TsooRaD

About Me