Showing posts with label Snapshot. Show all posts

Convert a VM snapshot to Memory Dump



Imagine we come across a very critical Virtual Machine hosted on VMware platform which is hung at a particular stage. We see that either the machine has freezed at a screen or has been hung at Blue screen of death. What options do we have rather than hard rebooting the machine to bring back the primary functionality of the Virtual Machine online but we often being asked the question as to why the machine got to where it was and how can we avoid it from happening again.

Yes, we all know if we have configured Crash Dump or Minidump settings on the guest OS, we would definitely be able to analyze the dump to understand the state of the Virtual Machine at that stage. However if we do not have Crash Dump Collection Enabled on any machine or if you feel the Pagefile is not configured enough to capture a crash dump or if you realise that the space where the Dump has to be created is not sufficient then we would not get the desired Crash dump for analysis. Well in that case, before rebooting the server, we can take a Virtual Machine snapshot.

Yes, This snapshot can be converted into a Memory Dump which can be then analyzed using various debugger tools like Windebug etc.

1. Download the vmss2core.exe tool
2. Copy it to one of your Windows Server having sufficient free space 
3. Copy the Snapshot File [.vmss] from the datastore, where the VM is located, to the same location where vmss2core.exe is residing
4. Run the utility to convert the snapshot to dump as shown below

vmss2core -W VM_Snapshot_Filename.vmss

5. This will convert the Snapshot File to Memory Dump that we can use to analyse the cause of Server Hung 


What if vCenter Server\Appliance is Down?



The best thing about Vmware platform is the centralized management of all the resources using vCenter Server. Using vSphere or Web Client we all connect to vCenter Server to administer the virtual datacenter. Imagine if there was no vCenter Server and we had to connect to each ESXi host manually and manage the VMs. Yes, it sounds like a tedious task.
vCenter Server also provides multiple features like DRS, sRDS, vMotion, HA, FT etc. Today we are talking about a scenario where the vCenter Server or Appliance goes down and what are its impact on each of these functionalitiess. Let's check the impact of each of these functionalities below

Management :
Managing the environment won't have much impact as we can still connect to each ESXi host via SSH or vSphere client and manage the servers. It is not easy but there is no impact to the environment

Virtual Machines & ESXi Hosts:
There is no dependency of vCenter server on functionality or uptime of any other Virtual Server or ESXi Host. The Hosts can still be connected via SSH or vSphere Client and all the Virtual machines are still working

Distributed Resource Scheduling:
DRS works with vCenter Server to balance the resources and Virtual Machines across ESXi Hosts using DRS Clusters, the DRS functionality will fail if the vCenter server is down

vMotion\svMotion:
vMotion and svMotion features are spanned across hosts and since it is a feature is based on DRS, Both vMotion and svMotion will fail if vCenter server is down

High Availability:
HA will have medium impact as the Hosts\Clusters configured with HA enabled will have the HA running even if vCenter Server is down. However we would not be able to change any settings like Admission Control Policies while the vCenter Server is down.

Fault Tolerance:
FT will also work in case of all Virtual Machines which are configured before vCenter Server went down. No Changes can be done once the vCenter server is down.

Distributed Switch:
Distributed Switch would still continue to work even after vCenter Service is down. It will still connect to the Network it is configured on.

VM Snapshots:
There would be no issues in taking the snapshots of a Virtual Machine. We need to connect to ESXi host and take the VM snapshot.

Virtual Update Manager:
Since Virtual Update Manager is a vCenter plugin, the functionality of VUM will fail while the vCenter is down.

Note:
We have tested these features and impacts mentioned above only on vCenter Server 5.x version only. 

What are VSS Writers and How to Troubleshoot Error States



Every Windows Administrator come across Backup Issues related to File Level Backup. We often see these issues are fixed mostly by reboot [as we all know Reboot fixes most of the issues] but it is hard to get the required Application\Server downtime to fix these issues. Also requesting the Server\Application owners for a reboot every now and then causes a lot of problem when it is caused on a single server more often.

We see most of the Backups are failed as one of the VSS writer is in Error\Failed or Waiting for Completion state. Reboot does fix these VSS writers and hence fixing the Backup Failure Issue.

What are these VSS Writers?
VSS Writers are Application Specific components designed by Microsoft [which is acronym of Volume Shadow Copy Service]. These Writers are compatible with various applications which helps in taking a complete snapshot of the data even though there are Ongoing Input\Output Transactions. This makes sure that there is no incomplete data collected. In the process if there are any transactions affecting the snapshot process the VSS writers may go into Error state hence causing Backup Failures. In this case most Administrators recommend rebooting the servers to fix this issue but there is a better way to decrease the downtime and fix the issue with the VSS writers by bringing them in stable state.

I have listed down few VSS writers and their associated Windows Services which can terminate the snapshot process and bring back the Writers in Stable state. Simply restart the below service if any VSS writer is in Error \ Failed or Waiting for Completion State.

VSS Writer Name Service Name Service Display Name
ASR Writer VSS Volume Shadow Copy
BITS Writer BITS Background Intelligent Transfer Service
Certificate Authority CertSvc Active Directory Certificate Services
COM+ REGDB Writer VSS Volume Shadow Copy
DFS Replication service writer DFSR DFS Replication
DHCP Jet Writer DHCPServer DHCP Server
FRS Writer NtFrs File Replication
FSRM writer srmsvc File Server Resource Manager
IIS Config Writer AppHostSvc Application Host Helper Service
IIS Metabase Writer IISADMIN IIS Admin Service
Microsoft Exchange Replica Writer MSExchangeRepl Microsoft Exchange Replication Service
Microsoft Exchange Writer MSExchangeIS Microsoft Exchange Information Store
Microsoft Hyper-V VSS Writer vmms Hyper-V Virtual Machine Management
MSMQ Writer MSMQ Message Queuing
MSSearch Service Writer WSearch Windows Search
NPS VSS Writer EventSystem COM+ Event System
NTDS NTDS Active Directory Domain Services
Registry Writer VSS Volume Shadow Copy
Shadow Copy Optimization Writer VSS Volume Shadow Copy
SMS Writer SMS_SITE_VSS_WRITER SMS_SITE_VSS_WRITER
SqlServerWriter SQLWriter SQL Server VSS Writer
System Writer CryptSvc Cryptographic Services
TermServLicensing TermServLicensing Remote Desktop Licensing
WMI Writer Winmgmt Windows Management Instrumentation


Force Mount Snapshot LUNs on ESXi


Force Mounting Snapshot LUNS on VMWARE ESXi Hosts


We all know that the recommended way to implement the Virtual Server Environment is to have a Clustered ESXi hosts together with Clustered Datastores which is presented to all the hosts in the same ESXi cluster. We ideally add the storage to one of the Host on the Cluster which then is automatically presented\reflected under all the hosts in the cluster but there are cases where some hosts do not see the Storage disk\share automatically.

In the above case, an Administrator usually tries to scan the Storage and HBAs or try to add the Storage LUN from Add Storage Console and select the Disk by identifying the NAA or WWN ID which is shared by Storage Team to see if it helps to add the presented disk to the Host and it helps in most cases. 

As mentioned, the above steps do help the Administrators in adding the disks in most cases but there are times when we don't see the disk on the Host. This is because the Host consider the LUN as a Snapshot Disk instead and won't add it automatically. In order to add the LUN manually, we need to follow the below steps:

1. Connect to the Host using SSH Putty Session
2. Run the command to list visible Volumes : esxcfg-volume -l 
3. Once the Volumes are listed above: Make a note of the Volume UUID as we would need it to manually mount it.
4. Run the following command to mount the volume: esxcfg-volume -m <UUID> 
Here the UUID is replaced with UUID which we received in point#2

Note: Please see below screenshot (highlighted part shows the Storage Disk UUID and Label which are separated by "/" character. While adding the disk, use only the UUID as shown in the last line of the screenshot)

By running the above command we would be able to add the LUN to the Host successfully and we would see the change in vCenter Client within few minutes