Friday, November 29, 2013

Saturday, October 26, 2013

ESXi 5.5 Hypervisor Enhancements

Hot-Pluggable SSD PCI Express Device: Users are able to hot-add or hot-remove an SSD device while a vSphere host is running, and the underlying storage stack detects the operation.
Support for Reliable Memory Technology:  To provide reliability and great uptime for ESXi Hypervisor, vSphere 5.5 supports a new feature called Reliable Memory, ESXi Hypervisor runs directly in memory, an error in it can potentially crash it and the VMs running on the host.
A CPU hardware feature through which a region of memory is reported from the hardware to vSphere ESXi Hypervisor as being more “reliable”. This information is then used to optimize the placement of the VMkernel and other critical components such as the initial thread (initd), hostd and watchdog process and helps guard against memory errors.

Enhancements for CPU C-States:  vSphere 5.1 and earlier, uses the power management policy called balanced, the balanced policy used only the performance state (P-state), which kept the processor running at a lower frequency and voltage.
In vSphere 5.5, the Balance policy also uses deep processor power state (C-state), providing additional power savings. Another potential benefit of reduced power consumption is with inherent increased performance, with all C-states enabled, turn Turbo mode on to get the maximum power and performance benefit. vSphere 5.5 uses the USB auto-suspend mode to automatically put idle USB hubs in a lower power state.
c-Power

    

Wednesday, July 10, 2013

VM Disk provisioning policies

When you perform certain VM management operations, such as creating a virtual disk, cloning a VM to a template, or migrating a VM, you can specify a provisioning policy for the virtual disk file.
NFS datastores with Hardware Acceleration and VMFS datastores support the following disk provisioning policies. On NFS datastores that do not support Hardware Acceleration, only thin format is available.
You can use Storage vMotion to transform virtual disks from one format to another.
Thick Provision Lazy Zeroed :  Creates a virtual disk in a thick format. Space required for the virtual disk is allocated when the virtual disk is created. Data remaining on the physical device is not erased during creation, but is zeroed out on demand at a later time on first write from the VM.
Thick Provision Eager Zeroed: A type of thick virtual disk that supports clustering features such as Fault Tolerance. Space required for the virtual disk is allocated at creation time. The data remaining on the physical device is zeroed out when the virtual disk is created. It takes longer time to create disks in this format than to create other types of disks.
Thin Provision: Use this format to save storage space. Thin disk, provision as much datastore space as the disk would require based on the value that you enter for the disk size. However, the thin disk starts small and at first, uses only as much datastore space as the disk needs for its initial operations. By implementing thin provisioned disks, you are able to over-allocate storage. If storage is over allocated, thin virtual disks can grow to fill an entire datastore if left unchecked.
In order for a guest operating system to make use of a virtual disk, the guest operating system must first partition and format the disk to a file system it can recognize. Depending on the type of format selected within the guest operating system, the format may cause the thin provisioned disk to grow to a full size.
example, if you present a thin provisioned disk to a Microsoft Windows operating system and format the disk, unless you explicitly select the Quick Format option, the Microsoft Windows format tool writes information to all of the sectors on the disk, which in turn inflates the thin provisioned disk.
Thanks to VMware, Information is from the white paper provided by VMware.
    

Beacon probing

Beacon probing is a network failover detection mechanism that sends out and listens for beacon probes on all NICs in the team and uses this information along with link status to determine link failure. Beacon probing detects failures, such as cable pulls and physical switch power failures on the immediate physical switch and also on the downstream switches.
ESXi periodically (in every 10 seconds) broadcasts beacon packets (approx. of 62 bytes) from all uplinks in a team. The physical switch is expected to forward all packets to other ports on the same broadcast domain. Therefore, a team member is expected to see beacon packets from other team members. If an uplink fails to receive three consecutive beacon packets, it is marked as bad. The failure can be due to the immediate link or a downstream link.

Note: Beaconing is most useful with three or more uplinks in a team because ESXi can detect failures of a single uplink. When there are only two NICs in service and one of them loses connectivity, it is unclear which NIC needs to be taken out of service because both do not receive beacons and as a result all packets sent to both uplinks. Using at least three NICs in such a team allows for n-2 failures where n is the number of NICs in the team before reaching an ambiguous situation.
*Do not use beacon probing with IP HASH load balancing
You must enable beacon probing when downstream link failures may impact availability and there is no Link State Tracking on the physical switch.
Thanks to VMware, Information is from the white paper provided by VMware. 
    

vCenter Inventory Service

vCenter Inventory Service Stores vCenter Server application and inventory data, enabling you to search inventory objects across linked vCenter Servers. If the vCenter Inventory Service Database is corrupted or inoperable, you can reset it. You should also reset the vCenter Inventory Database if you reset the vCenter Server database.
Procedure to reset the vCenter Inventory Service database:
  1. Stop the vCenter Inventory Service.
    1. From the Windows Start menu, select Administrative Tools > Services.
    2. Right-click vCenter Inventory Service and select Stop.
  2. Open a command prompt:
    Delete the entire contents of the Inventory_Service_Directory/data directory.For example, if you installed vCenter Inventory Service in the default location, run this command.
    cd /Program Files/VMware/Infrastructure/Inventory Service/data
  3. Change directory to Inventory_Service_directory/scriptsFor example, if you installed vCenter Inventory Service in the default location, run this command.
    cd /Program Files/VMware/Infrastructure/Inventory Service/scripts
  4. Run the createDB.bat command, with no arguments, to reset the vCenter Inventory Service database.
  5. Run the register.bat command to update the stored configuration information of the Inventory Service
    register.bat current_vCenter_Server_fully_qualified_domain_namevCenter_Server_HTTPS_port 
    For example, if the vCenter Server fully qualified domain name is machinename.corp.com and the HTTPS port is 443, run this command.
    register.bat machinename.domain.local 443
  6. Restart the vCenter Inventory Service.
    From the Windows Start menu, select Administrative Tools > Services.
    Right-click vCenter Inventory Service and select Start.
Thanks to VMware, Information is from the white paper provided by VMware.
    

N-Port ID Virtualization (NPIV)

N-Port ID Virtualization (NPIV) enables a single Fibre Channel HBA port to register several worldwide port names (WWPNs) with the fabric. Each address appears as a unique entity on the Fibre Channel fabric. Each of which can be assigned to an individual VM. When VMs do not have WWN assignments, they access storage LUNs with the WWNs of their host’s physical HBAs. By using NPIV, a SAN administrator can monitor and route storage access on per VM basis.
When a virtual machine has a WWN assigned to it, the virtual machine’s configuration file (.vmx) is updated to include a WWN pair (a World Wide Port Name, WWPN, and a World Wide Node Name, WWNN). When that VM is powered on, the VMkernel instantiates a virtual port (VPORT) on the physical HBA which is used to access the LUN. The VPORT is a virtual HBA that appears to the FC fabric as a physical HBA, that is, it has its own unique identifier, the WWN pair that was assigned to the VM. Each VPORT is specific to the VM, and the VPORT is destroyed on the host and it no longer appears to the FC fabric when the VM is powered off. When a VM is migrated from one host to another, the VPORT is closed on the first host and opened on the destination host.
NPIV in the SAN
NPIV Advantages:
The ESXi leverages NPIV to assign individual WWNs to each VM, so that each VM can be recognized as a specific end point in the fabric. The benefits of this approach are as follows:
  • Granular security: Access to specific storage LUNs can be restricted to specific VMs using the VM WWN for zoning, in the same way that they can be restricted to specific physical servers.
  • Easier monitoring and troubleshooting: The same monitoring and troubleshooting tools used with physical servers can now be used with VMs, since the WWN and the fabric address that these tools rely on to track frames are now uniquely associated to a VM.
  • Flexible provisioning and upgrade: Since zoning and other services are no longer tied to the physical WWN “hard-wired” to the HBA, it is easier to replace an HBA. You do not have to reconfigure the SAN storage, because the new server can be pre-provisioned independently of the physical HBA WWN.
  • Workload mobility: The virtual WWN associated with each VM follows the VM when it is migrated across physical servers. No SAN reconfiguration is necessary when the work load is relocated to a new server.
  • Applications identified in the SAN: Since virtualized applications tend to be run on a dedicated VM, the WWN of the VM now identifies the application to the SAN.
  • Quality of Service (QoS): Since each VM can be uniquely identified, QoS settings can be extended from the SAN to VMs.
Requirements for Using NPIV
  • NPIV can be used only for VMs with RDM disks. VMs with regular virtual disks use the WWNs of the host’s physical HBAs.
  •  HBAs on your host must support NPIV.
  • Use HBAs of the same type, either all Brocade or all QLogic or all Emulex. VMware does not support heterogeneous HBAs on the same host accessing the same LUNs.
  • If a host uses multiple physical HBAs as paths to the storage, zone all physical paths to the VM. This is required to support multipathing even though only one path at a time will be active.
  • Make sure that physical HBAs on the host have access to all LUNs that are to be accessed by NPIV-enabled VMs running on that host.
  • The switches in the fabric must be NPIV-aware.
  • When configuring a LUN for NPIV access at the storage level, make sure that the NPIV LUN number and NPIV target ID match the physical LUN and Target ID.
  • Use the vSphere Client to manipulate virtual machines with WWNs.
NPIV Capabilities and Limitations
ESXi with NPIV supports the following items:
  • NPIV supports vMotion. When you use vMotion to migrate a VM it retains the assigned WWN.
    If you migrate an NPIV-enabled virtual machine to a host that does not support NPIV, VMkernel reverts to using a physical HBA to route the I/O.
  • If your FC SAN environment supports concurrent I/O on the disks from an active-active array, the concurrent I/O to two different NPIV ports is also supported.
When you use ESXi with NPIV, the following limitations apply:
  • Because the NPIV technology is an extension to the FC protocol, it requires an FC switch and does not work on the direct attached FC disks.
  • When you clone a virtual machine or template with a WWN assigned to it, the clones do not retain the WWN.
  • NPIV does not support Storage vMotion.
  • Disabling and then re-enabling the NPIV capability on an FC switch while VMs are running can cause an FC link to fail and I/O to stop.
Thanks to VMware, Information is from the white paper provided by VMware.
    

Sysprep Location on vCenter

Microsoft Sysprep tool is used to do the customization of Windows guest operating systems, Sysprep tool is useful when you clone VM or when you deploy VM(s) from the template, So to automate the customization of windows guest operating systems copy the correct version of Sysprep for each guest operating system in vCenter.
Microsoft includes the system tool set on the installation CD-ROM discs for Windows 2000, Windows XP, Windows 2003 (Copy all files after extracting  “\Support\Tools\Deploy.cab” file from your CD). Sysprep tool is built in to Windows Vista, Windows7 and Windows 2008 operating system (so you don’t need to copy for them).
Location to store the Sysprep files on your vCenter will depend on Operating System on vCenter service is installed and running:
For Windows 2003 operating system (64bit) path will be “C:\Document And Settings\All Users\ApplicationData\VMware\VMware VirtualCenter\sysprep\<OS-Name>” (<OS-Name> will be the name of individual operating system).
Copy these files by extracting “\System\Tools\Deploy.Cab” from Windows Product CD-ROM
For Windows 2008 operating system (64bit) path will be “C:\ProgramData\VMware\VMware VirtualCenter\sysprep\<OS-Name>”.
Location to copy sysprep files on Windows 2008 based vCenter Server
For Appliance based vCenter path will be “/etc/vmware-vpx/sysprep/<OS-Name>”. (You can use tool like Winscpto copy, paste file to and from your local computer from and to linux based computer using graphical interface).

Using Winscp to Copy Sysprep files onto Appliance Based vCenter Server
The Password for the local administrator account on the VM need to clone or Template is set to blank (“”).
    

vSphere DPM

vSphere DPM is a cluster power management feature. vSphere DPM continuously monitors resource requirements and power consumption across a DRS cluster. When the cluster needs fewer resources, it consolidates workloads and powers off unused ESXi hosts to reduce power consumption. When resource requirements of workloads increase, vSphere DPM powers hosts on for VMs to use.
Create a DRS cluster functioning in Fully Automation Mode
vSphere DPM can use one of three Power management protocols to bring a host out of standby mode:
  • Intelligent Platform management Intergace(IPMI)
  • Hewlett-Packard Integrated Lights-Out (iLO)
  • Wake on LAN (WOL)
Hosts powered off by vSphere DPM are marked by vCenter Server as being in standby mode. This indicates that the hosts are available to be powered on whenever they are needed.
vSphere DPM operates by awakening ESXi hosts from a powered-off state through WOL packets. These packets are sent over the vMotion networking integrate by another host in the cluster, so vSphere DPM keeps at least one host powered on at all times. (Manually test “exit standby” for the host with vSphere Client).
vSphere DPM Operation
vSphere DPM algorithm does not frequently power servers on and off, it powers off a server only when it is very likely that it will stay powered off for some time. It does this by cluster workload history.
When vSphere HA admission control is disabled, failover resource constraints are not passed on to DRS and DPM. DPM places the hosts in standby mode, even if doing so violates failover requirements.
vSphere DPM powers-off the host when the cluster load is low, it considers a 40 minute load history and all VMs on the selected host are migrated to other hosts before putting the host in standby mode.
vSphere DPM powers on a host when the cluster load is high, for this DPM considers a 5 minute load history and then WOL packet is sent to the selected host, which boots up, then DRS does the load balancing and some VMs are migrated to this host.
vSphere DPM evaluates CPU and memory resource use of each host and aims to keep each host’s resource use in the range of 45–81 percent (63 percent +/- 18 percent).ESXi hosts cannot automatically be brought out of standby mode unless vCenter Server is running in the cluster.
The power management automation levels are:
  • Off: disables the DPM feature.
  • Manual sets vSphere DPM to recommendations for host power operation; these recommendations are displayed on the cluster’s DRS tab in the vSphere Client.
  • Automatic sets vSphere DPM to execute host power operations if all VMs migration can be automatic from DRS.
Select Power Management and Specify Automation Level
Priority ratings are based on the amount of overutilization or underutilization found in the DRS cluster and the improvement that is expected from the intended host power state change. A priority 1 (Conservative) recommendation is mandatory. A priority 5 (Aggressive) recommendation brings only slight improvement.
vSphere DPM Set for Fully Automatic with priority level 5 “Apply all Recommendation”
When you enable vSphere DPM, hosts in the DRS cluster inherit the power management automation level of the cluster by default. You can override this default for an individual host so that its automation level differs from that of the cluster.

Thanks to VMware, Information is from the white paper provided by VMware.
    

Memory Compression

With Memory Compression, ESXi stores pages, which would otherwise be swapped out to disk through host swapping, in a compression cache located in the main memory. Memory compression overtakes host swapping because access to the compressed page only need a page decompression, which is faster than the disk access, which involves a disk I/O. ESXi determines if a page can be compressed by checking the compression ratio for the page. Memory compression occurs when the page’s compression ratio is greater than 50%. Otherwise, the page is swapped out.
Only pages that would otherwise be swapped out to disk are chosen as candidates for memory compression. This means ESXi will not proactively compress guest pages when host swapping is not necessary; so we can say that memory compression does not affect workload performance when host memory is undercommitted. With memory compression, a swap candidate page (4K) is compressed and stored using 2k of space in a per-virtual machine compression cache.
Note: For more efficient usage of the compression cache, if a page’s compression ratio is larger than 75%, ESXi will store the compressed page using a 1KB quarter-page space.
Managing Per-VM Compression Cache
Compression cache is accounted for by the VM’s guest memory usage, a large compression cache may waste VM memory and unnecessarily create host memory pressure especially  when most compressed pages would not be used in future. The compression cache size starts with zero when host memory is undercommitted and grows when the VM starts to be swapped out.
If the compression cache is full, One compressed page must be replaced for a new compressed page. The page which has not been accessed for the longest time will be decompressed and swapped out. ESXi will not swap out compressed pages.
The maximum compression cache size is important for maintaining good VM performance.  The default maximum compression cache size (Mem.MemZipMaxPct) is set to 10% of configured VM memory size.
Configure Maximum Compression Cache Size in vSphere Client.

Thanks to VMware, Information is from the white paper provided by VMware.
    

Understanding Snapshot

Understanding Snapshot

Snapshot in VMware allows you to preserver the state and data of virtual machine at the time snapshot is taken. When you want to test some new application, update and you don’t know the effect of the new installation on your current state of virtual machine take the snapshot to preserve the current state and then do the installation (changes), so even if nothing work; you will have a proper working condition VM.
Snapshots to manage VMs
Snapshots preserve the state and data of a Virtual Machine at the time you take the snapshot. Snapshots are useful when you want to revert repeatedly to the same virtual machine state. Snapshots are useful as a short term solution for testing software with unknown or potentially harmful effects. For example, you can use a snapshot as a restoration point during a linear process, such as installing update packages, or during a branching process, such as installing different versions of a program. Using snapshots ensures that each installation begins from an identical baseline.
A snapshot preserves the following information:
  • Virtual machine settings: The virtual machine directory, which includes disks that were added or changed after you took the snapshot.
  • Power state: The virtual machine can be powered on, powered off, or suspended.
  • Disk state: State of all the virtual machine’s virtual disks.
  • Memory state (Optional): The contents of the virtual machine’s memory.
The relationship between snapshots is like that of a parent to a child. In the linear process, each snapshot has one parent snapshot and one child snapshot, except for the last snapshot, which has no child snapshots. Each parent snapshot can have more than one child. You can revert to the current parent snapshot or restore any parent or child snapshot in the snapshot tree and create more snapshots from that snapshot. Each time you restore a snapshot and take another snapshot, a branch, or child snapshot, is created.
A .vmsd file contains the virtual machine’s snapshot information and is the primary source of information for the Snapshot Manager. This file contains line entries, which define the relationships between snapshots and between child disks for each snapshot.
Taking a snapshot preserves the disk state at a specific time by creating a series of delta disks for each attached virtual disk or virtual RDM and optionally preserves the memory and power state by creating a memory file. Taking a snapshot creates a snapshot object in the Snapshot Manager that represents the virtual machine state and settings.
Each snapshot creates an additional delta .vmdk disk file. When you take a snapshot, the snapshot mechanism prevents the guest operating system from writing to the base .vmdk file and instead directs all writes to the delta disk file. The delta disk represents the difference between the current state of the virtual disk and the state that existed at the time that you took the previous snapshot. If more than one snapshot exists, delta disks can represent the difference between each snapshot. Delta disk files can expand quickly and become as large as the entire virtual disk if the guest operating system writes to every block of the virtual disk.
A Take Snapshot operation creates .vmdk, -delta.vmdk and .vmsn files. By default, the first and all delta disks are stored with the base .vmdk file. The .vmsn files are stored in the virtual machine directory.

Snapshot Limitations
Snapshots can affect virtual machine performance and do not support some disk types or virtual machines configured with bus sharing. Snapshots are useful as short-term solutions for capturing point-in-time virtual machine states and are not appropriate for long-term virtual machine backups.
  • VMware does not support snapshots of raw disks, RDM physical mode disks, or guest operating systems that use an iSCSI initiator in the guest.
  • Virtual machines with independent disks must be powered off before you take a snapshot. Snapshots of powered-on or suspended virtual machines with independent disks are not supported.
  • Snapshots are not supported with PCI vSphere Direct Path I/O devices.
  • VMware does not support snapshots of virtual machines configured for bus sharing. If you require bus sharing, consider running backup software in your guest operating system as an alternative solution. If your virtual machine currently has snapshots that prevent you from configuring bus sharing, delete (consolidate) the snapshots.
  • Snapshots provide a point-in-time image of the disk that backup solutions can use, but Snapshots are not meant to be a robust method of backup and recovery. If the files containing a virtual machine are lost, its snapshot files are also lost. Also, large numbers of snapshots are difficult to manage, consume large amounts of disk space, and are not protected in the case of hardwarefailure.
    Backup solutions, such as VMware Data Recovery, use the snapshot mechanism to freeze the state of the virtual machine. The Data Recovery backup method has additional capabilities that mitigate the limitations of snapshots.
  • Snapshots can negatively affect the performance of a virtual machine. Performance degradation is based on how long the snapshot or snapshot tree is in place, the depth of the tree, and how much the virtual machine and its guest operating system have changed from the time you took the snapshot. Also, you might see a delay in the amount of time it takes the virtual machine to power-on. Do not run production virtual machines from snapshots on a permanent basis.
Thanks to VMware, Information is from the white paper provided by VMware.
    

vNetwork Load Balancing Policies

NIC Teaming
When we connect multiple physical Ethernet adapters (VMNICs) to a single virtual switch is called Teaming. A team can share the load of traffic between Physical and virtual networks among some or all of its members and provide passive failover in hardware failure or network outage.
Load Balancing
Load balancing spread network traffic from virtual machines on a virtual switch across two or more physical Ethernet adapters when we have NIC teaming.
Load Balancing Policies (Is Only for Outbound Traffic):
Route based on originating virtual port ID: This is the default configuration policy; in this policy Virtual ports of the vSwitch are associated with the physical Ethernet adapter.  The physical NIC is determined by the ID of the virtual port to which this virtual machine is connected. Traffic from a given virtual Ethernet adapter is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team. This method is simple and fast and does not require the VMkernel to examine the frame for any information. This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters.
Replies are received on the same physical adapter as the physical switch learns the port association.
A VM cannot use more than one physical Ethernet adapter at any given time unless it has multiple virtual adapters.
Route based on source MAC hash:  In this policy, each VM’s outbound traffic is mapped to a specific physical NIC that is based on the virtual NIC’s MAC address. This method has low overhead and is compatible with all switches, but it might not spread traffic evenly across the physical NICs.
Choose an uplink based on a hash of the source Ethernet MAC address (VM virtual NIC’s MAC Hash). Traffic from a given virtual NIC is consistently sent to the same physical adapter unless there is a failover to another adapter in the NIC team. This setting provides an even distribution of traffic if the number of virtual Ethernet adapters is greater than the number of physical adapters.
Replies are received on the same physical adapter as the physical switch learns the port association.
A given virtual machine cannot use more than one physical Ethernet adapter at any given time unless it uses multiple source MAC addresses for traffic it sends.
Route based on IP hash: In this policy, a physical NIC for each outbound packet (packet send from VM) is chosen based on its source and destination IP address. Selection of uplink will be based on a hash of the source and destination IP addresses of each packet. (For non-IP packets, whatever is at those offsets is used to compute the hash.) This method has higher CPU overhead but a better distribution of traffic across physical NICs associated with vSwitch.
You can use link aggregation; grouping multiple physical adapters to create a fast network pipe for a single virtual adapter in a virtual machine. All adapters in the NIC team must be attached to the same physical switch or an appropriate set of stacked physical switches. 802.3ad link aggregation support or EtherChannel must be supported on the physical switch. The Link Aggregation Control Protocol is a method to control the bundling of several physical ports to form a single logical channel. (LACP is part of the IEEE 802.3ad specification.) EtherChannel and IEEE 802.3ad standards are similar and accomplish the same goal. EtherChannel is a port trunking technology used primarily on Cisco switches. This technology allows grouping several physical Ethernet links to create one logical Ethernet link for providing fault tolerance and high-speed links between switches, routers, and servers. When one VM communicates to different Physical Machines, it chooses different physical NICs. On the return traffic, the packet can come in on multiple paths because more than two NICs might be teamed. Thus, link aggregation must be supported on the physical switch.
A single virtual NIC of a virtual machine might use the bandwidth of multiple physical adapters associated with vSwitch.
The physical switch sees the client MAC address on multiple ports. There is no way to predict which physical Ethernet adapter will receive inbound traffic. So the link aggregation must be supported on the physical switch.
Thanks to VMware, Information is from the white paper provided by VMware.
    

Software FCoE Adapter

Like Software base iSCSI initiator, VMware had introduce Software FCoE in vSphere 5.0, To use software FCoE initiator, we must have a NIC that supports the FCoE offloads (Intel X520), if we have a NIC that support FCoE offloads, we can add software FCoE initiator.
  1. Create a VMkernel port, when prompted for VLAN ID. (Note the VLAN ID for FCoE traffic). Assign an IP address to VMkernel port. Associate the physical NIC that is physically connected to SAN topology.
  2. Add software FCoE initiator and select the appropriate physical uplink.
Now we will have Software FCoE adapter in the list of Storage Adapters
For software FCoE adapter, we will see a node WWN and a port WWN listed. We will use these values in creating the zoning or masking. Then the LUNs from the SAN will be visible to our ESXi server.
    

How VM Access Data on a SAN storage

ESXi provides host-level storage virtualization, which abstracts the physical storage layer from VMs.
A VM uses a virtual disk to store its operating system, program files, and other data. To access virtual disks, a VM uses virtual SCSI controllers (BusLogic Parallel, LSI Logic Parallel, LSI Logic SAS, and VMware Paravirtual).To a VM each virtual disk appears as if it were a SCSI drive connected to a SCSI controller. Whether the actual physical disk device is being accessed through parallel SCSI, iSCSI, network, or Fibre Channel adapters on the host is transparent to the guest operating system and to applications running on the VM.
When guest operating systems issue SCSI commands to their virtual disks, the SCSI virtualization layer translates these commands to VMFS file operations.
1.    When the guest operating system in a VM reads or writes to SCSI disk, it issues SCSI commands to the virtual disk.
2.   Device drivers in the VM’s operating system communicate with the virtual SCSI controllers.
3.    The virtual SCSI Controller forwards the command to the VMkernel.
4.    The VMkernel performs the following tasks.
a.    Locates the file in the VMFS datastore, corresponding to VM disk.
b.    Maps the requests for the blocks on the virtual disk to blocks on the appropriate physical device.
c.     Sends the modified I/O request from the device driver in the VMkernel to the physical HBA (FC HBA, iSCSI initiator).
5.    The physical HBA (FC HBA) performs the following tasks.
a.    Packages the I/O request according to the rules of the FC protocol.
b.    Transmits the request to the SAN.
5.   If the iSCSI initiator is a hardware iSCSI adapter (independent or dependent), the adapter performs the following tasks.
a.    Encapsulates I/O requests into iSCSI Protocol Data Units (PDUs).
b.   Encapsulates iSCSI PDUs into TCP/IP packets.
c.    Sends IP packets over Ethernet to the iSCSI storage system.
5.   If the iSCSI initiator is a software iSCSI adapter, the following take place.
a.    The iSCSI initiator encapsulates I/O requests into iSCSI Protocol Data Units (PDUs).
b.   The initiator sends iSCSI PDUs through TCP/IP packets.
c.    The VMkernel TCP/IP stack relays TCP/IP packet to a physical NIC.
d.    The physical NIC sends IP packets over Ethernet to the iSCSI storage system.
6.    Depending on HBA port (FC, iSCSI), SAN switches receive the request and routes it to the storage device that the host wants to access.
Thanks to VMware, Information is from the white paper provided by VMware.
    

Storage vMotion

Storage vMotion leverages the same technology that is used for vMotion but applies to migration of virtual disk files. Storage vMotion allows VMware to implement a new patented load balancing technique for virtual machines based on storage usage and load. Storage vMotion can also be performed individual virtual machines. Storage vMotion is storage-type independent and works across NFS datastores as well as across VMFS datastores on Fibre Channel, iSCSI, and local SCSI storage.
The Process:
The Storage vMotion process is fairly straight forward and not as complex as one might expect.
  1. The virtual machine working directory is copied by VPXA to the destination datastore.
  2.  A “shadow” virtual machine is started on the destination datastore using the copied files. The “shadow” virtual machine idles, waiting for the copying of the virtual machine disk file(s) to complete.  (A new vpx process get started in the same host)
  3. Storage vMotion enables the Storage vMotion Mirror driver to mirror writes of already copied blocks to the destination.
  4. In a single pass, a copy of the virtual machine disk file(s) is completed to the target datastore while mirroring I/O.
  5. Storage vMotion invokes a Fast Suspend and Resume of the virtual machine (similar to vMotion) to transfer the running virtual machine over to the idling shadow virtual machine.
  6. After the Fast Suspend and Resume completes, the old home directory and VM disk files are deleted from the source datastore.
Note: shadow VM is only created in the case that the VM home directory is moved. If and when it is a “disks-only Storage vMotion, the VM will simply be suspend/resume.
 
Mirror Driver
Mirror driver; it mirrored the I/O. when a VM that is being Storage vMotioned writes to disk, the write will be committed to both the source and the destination disk. The write will only be acknowledged to the VM when both the source and the destination have acknowledged the write. Because of this, it is unnecessary to do re-iterative copies.
Datamover
The hypervisor uses a component called the datamover for copying data and when provisioning new virtual machines. The datamover was first introduced with ESX 3.0 and is utilized by Storage vMotion as blocks will need to be copied between datastores.
  • fsdm – This is the legacy 3.0 datamover which is the most basic version and the slowest as the data moves all the way up the stack and down again.
  • fs3dm – This datamover was introduced with vSphere 4.0 and contained some substantial optimizations so that data does not travel through all stacks.
  • fs3dm – hardware offload – This is the VAAI hardware offload full copy that is leveraged and was introduced with vSphere 4.1. Maximum performance and minimal host CPU/Memory overhead.
In ESXi if VMFS volume with a different block size or a different array is selected as the destination, ESXi reverts to the legacy datamover (fsdm). If same block sizes are used, the new datamover (fs3dm) will be utilized. Depending on the capabilities of the array, the task will be performed via the software stack or offloaded through the use of VAAI to the array.
Thanks to VMware, Information is from the white paper provided by VMware.