trouble shooting apachecloudstack
TRANSCRIPT
Troubleshooting CloudStack
Likitha Shetty , Rajesh Battala & Sailaja Mada
Tuesday, April 11, 2023
AgendaACS developer
• ACS Error codes• Debugging tips in ACS development • SSVM troubleshooting• ACS ports
ACS Cloud Admin• Install, Configuration & Deployment• Log analysis• Important Global Config Parameters • Best Practices• Cloud Database• Reusing Hypervisors
References
Q & A
Troubleshooting CloudStack
ACS Developer
Troubleshooting CloudStack
ACS error codes- Client error codes
- public static final int MALFORMED_PARAMETER_ERROR = 430;- public static final int PARAM_ERROR = 431;- public static final int UNSUPPORTED_ACTION_ERROR = 432;- public static final int PAGE_LIMIT_EXCEED = 433;
- Server error codes- public static final int INTERNAL_ERROR = 530;- public static final int ACCOUNT_ERROR = 531;- public static final int ACCOUNT_RESOURCE_LIMIT_ERROR= 532;- public static final int INSUFFICIENT_CAPACITY_ERROR = 533;- public static final int RESOURCE_UNAVAILABLE_ERROR = 534;- public static final int RESOURCE_ALLOCATION_ERROR = 534;- public static final int RESOURCE_IN_USE_ERROR = 536;- public static final int NETWORK_RULE_CONFLICT_ERROR = 537
Troubleshooting CloudStack
Debugging tips in CS development
- Generally use eclipse to attach debugger to the management server- SystemVM agents
- stop cloud service - add -Xdebug -
Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8787 to /usr/local/cloud/systemvm/_run.sh
- open port 8787- start the java process - ./run.sh
- Usage- To check if events are being logged in check usage_events in
cloud DB- To start usage server in dev setup
mvn -pl usage -Drun -Dpid=$$
Troubleshooting CloudStack
SSVM troubleshooting
- Login- ssh -i /root/.ssh/id_rsa.cloud -p 3922 root@ip where ip is link
local on XenServer and private ip in case of VMware- Script to check the health of SSVM
- /usr/local/cloud/systemvm/ssvm-check.sh- Check if port 8250 is open- In global configuration value of ‘host’ is right set to the management
server ip- Check agent status – service cloud status- Logs can be found at
- /var/log/cloud/cloud.log- Template status can be found in template_store_ref DB table
Troubleshooting CloudStack
And a couple more …
- DB EncryptionTo decrypt the database secret key use the following
java -classpath /usr/share/java/cloud-jasypt-1.8.jar org.jasypt.intf.cli.JasyptPBEStringDecryptionCLI decrypt.sh input=<encryptedValue> password=<secretKey> verbose=false
(where secretKey is the value in /etc/cloudstack/management/key file)
- GUI timeout- Default timeout is 15 minutes- To increase the timeout edit
/usr/share/cloud/management/webapps/client/WEB-INF/web.xml to add<session-config>
<session-timeout>60</session-timeout></session-config>
- Restart the server
Troubleshooting CloudStack
ACS Ports- Management Server
- 8080: Primary GUI / Authentication API Port- 8096: User/Client Management Server (unauthenticated)- 8787: CloudStack (Tomcat) debug socket- 9090: Cloudstack Management Cluster Interface
- SystemVM Agent - 3922: SystemVM to Management (secure)- 8250: SystemVM to Management (unsecure)
- MySQL Server- 3306: MySQL Server
- Hypervisor- 22/443: XenServer- 22: KVM- 443: vCenter
- 7080: AWS API server
Troubleshooting CloudStack
ACS Administrator
Troubleshooting CloudStack
ACS Administrator
Troubleshooting CloudStack
Install, Configuration & Deployment
Logs
Global Config Parameters
Cloud database
Reusing of Hypervisors
Best Practices
Install ,Configuration & Deployment Issues
Troubleshooting CloudStack
? Failed to login to ACS Management server
4.2 requires Min 2 GB RAM Redeploy DB and start cloudstack-setup-management
? Issue with Instances in isolated network
VLAN Trunking in Switch port configuration
? Failed to deploy instances
Insufficient resources : Management server log analysis
Install ,Configuration & Deployment Issues
? Failed to add host XCP host – Copy Echo plugin Host License Compatible host while creating the cluster of hosts
? Host/Storage pool in avoid set Reachability issues Timeout Capacity of the storage pool / Host Alert state
? Move XS hosts from Alert state Restart Cloud Management service which does reconnection to hosts Unmanage the cluster with the affected host. Clear the host tags of the affected host.
xe host-param-clear param-name=tags uuid=<UUID of affected host> Manage the cluster with the affected host.
Troubleshooting CloudStack
Install ,Configuration & Deployment Issues
? Host in Alert State
Monitor Host Root Disk usage
? SSVM access in VMWARE
VMWARE do not have link local IP. We need to access SSVM from Management server (CCP server) using private ip address .
ssh -i /var/cloudstack/management/.ssh/id_rsa -p 3922 root@<Private IP>
Troubleshooting CloudStack
Logs Management Server logs
- /var/log/cloudstack/managementserver.log
- /var/log/cloudstack/api.log
SSVM - /var/log/cloud/cloud.out
KVM cloudstak Agent - /var/log/cloudstack/agent/agent.log
vSphere logs
- /var/log/hostd.log (host log)
- /var/log/vmkernel.log (kernel log)
- /var/log/vpxa.log (agent log) Xenserver logs
- /var/log/Smlog
-/var/log/xensource.log
/etc/cloudstack/management/log4j-cloud.xml - Set the priority to TRACE
Levels - FATAL, ERROR, WARNING, INFO, DEBUG, TRACE
Troubleshooting CloudStack
Global Config Parameters
Troubleshooting CloudStack
expunge.delayDetermines how long (in seconds) to wait before actually expunging destroyed vm. The default value = the default value of expunge.interval
60
expunge.interval The interval (in seconds) to wait before running the expunge thread. 60
expunge.workers Number of workers performing expunge 1
network.gc.interval Seconds to wait before checking for networks to shutdown 600
network.gc.wait Time (in seconds) to wait before shutting down a network that's not in used 600
pool.storage.allocated.capacity.disablethresholdPercentage (as a value between 0 and 1) of allocated storage utilization above which allocators will disable using the pool for low allocated storage available.
1
secstorage.allowed.internal.sitesComma separated list of cidrs internal to the datacenter that can host template download servers, please note 0.0.0.0 is not a valid site
wait Time in seconds to wait for control commands to return 1800
vmware.vcenter.session.timeout VMware client timeout in seconds 12000integration.api.port Defaul API port 8096
storage.cleanup.interval The interval (in seconds) to wait before running the storage cleanup thread. 86400
Cloud Database op_dc_vnet_alloc op_dc_ip_address_alloc user_ip_address image_store vm_template Template_store_ref volume storage_pool host vm_instance nics network_offering physical_network_traffic_types
Troubleshooting CloudStack
Reusing Hypervisors Xenserver • xe vm-uninstall --multiple –force• Unmount Storage • xe vif-unplug uuid=<uuid>• xe vif-destroy uuid=<uuid>• xe network-destroy uuid=<cloud link Local uuid>• sh /opt/xensource/bin/cloud-clean-vlan.sh • Disable cloud tags created on host
Vmware• Delete all instances • Delete Templates • Un mount Datastores • Remove all cloud networks
Troubleshooting CloudStack
Best Practises Switch port configurations ( VLANs must be trunked).
Restrict the IP addresses which can access storage to avoid data loss .
Monitor host disk space .
All hosts must be 64-bit and must support HVM (Intel-VT or AMD-V enabled). All Hosts within a Cluster must be homogeneous.
The volumes used for Primary and Secondary storage should be accessible from Management Server and the hypervisors. These volumes should allow root users to read/write data. These volumes must be for the exclusive use of CloudStack and should not contain any data
All resources used for CloudStack must be used for CloudStack only. CloudStack cannot shares instance of ESXi or storage with other management consoles. Do not share the same storage volumes that will be used by CloudStack with a different set of ESXi servers that are not managed by CloudStack
The Management Servers communicate with the XenServers on ports 22 (ssh) and 80 (HTTP).
The Management Servers communicate with VMware vCenter servers on port 443 (HTTPs).
The Management Servers communicate with the KVM servers on port 22 (ssh).
Troubleshooting CloudStack
References https://
cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM%2C+templates%2C+Secondary+storage+troubleshooting
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Ports+used+by+CloudStack
http://dlafferty.blogspot.in/2013/08/using-cloudstacks-log-files-xenserver.html
Troubleshooting CloudStack
Troubleshooting CloudStack
Get Involved
Web: http://cloudstack.apache.org/
Mailing Lists: cloudstack.apache.org/mailing-lists.html
IRC: irc.freenode.net: 6667 #cloudstack
Twitter: @cloudstack
LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859
If it didn’t happen on the mailing list, it didn’t happen.
Troubleshooting CloudStack