Various Trouble Shooting Notes

By admin, January 11, 2014 7:38 pm

Yesterday was like a long fight, different parts started to fall apart within a few hours, first it was one of the ESX host, then Equallogic, finally Group Manager and iDrac problem, people say Shxt happens, this fits exactly to my case!

One thing I’m glad that Dell fulfilled it’s promise this time and fixed everything within the 4 hours pro-support contract (hardware wise of course), poor guy has to go to NOC twice and worked till almost mid-night with me working remotely.

So please let me list them accordingly:

1. ESX Host:
I suddenly received a host fail alert, vCenter shows the problem host got disconnected, all the VMs on it also went grey out. Funny thing is all VMs can be still pingable and function perfectly normal as if there is nothing wrong.

Telnet/SSH Even Console hung completely, there was no way to login using root, openmanage doesn’t load. Later I found out a 15K 146GB disk failed in a RAID1 configuration from iDrac system log.

Worst enough, the replaced disk did not start to rebuild. Later Dell’s technician went into Megaraid BIOS utility and found out he has to manually add back the disk. I suspect the problem is due to the replaced disk is a Fujisu where as the faulty disk is a Hitachi, that’s why they don’t work together initially. (they should in theory, but in reality NO)

At this stage, since there is no way to remove the live VM or do a vMotion, I have no choice but to power down the host manually. Even more strange, HA didn’t kick in, all the VM did not restart on other hosts in the cluster even after 5 mins.

The whole rebuild took about 15 minutes, thanks to RAID1. The rebuild status in Openmange shows it’s always 33% while the disk light stopped blinking (meaning completed), funny! After reboot again, the optimal status can be verified in Megaraid BIOS, also reflects in Openmanage later, so this means Openmanage takes time to fetch the status from different hardware parts.

So I still have no clue why the faulty disk in a RAID1 caused the ESX host to be non-responsive.

2. Equallogic:

I received the following notice multiple times via Email, Group Manager shows it’s Information type and it’s in Green, I’ve sensed there must be something wrong, so I called Dell EQL support, as expected, the local support knows nothing about it.

—————————————–
INFO event from storage array eql01
subsystem: SP
event: 14.2.22
time: Fri Jan 10 12:10:30 2014

I/Os containing bad blocks were read from drive 10 and successfully reconstructed in the last 8 minutes.
—————————————–

After approximately 6 hours, the following faulty alert confirmed my previous worry.

—————————————–
ERROR event from storage array eql01
subsystem: SP
event: 14.4.22
time: Fri Jan 10 20:11:15 2014

Disk drive 10 failed in RAID LUN 0.
—————————————–

So the previous notice is actually EQL’s Predictive Failure in Action!!!

SANHQ also generated the similar alert.

Warning conditions:

  • 1/10/2014 8:10:50 PM to 1/10/2014 8:12:50 PM
    • Warning: Member eql01 RAID Set Is Degraded
      • Warning: Member eql01 RAID set is degraded because a disk drive failed or was removed.
    • Warning: Member eql01 RAID More Spares Expected
      • Warning: Member eql01 The current RAID configuration requires more spare drives then are currently available.
    • Warning: Member eql01 has a failed drive in slot 10

With the replacement disk, reconstruction immediately took place, and the process took about 1 hour to complete, again, thanks to RAID1.

3. EQL Group Manager

As I need to verify if the replaced EQL disk has successfully changed to a hot spare, then I found out I can no longer login to EQL Group Manager due to some strange Java error, no matter if it’s IE or Firefox. The Java version is v7 u45, then I’ve tried different versions until I figured out only v7 u17 worked. My conclusion is EQL firmware plays a big role in this case, as I am still using v5.2.2, so EQL probably hard coded the requirement into their application, anyway, Java JRE verion always produces nasty problem in my environment one way another, so I’ve decided not to upgrade it for sure.

4. iDrac

Back to the Disconnected Host with faulty disk, I found I can no longer login to iDrac Web UI, IE works but producing all sorts of problem, not to mention the console doesn’t show up at all with its ActiveX stuff. I’ve even tried to removed the iDRAC cert from advanced option, reboot the managed machine, won’t help at all, and it turns out a simple Content Cache Clear in Firefox solved the problem completely! Ridiculous Really!

If it still doesn’t work, do a soft rest by “racadm racreset soft”

5. Veeam

Yes, it’s not finished yet, I also found Veeam’s schedule job stopped working as I am still using V5.0.1, there is a Veeam KB and an update (v5.0.2) for this issue, but I can’t explain why it’s been working for 3+ years and suddenly stopped working with no reason, so I’ve removed all the old backup and created a New Full Backup, truth will tell by tomorrow morning and I shall verify the Schedule Job again by then.

Update: I have to install the update in order to solve the schedule job doesn’t run problem. Also do remember to close all the extra TPC/UDP ports that’s been re-enabled by the upgrade of Veeam B&R program. (Potential Risk: Veeam Agent, NFS and Windows Shares in particular)

Updated:

Restarting the management agents on ESX may help:

  1. Log in to your ESX Server as root (by su -) from either an SSH session or directly from the console of the server.
  2. Type “service mgmt-vmware restart”.
    Caution: Ensure Automatic Startup/Shutdown of virtual machines is disabled before running this command or you risk rebooting the virtual machines.
  3. Press Enter.
  4. Type “service vmware-vpxa restart”.
  5. Press Enter.
  6. Type “logout” and press Enter to disconnect from the ESX host.

Autoart: 授權許可絕非小事一樁 (轉文)

By admin, January 11, 2014 5:35 pm

AA這篇最新的通訊講的全是事實。我也相信這種密集式的手工製作工業產品價格應該不會向下調,所以說早買早著,最近BBR/HPI見無利可圖早已宣佈退出了合金車模市場。

另外我也同意尤其是那些高端的精密車模產品肯定會不停地隨着時間而增值,因為以後根本沒廠商再肯投資去生產這種利潤微薄的商品。這個現象可以從國際拍賣網站近年來屢創新高的成交價體現出來,世界各地的車模收藏家們用他們的實際行動投下了信任的一票。

唯一不解的是為何Exoto多年來都沒版權,但仍然可以屹立不倒﹖

汽車模型的製作,常常牽涉到授權的問題。模型製造商在生產汽車模型的時候,需要向汽車製造商支付版權費。在1990年以前,知識產權未獲廣泛重視,模型商只要向汽車製造商要求准許製作有關的車款,那時候,版權這個內容很少被触及到。汽車製造商樂於看見這些汽車模型在市場上銷售時,可以同時推廣真正汽車的形象。他們尤其熱衷於玩具汽車模型,皆因這些產品能夠起到教育下一代的作用,吸引小孩子長大成人之後,會成為汽車的真正買家。當年草擬的汽車模型製作許可協議的內容,往往只有寥寥數頁,僅僅描述了汽車製造商提出的基本條件而已。

到了九十年代中期,汽車市場的競爭日見熾熱。新車售价只能往下調,以迎合競爭。汽車製造商在各個製造環節中,努力節省成本之餘,更力求開拓增加收入的途徑。從九十年代中期開始,製作汽車模型所需支付的版權費,漸漸被汽車製造商确認為一個額外收入的來源。

在那個年代,中國工人的工資成本非常低,非熟練工人每月的工資只有大約40美元;這令汽車模型得以价廉物美地製作,一輛一比十八的壓鑄合金汽車模型,零售价低於20美元,可以動輒賣出數十万部之多。汽車製造商意識到,當模型汽車開發商能夠售出大量汽車模型的時候,他們是時候出手了。就是從那時候開始,申請模型製作授權時,不再是簡單的許可憑證,收取多少比率的版權費,成為了重點。

汽車製造商初時只是要求一個細數目的百分比,作為象徵式的版權費,這不會對模型汽車整体製造成本帶來負擔。可是,隨著玩具界的巨人,Martell公司開始玩金錢遊戲,向「法拉利」一擲數百万美元,取得「法拉利」壓鑄合金模型汽車獨家製作授權之後,其他汽車製造商察覺并相信模型汽車製作及銷售是「盆滿砵滿」的生意,紛紛提高模型製作的版權費比率。在千禧年代中期,有些薄有名气的跑車品牌,汽車製造商要求的版權費比率,已經進入兩位數字!有些例子,情況更坏,汽車製造商甚至要求先行支付六位數字的訂金,才能确保取得許可,去製作汽車模型。

許多輪胎和汽油公司,一向支付巨額金錢去贊助賽車,在車身張貼它們產品的標誌或色彩,以達到宣傳推廣的目的。製作這些賽車模型時,這些公司樂於看見它們的色彩及標誌出現在模型車身上,因為可以同樣起到宣傳效果。但是一些公司卻意識到這些標誌原來可以衍生收入的時候,授權許可及版權費,馬上又成為了製作賽車模型的另一個問題。這還不止,好些籌備賽車活動的机构,也趁机加入要求索取版權費的行列,因為每輛參加比賽的賽車,都貼有主辦者的標籤或大會貼紙。

一輛模型賽車,一旦牽涉到電影人物和帶有公司的標貼,出現在特別的賽道上,這就需要支付更多的版權費用了。這种多樣化的版權費,分別支付給汽車製造商、電影人物、汽油公司標誌、輪胎公司標誌,与及主辦机构的標籤及貼紙。細算下來,這個複合的版權費比率,往往超過百分之二十之數。

當中國工人的最低工資在2006年坐上火箭升空之後,模型汽車製作再不是一門盈利的生意。製造跑車模型是耗資甚鉅的工序,它往往需要大量的人手上色加工,若再加上這額外的版權費成本,只會令模型製造商卻步,製造跑車模型將變得無利可圖。近年越來越少的高質素跑車模型在市場推出,這些產品變得越來越昂貴,甚少收藏者愿意支付購買。

雖然中國工人的工資成本正坐上火箭升空,但汽車製造商要求的版權費比率仍在上升的趨勢。汽車授權部門常常受到管理層的壓力,以爭取更多的收入,他們往往在授權協議中,添加更嚴厲的條文,和更嚴格的控制,令整個協議變得更加複雜,動輒厚達二十至三十頁。協議的條文彷彿是一面倒的,只有獲授權的一方會動輒得咎。授權机构,即汽車製造商,可以常態地查核模型製造商的賬目,以找出任何不符合授權協議的小節,而這些小節往往可以取得巨額的賠償金。對於某些汽車製造商而言,透過這些授權,查核獲授權方的賬目,已成為一個額外收入的來源了。也正因為這些核數行為,模型製造商都成為了受害的一方,他們無法在苛刻的條文之下,再承諾製造汽車模型下去,除非這是一個确保有利可圖的項目。如此一來,以往汽車製造商及模型製造商构建出來的商業關係,因為這种核數常態,而變得彼此冷漠,彼此再不触及未來新的開發項目了。

另外,在授權合約內容內,汽車製造商要求的產品製作審批程序,越來越加複雜,并且涉獵到細小的環節,這包括包裝設計、標誌的位置、大小及字型;產品推廣的物料和在線設計等等,都需要逐一審批。待确認無誤之後,才可以進入一個工序。這還不止,授權一方要求在產品包括盒上面張貼全息鐳射標貼,已越加普遍了。所有這些複雜的要求,往往會延長模型汽車的製作耗時,同樣地帶動了製作成本的上升。

產品保險金是另一個窒礙模型汽車製造的問題。每一個授權合約,需要提交約一千万美元的產品保證,以保障授權一方。如果產品銷售市場包括有美國的話,這個保險費用,動輒是上万美元。和玩具產品可能會傷害小孩子不同,一輛精品模型汽車的銷售對象是成年人,絕不會對收藏者帶來傷害。可是,在沒有理由之下,也要支付這些保險金的時候,只會令模型車的成本一直高企。

近期模型汽車的零售价一直往上調整,主要是來自中國工人的薪金上升所致,從昔日每月四十元美金,增長至現在的四百美元。中國政府的目標,是在未來五年內,再替工人的工資翻兩番。授權許可的事宜,只會令模型汽車變得越來越价格高昂,導致模型汽車製作行業在未來的經營,愈加困難。

MC論壇darkangelfbi會員的回應:

其实这些所谓的高端模型,说白了就系一些高级玩具吧了。在行外人的眼里等同于给儿童玩的玩具车。行业成本的增加是真实的。但除了楼主上面方法外,更简单一些是:通过提高销售量去降低成本。一款模型的开模成本起码要卖2000台以上才能保本吧。当然啦,高姿态之后不肯走回头路也是可以理解。但一个合金的模型,长时间放在包装盒里就会出现油漆变异。虽然,这点autoart的质量还是可以,起码可以挨过2—3年。反观kyosho的产品3年后就不能直视了。一个企业无论生产出来的产品有多么好,总有10%—20%的产品是不好卖的。企业的盈利应该是长期稳定的才有利于企业本身的生存和发展。打个比方,就我自己来说,现在200+台模型,其实我要求的工艺同做工不需要你达到什么CMC,EXOTO的水平。你就保持在03年那个时候就可以了。最近autoart的官方微博都在推荐自己的产品用了如何那样多的零件,工艺是多么复杂。用在下不思进取的见解,做这么多也是白搭。每个人玩模型的出发点都不同。但最基本的一点是对汽车的热爱,或者具体地对某一款车的热爱。简单来说,你选对了车型去出模型车,就算是龟车都一样有人买。不信你可以生产一下EVO1—5代,就出龟车,保证比什么86好卖。

就我本人而言,买新的模型回去就这么往柜子一放就完事了。虽现在玩模型的人多,但每日回去玩那么30分钟以上的有多少个呢?到最后的最终结果就是用于静态展示。其次,1:18的模型的确很占空间收藏。好多人收藏几十台后就会转方向了。只能通过不断吸引新的爱好者才能继续下去。之前autoart用过手表来做比喻,我现在也比喻一下:就好像普通的工薪阶层想买块手表戴戴,一般买款精工就算了,你非要把手表的档次一下提升到哥丝丹顿的档次,那么就这能不戴手表了。