antsle Forum

Welcome to our antsle community! This forum is to connect all antsle users to post experiences, make user-generated content available for the entire community and more. Please note: If you have specific questions about your antsle and expect a response from our team directly, please continue to use the appropriate channels (email: support@antsle.com) so every inquiry is tracked. 

You need to log in to create posts and topics.

Corrupted ZPOOL -- Missing Antlets -- Data Gone

I'm having a SERIOUS issue with my Anstle.   Please help if possible.

Logged into antMan to do a soft reboot of the machine after one of my PHP scripts in an Ubuntu LXC container wasn't sending emails through Gmail.

When the machine came back up, I notice I have no Antlets listed in antMan.   Call tech support and was given a list of commands to try.

  • zpool list
  • zpool status
  • zpool import -a
  • virsh list --all
  • virsh -c lxc:/// list --all
  • virsh start ANTLET_NAME
  • virsh -c lxc:/// start ANTLET_NAME
zpool status initially shows pool hdd as degraded with one of the harddrives listed as unavailable (screenshot attached).
I notice the box is fairly warm so I unplug the machine for a few minutes and restart it.
zpool status shows the hard drive recognized by the os but now also indicates that the pool has been resilverd (I'm sure this is bad).
My primary work antlet container TBAAPPSRV and all associated data is gone with no trace.  The only thing left was an old test container I had made called TBADEVSRV (which will not start through antMan-- attached) and a snapshot I had made of it after a few configurations changes.
I cloned a TEST antlet from the non-working TBADEVSERV snapshot and it does start-up albeit with snapshot / old mysql persistence data.
At this point is there any hope of recovering my missing antlets or their container data (mysql specifically)?...or is there anywhere that the container data is stored where it might be extracted?   I will pay for premium support if neccessary but I really need those files.
Thanks in advance,
-Justin-
Uploaded files:
  • 2019-05-10-19_23_01-https___docs.oracle.com_cd_E19253-01_819-5461_gcfhw_index.html_.png
  • 2019-05-10-22_17_18-.png
  • 2019-05-10-22_21_04-192.168.1.102-KiTTY.png
  • 2019-05-10-22_24_26-_antlets-ANTSLE-WinSCP.png

Update: Ok, I was able to fix the problem but the zpool degradation / hdd connection issue has me seriously concerned.

First -- it was late and I was panicked...TBADEVSRV is actually my primary work container but in the past I had one named TBAAPPSRV hence the initial confusion and panicked phone call.

The TEST container I made from the backup snapshot in the NONworking TBADEVSRV container tipped me off because the ip address information in /etc/network/interfaces inside the ubuntu container was the same as I'm used to always working with.

Procedure: Hit clone of the non-working container via antMan....start....and the clone fires up with all data intact.  I re-point my ANTSLE's nginx virtural hosts to the new container ip address and my web apps are back.

Great!...phew.   Grocery list for today: big ass back-up drive + iDrive or whatever....triple check marks.

Second - ANTSLE's fanless case / heat mitigation might be the culprit (it could very well be just a bad drive).  As I mentioned,  the case was VERY warm to the touch when I pullled it from the the rack (which has fans built in to keep air circulating around the other servers running in there...they have been fine for years).    After letting the case cool down a bit, the hdd seemed to connect again and be detected in the zpool without issue after several initial / desparate boot attempts.

Just as an experiment...I've hard rebooted several more times with the ANTSLE just sitting on a desk in my office with plenty of A/C going in the room:  no issues.

Weird.

I appreciate tech support for getting back to me on a Friday afternoon before beer time.

Uploaded files:
  • 2019-05-11-06_54_29-Welcome-to-antMan.png

Hello JYoung,

For some reason it looks like your antsle did not find the antlets zpool which the 'zpool import -a' command probably resolved as well as the error message about "cannot find init path..." when trying to start the antlet.

Regarding the degraded hdd zpool, each zpool is a pair of mirrored drives.  The 'resilvering' message is zfs's way of saying it is re-syncing the mirror.  You can also run

zpool scrub hdd

which will verify/correct the data on the drives.

If the degraded status persists it may be a bad drive and we would gladly repair that for you.  You can contact us via the 'Support' link in anthill.antsle.com

You might want to test by creating a few antlets on the hdd zpool to see if the zpool goes into a 'degraded' state again.

Hi Mario,

Thanks for the insights.  And yes, you're absolutely correct...after I had a little downtime to bone up on my linux knowledge, I discovered that a mirrored zpool is redundancy set-up against hardware failure.   Phew.

For whatever reason, when the zpool import -a command was run it did bring back my TBADEVSRV container.    It did not actually solve the 'cannot find init path' error and the container will still not startup.   I left it in place though...as it's not bothering anything by being there.

What DID work as a solution to getting a working container back up and running was cloning a NEW container from the borked TBADEVSRV container.   In the screenshot one post above, you can see TBADEVSRV_CLONE running and the old borked TBADEVSRV container still in a stopped state.

I went ahead and pulled off all mysql data, php code and LAMP configurations off of the container and will be pulling the ANSTLE from the network in the meantime.   I just have a really bad feeling about keeping it in production use as I'm still not sure whether the degraded zpool status is a result of the heat situation or a legit failed hard drive.  A mirrored / raid set-up shouldn't bring down a box because one of the hard drives failed.   All I did was reboot the machine...no bueno.

I will most likely be taking you guys up on a warranty claim as maybe something in the logs will allow you to diagnose things on the hardware side better than I can.   I do appreciate the willingness to help out however.    The support is much appreciated.