Antsle Forum

Welcome to our Antsle community! This forum is to connect all Antsle users to post experiences, make user-generated content available for the entire community and more.

Please note: This forum is about discussing one specific issue at a time. No generalizations. No judgments. Please check the Forum Rules before posting. If you have specific questions about your Antsle and expect a response from our team directly, please continue to use the appropriate channels (email: [email protected]) so every inquiry is tracked.

ForumGeneral: edgeLinux/antManroot partition 100% full

Please or Register to create posts and topics.

root partition 100% full

#1 · March 18, 2019, 10:01 am

My root partition (/) on my Antsle is 100% full. My antlets zpool has plenty of space. I can browse the Antsle Web UI, and my various antlet web pages load OK (although there are weird rendering issues, which is why I'm investigating things).

Is it normal or expected for that / partition is full?

What might filling it up?

# df -H

Filesystem Size Used Avail Use% Mounted on

udev 11M 0 11M 0% /dev

/dev/sda3 17G 17G 0 100% /

tmpfs 1.7G 783k 1.7G 1% /run

shm 8.4G 0 8.4G 0% /dev/shm

cgroup_root 11M 0 11M 0% /sys/fs/cgroup

/dev/sda2 133M 88M 45M 67% /boot

antlets 199G 0 199G 0% /antlets

...

...

...

#2 · March 18, 2019, 12:50 pm

Fixed. Out of control nginx access file in /var/log/nginx/

Does the Antsle not do log rotation?

#3 · March 19, 2019, 11:13 am

Quote from jwdenzel on March 18, 2019, 10:01 am

My root partition (/) on my Antsle is 100% full. My antlets zpool has plenty of space. I can browse the Antsle Web UI, and my various antlet web pages load OK (although there are weird rendering issues, which is why I'm investigating things).

Interesting. I just hit this last night too. My box was up and running, but Antman wouldn't respond. It was strange because the docker containers were running okay. After poking around, I saw the same 100% state. The bulk of it was in /var/log, with syslog, kern.log, messages, etc., all maxing out at 2GB each. After deleting, touching and restarting, things were back to normal.

Now, I would have written this off to just some random glitch in the cosmos, but then I decided to check here and lo and behold, it looks like I'm not the only one. I wonder what happened (globally) to cause this? With two of us having the same result, within a day of each other, this just can't be coincidence.

Anyone else?

jwdenzel has reacted to this post.

#4 · March 19, 2019, 11:21 am

I heard from Antsle support yesterday regarding this and they said that log rotation on the top-level Linux OS is a feature they plan to implement in the future. In the meantime, they encouraged me to setup logrotate on my own.

I'm fine with this solution, but my new worry is that something might be happening to fill those logs more quickly than they did before.

#5 · March 19, 2019, 11:27 am

Quote from jwdenzel on March 19, 2019, 11:21 am

I heard from Antsle support yesterday regarding this and they said that log rotation on the top-level Linux OS is a feature they plan to implement in the future. In the meantime, they encouraged me to setup logrotate on my own.

I'm fine with this solution, but my new worry is that something might be happening to fill those logs more quickly than they did before.

Logrotate is such a staple of any linux system... I'm kind of shocked that it's not turned on by default. I was thrown off by seeing the usual evidence of log rotation (files named base.0, base.1, etc), but then looking at it closer, I see they haven't rotated since mid-2018. Looks like they might have checkpointed the system and left old files around, but didn't turn it on.

I'll turn it on for my box. If you've already set up a conf file, do you mind sharing?

And yes, I have the same concern. I think something happened that started those log files growing super fast. Unfortunately, I deleted them all, otherwise I could have looked deeper at them. It's suspicious, though.

Thanks!

#6 · March 19, 2019, 11:38 am

Quote from bbergman on March 19, 2019, 11:38 am
Hmm. May have found it. I asked a co-worker if his box had hit 100% and he said no, so we jumped on the console and started poking around. Although his / is filling up, it's not near the 100% mark yet. Looking at his messages file, we found TONS of these entries:

Feb 14 11:29:15 rex kernel: [ 24.766855] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Feb 14 11:29:15 rex kernel: [ 24.883112] docker0: port 1(veth6261176) entered blocking state
Feb 14 11:29:15 rex kernel: [ 24.883114] docker0: port 1(veth6261176) entered disabled state
Feb 14 11:29:15 rex kernel: [ 24.883157] device veth6261176 entered promiscuous mode
Feb 14 11:29:15 rex kernel: [ 24.883219] IPv6: ADDRCONF(NETDEV_UP): veth6261176: link is not ready

We went back to the earliest occurrence of it, which was 14-Feb. Interestingly enough, this was when one of the point releases came out.

Looking at docker.log, we saw the same evidence of tons of log entries like this:

time="2019-02-14T11:29:16.952002854-08:00" level=info msg="Loading containers: done."
time="2019-02-14T11:29:17.015968657-08:00" level=info msg="Docker daemon" commit=0ffa825 graphdriver(s)=zfs version=18.06.0-ce
time="2019-02-14T11:29:17-08:00" level=info msg="shim reaped" id=d961289bfa388eff3255a6676fe5abdfbe9858e75022bf04d29c33f2bbadad4a
time="2019-02-14T11:29:17.038577902-08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2019-02-14T11:29:17.519646708-08:00" level=info msg="Daemon has completed initialization"
time="2019-02-14T11:29:17.563807831-08:00" level=info msg="API listen on /var/run/docker.sock"
time="2019-02-14T11:29:17-08:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/d961289bfa388eff3255a6676fe5abdfbe9858e75022bf04d29c33f2bbadad4a/shim.sock" debug=false pid=6908

time="2019-02-14T11:29:18-08:00" level=info msg="shim reaped" id=d961289bfa388eff3255a6676fe5abdfbe9858e75022bf04d29c33f2bbadad4a
time="2019-02-14T11:29:18.581390270-08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2019-02-14T11:29:19-08:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/d961289bfa388eff3255a6676fe5abdfbe9858e75022bf04d29c33f2bbadad4a/shim.sock" debug=false pid=7350

Both of these files have thousands of virtually identical errors.

I'm guessing that a change they made in one of the docker containers in 0.11.1c (b?) caused this. It looks like a run amok loop.

I'll see what else we can find out.

Hmm. May have found it. I asked a co-worker if his box had hit 100% and he said no, so we jumped on the console and started poking around. Although his / is filling up, it's not near the 100% mark yet. Looking at his messages file, we found TONS of these entries:

Feb 14 11:29:15 rex kernel: [ 24.766855] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Feb 14 11:29:15 rex kernel: [ 24.883112] docker0: port 1(veth6261176) entered blocking state
Feb 14 11:29:15 rex kernel: [ 24.883114] docker0: port 1(veth6261176) entered disabled state
Feb 14 11:29:15 rex kernel: [ 24.883157] device veth6261176 entered promiscuous mode
Feb 14 11:29:15 rex kernel: [ 24.883219] IPv6: ADDRCONF(NETDEV_UP): veth6261176: link is not ready

We went back to the earliest occurrence of it, which was 14-Feb. Interestingly enough, this was when one of the point releases came out.

Looking at docker.log, we saw the same evidence of tons of log entries like this:

time="2019-02-14T11:29:16.952002854-08:00" level=info msg="Loading containers: done."
time="2019-02-14T11:29:17.015968657-08:00" level=info msg="Docker daemon" commit=0ffa825 graphdriver(s)=zfs version=18.06.0-ce
time="2019-02-14T11:29:17-08:00" level=info msg="shim reaped" id=d961289bfa388eff3255a6676fe5abdfbe9858e75022bf04d29c33f2bbadad4a
time="2019-02-14T11:29:17.038577902-08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2019-02-14T11:29:17.519646708-08:00" level=info msg="Daemon has completed initialization"
time="2019-02-14T11:29:17.563807831-08:00" level=info msg="API listen on /var/run/docker.sock"
time="2019-02-14T11:29:17-08:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/d961289bfa388eff3255a6676fe5abdfbe9858e75022bf04d29c33f2bbadad4a/shim.sock" debug=false pid=6908

time="2019-02-14T11:29:18-08:00" level=info msg="shim reaped" id=d961289bfa388eff3255a6676fe5abdfbe9858e75022bf04d29c33f2bbadad4a
time="2019-02-14T11:29:18.581390270-08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2019-02-14T11:29:19-08:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/d961289bfa388eff3255a6676fe5abdfbe9858e75022bf04d29c33f2bbadad4a/shim.sock" debug=false pid=7350

Both of these files have thousands of virtually identical errors.

I'm guessing that a change they made in one of the docker containers in 0.11.1c (b?) caused this. It looks like a run amok loop.

I'll see what else we can find out.

#7 · March 19, 2019, 11:41 am

Same docker errors are copied to kern.log and syslog as well.

This explains why all of those log files blew up to 2GB so fast.

(sigh)

#8 · March 19, 2019, 11:46 am

Thanks for looking at it. My messages file seems to be rotating and is minimal in size. I've not implemented any custom log rotation yet.

-rw-r----- 1 root545 Mar 16 03:11 messages.0

-rw-r----- 1 root271 Mar9 03:11 messages.1.gz

-rw-r----- 1 root592 Mar2 03:10 messages.2.gz

-rw-r----- 1 root128 Feb 23 03:11 messages.3.gz

In my case, it was the nginx access_log that was growing huge. As a quick fix / hack, I might just create a cron job to cat /dev/null into that file on a weekly basis or whatever. Primitive, but effective.

#9 · March 19, 2019, 3:28 pm

Quote from jwdenzel on March 19, 2019, 11:46 am

Thanks for looking at it. My messages file seems to be rotating and is minimal in size. I've not implemented any custom log rotation yet.

I'm confused. You said you spoke to Antsle and they said you needed to set up logrotate, so that would imply to me that you didn't have it running before. And if not, how were your other log files being rotated?

Also, the max log for nginx would be 2GB, which wouldn't fill your disk (you said you had > 2GB free after cleaning up), so what other log files were maxed out to reach the 16GB partition limit?

I had to install logrotate to get it working (it wasn't installed by default) on my system...

#10 · April 2, 2020, 1:26 pm

Any tips/suggestions on the best logrotate installation/config on Antsle? I'm not the most experienced sysadmin (that's why I bought th Antsle- didn't think I'd have to deal with something like this!)...