For the security enthusiasts out there, Docker 1.10 comes with some really cool Security focused additions. In particular:
- Seccomp filtering: you can now use bpf to filter exactly what system calls the processes inside of your containers can use.
- Default Seccomp Profile: Using the newly added Seccomp filtering capabilities we added a default Seccomp profile that will help keep reduce the surface exposed by your kernel. For example, last month's use-after-free vuln in join_session_keyring was blocked by our current default profile.
- User Namespaces: root inside of the container isn't root outside of the container (opt-in, for now).
- Authorization Plugins: you can now write plugins for allowing or denying API requests to the daemon. For example, you could block anyone from using --privileged.
- Content Addressed Images: The new manifest format in Docker 1.10 is a full Merkle DAG, and all the downloaded content is finally content addressable.
- Support for TUF Delegations: Docker now has support for read/write TUF delegations, and as soon as notary 0.2 comes out, you will be able to use delegations to provide signing capabilities to a team of developers with no shared keys.
These are just a few of the things we've been working on, and we think these are super cool.
Checkout more details here: http://blog.docker.com/2016/02/docker-engine-1-10-security/ or me know if you have any questions.
https://github.com/docker/docker/pull/5773 and https://github.com/docker/docker/issues/3629
Docker containers /in principle/ do work with systemd. They are implemented as transient units when you use --exec-opt native.cgroupdriver=systemd (in your daemon's cmdline). I've been working on getting this support much better (in runC and therefore in Docker), however systemd just has bad support for many of the new cgroups when creating transient units.
So really, Docker has systemd support. Systemd doesn't have decent support for all of the cgroup knobs that libcontainer needs (not to mention that systemd has no support for namespaces). I'd recommend pushing systemd to improve their transient unit knobs.
But I'd rather like to know why the standard cgroupfs driver doesn't fulfil your needs? The main issues we've had with systemd was that it seems to have a mind of it's own (it randomly swaps cgroups and has its own ideas about how things should be run).
Can someone elaborate on this a bit more? From a CS point-of-view, sounds like a problem where a data structure came in handy but I'm not sure what it solves. Thanks!
This is really interesting.
Sounds like the up/download manager has improved too. I did some early work adding parallel stuff to that (which was then very helpfully refactored into actually decent go code :), thanks docker team) and it's great to see it improved. I remember some people looking at adding torrenting for shunting around layers, I guess this should help along that path too.
In previous versions, only the name of a container would be aliased to its IP address, which can make it hard to deploy a setup with multiple containers in a given network group that should address each other using their names (e.g. "api" host connects to "postgres") and then have multiple instances of those groups on the same server (as container names need to be unique).
Use case:
Using compass within a container for development
Compass creates sass files with whatever user it run under within the container (likely root)
Host must chmod them to do stuff with them
As a work around, I've been building images and creating a user:group that matches my host. Obviously this destroys portability for other developers.
In unix, a uid is just an integer (chown <random_int> file will work). In your case, the container created a file in a volume with a uid. This uid makes no sense on host but it leaks out to host anyway since it's a 'volume'.
I think with the userns map, you can map container uid to a host uid. The files created in the volume will then be visible on the host as the mapped uid.
This is my understanding, I have to play with it :-)
Don't run anything as root unnecessarily.
(Your problem is likely because you refer to the root namespace as a "host". It is common practice, but leads people to bad conclusions.)
EDIT: And a default seccomp profile! Did I miss the memo about containerisation suddenly becoming a competative industry?
Heh. Yeah, it took quite a while. Lots of kernel bugs. :P
1. docker stats --all
Built-in alternative over 'docker ps -q | xargs docker stats' which takes care of dynamic additions to the list.
For consistency, it would be nice to have a similar option in the API stats call to fetch statistics for all running containers.
2. 'docker update' command, although I would have preferred 'docker limit'.
Ability to change container limits at runtime:
- CPUQuota - CpusetCpus - CpusetMems - Memory - MemorySwap - MemoryReservation - KernelMemory
With this feature in place, there is no reason to run containers without limits, at least memory limits.
3. Logging driver for Splunk
Better approach is to enhance generic drivers to be flexible enough to send logs to any logging consumer.
This is not correct. You cannot change kernel memory limits on Linux after processes have been started (disclaimer: I've done a bunch of work with the runC code that deals with this in order to add support for the pids cgroup). You can update everything else though.
So happy right now.
> ./docker-1.10.0 daemon -b br0 --default-gateway 172.16.0.1
> ./docker-1.10.0 run --ip 172.16.0.130 -ti ubuntu bash docker: Error response from daemon: User specified IP address is supported on user defined networks only.
But my KVM vms work fine with that bridged network. I know I could just port forward but I don't want to, yes It seems I am treating my containers as VMs, but it worked so fine in default LXC, we could even use Open vSwitch bridge for advanced topologies.
> docker network create --gateway 172.16.0.1 --subnet 172.16.0.0/21 mynet
Docker 1.9 brought in Native multi-host networking support using overlay driver. Proper multicast support for the overlay driver would require proper multicast control-plane (L2 & L3). Contributions welcome.
Weave would just get in the way in this scenario (and has a tendency to over-complicate simple stuff like running an ElasticSearch cluster with auto discovery)
The highlights are networks/volumes in Compose files, a bunch of security updates, and lots of new networking features.
It seems like if one piece gets an upgrade, every moving component relying on some APIs may need to be looked at as well.
Did a PR on one issue.
Currently chasing my tail to see if a third party lib is out of whack with the new version or it's something I did.
The whole area is evolving and the cross pollination of frameworks, solutions (weave, etc), make for a complicated ecosystem. Most people don't stay "Docker only". I'm curious to see the warts that pop up.
Even within the Mesos environment, there are so many nuts and bolts which have to fit together that sometimes I'm just fed up with the complexity. Furthermore, releases of Mesos and Marathon are not necessarily synched... Stateful apps? No go... Persistent volumes in Marathon? Maybe in v0.16... Graceful shutdown of Docker containers? No go...
In these use cases, I want to start a container, have it process a unit of work, clear any state, and start over again. Previously, you could orchestrate this by (as an example, there are other ways) mounting a tmpfs file system into any runtime necessary directories, starting the container, stopping it once the work is done, clean up the tmpfs, and then start the container again.
Now, you can create everything once with the --tmpfs flag and simply use "restart" to clear any state. Super simple. Awesome!
If you're using Docker Compose, add this to your environment: environment: - WAIT_COMMAND=[ $(curl --write-out %{http_code} --silent --output /dev/null http://elastic:9200/_cat/health?h=st) = 200 ] - WAIT_START_CMD=python /code/lytten/main.py - WAIT_SLEEP=2 - WAIT_LOOPS=10
Then, create a 'wait' bash script in your app's source code that looks like this: !/bin/bash echo $WAIT_COMMAND echo $WAIT_START_CMD
is_ready() {
eval "$WAIT_COMMAND"
}
# wait until is ready
i=0
while ! is_ready; do
i=`expr $i + 1`
if [ $i -ge $WAIT_LOOPS ]; then
echo "$(date) - still not ready, giving up"
exit 1
fi
echo "$(date) - waiting to be ready"
sleep 10
done
#start the script
exec $WAIT_START_CMD
Then, finally, for your nginx container, add:
command: sh /code/wait_to_start.shto its specification
timeout 60 bash -c 'until is_ready; do sleep 10; done'
You'll need to `export -f is_ready` if it's a shell function.
That's assuming you can use a timeout instead of number-of-retries.Apparently no one else has been paying an ounce of attention... And you get downvoted for it. The HN way! https://github.com/docker/docker/issues/19474 Least of all you're forced to go through their DNS server which doesn't support TCP. Boy, this is absolutely going to fuck people. Because I bet a bunch of people are going to run Go containers in 1.10 engine. And guess what happens when you send a Go app a DNS response, in UDP format, that is larger than 4096 bytes? You get a panic and crash! Woohoo! And yes, there are DNS servers that incorrectly throw out UDP DNS responses larger than 4096 bytes. Can't wait for my containers to fail because of fucking Docker putting a DNS service in Engine. Unacceptable. Docker should've realized they needed to think about this stuff all-the-why shykes was too busy picking fights with people as Kubernetes encroached on what he saw as "his" territory. There's a reason that everyone is very excited about the rkt announcement today. Particularly amongst some Kubernetes users... (In the interest of not tainting the waters, I do NOT work for Google)
This will end well...
Simply from a learning perspective. I just don't know why and would like to know.
I'd give it some time before using that in production and use your own DNS/Service Discovery mix.