EuroPython 2015 - Day 5 - Kubernetes, Yelps Microservices Story, DumbDev, Parallelism Shootout, FOSS Docs - 24th July 2015

Containers and Kubernetes - the Keynote

@tekgrrl is presenting in her keynote Kubernetes. You can read all about it at https://kubernetes.io .

Interestingly, being asked about security of containers, she commented that this still very much work in progress. Any multi-tenant usage might not be a good idea for applications that require security guarantees.

Yelp’s Microservices Story

Scott spoke to us about the difficulties yelp faced moving their transaction platform to a microservices architecture.

Problems they faced:

  • increase in API complexity
  • coupling between individual services increased
  • interactions between all services got very difficult to debug
  • the whole thing got slower

Decouple all the things

How do you decouple shared concepts in a maintainable way across multiple services? If you ever want to refactor those, how would you do that?

Yelp’s answer to that was to move to a API description language called SWAGGER.
This worked as a documentation system for itself, but requires upfront a gigantic very detailed specification document.

After all the hard work, they’ve gained a pretty API with a web view that’s self-documenting.

  • Lessons
    • interfaces should be intentional
    • interfaces need to be explicit
    • automate everything, especially the repetitive mechanical stuff
    • logging is everything - use logstash and be happy
  • General Lessons
    • Measure everything
    • be explicit
    • know your business and build based on that
    • automate everything

Summary is available as yelp/service-principles at https://github.com/Yelp/service-principles

DumbDev

Programmers aren’t good at remembering. Modern research shows that humans in general aren’t able to hold more than 6 to 7 facts in short term memory.

For example, try to remember all the 12 rules from 12 factor app in the next minute.

The 12 Factor App Rules

Now, please try to repeat them without looking at the list.

Research has shown that you should be able to repeat about 4-7 facts of the twelve, don’t worry if it’s more or less.

This illustrates very nicely that we need a framework to ensure that any problem topic actually fits into our head.

Rob Collins then introduced his idea to utilize a Noughts and Crosses (Tic-tac-toe) board,
which any concept, idea, proposal or diagram must fit into.

DumbDev Empty

To explain this further he then used the concept to visualize itself; a concept inception.

DumbDev

You can watch the whole talk, here:
https://archive.org/details/EuroPython_2015_VklpGvbz

Parallelism Shootout - threads vs. multiprocesses vs. asyncio

Shahriar Tajbakhsh benchmarked different parallelism approaches for I/O bound applications. His benchmark application was download 30 websites.

To compare all the parallel approaches, first lets look at the sequential timing.

Sequential Python Benchmark

The sequential processing time increases as expected in this benchmark.

Threaded Python Benchmark

No surprises in the threaded version either, there is a certain amount of setup time
to get the threading started.

Multiprocessing Python Benchmark

Multiprocessing has similar overhead and similar results.

AsyncIO Python Benchmark

We can clearly see a winner here, AsyncIO is significantly
faster than any of the other approaches.

The sourcecode for the benchmarks can be found at:
https://github.com/s16h?tab=repositories
when he publishes them.

FOSS Docs by Mikey Ariel

Mikey is senior technical writer at Red Hat, talking about open-source projects and
why their documentation is so essential.

Documentation helps to:

— build a unified and intuitive user experience
— have portable and adoptable workflows
— create a scalable and adaptable project

How should one keep up with documentation? Build a tighter integration with the developers on the project and make documentation part of the testing cycle: DevOps for Docs.

Bad documentation is worse than no documentation, always ask: Who are my readers and write for their needs.

As a simple suggestion on where to start with documentation, she suggested a simple markdown based readme file in the following format:

Readme.md basic example

EuroPython 2015 - Day 4 - GitFS, RaspberryPi weather station, Python and the infamous GIL, PyPy STM - 23rd July 2015

Day 4 at Europython

PyPySTM vs PyPy

The day after the Europython Social Event started very slow. I missed the keynote as well as the first talk. The theme of the day was the GIL and PyPy. (In the picture attached, you can see how much faster PyPySTM is than normal PyPy.)

GitFS

https://github.com/PressLabs/gitfs

Danci and Vlad presenting their solution on how to deal with people that can only upload via FTP and don’t know what git is.
They build a mountable representation of a git branch tree using python pygit2 and fuse via pyfuse.

Raspberry Pi - Weatherstation

A really neat project that is supposed to show kids how to interact and work with digital sensors. For example, the students had to calculate how much it rained based on a variaty of digital and analog sensoric data.

  • naturebits (birdbox)
  • AstroPi - fixed date for the rocket launch
  • RaspPi - teaching

On a side note he shared with us an annectdote where he’s been testing the airquality and noticed that around 3pm on a weekday the quality
would significantly drop. Turns out, his office is right above the parking lot on the first floor and all the teachers would leave the building at 3pm.

The Infamous GIL

larry hastings talking about the cpython gil

Larry Hastings presentation are as always an absolute event to watch.

Speaking about the GIL, he took us on a historic journey to the beginning of the GIL.

1992 multithreading became known.
Globals, had a pointer to the current executing frame

The GIL received some additional changes for Python 3 in 2009, the new GIL.

Guido only allows the GIL to be changed if single core performance doesn’t get slower.

There have been multiple tries to fix the GIL, starting with a branch of Python 1.4, however all implementations had the same issue.

modern python language implementations

If the GIL is a deal breaker, one can always use one of the other implementations of Python; Jython, IronPython which use garbage collection instead.

All major languages developed in the time after Python came out are using a garbage collection and don’t support C extensions.

He comes to the conclusion that to support a garbage collector, CPython needs to drop C-extension support.

PyPy - STM - the GIL is dead

PyPy is a just-in-time (JIT) compiler for python that started about 10 years ago. PyPy-STM is a forked version of PyPy that implements
software-transactional-memory.

This provides a huge speed benefit if you’re using multiple threads and shared data.

As example, he showed us a realtime 3D rendered view as comparrison, CPython managed to generate 0.8 FPS, PyPy did 17 FPS and PyPy-STM rendered 35 FPS. PyPy-STM can run N-number of threads concurrently; N equal the number of CPU cores. One important current limitation is that one needs to ensure that each thread is either CPU or I/O bound not both and that it doesn’t switch between them either since that is attached to a lot of overhead.

This sounds easy, however simply executing a log statement consitutes an I/O operation which will slow down a CPU bound thread significantly.

How do the STM transactions work

While analyzing the GIL, one notices that it actually behaves just like a database. PyPy-STM needs to simply check if another thread touched the same object in another transaction and then only if they conflict it’ll need to do some work to arrange a consensus.
Obviously I/O work will be committed first since we can’t turn back time and redo it.

To ensure that no automatic context switching happens he added a context manager “with atomic” which ensures that.

http://morepypy.blogspot.com

#pypy on irc.freenode.net

QnA

How does PyPy-STM know when an effect like I/O will happen?
— The simple answer is, whenever CPython would release the GIL.
What’s the speed difference between PyPy-STM and PyPy for single threaded applications?
— It’s about 20% to 30% slower.

Lightning talks

@nailor showed us a introspection library that allows to inspect running processes called python-manhole. http://python-manhole.readthedocs.com

Announced PyCon Pl on the 13th Oct 2015 - 190Euro incl Accom and Food.

EuroPython 2015 - Holger Krekels Keynote about the interplanetary filesystem - Wed 22nd July 2015

Keynote from Holger Krekel - towards a more effective, decentralized web

Holger started out telling us that for about a year he’s been talking to many people
all around the technology scene. Not only in a single community but, across communities,
to people that are interested in solutions, not implementations.

Since 1969, we’ve seen a stagnation of papers being written, he used that as a sign for
achievements of mankind is slowing down.

Margaret Hamilton

Specifically, back in the day, the guys used to do rocket science, did actual rocket science.
For example; Margaret Hamilton, she build the control code for the Apollo mission that went to the moon. Interestingly, the percentage of females working in technology in 1969 was much greater than nowadays. Unfortunately, this changed with the PC industry getting more successful and being specifically marketed to target males.

But, lets back up a little bit.

Where does the internet and everything we do come from?

In 1936, this telephone used pulse dialing to switch cables all the way
to the other endpoint.
Later, in 1974, packet switching with TCP/IP was introduced and we moved from cable switching
via pulse to data packet switching.

A large free network to freely share information

The advantages of this method were obvious; no more setup cost, the line was already established;
being able to route around nodes; and the idea that many individual nodes will comprise a
large free network and freely share information.
Unfortunately, this was a little bit of a hippy dream.

The internet usage in Great Britain 2009

What actually happened, is lots of star networks accessed by individuals; some stars are bigger than others.
A very obvious and easy example for a big star is the one which helps you find websites. Another example for a big star tries to help you stay in touch with your friends from University.

This is about 2009.

The complexity tax is regressive

This development allowed the bigger stars, the ones being accessed countless times, to track who’s calling and what they’re looking for.
They understood quickly that they’re able to make money out of this knowledge in various ways. Additionally, the setup and operational complexity doesn’t increase per person. This in simple terms means, that after the first million users they acquire the income grows consistently per user but the cost gets lower and lower.

The best minds in IT are focusing on how to make people click more ads and unfortunately not on how to build better rockets.

This is called the “million to one” architecture, “big data” computing large amounts of data from human interaction.

A famous exception is, Elon Musk, who aims to get us to Mars by 2026. Now, this spawns the question:

Can Elon Musk use this on Mars?

Does TCP/IP still work on Mars? - No. Sure, we can rebuild a Mars version of our current infrastructure but what we’ll really need is a different way to connect all together.

This is already happening, in the IoT (Internet of Things) space, however, most of the protocols are proprietary.
Devices interconnect directly with each other since uploading to somewhere in California would be
an unnecessary waste of bandwidth; (direct links).

One nice initiative for this is offlinefirst.org applications written in a way to prefer local or offline first storage/state and sync when they go online.

To ensure that the sync works as expected you need to distribute data in a secure way.
More recent examples of technologies that allow us to do that are: Git, bittorrent, ZFS, Bitcoin, Tahoe-LAFS, Cassandra and Riak.
Interestingly, all these are based on Merkel Trees.

The Merkel Tree

A Merkel Tree is a one way computation that allows to be used as a distributed hash table. All the previously presented
implementations are datastores and databases.
We’ve not used DHT’s to implement a web protocol.

In recent history, more and more programmers have used immutability to allow them to reason about programs better. Immutability helps to ensure correctness not only on mulit-threaded machines but also in massively distributed systems.

Examples of languages and implementations are: Haskell, Scala, Clojure, Immutable.JS and Pyrsistent.

One last concept to discuss before we move on are Namespaces.

Namespaces are a honking great idae, lets do more of those!

“Namespaces are a honking great idea, lets do more of those!” (Zen of Python, Tim Peters)

Namespaces allow programmers to address individual functionality or data in a more organized and obvious way.

Say you're in England. You hear someone talking about "boots".
You think of what Americans call "trunks". That's namespaces.

The idea of namespaces are that you give things names, and then package up
those names in something called a namespace. And then, down the road, you
open up that package and start referring to those things by name again.

(http://tech.jonathangardner.net/wiki/Namespace)

If one combines all the above concepts, you’ll arrive at the need for a massively scalable distributed information exchange system that uses namespaces to address immutable data, the Interplanetary Filesystem.

IPFS - The Interplanetary Filesystem

IPFS - the Interplanetary Filesystem

The IPFS is exactly that, a distributed immutable merkel tree of data that can be addressed directly.

In comparison to http, the content is being fetched based on a merkel tree address hash that references data not the address of another computer.

Mutating IPFS - updating data in a DHT

These hashes can be quite complicated, hence IPFS uses DNS for easy discovery.
Currently the system is in development, but for now we’d be using it as follows:

ipns://example.com/ep2015  -> using DNS resolves to
ipns://<hash value>/ep2015 -> using a special lookup resolves to
ipfs://<data hash value>/ep2015 (the actual address of your specific content)

The naming is done via a self-certified distributed entity.

IPFS - current architecture

The actual data transfer is accomplished via bittorrent-ish data transfer (the distributed hash table) and the networking is simply IP based, because it works.

The Blind Idiot God

The Blind Idiot God

Unfortunately, there is a problem within the discovery phase. If everything is distributed and nodes go randomly online and offline; how does one ensure that a new joiner or a rejoiner sees the actual reality as it is when coming online.

In current implementations this is achieved via stable nodes, they have knowledge about the majority of the tree or even own a full tree snapshot. This is a problem, how can we trust them, they might lie to us.

The open question is therefore, how can we fix this?

How can we give updates of the merkel tree to new joiners/connectors?

Currently, we’ve got one gateway at: gateway.ipfs.io and the current version is implemented in golang. We’re also not enforcing any kind of versioning, simply publishing a new version and referencing that, would make the old one go away.

EuroPython 2015 - Day 3 - Wed 22nd July 2015

Keynote from Holger Krekel - towards a more effective, decentralized web

Holgers amazing keynote on a new way to exchange data deserved it’s own blog post can be found here.

Python Cryptography & Security

@jmortegac was speaking about OWASP and how to secure a python application. You can find his talk on (speakerdeck.com/jmortegac).

Use PBKDF2 for everything with passwords, django has that built in. Find out if everything is ok in your django application via ponycheckup.com.

MongoDB tricks

@apotoc shared

MongoDB best practices and tips

Some MongoDB best practices.

MongoDB useful resources

And further useful resources.

Type Hints for Python 3 - PEP 484 - Guido Van Rossum

Guido is going over all the different new type annotation options and how to define stub files.

Unfortunately, I can’t find a link to the slides. I’ll update this post when I find them.

The reason the syntax is clunky is, it needs to be easy to parse and we need to support all the other syntax commitments from Python 3.

If you’re very academic, you’re probably very upset about this proposal, however, this is a good start, now we can iterate over it until we’re happy about it.

After a little bit of pondering with different implementations they came to the conclusion that, for Python at least, type checking isn’t the job of the python compiler. Instead we’ll be using a static type checker that can be run aside the compile, similar to pylint.

Simply because many organizations are already doing that, and it works.

QnA:

Why is there two ways, to define annotations.

There are quite a few downsides to stub files, always, however, if you force it into every definition the function declaration will be much longer.

Why square brackets?

Parameterization is different, hence

  • they stand out
  • and they are already part of current Python syntax

Invariant, Covariant etc.

Yes, the default is invariant but it’s all in the PEP.

Float vs. Int - Int is a subtype of float and expectable.

Playing with CPython object internals

Very entertaining talk about everything that one shouldn’t do with CPython. Within the process context, we’re able to mess around with the CPython internals and fix it in funny and entertaining ways.

very_bad_things.jpg

We saw that very bad things can be made possible if you play with cpython internals.

Asyncio - Look Ma I’ve built a distributed hash table

Asyncio is great for networking related tasks.
He used asyncio to implement a distributed hash table, or simply speaking a key/value store.

@ntoll spoke to us about how Asyncio helped him solve some of the concurrency issues he faced
while implementing a massively scalable distributed hash table.

asyncio/1.jpg asyncio/3.jpg asyncio/6.jpg asyncio/7.jpg asyncio/8.jpg asyncio/10.jpg asyncio/11.jpg asyncio/12.jpg asyncio/15.jpg asyncio/16.jpg asyncio/19.jpg asyncio/20.jpg asyncio/22.jpg asyncio/24.jpg asyncio/25.jpg asyncio/26.jpg asyncio/27.jpg asyncio/28.jpg asyncio/31.jpg asyncio/32.jpg asyncio/33.jpg asyncio/34.jpg asyncio/35.jpg asyncio/36.jpg asyncio/37.jpg asyncio/39.jpg asyncio/40.jpg asyncio/41.jpg asyncio/42.jpg

EuroPython 2015 - Day Two - Tuesday 21st July 2015

Keynote by Guido

Guido receiving a thank you gift.

He’s thanking Ola + Ola for their talk and noted that it’s impressive that they managed to
create a really strong brand in about one year.

Then he moved on to discuss why one should move to switch to Python 3,
especially since:

  • porting takes time
  • Python 2.7.x is a fine language that will be supported for the next 5 years

The simple answer is Python 3 is just a much better language.

Python 2.7.x will simply be a dead end, there will not be a 2.8 version.

The unicode support alone should make you move to Python 3.

His favorite features in Python 3 is asyncio and everything connected to that,
in Python 3.4 the async_with and all the other block statements are simply
a much better more natural way of writing coroutines.

Why is there such a huge list of open bugs?
Simple fact of live, any large project has many open bugs, but they should mostly be harmless edge cases.

Interesting developments for the future is cross-compiling of Python to Android and iOS.

During the QnA Guido noted that he really likes Swift as language, as well as the 2015 version of C++.

Salting things up in the sysadmin’s world

From @godlike64.

Salt is a configuration management system:

ensure that the state of your system is consistent through time

  • Quantity (we’re dealing with more and more machines at the same time)
  • Increasing complexity of systems
  • System-Admins are generally lazy

He prefers Salt because:

  • written in Python
  • yaml configuration files
  • Jinja Templating

Master - and - Minions

  • Minions are defined by:
    • ID (the hostname)
    • they are part of a nodegroup
    • they provide grains of information

State file - defines the state

  • reside in /etc/salt/master
  • yaml
  • top.sls
    • which host matches the environment
    • entry point
  • supports jinja templating that can be used to replace data from salt minion or master

highstate - state that all minions will adhere too

  • state that is supposed to applied to all systems controlled by the salt master

matching and nodegroups

  • ID (hostname)
  • nodegroup
    • defined by master
    • matches on all grain data

grains and pillars

  • grains
    • informational bits about the current minion system
    • generated on minion start
    • you can write your own grain generating functions in Python
  • pillars
    • data defined by admin in the master configuration in /etc/salt/master
  • general rules
    • grains are data from minions
    • pillar is data sent to the minions

QnA

Communication between master and minion is encrypted.
Pillar data is not stored on the minion.

Lessons learned about testing and TDD

From Marco Buttu.

Radio Telescope

They learned the hard way that tests can keep you sane.
Now they’re happily embracing TDD and don’t have to
deal with sudden outages anymore.

Image recognition using OpenCV

OpenCV detecting in B/W

We saw how OpenCV can be used to recognize a picture based on a library of paintings from Google Glass input while a tourist wearing a Google Glass is walking through a museum.

Python and PyPY performance (not) for dummies

To measure Python performance via a profile you’ve got the following opitions:

  • statistical
    • plop (by Dropbox)
    • vmprof
  • event based
    • cProfile
    • runSnakeRun

The ones supported by PyPy are vmprof and cProfile.

VMProf is inspired by gperftools, it was needed because the C stack output isn’t very useful.

The reason why PyPy is fast than CPython in some cases is that they analyze the python code and
compile the high points into Assembler.

This specialized version is covered by special guardians that check if the previous assumptions are constant, if not it recompiles another version.

EuroPython 2015 - Day One

EuroPython 2015 - Day One

EuroPython 2015

Below are my notes, they might be wrong, please check the actual coverage.

Djangogirls - Keynote

Ola & Ola from the DjangoGirls foundation are writing a book called yay python.

Asyncio

We learned how easy it is do asynchronous programming in Python 3.4+ using asyncio.

Check out the aiohttp documentation.

The following is a quick untested example.

1
pip install asyncio aiohttp
1
2
3
4
5
6
7
import asyncio
from aiohttp import web

@asyncio.coroutine
def index(request):
...
data = yield from request.post() # use yield to avoid blocking the thread

You can add routes like this:

1
app.router.add_route( ... )

The libary support is quite nice at the moment, you can see
them all here at asyncio.org.

Featuring:

  • aiopg
  • aiozmq
  • asyncio-redis
  • aiomemcache

Container based Linux flavours

Presenter: @hguemar

Presenting and overview of containerization OS’s.

Container OS's

All of the OS’s share the following (mostly):

  • SystemD
  • etcd
  • cloud-init
  • kubernets

The OS’s

  • CoreOS
    • based on ChromiumOS that is itself based on Gentoo
    • requires a running of a toolbox container which is based on Fedora
  • Project Atomic by Redhat
    • not yum but rpm-ostree (from the GNOME CI)
    • SELinux secured containers by Dan Walsh
  • Snappy Ubuntu
    • uses AppArmor and LXD
    • based on Canonicals work on phones and JeOS
  • Photon by VMWare
    • based on Fedora
    • uses rpm-ostree and a yum compatible tdnf
    • not production ready yet (July 2015)
  • Rancher OS
    • radically different
    • very minimal footprint (~20MB)
    • runs a Docker instance as PID1
    • doesn’t use SystemD, but instead
    • uses a System container and a user container
    • basically docker inception
    • targets Embedded Devices, IoT

A generic API wrapper

Presenter: @xima

Made a universal API wrapper library called tapioca. github.

Tries to make it easy to wrap RESTful http API’s and binds them in a pythonic way.

Does pagination.

Currently supports:

  • Facebook,
  • Twitter,
  • Parse.com
  • Mandrill

Combining Rust and Python

Dimitry Trofaimov wanted to try Rust and used it to build a Python profiler.

Rust started 2010 - now version 1.x since May 2015.

KEEP CALM AND SEGFAULT

Now, normally when you search how to integrate Rust with Python, you’ll come across
FFI. However, for a profiler this isn’t enough.

When creating a Python profiler there are two options:

  • tracing
  • statistical (periodically captures frames of function stacks)

Datatypes aren’t easily exchangable between Python and Rust, fortunately,
there is a nice library mio github.

1
2
3
4
5
6
7
8
9
10
11
extern crate cpython;

use cpython::{PythonObject, Python};

fn main() {
let gil_guard = Python::acquire_gil();
let py = gil_guard.python();
let sys = py.import("sys").unwrap();
let version = sys.get("version").unwrap().extract::<String>().unwrap();
println!("Hello Python {}", version);
}

Basic Server Security Steps

basic server security steps

DigitalOcean is a great compute provider, easy to use and not in your way.
Unfortunately, their default ubuntu server setup isn’t quite as secure as it
should or could be.

Therefor, I’ll be describing some simple steps to make your new server
much more secure.

add a management user

We need to avoid logging in via root, that’s the first target for anyone.
In this example, we’ll be creating another user called managethis, I’d recommend to
use your own name.

1
2
3
4
useradd managethis
mkdir /home/managethis
mkdir /home/managethis/.ssh
chmod 700 /home/managethis/.ssh

ensure that the management user is able to login via ssh

DigitalOcean already added my ssh key to the authorized_keys list, all I need to do is copy it over to
the new user and make sure that OpenSSH is able to access it.

We’ll be using sudo su - to access the root account later on, to add another level of security; lets add a user password.
You wont be able to use this to login via OpenSSH, it’s just for sudo confirmation.

1
2
3
4
5
6
7
8
# either
cp ~/.ssh/authorized_keys /home/managethis/.ssh/
# or
vim /home/managethis/.ssh/authorized_keys

chmod 400 /home/managethis/.ssh/authorized_keys
chown managethis:managethis /home/managethis -R
passwd managethis

setup logwatch and fail2ban

From Wikipedia: “Fail2ban is an intrusion prevention software framework which protects computer servers from brute-force attacks.”
Simply put, people will not be able to brute-force their way into your server that easily.

1
2
apt-get update
apt-get install fail2ban logwatch
1
vim /etc/sudo/sudoers.d/managethis
1
managethis  ALL=(ALL) ALL

Logwatch will send you all your logfiles via email, replace test@example.com with your own email
in 00logwatch.

1
vim /etc/cron.daily/00logwatch
1
/usr/sbin/logwatch --output mail --mailto test@example.com --detail high

deactivate root login via ssh

Open the sshd_config file and change the following options. If your file doesn’t have an option, add it to it.

1
vim /etc/ssh/sshd_config
1
2
3
4
PermitRootLogin no
PasswordAuthentication no
AllowUsers managethis@(your-ip) managethis@(another-ip-if-any)
Port 50683

You can pick any port number for the ssh port however, the following would be advised:

The Internet Assigned Numbers Authority (IANA) is responsible for the global coordination of the DNS Root, IP addressing, and other Internet protocol resources. It is good practice to follow their port assignment guidelines. Having said that, port numbers are divided into three ranges: Well Known Ports, Registered Ports, and Dynamic and/or Private Ports. The Well Known Ports are those from 0 through 1023 and SHOULD NOT be used. Registered Ports are those from 1024 through 49151 should
also be avoided too. Dynamic and/or Private Ports are those from 49152 through 65535 and can be used. Though nothing is stopping you from using reserved port numbers, our suggestion may help avoid technical issues with port allocation in the future.

Also, please make sure that you activate the ssh port in the firewall via ufw allow 50683 (change the number according to the port that you choose) and
remember that from now on you need to connect specifying the port: ssh managethis@YOURSERVERIPADDRESS -p 50683 you can avoid having to type all this all the time
by simply defining it in your ~/.ssh/config file on your own machine.

To apply the changes we need to restart OpenSSH.

1
service ssh restart

enable the ubuntu firewall

The UncomplicatedFirewall is a frontend for iptables, using it is exactly that - easy, all you need to do is:

1
2
3
ufw allow 22
ufw logging on
ufw enable

Please don’t forget to allow the custom ssh port you’ve set earlier.

In case that you’re planing to run a webserver:

1
2
3
4
5
ufw allow 22
ufw allow 80
ufw allow 443
ufw logging on
ufw enable

[optional] unattended-upgrades setup

1
apt-get install unattended-updates

Open the 10periodic apt file and

1
vim /etc/apt/apt.conf.d/10periodic

add this:

1
2
3
4
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";

Ensure that you’re happy with the packages it’ll install for you:

1
vim /etc/apt/apt.conf.d/50unattended-upgrades

thanks to

This post is a summary of security steps that I found around the internet. Please let me know
if I’ve forgotten to link back.

Replacing Dropbox with OpenSource Self-Hosted Owncloud

Information

This is a work in progress post.

TL;DR;

Alright, you got here via search (hopefully using DuckDuckGo) and don’t want me
to lament on and on about why this is a good idea.

Just go and post the bits only sections of this post one by one, it wont be the best setup,
but it’ll get you started.

Preparation

You need the following:

  • 1) A DigitalOcean account. Trust me they are brilliant.
  • 2) (for backup, advised) an Amazon AWS account and access keys / credentials
  • 3) A way to access the servers ssh console. I’ll be using the built in Terminal emulator. In Windows you’ll probably want to use Putty.
  • 4) And a set of SSH keys to authenticate yourself without password; the best tutorial for that I know of is from Github: (Generating SSH Keys)[https://help.github.com/articles/generating-ssh-keys/]
  • 5) An available domain or subdomain, in any case use namecheap if you need a new domain.

Create a new Droplet on DigitalOcean

For this step you’ll need (from the preparation list):

  • 1) the DigitalOcean account
  • 2) working SSH setup
  • 3) and a set of SSH keys

DigitalOcean makes the next couple of steps really easy.
To ensure that you’re able to follow along I’ve added screenshots for each step.

start creating your new Droplet

Once you’re logged in to your DigitalOcean account click on “Create Droplet”.

Start creating a new Droplet

This will open the page that takes all the information about our new Owncloud instance.

Supplying all necessary information to create the Owncloud server

Name your Server

Any name will do in my example I’ll be naming it myowncloud.

Name your Owncloud Instance

Select the right size

The absolute minimum for this is 1GB, please don’t select 512MB. However, you’re free to select anything higher. I’m using 1GB
happily at the moment, which should be sufficient for up to 10 users.

Select the 1GB instance for Owncloud for a minimum deployment

Select the Owncloud Application Image

Fortunately, DigitalOcean provides us with an Owncloud image. The image contains almost everything we need out of the box,
let’s select it.

Select Image - Next Step

Click on Applications.

Select Image - Click Applications

And select Owncloud 8.0.4 on 14.04. DigitalOcean updates these application images quite frequently,
so, your version number should either be 8.0.4 or anything higher than that. The 14.04 stands for
Ubuntu 14.04 and references the current LTS (LongTermRelease) until the next LTS release comes out
this will be the same as well.

Select Image - Click the Owncloud Image

Pick the region

The region references the location on planet earth of our instance. For me in the UAE the Amsterdam location has
shown the best performance in terms of access performance. I’d advise to that you pick the
closest city to your current location.

Select a Region close to you

Adding SSH keys

We’re almost done with this part of the tutorial.
Let’s add your previously generated SSH keys.

Adding your SSH keys - Next Step

This is not optional and a very important step. The SSH key is the only way that you’ll be
able to access your server. Please make sure you’re following this properly.

Please press *Add SSH key”.

Adding your SSH keys - Press 'Add SSH Key'

Now, please proceed with Step 4 from the Github Tutorial.
I’ve added this step below so that you don’t need to jump back to the other site.

1
2
$ pbcopy < ~/.ssh/id_rsa.pub
# Copies the contents of the id_rsa.pub file to your clipboard

And paste the contents into the text field.

Adding your SSH keys - Paste your previously generate SSH key into the text box 'SSH Key content'

Confirming all above information and creating the Owncloud instance

We’re done on DigitalOcean now. Just press Create Droplet and your new Owncloud instance
will be created within 30 seconds.

Press 'Create Droplet' to create your Owncloud instance Now creating your Owncloud Instance

After about 30 seconds, you’ll be redirect to the droplets main page and you should see.

The Owncloud Droplet page

Congratulations, you’re the proud owner of your own Dropbox alternative using Owncloud 8.
In the next steps we’ll make sure that your shiny new Owncloud instance is save and secure.

To be able to continue you need to take note about the IP address of your new Owncloud instance.
You’ll find it on the same page.

The Owncloud Instance IP Address

It’s the first numbers, in my case they are: 178.62.187.178.

Owncloud Initial Setup

Let’s head over to your new Owncloud instance for some 5 minute initial setup.

Visit your Owncloud Instance

You should be redirected automatically to:

Owncloud Landing Page

Don’t just go and download those Apps, we’ve got some more stuff todo. :)

create your own admin user account

Unfortunately, admin is not a good username for your main ownclouds administration account.
We need to change it to something better. I’ll be using myownclouduser as the name for the admin
user account.

Select Users from the dropdown menu on the top right corner.

Add your own admin user - change to the user admin panel

Add the username in the first box, the password in the second box and select admin as group.

Add your own admin user - enter the user details

If it all worked, this is what you should see.

Add your own admin user - see that the user has been added

Lets try it out, please select Log out from the top right dropdown menu.

Add your own admin user - confirm by logging out and ...

This will redirect you to the generic log in page. Please enter your username and password.

Add your own admin user - ... logging back in with your new user.

If it all worked you should find yourself back in the Owncloud interface.

switch to https only

Now, if you go and switch to the Admin page, you’ll be greeted with the following message in nice bold red letters.

Switch to HTTPS only - switch to the admin page

HTTPS is a secure transport protocol, this is used to ensure that people can’t read your data while it’s in transit between
your device and the server. To use HTTPS you have two options, you can purchase a certificate (we’ll be doing just that later on in this blog post)
or you can use a certificate that was generated with the server. It’s ok to use the generated certificate for now, you’ll get some
security warnings, but besides that, it’s perfectly fine.

Switch to HTTPS only - on there you should see this security warning

To fix this using the generated certificate, we first need to switch to HTTPS as protocol.

Switch to HTTPS only - fixing it we need to switch to HTTPS first

This will make your browser display the following warning.

Switch to HTTPS only - dont worry, this is a normal warning because we are using our own certificate, we will fix this later by using an official one

Don’t worry, everything is fine, this is our own certificate, hence we can trust it. Select Advanced.

Switch to HTTPS only - in chrome click on Advanced and select Proceed to ...

And click, “Proceed to … (unsafe)”.

Switch to HTTPS only - now you select the admin page again

Now, we’re back in the Owncloud interface, using our own certificate. Lets head over to the admin page and change some settings.

Switch to HTTPS only - switch to the security part of the page

In the Admin page, switch to the security part of the page by selecting Security on the left side menu
and select the option Enforce HTTPS as well as Enforce HTTPS for subdomains.

Switch to HTTPS only - and select both options

Great, now we have a reasonably save initial setup, lets continue with some basic server security steps.

Basic Server Security Steps

Follow the steps in this post Basic Server Security Steps

Backup

We’re still missing a nice backup solution for our data.

I’ll be using:

  • Amazon S3 in Amsterdam
  • Duplicity

I’ll be adding this section soon.

Domain Setup

Either get an SSL certificate from namecheap
or use a self-signed one.

… to be continued later