essay.dev: A real-time blog from emacs magit-forge based on GitHub issues

Sean Grove
Sean GroveDec 3rd, 2020

So @dwwoelfel and I have been working on a powerful blogging system that keeps all of your data inside of GitHub issues - you can see the result (and post yourself) live on essay.dev - or you can fork the open-source repo and deploy your instance, and all the instructions below will work just fine on your own repo.

Watch me create a blog post from inside magit-forge

GitHub-issue powered blogging and commenting

The entire site is powered by GitHub issues and next.js (and hosted on Vercel). Any issue with a Publish tag will be made publicly available immediately (and can be similarly unpublished by removing the Publish label).

That's pretty fantastic for lots of reasons - your posts are now in an API that's easy to slice and dice so there's no lock-in to your content or comments, it's a familiar place for devs to work, etc.

There are hundreds of features and polish in essay.dev, but importantly for me, it's compatible with emacs' magit-forge!

magit-forge, I choose you!

magit is the famous git control system for emacs, and it has an equally powerful integration to manage GitHub issues called magit-forge.

Preview of reading a rich post on `essay.dev` in `magit-forge`

You can do all the normal CRUD operations on GitHub issues inside a familiar emacs workflow - which means we can do the same for our posts1!

Creating a post on essay.dev

First make sure you've installed magit and magit-forge (or for spacemacs users, just add the GitHub layer).

Now, let's clone the essay.dev repo:

git clone https://github.com/OneGraph/essay.dev.git
cd essay.dev
emacs README.md

Next we'll connect forge with our GitHub repository via M-x forge-add-repository - and now we're ready to see a list of all of the posts, so run M-x forge-list-issues:

`magit-forge` listing posts on `essay.dev`

If we hit Enter on any of the issues, we'll see the content and the comments:

Look at this excellent post - we'll have to up our game from now on

Create a new post

Running M-x forge-create-issue will create a new buffer pre-filled via the default new-post template:

We're ready to write our next great post

Simply fill out the title and the body, and when you're ready, "commit" the new post via C-c C-c. Forge will commit it to a local database first for safe-keeping, and then create an issue on GitHub! Back in the *forge-issue-list...* buffer, hit g to refresh the lists of posts, with your newest one at the top. Hit Enter on it to view the contents.

Your post is ready!

A few seconds later, run M-x forge-pull to update your local copy - you should find there's a new comment waiting for you from onegraph-bot:

View your post at https://.essay.dev/post//

Our post is all grown up and ready for the world

That's it, your post is available to the world.

What's a post without comments?

You can also leave comments on your posts (and others) with M-x forge-create-post:

Why leave emacs to leave a comment?

It'll show up instantly on your post (both in forge and on the site):

Thanks to the API-based backend (and some clever engineering), posts and comments show up everywhere seamlessly

What's next?

Your content belongs to you, and is easily accessible through the GitHub API - here's an example query that'll pull out the posts for you:

query MyPostsOnGitHub(
  $owner: String = "onegraph"
  $name: String = "essay.dev"
  $createdBy: String = "sgrove"
) {
  gitHub {
    repository(name: $name, owner: $owner) {
      issues(
        first: 10
        orderBy: { field: CREATED_AT, direction: DESC }
        filterBy: { createdBy: $createdBy }
      ) {
        edges {
          node {
            body
            number
            title
          }
        }
      }
    }
  }
}

Try it out here

And again, note that this setup will work with any repo, so if you want to self-host your content it's as easy as using the deploy on vercel link.

❤️ 2

Hello magit-forge

Really an impressive setup, if I'm honest!

Switching site to Docusaurus

Sean Grove
Sean GroveNov 18th, 2020

I've switched my personal site (riseos.com) over to Docusaurus from a Mirage unikernel for a few reasons. First, I was putting off writing blog posts because of the amount of yak shaving I was doing. Second, the dependency situation never really got to the point I felt it was worth the effort. And third, some projects I've been working on have pushed me to get a lot more familiar with frontend topics, especially static-sites that are rendered with React.

I've deployed a few sites with Gatsby, and was looking for something significantly simpler and more reliable. At the recommendation of the ReasonML team, I gave docusaurus a shot on another site, and it worked out nicely. I appreciate that it's limited enough to encourage you not to yak-shave too much (which is good from time to time, but not for my personal site at this time).

Anyway, certainly recommend giving Docusaurus + netlify a shot, worked like a charm for me.

Continuously Deploying Mirage Unikernels to Google Compute Engine using CircleCI

Sean Grove
Sean GroveNov 18th, 2020

Trying to blow the buzzword meter with that title...

Note of Caution!

This never made it quite 100% of the way, it was blocked largely on account of me not being able to get the correct version of the dependencies to install in CI. Bits and pieces of this may still be useful for others though, so I'm putting this up in case it helps out.

Also, I really like the PIC bug, it tickles me how far down the stack that ended up being. It may be the closest I ever come to being vaguely involved (as in having stumbled across, not having diagnosed/fixed) in something as interesting as Dave Baggett's hardest bug ever

Feel free to ping me on the OCaml discourse, though I'll likely just point you at the more experienced and talented people who helped me put this all together (in particular Martin Lucin, an absurdly intelligent and capable OG hacker and a driving force behing Solo5).

Topics

  • What are unikernels?
    • What's MirageOS?
  • Public hosting for unikernels
    • AWS
    • GCE
    • DeferPanic
    • Why GCE?
  • Problems
    • Xen -> KVM (testing kernel output via QEMU)
    • Bootable disk image
    • Virtio problems
      • DHCP lease
      • TCP/IP stack
      • Crashes
  • Deployment
    • Compiling an artifact
    • Initial deploy script
    • Zero-downtime instance updates
    • Scaling based on CPU usage (how cool are the GCE suggestions to downsize an under-used image?)
    • Custom deployment/infrastructure with Jitsu 1

Continuously Deploying Mirage Unikernels to Google Compute Engine using CircleCI

Or "Launch your unikernel-as-a-site with a zero-downtime rolling updates, health-check monitors that'll restart an instance if it crashes every 30 seconds, and a load balancer that'll auto-scale based on CPU usage with every git push"

This post talks about achieving a production-like deploy pipeline for a publicly-available service built using Mirage, specifically using the fairly amazing Google Compute Engine infrastructure. I'll talk a bit about the progression to the current setup, and some future platforms that might be usable soon.

What are unikernels?

Unikernels are specialised, single-address-space machine images constructed by using library operating systems.

Easy! ...right?

The short, high-level idea is that unikernels are the equivalent of opt-in operating systems, rather than opt-out-if-you-can-possibly-figure-out-how.

For example, when we build a virtual machine using a unikernel, we only include the code necessary for our specific application. Don't use a block-storage device for your Heroku-like application? The code to interact with block-devices won't be run at all in your app - in fact, it won't even be included in the final virtual machine image.

And when your app is running, it's the only thing running. No other processes vying for resources, threatening to push your server over in the middle of the night even though you didn't know a service was configured to run by default.

There are a few immediately obvious advantages to this approach:

  • Size: Unikernels are typically microscopic as deployable artifacts
  • Efficiency: When running, unikernels only use the bare minimum of what your code needs. Nothing else.
  • Security: Removing millions of lines of code and eliminating the inter-process protection model from your app drastically reduces attack surface
  • Simplicity: Knowing exactly what's in your application, and how it's all running considerably simplifies the mental model for both performance and correctness

What's MirageOS?

MirageOS is a library operating system that constructs unikernels for secure, high-performance network applications across a variety of cloud computing and mobile platforms

Mirage (which is a very clever name once you get it) is a library to build clean-slate unikernels using OCaml. That means to build a Mirage unikernel, you need to write your entire app (more or less) in OCaml. I've talked quite a bit now about why OCaml is pretty solid, but I understand if some of you run away screaming now. No worries, there are other approaches to unikernels that may work better for you2. But as for me and my house, we will use Mirage.

There are some great talks that go over some of the cool aspects of Mirage in much more detail 13, but it's unclear if they're actually usable in any major way. There are even companies that take out ads against unikernels, highlighting many of the ways in which they're (currently) unsuitable for production:

Anti-unikernel ads

Bit weird, that.

But I suspect that bit by bit this will change, assuming sufficient elbow grease and determination on our parts. So with that said, let's roll up our sleeves and figure out one of the biggest hurdles to using unikernels in production today: deploying them!

Public hosting for unikernels

Having written our app as a unikernel, how do we get it up and running in a production-like setting? I've used AWS fairly heavily in the past, so it was my initial go-to for this site.

AWS runs on the Xen hypervisor, which is the main non-unix target Mirage was developed for. In theory, it should be the smoothest option. Sadly, the primitives and API that AWS expose just don't match well. The process is something like this:

  1. Download the AWS command line tools
  2. Start an instance
  3. Create, attach, and partition an EBS volume (we'll turn this into an AMI once we get our unikernel on it)
  4. Copy the Xen unikernel over to the volume
  5. Create the GRUB entries... blablabla
  6. Create a snapshot of the volume ohmygod
  7. Register your AMI using the pv-grub kernel id what was I doing again
  8. Start a new instance from the AMI

Unfortunately #3 means that we need to have a build machine that's on the AWS network so that we can attach the volume, and we need to SSH into the machine to do the heavy lifting. Also, we end up with a lot of left over detritus - the volume, the snapshot, and the AMI. It could be scripted at some point though.

GCE to the rescue!

GCE is Google's public computing offering, and I currently can't recommend it highly enough. The per-minute pricing model is a much better match for instances that boot in less than 100ms, the interface is considerably nicer and offers the equivalent REST API call for most actions you take, and the primitives exposed in the API mean we can much more easily deploy a unikernel. Win, win, win!

GCE Challenges
Xen -> KVM

There is a big potential show-stopper though: GCE uses the KVM hypervisor instead of Xen, which is much, much nicer, but not supported by Mirage as of the beginning of this year. Luckily, some fairly crazy heroes (Dan Williams, Ricardo Koller, and Martin Lucina, specifically) stepped up and made it happen with Solo5!

Solo5 Unikernel implements a unikernel base, or the lowest layer of code inside a unikernel, which interacts with the hardware abstraction exposed by the hypervisor and forms a platform for building language runtimes and applications. Solo5 currently interfaces with the MirageOS ecosystem, enabling Mirage unikernels to run on either Linux KVM/QEMU

I highly recommend checking out a replay of the great webinar the authors gave on the topic https://developer.ibm.com/open/solo5-unikernel/ It'll give you a sense of how much room for optimization and cleanup there is as our hosting infrastructure evolves.

Now that we have KVM kernels, we can test them locally fairly easily using QEMU, which shortens the iterations while we dealt with teething on the new platform. The

Bootable disk image

This was just on the other side of my experience/abilities, personally. Constructing a disk image that would boot a custom (non-Linux) kernel isn't something I've done before, and I struggled to remember how the pieces fit together. Once again, @mato came to the rescue with a lovely little script that does exactly what we need, no muss, no fuss.

Virtio driver

Initially we had booting unikernels that printed to the serial console just fine, but didn't seem to get any DHCP lease. The unikernel was sending DHCP discover broadcasts, but not getting anything in return, poor lil' fella. I then tried with a hard-coded IP literally configured at compile time, and booted an instance on GCE with a matching IP, and still nothing. Nearly the entire Mirage stack is in plain OCaml though, including the TCP/IP stack, so I was able to add in plenty of debug log statements and see what was happening. Finally tracked everything down to problems with the Virtio implementation, quoting @ricarkol:

The issue was that the vring sizes were hardcoded (not the buffer length as I mentioned above). The issue with the vring sizes is kind of interesting, the thing is that the virtio spec allows for different sizes, but every single qemu we tried uses the same 256 len. The QEMU in GCE must be patched as it uses 4096 as the size, which is pretty big, I guess they do that for performance reasons. - @ricarkol

I tried out the fixes, and we had a booting, publicly accessible unikernel! However, it was extremely slow, with no obvious reason why. Looking at the logs however, I saw that I had forgotten to remove a ton of logging per-frame. Careful what you wish for with accessibility, I guess!

Position-independent Code

This was a deep rabbit hole. The bug manifested as Fatal error: exception (Invalid_argument "equal: abstract value"), which seemed strange since the site worked on Unix and Xen backends, so there shouldn't have been anything logically wrong with the OCaml types, despite what the exception message hinted at. Read this comment for the full, thrilling detective work and explanation, but a simplified version seems to be that portions of the OCaml/Solo5 code were placed in between the bootloader and the entry point of the program, and the bootloader zero'd all the memory in-between (as it should) before handing control over to our program. So eventually our program did some comparison of values, and a portion of the value had at compile/link time been relocated and destroyed, and OCaml threw the above error.

Crashes

Finally, we have a booting, non-slow, publicly-accessible Mirage instance running on GCE! Great! However, every ~50 http requests, it panics and dies:

[11] serving //104.198.15.176/stylesheets/normalize.css.
[12] serving //104.198.15.176/js/client.js.
[13] serving //104.198.15.176/stylesheets/foundation.css.
[10] serving //104.198.15.176/images/sofuji_black_30.png.
[10] serving //104.198.15.176/images/posts/riseos_error_email.png.
PANIC: virtio/virtio.c:369
assertion failed: "e->len <= PKT_BUFFER_LEN"

Oh no! However, being a bit of a kludgy-hacker desperate to get a stable unikernel I can show to some friends, I figured out a terrible workaround: GCE offers fantastic health-check monitors that'll restart an instance if it crashes because of a virtio (or whatever) failure every 30 seconds. Problem solved, right? At least I don't have restart the instance personally...

And that was an acceptable temporary fix until @ricarkol was once again able to track down the cause of the crashes and fix things up that had to do with some GCE/Virtio IO buffer descriptor wrinkle:

The second issue is that Virtio allows for dividing IO requests in multiple buffer descriptors. For some reason the QEMU in GCE didn't like that. While cleaning up stuff I simplified our Virtio layer to send a single buffer descriptor, and GCE liked it and let our IOs go through - @ricarkol

So now Solo5 unikernels seem fairly stable on GCE as well! Looks like it's time to wrap everything up into a nice deploy pipeline.

Deployment

With the help of the GCE support staff and the Solo5 authors, we're now able to run Mirage apps on GCE. The process in this case looks like this:

  1. Compile our unikernel
  2. Create a tar'd and gzipped bootable disk image locally with our unikernel
  3. Upload said disk image (should be ~1-10MB, depending on our contents. Right now this site is ~6.6MB)
  4. Create an image from the disk image
  5. Trigger a rolling update

Importantly, because we can simply upload bootable disk images, we don't need any specialized build machine, and the entire process can be automated!

One time setup

We'll create two abstract pieces that'll let us continually deploy and scale: An instance group, and a load balancer.

Creating the template and instance group

First, two quick definitions...

Managed instance groups:

A managed instance group uses an instance template to create identical instances. You control a managed instance group as a single entity. If you wanted to make changes to instances that are part of a managed instance group, you would apply the change to the whole instance group.

And templates:

Instance templates define the machine type, image, zone, and other instance properties for the instances in a managed instance group.

We'll create a template with

FINISH THIS SECTION(FIN)

Setting up the load balancer

Honestly there's not much to say here, GCE makes this trivial. We simply say what class of instances we want (vCPU, RAM, etc.), what the trigger/threshold to scale is (CPU usage or request amount), and the image we want to boot as we scale out.

In this case, I'm using a fairly small instance with the instance group we just created, and I want another instance whenever we sustained CPU usage over 60% for more than 30 seconds:

`PUT THE BASH CODE TO CREATE THAT HERE`(FIN)

Subsequent deploys

The actual cli to do everything looks like this:

    mirage configure -t virtio --dhcp=true \
            --show_errors=true --report_errors=true \
            --mailgun_api_key="<>" \
            --error_report_emails=sean@bushi.do
    make clean
    make
    bin/unikernel-mkimage.sh tmp/disk.raw mir-riseos.virtio
    cd tmp/
    tar -czvf mir-riseos-01.tar.gz disk.raw
    cd ..

    # Upload the file to Google Compute Storage 
    # as the original filename
    gsutil cp tmp/mir-riseos-01.tar.gz  gs://mir-riseos

    # Copy/Alias it as *-latest
    gsutil cp gs://mir-riseos/mir-riseos01.tar.gz \
              gs://mir-riseos/mir-riseos-latest.tar.gz
    
    # Delete the image if it exists
    y | gcloud compute images delete mir-riseos-latest
    
    # Create an image from the new latest file
    gcloud compute images create mir-riseos-latest \
       --source-uri gs://mir-riseos/mir-riseos-latest.tar.gz
    
    # Updating the mir-riseos-latest *image* in place will mutate the
    # *instance-template* that points to it.  To then update all of
    # our instances with zero downtime, we now just have to ask gcloud
    # to do a rolling update to a group using said
    # *instance-template*.

    gcloud alpha compute rolling-updates start \
        --group mir-riseos-group \
        --template mir-riseos-1 \
        --zone us-west1-a

Or, after splitting this up into two scripts:

    export NAME=mir-riseos-1 CANONICAL=mir-riseos GCS_FOLDER=mir-riseos
    bin/build_kvm.sh
    gce_deploy.sh

Not too shabby to - once again - launch your unikernel-as-a-site with zero-downtime rolling updates, health-check monitors that'll restart any crashed instance every 30 seconds, and a load balancer that auto-scales based on CPU usage. The next step is to hook up CircleCI so we have continuous deploy of our unikernels on every push to master.

CircleCI

The biggest blocker here, and one I haven't been able to solve yet, is the OPAM switch setup. My current docker image has (apparently) a hand-selected list of packages and pins that is nearly impossible to duplicate elsewhere.

Email reports on error in OCaml via Mailgun

Sean Grove
Sean GroveNov 18th, 2020

The OCaml web-situation is barren. Really barren.

I'm not sure if it's because the powers-that-be in the OCaml world are simply uninterested in the domain, or if it's looked down upon as "not-real development" by established/current OCaml devs, but it's a pretty dire situation. There's some movement in the right direction between Opium and Ocaml WebMachine, but both are 1.) extremely raw and 2.) pretty much completely incompatible. There's no middleware standard (Rack, Connect, or the one I'm most familiar with, Ring), so it's not easy to layer in orthogonal-but-important pieces like session-management, authentication, authorization, logging, and - relevant for today's post - error reporting.

I've worked over the past few years on ever-increasingly useful error reporting, in part because it was so terrible before, especially compared to error reports from the server-side. A few years ago, you probably wouldn't even know if your users had an error. If you worked hard, you'd get a rollbar notification that "main.js:0:0: undefined is not a function". How do you repro this case? What did the user do? What path through a (for a human) virtually unbounded state-space lead to this error? Well friend, get ready to play computer in your head, because you're on your own. I wanted to make it better, and so I worked on it in various ways, include improved source-map support in the language I was using at the time (ClojureScript), user session replay in development, predictive testing, automated repro cases, etc., until it was so nice that getting server-side errors was a terrible drag because it didn't have any of the pleasantries that I had come to be used to on the frontend.

Fast forward to this week in OCaml, when I was poking around my site, and hit a "Not found" error. The url was correct, I had just previously a top-level error handler in my Mirage code return "Not found" on any error, because I was very new to OCaml in general and that seemed to work to the extend I needed that day. But today I wanted to know what was going on - why did this happen? Googling a bit for "reporting OCaml errors in production" brought back that familiar frustration of working in an environment where devs just care (let's assume they're capable). Not much for the web, to say the least.

So I figured I would cobble together a quick solution. I didn't want to pull in an SMTP library (finding that 1. the namespacing in OCaml is fucking crazy and 2. some OPAM packages don't work with Mirage only when compiling for a non-Unix backend after developing a full feature has led me to be very cautious about any dependency) - but no worries, the ever-excellent Mailgun offers a great service to send emails via HTTP POSTs. Sadly, Cohttp can't handle multipart (e.g. form) posts (another sign of the weakness of OCaml's infrastructure compared to the excellent clj-http), so I had to do that on my own. I ended up copying the curl examples from Mailgun's, but directing the url to an http requestbin, so I could see exactly what the post looked like. Then, it was just matter of building up the examples in a utop with Cohttp bit by bit until I was able to match the exact data sent over by the curl example. From there, the last bit was to generate a random boundary to make sure there would never be a collision between form values. It's been awhile since I had to work at that level (I definitely prefer to just focus on my app and not constantly be sucked down into implementing this kind of thing), but luckily it still proved possible, if unpleasant. Here's the full module in all its glory currently:

(* Renamed from http://www.codecodex.com/wiki/Generate_a_random_password_or_random_string#OCaml *)
let gen_boundary length =
    let gen() = match Random.int(26+26+10) with
        n when n < 26 -> int_of_char 'a' + n
      | n when n < 26 + 26 -> int_of_char 'A' + n - 26
      | n -> int_of_char '0' + n - 26 - 26 in
    let gen _ = String.make 1 (char_of_int(gen())) in
    String.concat "" (Array.to_list (Array.init length gen))

let helper boundary key value =
  Printf.sprintf "%s\r\nContent-Disposition: form-data; name=\"%s\"\r\n\r\n%s\r\n" boundary key value

let send ~domain ~api_key params =
  let authorization = "Basic " ^ (B64.encode ("api:" ^ api_key)) in
  let _boundary = gen_boundary 24 in 
  let header_boundary = "------------------------" ^ _boundary in
  let boundary = "--------------------------" ^ _boundary in
  let content_type = "multipart/form-data; boundary=" ^ header_boundary in
  let form_value = List.fold_left (fun run (key, value) ->
      run ^ helper boundary key value) "" params in
  let headers = Cohttp.Header.of_list [
      ("Content-Type", content_type);
      ("Authorization", authorization)
    ] in
  let uri = (Printf.sprintf "https://api.mailgun.net/v3/%s/messages" domain) in
  let body = Cohttp_lwt_body.of_string (Printf.sprintf "%s\r\n%s--" form_value boundary) in
  Cohttp_mirage.Client.post ~headers ~body (Uri.of_string uri)

Perhaps I should expand it a bit so that it could become an OPAM package?

From there, I changed the error-handler for the site dispatcher to catch the error and send me the top level message. A bit more work, and I had a stack trace. It still wasn't quite right though, because to debug an error like this, you often need to know the context. With some help from @das_cube, I was able to serialize the request, with info like the headers, URI, etc. and send it along with the error report. The final step was to use @Drup's bootvar work (or is it Functoria? I'm not sure what the line is here) to make all of the keys configurable, so that I only send emails in production, and to a comma-separated list of email supplied either at compile- or boot-time:

let report_error exn request =
  let error = Printexc.to_string exn in
  let trace = Printexc.get_backtrace () in
  let body = String.concat "\n" [error; trace] in
  let req_text = Format.asprintf "%a@." Cohttp.Request.pp_hum request in
  ignore(
    let emails = Str.split (Str.regexp ",") (Key_gen.error_report_emails ())
                 |> List.map (fun email -> ("to", email)) in
    let params = List.append emails [
        ("from", "RiseOS (OCaml) <errors@riseos.com>");
        ("subject", (Printf.sprintf "[%s] Exception: %s" site_title error));
        ("text", (Printf.sprintf "%s\n\nRequest:\n\n%s" body req_text))
      ]
    in
    (* TODO: Figure out how to capture context (via
       middleware?) and send as context with error email *)
    ignore(Mailgun.send ~domain:"riseos.com" ~api_key:(Key_gen.mailgun_api_key ()) params))

let dispatcher fs c request uri =
  let open Lwt.Infix in
  Lwt.catch
    (fun () ->
       let (lwt_body, content_type) = get_content c fs request uri in
       lwt_body >>= fun body ->
       S.respond_string
         ~status:`OK
         ~headers: (Cohttp.Header.of_list [("Content-Type", content_type)]) ~body ())
    (fun exn ->
       let status = `Internal_server_error in
       let error = Printexc.to_string exn in
       let trace = Printexc.get_backtrace () in
       let body = String.concat "\n" [error; trace] in
       ignore(match (Key_gen.report_errors ()) with
           | true -> report_error exn request
           | false -> ());
       match (Key_gen.show_errors ()) with
       | true -> S.respond_error ~status ~body ()
       (* If we're not showing a stacktrace, then show a nice html
          page *)
       | false -> read_fs fs "error.html" >>=
         fun body ->
         S.respond_string
           ~headers:(Cohttp.Header.of_list [("Content-Type", Magic_mime.lookup "error.html")])
           ~status
           ~body ())

It's still not anywhere near what you get for free in Rails, Clojure, etc. - and definitely not close to session-replay, predictive testing, etc. - but it's a huge step up from before!

An example error email, in all its glory:

riseos_error_email

Mirage Unikernel build via Docker

Sean Grove
Sean GroveNov 18th, 2020

As part of due diligence before introducing OCaml to our company, I've been building this site and exploring what OCaml has to offer on a lot of fronts. Now that I have a basic (sometimes terribly painful) flow in place, I've wanted to move on to slimming it down quite a bit. Especially the Mirage build + deploy process. Right now it looks like this:

  1. Dev on OSX (for minutes, hours, days, weeks) until happy with the changes
  2. Git push everything to master
  3. Start up VirtualBox, ssh in
  4. Type history to find the previous incantation
  5. Build Xen artifacts
  6. scp artifacts to an EC2 build machine
  7. ssh into build machine.
  8. Run a deploy script to turn the Xen artifacts into a running server
  9. Clean up left over EC2 resources

As nice as the idea is that I can "just develop" Mirage apps on OSX, it's actually not quite true. Particularly as a beginner, it's easy to add a package as a dependency, and get stuck in a loop between steps 1 (which could be a long time depending on what I'm hacking on) and 3, as you find out that - aha! - the package isn't compatible with the Mirage stack (usually because of the dreaded unix transitive dependency).

Not only that, but I have quite a few pinned packages at this point, and I build everything in step 3 in a carefully hand-crafted virtualbox machine. The idea of manually keeping my own dev envs in sync (much less coworkers!) sounded tedious in the extreme.

At a friend's insistence I've tried out Docker for OSX. I'm very dubious about this idea, but so far it seems like it could help a bit for providing a stable dev environment for a team.

To that end, I updated to Version 1.10.3-beta5 (build: 5049), and went to work trying random commands. It didn't take too long thanks to a great overview by Amir Chaudry that saved a ton of guesswork (thanks Amir!). I started with a Mirage Docker image, unikernel / mirage, exported the opam switch config from my virtualbox side, imported it in the docker image, installed some system dependencies (openssl, dbm, etc.), and then committed the image. Seems to work a charm, and I'm relatively happy with sharing the file system across Docker/OSX (eliminates step 2 the dev iteration process). I may consider just running the server on the docker instance at this point, though that's sadly losing some of the appeal of the Mirage workflow.

Another problem with this workflow is that mirage configure --xen screws up the same makefile I use for OSX-side dev (due to the shared filesystem). So flipping back and forth isn't as seamless as I want.

So now the process is a bit shorter:

  1. Dev on OSX/Docker until happy with the changes
  2. Build Xen artifacts
  3. scp artifacts to an EC2 build machine
  4. ssh into build machine.
  5. Run a deploy script to turn the Xen artifacts into a running server
  6. Clean up left over EC2 resources

Already slimmed down! I'm in the process of converting the EC2 deploy script from bash to OCaml (via the previous Install OCaml AWS and dbm on OSX), so soon I'd like it to look like:

  1. Dev on OSX/Docker until happy with the changes
  2. git commit code, push
  3. CI system picks up the new code + artifact commit, tests that it boots and binds to a port, then runs the EC2 deploy script.

I'll be pretty close to happy once that's the loop, and the last step can happen within ~20 seconds.

Babysteps to OCaml on iOS

Sean Grove
Sean GroveNov 18th, 2020

Early this morning I was able to get some very, very simple OCaml code running on my physical iPhone 6+, which was pretty exciting for me.

I had been excited about the idea since seeing a post on Hacker News. Reading through, I actually expected the whole process to be beyond-terrible, difficult, and buggy - to the point where I didn't even want to start on it. Luckily, Edgar Aroutiounian went well beyond the normal open-source author's limits and actually sat down with me and guided me through the process. Being in-person and able to quickly ask questions, explore ideas, and clear up confusion is so strikingly different to chatting over IRC/Slack. I'll write a bit more about the process later, but here's an example of the entire dev flow right now: edit OCaml (upper left), recompile and copy the object file, and hit play in XCode.

ocaml_on_ios

The next goal is to incorporate the code into this site's codebase, to build a native iOS app for this site as an example (open source) iOS client with a unikernel backend. I'm very eager to try to use ReactNative, for:

  1. The fantastic state models available (just missing a pure-OCaml version of DataScript)
  2. Code sharing between the ReactJS and ReactNative portions
  3. Hot-code loading
  4. Tons of great packages, like ReactMotion that just seem like a blast to play with

Acknowledgements

I'd really like to thank Edgar Aroutiounian and Gina Maini for helping me out, and for being so thoughtful about what's necessary to smooth out the rough (or dangerously sharp) edges in the OCaml world. Given that tooling is a multiplicative force to make devs more productive, I often complain about the lack of thoughtful, long-term investment in it. Edgar (not me!) is stepping up to the challenge and actually making very impressive progress on that front, both in terms of code and in documenting/blogging.

As a side note, he even has an example native OSX app built using OCaml, tallgeese.

Install OCaml's AWS & DBM libraries on OSX

Sean Grove
Sean GroveNov 18th, 2020

I'm toying with the idea of rewriting the deploy script I cribbed from @yomimono for this blog from bash to OCaml (there are some features I'd like to make more robust to the full deploy is automated and resources are cleaned up), and came across the OCaml AWS library. Unfortunately, installing it was a bit frustrating on OSX, I kept hitting:

NDBM not found, the "camldbm" library cannot be built.

After a bit of googling around, it was fairly simple: Simple install the Command Line Tools, and you should have the right header-files/etc. so that opam install aws or opam install dbm should work. Hope that helps someone who runs into a similar problem!

Happy hacking!

Let's Encrypt SSL

Sean Grove
Sean GroveNov 18th, 2020

I used Let's Encrypt (LE) to get a nice SSL cert for www.riseos.com (and riseos.com, though I really would like that to simply redirect to www. Someday I'll wrap up all the loose ends).

Going through the process wasn't too bad, but unfortunately it was a bit tedious with the current flow. To pass the automated LE checks, you're supposed to place a random string at a random URL (thus demonstrating that you have control over the domain and are therefore the likely owner). I thought I would do this by responding to the url in my existing OCaml app, but

  1. The deploy feedback cycle is just too long
  2. The SSL cert generated by make secrets doesn't pass work for the check.

In the end I simply switched the DNS records to point to my local machine, opened up my router, and copy/pasted the example python code. Because I use Route53, it was instantaneous. Then after a bit of mucking about with permissions, I copied fullchain1.pem -> secrets/server.pem, and privkey.pem -> secrets/server.key, fixed the dns records, redeployed (now a single script on a local vm + a single script on an EC2 vm), et voila, a working SSL site!

There are some problems with the Let's Encrypt certificate however. The JVM SSL libraries will throw and error when trying to connect to it, saying something like, "unable to find valid certification path to requested target". That transitively affects Apache HttpClient, and therefore clj-http. In the end, I had to pull the cert and insert it into the keystore.

As a side note, the deploy cycle is still too long, and still too involved, but it hugely better than just a week or two ago. I expect to soon be able to remove the EC2 vm entirely, and to be able to run a full, unattended deploy from my VM - or even better, from CircleCI after every push to master. After those sets of paper cuts are healed, I want to do a full deploy on a fresh account, and get the time from initial example-mirage git checkout to running publicly-accesible server (possibly with valid https cert) to under three minutes, on either EC2, Prgmr, or Google Cloud (or Linode/Digital Ocean if anyone knows how to get xen images booting there).

RiseOS TODOs

Sean Grove
Sean GroveNov 18th, 2020

This site is has been a very incremental process - lots and lots of hard-coding where you'd expect more data-oriented, generalized systems. For example, the post title, recent posts, etc. are all produced in OCaml, rather than liquid. I'd like to change that, and bit by bit I'm getting closer to that.

In fact there's a whole list of things I'd like to change:

  • Routing is hard-coded. I want to bring in Opium to be able to use the nice routing syntax, and middleware for auth, etc. However, its dependency on unix means that it can't be used with the Mirage backend. Definitely keeping an eye on the open PRs here.
  • Every page is fully re-rendered on each request - Reading the index.html (template file), searching through it for the targets to replace, reading the markdown files, rendering them into html and inserting them into the html, and finally serving the page. For production, this should be memoized.
  • Posts can't specify their template file - everything is just inserted into index.html. Should be trivial to change.
  • The liquid parser mangles input html to the point where it significantly changes index.html. It needs to be fixed up.
  • Similarly, I want to move more (e.g. some) logic into the liquid templates, for things like conditionals, loops, etc.
  • Along those lines, the ReactJS bindings are very primitive, I need to come up with a small app in this site (perhaps logging in) to start exercising and building them out (with ppx extensions at some points, etc.)
  • An application I'm considering is to first expose an API to update posts in dev-mode, then building a ReactJS-based editor on the frontend (draft.js is obviously a very cool tool that could be used). That way editing is a live, in-app experience, and then rendering is memoized in production. Production could even have a flag to load the dev tools given the right credentials, and allow for a GitHub PR to be created off of the changes.
  • Possibly use Irmin as a storage interface for the posts.

Plenty of other things as well. I'll update this as I remember them.