# raft archive format by [Charles Iliya Krempeaux](http://changelog.ca/) The **raft format** is a very simple and easy to understand **archive format** and **container format** that can combine multiple files into a single aggregate file. If you are _not_ familiar with **archive formats** and **container formats** — they have many use-case: * backups, * eBooks, * file-systems, * image galleries, * journals, * music albums, * photo albums, * software packages, * website archives, * _etc_. Basically, any use-case where you need to combine multiple files into a single aggregate file. The **raft format** is similar to other **archive formats**, such as the **ar format**, the **cpio format**, the **shar format**, the **tar format**, and the **WARC format** — but is designed to be easier to understand and implement than most (probably all) of the other **archive formats** and **container formats**. In fact, one of the main points of the **raft format** existing, is that it was designed to be easy to understand and implement for programmers. The **raft format** is meant to be both programmer-legible and programmer-friendly. ## Sample Here is an example **raft** file with 3 files embedded in it. ``` RAFT/1 README.md 12 Hello world! article.txt 1573 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Interdum velit laoreet id donec ultrices tincidunt arcu non sodales. Cras semper auctor neque vitae tempus quam pellentesque nec nam. Cursus turpis massa tincidunt dui ut. Diam vel quam elementum pulvinar etiam non quam. Gravida neque convallis a cras semper. Ornare massa eget egestas purus. Tempor id eu nisl nunc mi ipsum faucibus vitae aliquet. Fames ac turpis egestas maecenas pharetra. Arcu bibendum at varius vel pharetra vel turpis nunc. Integer quis auctor elit sed vulputate mi. Eget velit aliquet sagittis id consectetur purus ut faucibus. Sapien pellentesque habitant morbi tristique senectus. Lorem mollis aliquam ut porttitor leo a diam sollicitudin tempor. Quis commodo odio aenean sed adipiscing. Commodo quis imperdiet massa tincidunt nunc. Quam quisque id diam vel quam elementum pulvinar etiam non. Elit ut aliquam purus sit amet luctus venenatis lectus. Sit amet mauris commodo quis. Placerat vestibulum lectus mauris ultrices eros in. Tristique sollicitudin nibh sit amet commodo nulla facilisi nullam vehicula. Augue interdum velit euismod in. Tellus pellentesque eu tincidunt tortor. Commodo viverra maecenas accumsan lacus vel facilisis. Venenatis a condimentum vitae sapien pellentesque habitant morbi. Et ligula ullamcorper malesuada proin libero nunc consequat interdum varius. Tellus integer feugiat scelerisque varius. Bibendum enim facilisis gravida neque convallis. Nisl nisi scelerisque eu ultrices vitae auctor eu. images/logo.svg 1819 ``` The files inside of this **raft** file are named: * `README.md` * `article.txt` * `images/logo.svg` The **raft** file also specifies the **file size** of each of these embedded files. | File Name | File Size | |-------------------|-----------| | `README.md` | 12 | | `article.txt` | 1573 | | `images/logo.svg` | 1819 | Each of these **file sizes** lets us know how many bytes to read (starting at the next line) for the embedded file. The content of the embedded file named `README.md` is only 12 bytes long, and is: `Hello world!` The content of the embedded file named `article.txt` is 1573 bytes long, and is: ``` Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Interdum velit laoreet id donec ultrices tincidunt arcu non sodales. Cras semper auctor neque vitae tempus quam pellentesque nec nam. Cursus turpis massa tincidunt dui ut. Diam vel quam elementum pulvinar etiam non quam. Gravida neque convallis a cras semper. Ornare massa eget egestas purus. Tempor id eu nisl nunc mi ipsum faucibus vitae aliquet. Fames ac turpis egestas maecenas pharetra. Arcu bibendum at varius vel pharetra vel turpis nunc. Integer quis auctor elit sed vulputate mi. Eget velit aliquet sagittis id consectetur purus ut faucibus. Sapien pellentesque habitant morbi tristique senectus. Lorem mollis aliquam ut porttitor leo a diam sollicitudin tempor. Quis commodo odio aenean sed adipiscing. Commodo quis imperdiet massa tincidunt nunc. Quam quisque id diam vel quam elementum pulvinar etiam non. Elit ut aliquam purus sit amet luctus venenatis lectus. Sit amet mauris commodo quis. Placerat vestibulum lectus mauris ultrices eros in. Tristique sollicitudin nibh sit amet commodo nulla facilisi nullam vehicula. Augue interdum velit euismod in. Tellus pellentesque eu tincidunt tortor. Commodo viverra maecenas accumsan lacus vel facilisis. Venenatis a condimentum vitae sapien pellentesque habitant morbi. Et ligula ullamcorper malesuada proin libero nunc consequat interdum varius. Tellus integer feugiat scelerisque varius. Bibendum enim facilisis gravida neque convallis. Nisl nisi scelerisque eu ultrices vitae auctor eu. ``` The content of the embedded file named `images/logo.svg` is 1819 bytes long, and is: ``` ``` ## Motivation There are many many use-cases where multiple files are combined into a single file. For example: * backups, * eBooks, * file-systems, * image galleries, * journals, * music albums, * photo albums, * software packages, * website archives, * _etc_. Many of these use-cases either use the **cpio format**, the **iso format**, the **rar format**, **tar format**, or the **zip format**, or some other **archive format** or **container format**. While all of these formats work acceptably as an **archive format** and a **container format** — none of them are **easy** for a programmer of 3 to 10 years of experience to implement a encoder and a decoder for it. Also none of these supports a ‘**view-source**’ learning style (as none of them is text based, for some definition of "text"). That is why the **raft format** exists. The **raft format** is a text-based format (in the same way HTTP/1.1 protocol is a text-based), so a programmer can look at **raft** files (i.e., ‘**view-source**’) to understand it. The **raft format** is simple to create, thus making it easy to create an encoder. The **raft format** is simple to parse, thus making it easy to create a decoder. ## File Extension Although **raft** does _not_ require an extension (since it has magic-bytes), if a file-extension is used for a **raft** file, it should use the `.raft` extension (on systems where file-extensions are necessary). For example: `stuff.raft` ## MIME Type Although **raft** does _not_ require a MIME-type (since it has magic-bytes), if a MIME-type is used for a **raft** file, it should use the `multipart/raft` extension (on systems where file-extensions are necessary). For example: ``` Content-Type: multipart/raft ``` ## Name The name “**raft**” derives from 3 meanings: * it is a recursive acronym for “**R**aft **A**rchive **F**orma**T”**, * it is an English noun for a flat-bottomed boat used to **transport things together**, and * it also happens to be a Persian stem word for **‘to go**’. ## File Format A **raft** file is a single file that contain multiple other files. Or said more formally, the **raft format** is an **archive format** and **container format** that can combine multiple files into a single aggregate file. One of the main points of the **raft format** is that it was designed to be easy to understand and implement for programmers. The **raft format** is meant to be both programmer-legible and programmer-friendly. The common way to store and think about **multiple files** is as part of a directory system. For example: * readme.xhtml * LICENSE * images/logo.png * images/banner.png * images/figures/figure1.jpeg * images/figures/figure2.jpeg * images/figures/figure3.png This type of thing (and the files' contents) is what is inside of a **raft** file. One way of thinking about this is that, it is a **hierarchical key-value format** similar to (**but not the same as**) JSON, INI, and other similar formats. For example, in JSON the preceding file system would probably look like: ```json { "readme.xhtml": "...", "LICENSE": "...", "images": { "logo.png": "...", "banner.png": "...", "figures": { "figure1.jpeg": "...", "figure2.jpeg": "...", "figure3.png": "..." } } } ``` (Note that we are using `"..."` in the examples because we aren't listing the contents of the files.) Also for example, in INI the preceding file system would look like: ```ini readme.xhtml = ... LICENSE = ... [images] logo.png = ... banner.png = ... [images.figures] figure1.jpeg = ... figure2.jpeg = ... figure3.png = ... ``` (Again note that we are using `"..."` in the examples because we aren't listing the contents of the files.) ### Example The same as a **raft** file would be: ``` RAFT/1 readme.xhtml 14 ... LICENSE 1053 ... images/logo.png 17365 ... images/banner.png 5550 ... images/figures/figure1.jpeg 132441 ... images/figures/figure2.jpeg 814532 ... images/figures/figure3.png 28389 ... ``` (And again note that we are using `"..."` in the examples because we aren't listing the contents of the files.) Now let's look at a **raft** file that actually includes each file's contents (instead of `"..."`) so we can see a real example. We are going to use a different directory structure for this example though. We will use this one: * README.md * article.txt * images/logo.svg And here is the example **raft** file that includes each of the file's contents (instead of `"..."`): ``` RAFT/1 README.md 12 Hello world! article.txt 1573 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Interdum velit laoreet id donec ultrices tincidunt arcu non sodales. Cras semper auctor neque vitae tempus quam pellentesque nec nam. Cursus turpis massa tincidunt dui ut. Diam vel quam elementum pulvinar etiam non quam. Gravida neque convallis a cras semper. Ornare massa eget egestas purus. Tempor id eu nisl nunc mi ipsum faucibus vitae aliquet. Fames ac turpis egestas maecenas pharetra. Arcu bibendum at varius vel pharetra vel turpis nunc. Integer quis auctor elit sed vulputate mi. Eget velit aliquet sagittis id consectetur purus ut faucibus. Sapien pellentesque habitant morbi tristique senectus. Lorem mollis aliquam ut porttitor leo a diam sollicitudin tempor. Quis commodo odio aenean sed adipiscing. Commodo quis imperdiet massa tincidunt nunc. Quam quisque id diam vel quam elementum pulvinar etiam non. Elit ut aliquam purus sit amet luctus venenatis lectus. Sit amet mauris commodo quis. Placerat vestibulum lectus mauris ultrices eros in. Tristique sollicitudin nibh sit amet commodo nulla facilisi nullam vehicula. Augue interdum velit euismod in. Tellus pellentesque eu tincidunt tortor. Commodo viverra maecenas accumsan lacus vel facilisis. Venenatis a condimentum vitae sapien pellentesque habitant morbi. Et ligula ullamcorper malesuada proin libero nunc consequat interdum varius. Tellus integer feugiat scelerisque varius. Bibendum enim facilisis gravida neque convallis. Nisl nisi scelerisque eu ultrices vitae auctor eu. images/logo.svg 1819 ``` It is a simple format. You might even be able to figure out the format just by looking at this (and other) examples. Now that we have a real example of a **raft** file, let's look at the structure of it. ### Magic-Bytes You can tell if a file is a **raft** file or not just by looking at the first 5 bytes of at the beginning of the file. For a file to be a **raft** file is MUST begin with the byte bytes: ```go "RAFT/" ``` I.e., in hexadecimal this would be: ``` 0x5A 0x41 0x52 0x46 0x2F ``` ### Version What comes immediately after that is the version. So with this first line of a **raft** file: ```go "RAFT/1" ``` What comes immediately after the `"RAFT/"` is" ```go "1" ``` Or in hexadecimal this would be: ``` 0x31 ``` For now the only version of the **raft format** is version 1. So you should just look for the "1" character (i.e., hexadecimal `0x31`). ### Writing The First And Second Lines If you are creating a **raft** file, then you can create the first and second lines of a **raft** file with code like the following — In the Go programming language, it would look like: ```go var writer io.Writer = os.Stdout // you can change this from os.Stdout to a file // ... fmt.Fprintln(writer, "RAFT/1") fmt.Fprintln(writer) ```