Monthly Archives: July 2014

File Systems: The Semester Project

This eighteenth article, which is part of the series on Linux device drivers, kick starts with introducing the concept of a file system, by simulating one in user space.

<< Seventeenth Article

If you have not yet guessed the semester project topic, then you should have a careful look at the table of contents of the various books on Linux device drivers. You’ll find that no book lists a chapter on file systems. Not because they are not used or not useful. In fact, there are close to 100 different file systems existing, as of today, and possibly most of them in active use. Then, why are they not talked in general? And are so many file systems required? Which one of them is the best? Let’s understand these one by one.

In reality, there is no best file system, and in fact there may not be one to come, as well. This is because a particular file system suits best only for a particular set of requirements. Thus, different requirements ask for different best file systems, leading to so many active file systems and many more to come. So, basically they are not generic enough to be talked generically, rather more of a research topic. And the reason for them being the toughest of all drivers is simple: Least availability of generic written material – the best documentation being the code of various file systems. Hmmm! Sounds perfect for a semester project, right?

In order to get to do such a research project, a lot of smaller steps has to be taken. The first one being understanding the concept of a file system itself. That could be done best by simulating one in user space. Shweta took the ownership of getting this first step done, as follows:

  • Use a regular file for a partition, that is for the actual data storage of the file system (Hardware space equivalent)
  • Design a basic file system structure (Kernel space equivalent) and implement the format/mkfs command for it, over the regular file
  • Provide an interface/shell to type commands and operate on the file system, similar to the usual bash shell. In general, this step is achieved by the shell along with the corresponding file system driver in kernel space. But here that translation is embodied/simulated into the user space interface, itself. (Next article shall discuss this in detail)

The default regular file chosen is .sfsf (simula-ting file system file) in the current directory. Starting . (dot) is for it to be hidden. The basic file system design mainly contains two structures, the super block containing the info about the file system, and the file entry structure containing the info about each file in the file system. Here are Shweta’s defines and structures defined in sfs_ds.h

#ifndef SFS_DS_H
#define SFS_DS_H

#define SIMULA_FS_TYPE 0x13090D15 /* Magic Number for our file system */
#define SIMULA_FS_BLOCK_SIZE 512 /* in bytes */
#define SIMULA_FS_ENTRY_SIZE 64 /* in bytes */
#define SIMULA_FS_DATA_BLOCK_CNT ((SIMULA_FS_ENTRY_SIZE - (16 + 3 * 4)) / 4)

#define SIMULA_DEFAULT_FILE ".sfsf"

typedef unsigned int uint4_t;

typedef struct sfs_super_block
{
	uint4_t type; /* Magic number to identify the file system */
	uint4_t block_size; /* Unit of allocation */
	uint4_t partition_size; /* in blocks */
	uint4_t entry_size; /* in bytes */ 
	uint4_t entry_table_size; /* in blocks */
	uint4_t entry_table_block_start; /* in blocks */
	uint4_t entry_count; /* Total entries in the file system */
	uint4_t data_block_start; /* in blocks */
	uint4_t reserved[SIMULA_FS_BLOCK_SIZE / 4 - 8];
} sfs_super_block_t; /* Making it of SIMULA_FS_BLOCK_SIZE */

typedef struct sfs_file_entry
{
	char name[16];
	uint4_t size; /* in bytes */
	uint4_t timestamp; /* Seconds since Epoch */
	uint4_t perms; /* Permissions for user */
	uint4_t blocks[SIMULA_FS_DATA_BLOCK_CNT];
} sfs_file_entry_t;

#endif

Note that Shweta has put some redundant fields in the sfs_super_block_t rather than computing them every time. And practically that is not space inefficient, as any way lot of empty reserved space is available in the super block, which is expected to be the complete first block of a partition. For example, entry_count is a redundant field as it is same as the entry table’s size (in bytes) divided by the entry’s size, which both are part of the super block structure. Moreover, the data_block_start number could also be computed by how many blocks have been used before it, but again not preferred.

Also, note the hard coded assumptions made about the block size of 512 bytes, and some random magic number for the simula file system. This magic number is used to verify that we are dealing with the right file system, when operating on it.

Coming to the file entry, it contains the following for every file:

  • Name of up to 15 characters (1 byte for ”)
  • Size in bytes
  • Time stamp (creation or modification? Not yet decided by Shweta)
  • Permissions (just one set, for the user)
  • Array of block numbers for up to 9 data blocks. Why 9? To make the complete entry of SIMULA_FS_ENTRY_SIZE (64).

With the file system design ready, the make file system (mkfs) or more commonly known format application is implemented next, as format_sfs.c:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#include "sfs_ds.h"

#define SFS_ENTRY_RATIO 0.10 /* 10% of all blocks */
#define SFS_ENTRY_TABLE_BLOCK_START 1

sfs_super_block_t sb =
{
	.type = SIMULA_FS_TYPE,
	.block_size = SIMULA_FS_BLOCK_SIZE,
	.entry_size = SIMULA_FS_ENTRY_SIZE,
	.entry_table_block_start = SFS_ENTRY_TABLE_BLOCK_START
};
sfs_file_entry_t fe; /* All 0's */

void write_super_block(int sfs_handle, sfs_super_block_t *sb)
{
	write(sfs_handle, sb, sizeof(sfs_super_block_t));
}
void clear_file_entries(int sfs_handle, sfs_super_block_t *sb)
{
	int i;

	for (i = 0; i < sb->entry_count; i++)
	{
		write(sfs_handle, &fe, sizeof(fe));
	}
}
void mark_data_blocks(int sfs_handle, sfs_super_block_t *sb)
{
	char c = 0;

	lseek(sfs_handle, sb->partition_size * sb->block_size - 1, SEEK_SET);
	write(sfs_handle, &c, 1); /* To make the file size to partition size */
}

int main(int argc, char *argv[])
{
	int sfs_handle;

	if (argc != 2)
	{
		fprintf(stderr, "Usage: %s <partition size in 512-byte blocks>\n",
			argv[0]);
		return 1;
	}
	sb.partition_size = atoi(argv[1]);
	sb.entry_table_size = sb.partition_size * SFS_ENTRY_RATIO;
	sb.entry_count = sb.entry_table_size * sb.block_size / sb.entry_size;
	sb.data_block_start = SFS_ENTRY_TABLE_BLOCK_START + sb.entry_table_size;

	sfs_handle = creat(SIMULA_DEFAULT_FILE,
		S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
	if (sfs_handle == -1)
	{
		perror("No permissions to format");
		return 2;
	}
	write_super_block(sfs_handle, &sb);
	clear_file_entries(sfs_handle, &sb);
	mark_data_blocks(sfs_handle, &sb);
	close(sfs_handle);
	return 0;
}

Note the heuristic of 10% of total space to be used for the file entries, defined by SFS_ENTRY_RATIO. Apart from that, this format application takes the partition size as input and accordingly creates the default file .sfsf, with:

  • Its first block as the super block, using write_super_block()
  • The next 10% of total blocks as the file entry’s zeroed out table, using clear_file_entries()
  • And the remaining as the data blocks, using mark_data_blocks(). This basically writes a zero at the end, to actually extend the underlying file .sfsf to the partition size

As any other format/mkfs application, format_sfs.c is compiled using gcc, and executed on the shell as follows:

$ gcc format_sfs.c -o format_sfs
$ ./format_sfs 1024 # Partition size in blocks of 512-bytes
$ ls -al # List the .sfsf created with a size of 512 KiBytes

Figure 33 shows the .sfsf pictorially for the partition size of 1024 blocks of 512 bytes each.

Figure 33: Simula file system on 512 KiB of partition (.sfsf)

Figure 33: Simula file system on 512 KiB of partition (.sfsf)

Summing up

With the above design of simula file system (sfs_ds.h), along with the implementation for its format command (format_sfs.c), Shweta has thus created the empty file system over the simulated partition .sfsf. Now, as listed earlier, the final step in simulating the user space file system is to create the interface/shell to type commands and operate on the empty file system just created on .sfsf. Let’s watch out for Shweta coding that as well, completing the first small step towards understanding their project.

Nineteenth Article >>

   Send article as PDF   

Fundamentals of Set Theory

This eighteenth article of the mathematical journey through open source, digs into the fundamentals of the set theory through Maxima.

<< Seventeenth Article

Most of you would have definitely heard about the word “set” as in “set theory”, and many of you may already know quite a bit about the sets. Then what’s so special about it. The specialty lies in the whole idea of getting our familiar maths being done by the computer. In this series, we have been mostly picking up familiar maths and then figuring out how that can be done by computer, and that also with no or very little programming knowledge. The same holds true for the sets as well. So, let’s get started with its fundamentals, starting from their creation.

Creation of sets

Just for the beginners of set theory, a set is an unordered collection of distinct items – any item, in any order, but unique. An item is commonly referred to as an element, as well. If an item is contained in a set, it is commonly referred as a member of the set. A set is typically represented by its members enclosed in braces {} and separated by commas. {6, -5, 9, 0}, {dog, cat, donkey, cow, buffalo}, {Kiran, Kasturi, Karan}, {6, horse, sapphire} are some examples. Notice that the first three sets have related items in them, but the last one doesn’t – and that’s perfectly fine. However, if the items in a set have relation(s) or condition(s), the set can also be expressed with that relation(s) or condition(s) mentioned within braces {}. For example, {All human beings younger than 35 years}, {All positive even numbers, All multiples of 3}. In Maxima, we can straight away represent the sets in the first notation, as follows:

$ maxima -q
(%i1) {6, -5, 9, 0};
(%o1)                           {- 5, 0, 6, 9}
(%i2) {dog, cat, donkey, cow, buffalo};
(%o2)                  {buffalo, cat, cow, dog, donkey}
(%i3) {Kiran, Kasturi, Karan};
(%o3)                       {Karan, Kasturi, Kiran}
(%i4) {6, horse, sapphire};
(%o4)                        {6, horse, sapphire}
(%i5) {axe, knife, spear, axe, scissor};
(%o5)                    {axe, knife, scissor, spear}
(%i6) quit();

Note that as the order of items in the set doesn’t matter, Maxima internally keeps them sorted, and hence displayed accordingly, in the above examples. Also note the last example – the duplicates are treated as a single item.

Sets can also be created from ordered lists using setify(). Items of sets could be expressions, but may not be automatically simplified. Check out the following:

$ maxima -q
(%i1) setify([x, y, z]);
(%o1)                              {x, y, z}
(%i2) [x, y, z, x]; /* Ordered list */
(%o2)                            [x, y, z, x]
(%i3) setify([x, y, z, x]);
(%o3)                              {x, y, z}
(%i4) string({x^2 - 1, (x + 1) * (x -1)});
(%o4)                         {(x-1)*(x+1),x^2-1}

string() has been used in %i4, to just have the output on a single line. But the important thing to note is that though the two items of the list are mathematically identical, they have been preserved as two distinct items – and thus not a set in real sense. Such cases can be actually setified by simplifying the individual items of the set using corresponding simplification function, e.g. rat() for rational expressions. And operating any function on every item of a set can be achieved using map(). Here’s an to example to set all those straight, continuing from the above one:

(%i5) string(map(rat, {x^2 - 1, (x + 1) * (x -1)}));
(%o5)                               {x^2-1}
(%i6) string(rat((x + 1) * (x -1)));
(%o6)                                x^2-1
(%i7) quit();

%i6 and %o6 above is just to demonstrate how rat() works. I know, you are still wondering what this wierd map() is and how it is working. So, here are a few more examples for the same:

$ maxima -q
(%i1) trigreduce(2 * sin(x) * cos(x));
(%o1)                              sin(2 x)
(%i2) {sin(2 * x), 2 * sin(x) * cos(x)};  /* Identical items */
(%o2)                     {2 cos(x) sin(x), sin(2 x)}
(%i3) map(trigreduce, {sin(2 * x), 2 * sin(x) * cos(x)});
(%o3)                             {sin(2 x)}
(%i4) string({apple / fruit + mango / fruit, (apple + mango) / fruit});
(%o4)            {(mango+apple)/fruit,mango/fruit+apple/fruit}
(%i5) string(map(rat, {apple / fruit + mango / fruit, (apple + mango) / fruit}));
(%o5)                        {(mango+apple)/fruit}
(%i6) quit();

In fact, the power of map() lies in its ability that it can take a function created on the fly, meaning using the lambda notation. Here goes few examples to demonstrate the lamda() first, and then map() using lambda():

$ maxima -q
(%i1) f: lambda([x], x^3)$
(%i2) f(5);
(%o2)                                 125
(%i3) lambda([x], x^3)(5);
(%o3)                                 125
(%i4) lambda([x, y], x+y)(4, 6);
(%o4)                                 10
(%i5) map(f, {0, 1, 2, 3});
(%o5)                            {0, 1, 8, 27}
(%i6) map(lambda([x], x^3), {0, 1, 2, 3});
(%o6)                            {0, 1, 8, 27}
(%i7) map(lambda([x, y], x+y), {a}, {3});
(%o7)                               {a + 3}
(%i8) map(lambda([x], x^4), {-2, -1, 0, 1, 2});
(%o8)                            {0, 1, 16}
(%i9) map(g, {-2, -1, 0, 1, 2});
(%o9)                {g(- 2), g(- 1), g(0), g(1), g(2)}
(%i10) quit();

lambda() takes two arguments. First a list of arguments of the function being defined, and second the expression for the return value of the function using those arguments. %i1 defines a function f with one argument, returning its cube. %i2 calls f(). However, the whole point of using lambda is using it without defining an explicit function like f(). So, %i3 & %i4 demonstrate exactly that. %i6, %i7, %i8 shows using lambda() with map(). Note the elimination of duplicates in %o8. %i9 is an another example of map().

Basic set operations

Enough with varieties of set creation. Let’s do some set operations. For the starters: union of sets is defined as a set with items of all the sets; intersection of sets is defined as a set with items common in all the sets; difference of two sets is defined as a set with items from the first set, not in the second set. And here goes the demonstration:

$ maxima -q
(%i1) union({1, 2}, {1, 3, 4}, {1, 2, 6, 7});
(%o1)                         {1, 2, 3, 4, 6, 7}
(%i2) intersection({1, 2}, {1, 3, 4}, {1, 2, 6, 7});
(%o2)                                 {1}
(%i3) setdifference({1, 2}, {1, 3, 4});
(%o3)                                 {2}
(%i4) quit();

Other basic set operations provided by Maxima are:-

  • cardinality() – returns the number of distinct items in a set
  • elementp() – checks for an item to be a member of a set
  • emptyp() – checks for the emptiness of a set
  • setequalp() – compares two sets for equality
  • disjointp() – checks for no common items in two sets
  • subsetp() – checks for the first set to be a subset of the second set

 

The following walk through demonstrates all of these:

$ maxima -q
(%i1) S1: {}$
(%i2) S2: {1, 2, 3}$
(%i3) S3: {3, 1, 5-3}$ /* Same as S2 */
(%i4) S4: {a, b, c}$
(%i5) S5: {2, 1, 2}$
(%i6) cardinality(S1);
(%o6)                                  0
(%i7) cardinality(S2);
(%o7)                                  3
(%i8) cardinality(S3);
(%o8)                                  3
(%i9) cardinality(S4);
(%o9)                                  3
(%i10) cardinality(S5);
(%o10)                                 2
(%i11) elementp(b, S3);
(%o11)                               false
(%i12) elementp(b, S4);
(%o12)                               true
(%i13) emptyp(S1);
(%o13)                               true
(%i14) emptyp(S2);
(%o14)                               false
(%i15) setequalp(S1, S2);
(%o15)                               false
(%i16) setequalp(S2, S3);
(%o16)                               true
(%i17) disjointp(S1, S2);
(%o17)                               true
(%i18) disjointp(S2, S3);
(%o18)                               false
(%i19) disjointp(S3, S4);
(%o19)                               true
(%i20) disjointp(S3, S5);
(%o20)                               false
(%i21) subsetp(S1, S2);
(%o21)                               true
(%i22) subsetp(S2, S3);
(%o22)                               true
(%i23) subsetp(S3, S2);
(%o23)                               true
(%i24) subsetp(S3, S4);
(%o24)                               false
(%i25) subsetp(S5, S3);
(%o25)                               true
(%i26) subsetp(S3, S5);
(%o26)                               false
(%i27) quit();

Playing with set elements

After clearing the fundamentals, mostly through numerical examples, it is now time to have some fun with symbol substitution of Maxima. And here’s some playing around:

$ maxima -q
(%i1) S: {a, b, c, a};
(%o1)                             {a, b, c}
(%i2) S: {a+b, b+c, c+d, d+a};
(%o2)                   {b + a, c + b, d + a, d + c}
(%i3) subst(a=c, S);
(%o3)                          {c + b, d + c}
(%i4) subst([a=c, b=d], S);
(%o4)                              {d + c}
(%i5) subst([a=c, b=d, c=-d], S);
(%o5)                                {0}
(%i6) subst([a=1, b=2, c=-3], S);
(%o6)                       {- 1, 3, d - 3, d + 1}
(%i7) T: {S, {S}};
(%o7)   {{b + a, c + b, d + a, d + c}, {{b + a, c + b, d + a, d + c}}}
(%i8) subst([a=c, b=d, c=-d], T);
(%o8)                            {{0}, {{0}}}
(%i9) subst([a=1, b=2, c=-3], T);
(%o9)         {{- 1, 3, d - 3, d + 1}, {{- 1, 3, d - 3, d + 1}}}
(%i10) quit();

Nineteenth Article >>

   Send article as PDF