One of the side projects I’ve been working on lately is helping to shepherd the Semantic Versioning specification (SemVer) along to its 2.0.0 release. I want to thank everyone who sent pull requests and engaged in thoughtful, critical, spirited feedback about the spec. Your involvement has made it better!
I also want to thank Tom for creating SemVer in the first place and trusting me to help move it along.
I’ve mentioned SemVer in the past as it relates to NuGet. The 2.0.0 release of SemVer addresses some of the issues I raised.
What’s Changed?
Not too much has changed. Most of the changes focus around clarifications.
Build metadata
Perhaps the biggest change is the addition of optional build metatada (what we used to call a build number). This simply allows you to add a bit of metadata to a version in a manner that’s compliant with SemVer.
The metadata does not affect version precedence. It’s analogous to a code comment.
It’s useful for internal package feeds and for being able to tie a specific version to some mechanism that generated it.
For existing package managers that choose to be SemVer 2.0 compliant, the logic change needed is minimal. Instead of reporting an error when encountering a version with build metadata, all they need to do is ignore or strip the build metadata. That’s pretty much it.
Some package managers may choose to do more with it (for internal feeds for example) but that’s up to them.
Pre-release identifiers
Pre-release labels have a little more structure to them now. For example, they can be separated into identifiers using the “.” delimiter and identifiers that only contain digits are compared numerically instead of lexically. That way, 1.0.0-rc.1 < 1.0.0-rc.11 as you might expect. See the specification for full details.
Clarifications
The rest of the changes to the specification are concerned with clarifications and resolving ambiguities. For example, we clarified that leading zeroes are not allowed in the Major, Minor, or Patch version nor in pre-release identifiers that only contain digits. This makes a canonical form for a version possible.
If you find an ambiguity, feel free to report it.
What’s Next?
As SemVer matures, we expect the specification to become a little more formal in nature as a means of removing ambiguities. One such effort underway is to include a BNF grammar for the structure of a version number in the spec. This should hopefully be part of SemVer 2.1.
So, here I am writing some really fun code, when I found out that I am running into dead locks in the code. I activate emergency protocols and went into deep debugging mode.
After being really through in figuring out several possible causes, I was still left with what is effectively a WTF @!(*!@ DAMN !(@*#!@* YOU !@*!@( outburst and a sudden longing for something to repeatedly hit.
Eventually, however, I figure out what was going on.
I have the following method: Aggregator.AggregateAsync(), inside which we have a call to the PulseAll method. That method will then go and execute the following code:
1: public void PulseAll()2: {3: Interlocked.Increment(ref state);4: TaskCompletionSource<object> result;5: while (waiters.TryDequeue(out result))6: {7: result.SetResult(null);8: }9: }
After that, I return from the method. In another piece of the code (Aggregator.Dispose) I am waiting for the task that is running the AggregateAsync method to complete.
Nothing worked! It took me a while before I figured out that I wanted to check the stack, where I found this:
Basically, I had a dead lock because when I called SetResult on the completion source (which freed the Dispose code to run), I actually switched over to that task and allowed it to run. Still in the same thread, but in a different task, I run through the rest of the code and eventually got to the Aggregator.Dispose(). Now, I could only get to it if it the PulseAll() method was called. But, because we are on the same thread, that task hasn’t been completed yet!
In the end, I “solved” that by introducing a DisposeAsync() method, which allowed us to yield the thread, and then the AggregateAsync task was completed, and then we could move on.
But I am really not very happy about this. Any ideas about proper way to handle async & IDisposable?
Code is unforgiving. As the reasonable human beings that we are, when we review code we both know what the author intends. But computers can’t wait to Well, Actually all over that code like a lonely Hacker News commenter:
Well Actually, Dave. I'm afraid I can’t do that.
Hal, paraphrased from 2001: A Space Odyssey
As an aside, imagine the post-mortem review of that code!
Code review is a tricky business. Code is full of hidden mines that lay dormant while you test just to explode in a debris of stack trace at the most inopportune time – when its in the hands of your users.
The many times I’ve run into such mines just reinforce how important it is to write code that is intention revealing and to make sure assumptions are documented via asserts.
Such devious code is often the most innocuous looking code. Let me give one example I ran into the other day. I was fortunate to defuse this mine while testing.
This example makes use of the Enumerable.ToDictionary method that turns a sequence into a dictionary. You supply an expression to produce a key for each element. In this example, loosely based on the actual code, I am using the CloneUrl property of Repository as the key of the dictionary.
IEnumerable<Repository> repositories = GetRepositories(); repositories.ToDictionary(r => r.CloneUrl);
It’s so easy to gloss over this line during a code review and not think twice about it. But you probably see where this is going.
While I was testing I was lucky to run into the following exception:
System.ArgumentException: An item with the same key has already been added.
Doh! There’s an implicit assumption in this code – that two repositories cannot have the same CloneUrl. In retrospect, it’s obvious that’s not the case.
Let’s simplify this example.
var items = new[] { new {Id = 1}, new {Id = 2}, new {Id = 2}, new {Id = 3} }; items.ToDictionary(item => item.Key);
This example attempts to create a dictionary of anonymous types using the Id property as a key, but we have a duplicate, so we get an exception.
What are our options?
Well, it depends on what you need. Perhaps what you really want is a dictionary that where the value contains every item with the given key. The Enumerable.GroupBy method comes in handy here.
Perhaps you only care about the first value for a given key and want to ignore any others. The Enumerable.GroupBy method comes in handy in this case.
In the following example, we use this method to group the items by Id. This results in a sequence of IGrouping elements, one for each Id. We can then take advantage of a second parameter of ToDictionary and simply grab the first item in the group.
items.GroupBy(item => item.Id) .ToDictionary(group => group.Key, group => group.First());
This feels sloppy to me. There is too much potential for this to cover up a latent bug. Why should the other items be ignored? Perhaps, as in my original example, it’s fully normal to have more than one element for the key and you should handle that properly. Instead of grabbing the first item from the group, we retrieve an array.
items.GroupBy(item => item.Id) .ToDictionary(group => group.Key, group => group.ToArray());
In this case, we end up with a dictionary of arrays.
What if having more than one element with the same key is not expected and should throw an exception. Well you could just use the normal ToDictionary method since it will throw an exception. But that exception is unhelpful. It doesn’t have the information we probably want. For example, you just might want to know, which key was already added as the following demonstrates:
items.GroupBy(item => item.Id)
.ToDictionary(group => group.Key, group =>
{
try
{
return group.Single();
}
catch (InvalidOperationException)
{
throw new InvalidOperationException("Duplicate
item with the key '" + group.First().Id + "'");
}
});
}
In this example, if a key has more than one element associated with it, we throw a more helpful exception message.
System.InvalidOperationException: Duplicate item with the key '2'
In fact, we can encapsulate this into our own better extension method.
public static Dictionary<TKey, TSource> ToDictionaryBetter<TSource, TKey>( this IEnumerable<TSource> source, Func<TSource, TKey> keySelector) { return source.GroupBy(keySelector) .ToDictionary(group => group.Key, group => { try { return group.Single(); } catch (InvalidOperationException) { throw new InvalidOperationException( string.Format("Duplicate item with the key '{0}'", keySelector(@group.First()))); } }); }
Code mine mitigated!
This is just one example of a potential code mine that might go unnoticed during a code review if you’re not careful.
Now, when I review code and see a call to ToDictionary, I make a mental note to verify the assumption that the key selector must never lead to duplicates.
When I write such code, I’ll use one of the techniques I mentioned above to make my intentions more clear. Or I’ll embed my assumptions into the code with a debug assert that proves that the items cannot have a duplicate key. This makes it clear to the next reviewer that this code will not break for this reason. This code still might not open the hatch, but at least it won’t have a duplicate key exception.
If I search through my code, I will find many other examples of potential code mines. What are some examples that you can think of? What mines do you look for when reviewing code?
As I mentioned, I run into a very nasty issue with the TPL. I am not sure if it is me doing things wrong, or an actual issue.
Let us look at the code, shall we?
We start with a very simple code:
1: public class AsyncEvent2: {3: private volatile TaskCompletionSource<object> tcs = new TaskCompletionSource<object>();4:5: public Task WaitAsync()6: {7: return tcs.Task;8: }9:10: public void PulseAll()11: {12: var taskCompletionSource = tcs;13: tcs = new TaskCompletionSource<object>();14: taskCompletionSource.SetResult(null);15: }16: }
This is effectively an auto reset event. All the waiters will be released when the PulseAll it called. Then we have this runner, which just execute work:
1: public class Runner : IDisposable2: {3: private readonly ConcurrentQueue<TaskCompletionSource<object>> items =4: new ConcurrentQueue<TaskCompletionSource<object>>();5: private readonly Task<Task> _bg;6: private readonly AsyncEvent _event = new AsyncEvent();7: private volatile bool _done;8:9: public Runner()10: {11: _bg = Task.Factory.StartNew(() => Background());12: }13:14: private async Task Background()15: {16: while (_done == false)17: {18: TaskCompletionSource<object> result;19: if (items.TryDequeue(out result) == false)20: {21: await _event.WaitAsync();22: continue;23: }24:25: //work here, note that we do NOT use await!26:27: result.SetResult(null);28: }29: }30:31: public Task AddWork()32: {33: var tcs = new TaskCompletionSource<object>();34: items.Enqueue(tcs);35:36: _event.PulseAll();37:38: return tcs.Task;39: }40:41: public void Dispose()42: {43: _done = false;44: _event.PulseAll();45: _bg.Wait();46: }47: }
And finally, the code that causes the problem:
1: public static async Task Run()2: {3: using (var runner = new Runner())4: {5: await runner.AddWork();6: }7: }
So far, it is all pretty innocent, I think you would agree. But this cause hangs with a dead lock. Here is why:
Because tasks can share threads, we are in the Background task thread, and we are trying to wait on that background task completion.
Result, deadlock.
If I add:
1: await Task.Yield();
Because that forces this method to be completed in another thread, but that looks more like something that you add after you discover the bug, to be honest.
And… this test just passed!
Just to give you some idea, this is sitting on top of RavenDB’s implementation of leveldb. In fact, I have been using this code to test out the leveldb implementation.
But this actually store all the events, run the aggregation over them and give you the aggregated results. And the entire things works, quite nicely, even if I say so myself.
Starting with build 2603, we are now considering 2.5 to be a release candidate. It is feature frozen, and only bug fixes are going in.
You can download the new version here: http://hibernatingrhinos.com/builds/ravendb-unstable-v2.5, and it is also available on nuget as an unstable release.
There have been a lot of changes, most of which were recorded on this blog. Next week I’ll do a proper post about what is new, but in the meantime, you can check the release notes.
I urge you to download it and take it for a spin. We really need feedback from users about how it works, and any problems that you might run into.
This week we released some great updates to Windows Azure that make it significantly easier to develop mobile applications that use the cloud. These new capabilities include:
- Mobile Services: Custom API support
- Mobile Services: Git Source Control support
- Mobile Services: Node.js NPM Module support
- Mobile Services: A .NET API via NuGet
- Mobile Services and Web Sites: Free 20MB SQL Database Option for Mobile Services and Web Sites
- Mobile Notification Hubs: Android Broadcast Push Notification Support
All of these improvements are now available to use immediately (note: some are still in preview). Below are more details about them.
Mobile Services: Custom APIs, Git Source Control, and NuGet
Windows Azure Mobile Services provides the ability to easily stand up a mobile backend that can be used to support your Windows 8, Windows Phone, iOS, Android and HTML5 client applications. Starting with the first preview we supported the ability to easily extend your data backend logic with server side scripting that executes as part of client-side CRUD operations against your cloud back data tables.
With today’s update we are extending this support even further and introducing the ability for you to also create and expose Custom APIs from your Mobile Service backend, and easily publish them to your Mobile clients without having to associate them with a data table. This capability enables a whole set of new scenarios – including the ability to work with data sources other than SQL Databases (for example: Table Services or MongoDB), broker calls to 3rd party APIs, integrate with Windows Azure Queues or Service Bus, work with custom non-JSON payloads (e.g. Windows Periodic Notifications), route client requests to services back on-premises (e.g. with the new Windows Azure BizTalk Services), or simply implement functionality that doesn’t correspond to a database operation. The custom APIs can be written in server-side JavaScript (using Node.js) and can use Node’s NPM packages. We will also be adding support for custom APIs written using .NET in the future as well.
Creating a Custom API
Adding a custom API to an existing Mobile Service is super easy. Using the Windows Azure Management Portal you can now simply click the new “API” tab with your Mobile Service, and then click the “Create a Custom API” button to create a new Custom API within it:
Give the API whatever name you want to expose, and then choose the security permissions you’d like to apply to the HTTP methods you expose within it. You can easily lock down the HTTP verbs to your Custom API to be available to anyone, only those who have a valid application key, only authenticated users, or administrators. Mobile Services will then enforce these permissions without you having to write any code:
When you click the ok button you’ll see the new API show up in the API list. Selecting it will enable you to edit the default script that contains some placeholder functionality:
Today’s release enables Custom APIs to be written using Node.js (we will support writing Custom APIs in .NET as well in a future release), and the Custom API programming model follows the Node.js convention for modules, which is to export functions to handle HTTP requests.
The default script above exposes functionality for an HTTP POST request. To support a GET, simply change the export statement accordingly. Below is an example of some code for reading and returning data from Windows Azure Table Storage using the Azure Node API:
After saving the changes, you can now call this API from any Mobile Service client application (including Windows 8, Windows Phone, iOS, Android or HTML5 with CORS).
Below is the code for how you could invoke the API asynchronously from a Windows Store application using .NET and the new InvokeApiAsync method, and data-bind the results to control within your XAML:
private async void RefreshTodoItems() {
var results = await App.MobileService.InvokeApiAsync<List<TodoItem>>("todos", HttpMethod.Get, parameters: null);
ListItems.ItemsSource = new ObservableCollection<TodoItem>(results);
}
Integrating authentication and authorization with Custom APIs is really easy with Mobile Services. Just like with data requests, custom API requests enjoy the same built-in authentication and authorization support of Mobile Services (including integration with Microsoft ID, Google, Facebook and Twitter authentication providers), and it also enables you to easily integrate your Custom API code with other Mobile Service capabilities like push notifications, logging, SQL, etc.
Check out our new tutorials to learn more about to use new Custom API support, and starting adding them to your app today.
Mobile Services: Git Source Control Support
Today’s Mobile Services update also enables source control integration with Git. The new source control support provides a Git repository as part your Mobile Service, and it includes all of your existing Mobile Service scripts and permissions. You can clone that git repository on your local machine, make changes to any of your scripts, and then easily deploy the mobile service to production using Git. This enables a really great developer workflow that works on any developer machine (Windows, Mac and Linux).
To use the new support, navigate to the dashboard for your mobile service and select the Set up source control link:
If this is your first time enabling Git within Windows Azure, you will be prompted to enter the credentials you want to use to access the repository:
Once you configure this, you can switch to the configure tab of your Mobile Service and you will see a Git URL you can use to use your repository:
You can use this URL to clone the repository locally from your favorite command line:
> git clone https://scottgutodo.scm.azure-mobile.net/ScottGuToDo.git
Below is the directory structure of the repository:
As you can see, the repository contains a service folder with several subfolders. Custom API scripts and associated permissions appear under the api folder as .js and .json files respectively (the .json files persist a JSON representation of the security settings for your endpoints). Similarly, table scripts and table permissions appear as .js and .json files, but since table scripts are separate per CRUD operation, they follow the naming convention of <tablename>.<operationname>.js. Finally, scheduled job scripts appear in the scheduler folder, and the shared folder is provided as a convenient location for you to store code shared by multiple scripts and a few miscellaneous things such as the APNS feedback script.
Lets modify the table script todos.js file so that we have slightly better error handling when an exception occurs when we query our Table service:
todos.js
tableService.queryEntities(query, function(error, todoItems){
if (error) {
console.error("Error querying table: " + error);
response.send(500);
} else {
response.send(200, todoItems);
}
});
Save these changes, and now back in the command line prompt commit the changes and push them to the Mobile Services:
> git add .
> git commit –m "better error handling in todos.js"
> git push
Once deployment of the changes is complete, they will take effect immediately, and you will also see the changes be reflected in the portal:
With the new Source Control feature, we’re making it really easy for you to edit your mobile service locally and push changes in an atomic fashion without sacrificing ease of use in the Windows Azure Portal.
Mobile Services: NPM Module Support
The new Mobile Services source control support also allows you to add any Node.js module you need in the scripts beyond the fixed set provided by Mobile Services. For example, you can easily switch to use Mongo instead of Windows Azure table in our example above. Set up Mongo DB by either purchasing a MongoLab subscription (which provides MongoDB as a Service) via the Windows Azure Store or set it up yourself on a Virtual Machine (either Windows or Linux). Then go the service folder of your local git repository and run the following command:
> npm install mongoose
This will add the Mongoose module to your Mobile Service scripts. After that you can use and reference the Mongoose module in your custom API scripts to access your Mongo database:
var mongoose = require('mongoose');
var schema = mongoose.Schema({ text: String, completed: Boolean });
exports.get = function (request, response) {
mongoose.connect('<your Mongo connection string> ');
TodoItemModel = mongoose.model('todoitem', schema);
TodoItemModel.find(function (err, items) {
if (err) {
console.log('error:' + err);
return response.send(500);
}
response.send(200, items);
});
};
Don’t forget to push your changes to your mobile service once you are done
> git add .
> git commit –m "Switched to use Mongo Labs"
> git push
Now our Mobile Service app is using Mongo DB!
Note, with today’s update usage of custom Node.js modules is limited to Custom API scripts only. We will enable it in all scripts (including data and custom CRON tasks) shortly.
New Mobile Services NuGet package, including .NET 4.5 support
A few months ago we announced a new pre-release version of the Mobile Services client SDK based on portable class libraries (PCL).
Today, we are excited to announce that this new library is now a stable .NET client SDK for mobile services and is no longer a pre-release package. Today’s update includes full support for Windows Store, Windows Phone 7.x, and .NET 4.5, which allows developers to use Mobile Services from ASP.NET or WPF applications.
You can install and use this package today via NuGet.
Mobile Services and Web Sites: Free 20MB Database for Mobile Services and Web Sites
Starting today, every customer of Windows Azure gets one Free 20MB database to use for 12 months free (for both dev/test and production) with Web Sites and Mobile Services.
When creating a Mobile Service or a Web Site, simply chose the new “Create a new Free 20MB database” option to take advantage of it:
You can use this free SQL Database together with the 10 free Web Sites and 10 free Mobile Services you get with your Windows Azure subscription, or from any other Windows Azure VM or Cloud Service.
Notification Hubs: Android Broadcast Push Notification Support
Earlier this year, we introduced a new capability in Windows Azure for sending broadcast push notifications at high scale: Notification Hubs.
In the initial preview of Notification Hubs you could use this support with both iOS and Windows devices. Today we’re excited to announce new Notification Hubs support for sending push notifications to Android devices as well.
Push notifications are a vital component of mobile applications. They are critical not only in consumer apps, where they are used to increase app engagement and usage, but also in enterprise apps where up-to-date information increases employee responsiveness to business events. You can use Notification Hubs to send push notifications to devices from any type of app (a Mobile Service, Web Site, Cloud Service or Virtual Machine).
Notification Hubs provide you with the following capabilities:
- Cross-platform Push Notifications Support. Notification Hubs provide a common API to send push notifications to iOS, Android, or Windows Store at once. Your app can send notifications in platform specific formats or in a platform-independent way.
- Efficient Multicast. Notification Hubs are optimized to enable push notification broadcast to thousands or millions of devices with low latency. Your server back-end can fire one message into a Notification Hub, and millions of push notifications can automatically be delivered to your users. Devices and apps can specify a number of per-user tags when registering with a Notification Hub. These tags do not need to be pre-provisioned or disposed, and provide a very easy way to send filtered notifications to an infinite number of users/devices with a single API call.
- Extreme Scale. Notification Hubs enable you to reach millions of devices without you having to re-architect or shard your application. The pub/sub routing mechanism allows you to broadcast notifications in a super-efficient way. This makes it incredibly easy to route and deliver notification messages to millions of users without having to build your own routing infrastructure.
- Usable from any Backend App. Notification Hubs can be easily integrated into any back-end server app, whether it is a Mobile Service, a Web Site, a Cloud Service or an IAAS VM.
It is easy to configure Notification Hubs to send push notifications to Android. Create a new Notification Hub within the Windows Azure Management Portal (New->App Services->Service Bus->Notification Hub):
Then register for Google Cloud Messaging using https://code.google.com/apis/console and obtain your API key, then simply paste that key on the Configure tab of your Notification Hub management page under the Google Cloud Messaging Settings:
Then just add code to the OnCreate method of your Android app’s MainActivity class to register the device with Notification Hubs:
gcm = GoogleCloudMessaging.getInstance(this);
String connectionString = "<your listen access connection string>";
hub = new NotificationHub("<your notification hub name>", connectionString, this);
String regid = gcm.register(SENDER_ID);
hub.register(regid, "myTag");
Now you can broadcast notification from your .NET backend (or Node, Java, or PHP) to any Windows Store, Android, or iOS device registered for “myTag” tag via a single API call (you can literally broadcast messages to millions of clients you have registered with just one API call):
var hubClient = NotificationHubClient.CreateClientFromConnectionString(
“<your connection string with full access>”,
"<your notification hub name>");hubClient.SendGcmNativeNotification("{ 'data' : {'msg' : 'Hello from Windows Azure!' } }", "myTag”);
Notification Hubs provide an extremely scalable, cross-platform, push notification infrastructure that enables you to efficiently route push notification messages to millions of mobile users and devices. It will make enabling your push notification logic significantly simpler and more scalable, and allow you to build even better apps with it.
Learn more about Notification Hubs here on MSDN .
Summary
The above features are now live and available to start using immediately (note: some of the services are still in preview). If you don’t already have a Windows Azure account, you can sign-up for a free trial and start using them today. Visit the Windows Azure Developer Center to learn more about how to build apps with it.
Hope this helps,
Scott
P.S. In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu
Because, clearly, that is what is missing. RavenDB GetAll extension method

As mentioned earlier, HyperDex has made some changes to LevelDB to make it work faster for their scenarios. I was curious to see what changed, so I took a look at the code. In my previous post, I dealt with compaction, but now I want to deal exclusively with the changes that were made to leveldb to make writes more concurrent.
Another change that was made that I am really not sure that I am following is the notion of concurrent log writer. This relies on the pwrite() method, which allows you to write a buffer to a file at a specified position. I have not been able to figure out what is going on if you have concurrent writes to that file. The HyperDex modifications include synchronization on the offset where they will actually make the write, but after that, they make concurrent calls. It make sense, I guess, but I have several problems with that. I was unable to find any details about the behavior of the system when making concurrent calls to pwrite() at the end of the file, especially since your position might be well beyond the current end of file.
I couldn’t figure out what the behavior was supposed to be under those conditions, so I fired up a linux machine and wrote the following code:
1: int main() {2: char* str1 = "1234\n";3: char* str2 = "5678\n";4: int file_descriptor;5: int ret;6: char fn[]="test";7:8: if ((file_descriptor = creat(fn, S_IRUSR | S_IWUSR)) < 0)9: {10: perror("creat() error");11: return -1;12: }13: else {14: ret = pwrite(file_descriptor, str2, 5, 5);15: printf("Wrote %d\n", ret);16:17: ret = pwrite(file_descriptor, str1, 5, 10);18: printf("Wrote %d\n", ret);19:20: if (close(file_descriptor)!= 0)21: perror("close() error");22:23: }24:25: return 0;26: }
I’ll be the first to admit that this is ugly code, but it gets the job done, and it told me that you could issues those sort of writes, and it would do the expected thing. I am going to assume that it would still work when used concurrently on the same file.
That said, there is still a problem, let us assume the following sequence of events:
- Write A
- Write B
- Write A is allocated range [5 – 10] in the log file
- Write B is allocated range [10 – 5] in the log file
- Write B is done and returns to the user
- Write A is NOT done, and we have a crash
Basically, we have here a violation of durability, because when we read from the log, we will get to the A write, see it is corrupted and stop processing the rest of the log. Effectively, you have just lost a committed transaction. Now, the log format actually allows that to happen, and a sophisticated reader can recover from that, but I haven’t seen yet any signs that that was implemented.
To be fair, I think that the log reader should be able to handle zero'ed data and continue forward. There is some sort of a comment about that. But that isn't supported by the brief glance that I saw, and more importantly, it isn't won't help if you crashed midway through writing A, so you have corrupted (not zero'ed) data on the fie. This would also cause the B transaction to be lost.
The rest appears to be just thread safety and then allowing concurrent writes to the log, which I have issue with, as I mentioned. But I am assuming that this will generate a high rate of writes, since there is a lot less waiting. That said, I am not sure how useful that would be. There is still just one physical needle that writes to the disk. I am guessing that it really depends on whatever or not you need the call to be synced or not. If you do, there are going to be a lot more fsync() than before, when it was merged into a single call.
It appears that in my previous post I have had an issue with how I read the code. In particular, I looked at the commit log and didn’t look at the most recent changes with regards to how HyperLevelDB does the writes. Robert Escriva has been kind enough to point me in the right direction.
The way that this works is a lot more elegant, I think.
When you want to make a write to a file, you ask for a segment at a particular offset. If we have that offset already mapped, we give it to the caller. Otherwise, we increase the file size if needed, then map the next segment. That part is done under a lock, so there isn’t an issue of contention over the end of the file. That is much nicer than the pwrite method.
That said, however, I am still not sure about the issue with the two concurrent transactions. What actually happens here is that while we gained concurrency in the IO, there is still some serialization going on. In other words, even though transaction B was actually flushed to disk before transaction A, it would still wait for transaction A to complete, alleviating the concern that I have had about it.
As mentioned earlier, HyperDex has made some changes to LevelDB to make it work faster for their scenarios. I was curious to see what changed, so I took a look at the code.
The fun part is that I can easily check just the things that were changed by HyperDex, I don’t need to go read and understand a completely new engine. I’m currently just looking at the logs, and noting important stuff.
A lot of the work appears to have been concentrated on the notion of compaction. In particular, compaction means that we need to merge multiple files into a single file at a higher level. That is great, until you realize that this may mean that you’ll be writing a lot more data than you though you would, because you keep copying the data from one file to another, as it bubbles deeper into the levels.
The first thing that seemed to have happened is that when you get to a deep enough level, the file sizes that you are allowed to have become much larger, that means that you’ll have more data at higher level and likely reduce the number of times that you need to compact things.
The next step in the process was to create two parallel processes. One to handle the common compaction from memory to level-0 and the other to handle level-0 and down. I assume that this was done to avoid contention between the two. It is far more common to have a memory to level-0 compaction than level-0 down, and having to sync between the two is likely causing some slow down. The next change I noticed was tightening locks. Instead of having a single lock that was used for more writes, there now appears to be more granular locks, so you gain more concurrency under parallel load.
Next, the change was made to the compaction heuristics. I am not sure that I understand yet the changes, especially since I am just going through the commit log now. Once thing I noticed right away is that they removed the seek budget that you had for files. It appears that compactions triggered by reads were a source of too much pain for HyperDex, so it was removed. Note that HyperDex appears to be very interested in a consistent performance for high write scenarios.
Looking at the logs, it appears that there were a lot of strategies attempted for getting a better compaction strategy for what HyperDex is doing. Interestingly enough (and good for them) I see a lot of back & forth, trying something out, then backing out of it, etc. Sometimes over the courses of several weeks or months. A lot of that appears to be more like heuristically setting various things, like number of compacting threads, the various config options, etc. Trying to find the best match.
Another thing that I guess improved performance is that leveldb will intentionally slow down writes when is is doing compaction, or just about to, to reduce the load when heavy background operations are running. HyperLevelDB removed that, so you will have more load on the system, but no artificial governors on the system performance.
But coming back to compactions, it appears that what they ended up doing is to have three levels of background compactions.
- Memory Table –> Level0 files.
- Optimistic Compaction
- Background Compaction
To be honest, I think that at least between the last two there is a lot of code that can be shared, so it is a bit hard to read, but basically. We have an optimistic compaction thread running that waits for enough data to comes through to level 0, at which point it will try to do a low cost, high yield compaction. If that isn’t enough, we have the background compaction that will kick in, but that seems to be for there is really no other alternative.
Given the importance of compaction to performance, it would probably make more sense to split them into a separate concept, but I guess that they didn’t want to diverge too much from the leveldb codebase.
After having struggled with understanding Paxos for a while, I run into the Raft Draft paper, and I was very impressed. To start with, I have read at least four or five papers about Paxos, read 2 – 5 implementations of the algorithm in different languages, implemented it myself, and I am still not really comfortable about it.
On the other hand, after doing a single pass through the Raft paper, I have a much greater sense of understanding what it is about, how it works and even how I can implement that, if I want to. Hell, I fully expect to be able to hand that paper to a passerby CS student and get a working implementation without needing to get Sheldon Cooper involved. One thing to note, this paper and algorithm were heavily focused on making this understandable, and I think that they were quite successful in doing that. For that matter, I wish that other papers were this easy to read and follow.
Very interesting, and unlike the Paxos paper, immediately and almost painfully obvious how you would actually make use of something like that.
I was running into this problem. We were using phantomjs to run speed tests and weight tests for our web pages, and it all worked well on my machine.
Then it worked VERY slowly (five times slower!) on the build agent.
The fix was simple, once I single handedly googled it.
Go to Internet explorer on that machine.
- Click internet Options
- Click Connections
- Click Lan Settings
- UNCHECK “Automatically detect settings”
As an extra measure, you might want to disable IPV6 on the Ethernet adapter. Seems to fix it for some people.
Kellabyte rightfully points out that leveldb was designed primarily for mobile devices, and that there have been many things that can be done that weren’t because of that design goal. I was already pointed out to HyperDex’s port of leveldb, and I am also looking into what Basho is doing with it.
You can read a bit about what HyperDex did to improve things here. And Basho challenges are detailed in the following presentation. The interesting thing about this is why people want to go to leveldb to start with.
To put it simply, it is a really good product, in the sense that it is doing what it needs to do, and it is small enough so you can understand it in a reasonably short order. The fun starts when you disagree with the decisions made for you by leveldb, it is actually quite easy to make changes to the root behavior of the project without making significant changes to the overall architecture. Looking over the HyperDex changes overview, it appears that they were able to make very targeted changes. All the hard problems (storing, searching, atomicity, etc) were already solved, now the problem is dealing with optimizing them.
I’m going to review the changes made by both Basho & HyperDex to leveldb, because I think that it would be a facinating object of study. In particular, I was impressed with how leveldb merged transactions, and I am looking forward at how HyperDex parallelized them.
I was pointed out to this article about friction in software, in particular, because I talk a lot about zero friction development. Yet the post show a totally different aspect of friction.
I would actually agree with that definition of friction. The problem is that there are many types of frictions. There is the friction that you have to deal with when you build a new project, in a new domain or using new technologies, so you need to figure out how to make those things work, and you spend a lot of your time just getting things working. Then there is the friction of the process you are using. If you need to have a work item associated with each commit (and only one commit), it means that you are going to either commit more slowly, or spend a lot more time just writing effectively useless work items. Then you have the friction of the common stuff. If your users require you fairly elaborate validation, even a simple date entry form can become an effort taking multiple days.
All of those are various types of frictions. And all of those adds up to the time & cost of building software. I tend to divide them in my head to friction & taxes. Friction is everything that gets in the way unnecessarily. For example, if I need to spend a day making a release, that is friction that I can easily automate. If I need to spend a lot of time on the structure of my software, that is friction. If I have checklists that I need to go through, that is usually friction.
Then there are the taxes, stuff that you do because you have to. In my case, this is usually error handling and production readiness, failure analysis and recovery, etc. In other cases this can be i18n, validation or the like. Those are things that you can streamline, but you can’t really reduce. If you need to run in a HA environment, you are going to be needing to write the code to do that, and test that. And that ain’t friction, even though it slows you down. It is just part of the cost of doing business.
And, of course, we have the toll trolls. That is what I call to those things that you don’t have to do, but are usually forced upon you. Sometimes it is for a valid reason, but a lot of the time it is Just Because. Example of tolls that are being laid upon software include such things as having to integrate with the CRM / ERP / TLD of the day. Being forced to use a specific infrastructure, even though it is widely inappropriate. And, possibly the best thing ever: “I went golfing with Andrew this weekend, and I have great news. We are a Java shop now!”