A few weeks ago, I took a shallow dive into Swift programing on Linux. I managed to set up my environment, learned the basics of the language and coded a few trivial beginner-level programs.
Today, I want to take the next step on my Swift jorney and try my hand at a bit more serious challenge. To that end, I've picked an excellent programming interview question presented by Oren Eini in this blog post.
We have the following file (the full data set is 276 MB), that contains the entry / exit log to a parking lot.
The first value is the entry time, the second is the exit time and the third is the car id.
Details about this file: This is UTF8 text file with space separated values using Windows line ending.
What we need to do is to find out how much time a car spent in the lot based on this file.
Ayende puts a lot of emphasis on optimizing the code, but I am less interested in that. The reason I picked this particular task is because it's a good reflection of the kind of programming tasks I'm used to solving in real-life backend programming:
- Perform IO operation(s) to read and parse the data from an external source (file, database, API)
- Chew through data to extract meaning according to business logic
- Present results to the end user
Of course, each of these steps is much simpler here than in reality. But it's present nonetheless, and that's what makes this an excellend learning excercise.
So let's take it one step at a time.
Step 1 - Input
I want to have a dual input into my program. If user submits a command line argument, I will treat that as path to the target file and try to load it from there. Otherwise, I'll read it from standard input, so user can pipe the data in (or type it themselves, if they are particularly bored).
Reading from stancard input
This turned out to be a breeze. Swift comes with standard library function readLine
, which will parse input line by line and return nil
once it reaches the end.
I also created a little stub method, handleLine
, which will do the data crunching later.
func handleLine(_ line: String) {
print(line)
}
func readFromStdIn() {
print("Reading from STDIN:")
var line: String?
while true {
line = readLine();
if (line != nil) {
handleLine(line!)
} else {
break
}
}
}
readFromStdIn()
Reading from file
This one was a lot more difficult. At the point of writing this article (2017/03, v3.0.2), Swift's IO capabilities seem woefully unequiped for serious IO work outside of iOS. Its FileManager
class is a mindfield of not implemented
errors and unhandled edge cases (eg. passing "~"
to fileManager.contentsOfDirectory
instead of absolute path causes an uncatchable segfault).
Swift does offer a way to load file directly into a string using a special String
constructor (As an aside, I really dislike API-s like this. Why would String know about files!?). This allows me to write a naive solution like this:
func readFromFile(_ rawFilename: String) {
let filename = NSString(string: rawFilename).expandingTildeInPath
print("Reading from file: \(filename)")
let content = try? String(contentsOfFile: filename, encoding: String.Encoding.utf8)
if content == nil {
print("Couldn't read input")
return
}
let data = content!.components(separatedBy: "\r\n")
for line in data {
handleLine(line)
}
}
Unfortunately, this will not fly with arbitrarily sized datasets, like the 200MB file that Oren provided. I needed a way to stream the data through instead of loading it all at once.
Swift's standard library had nothing to offer here. Looking around, I found a decent looking 3rd party library which offered to solve all my streaming needs. This felt like an overkill, so I opted instead for Martin R's accepted answer on Stack Overflow.
Note
This didn't turn out to be such a great idea. This StreamReader
class seems to mangle my console output in some cases. Oh well. There is still the stdin.
After I copy-pasted his StreamReader
implementation into a separate file and imported it, my code came down to this:
func readFromFile(_ rawFilename: String) {
let filename = NSString(string: rawFilename).expandingTildeInPath
print("Reading from file: \(filename)")
if let sr = StreamReader(path: filename) {
defer {
sr.close()
}
for line in sr {
handleLine(line)
}
}
}
Not bad.
More than one code file
It seems swift will automatically load all .swift
files in the sources
directory. File names don't matter.
import
statements are only used for external modules, which you declare as dependencies inside Package.swift
.
On the downside, once I added multiple files, I could no longer use swift interpreter directly to run my code (swift main.swift
). I had to add actual compilation in my code/run/debug loop.
Chosing between inputs
My final main.swift
structure looks like this:
import Foundation
func handleLine(_ line: String) {
print(line)
}
func readFromStdIn(_ onLine: (String) -> Void) {
//...
}
func readFromFile(_ rawFilename: String, _ onLine: (String) -> Void) {
//...
}
if (CommandLine.arguments.count > 1) {
readFromFile(CommandLine.arguments[1], handleLine)
} else {
readFromStdIn(handleLine)
}
As per the spec, if program is given a CLI argument, it treats that as the source file path. Otherwise, it reads from stdin
.
I've also removed hard dependency on handleLine
from the read functions, and am now passing it as an onLine
handler. That refactoring went very smoothly. First class functions in Swift are intuitive and powerful.
Aside - structs and classes
I've been using various classes and class-like things without paying them much attention. It's time to take a beief break and summarize my findings so far.
-
Structs are value types, like arrays, enums, dictionaries and pretty much everything else. They are copied on assigment.
-
Classes are reference types. They can have inheritence. You can use
===
to compare two class references. -
Memory for classes is managed by the Swift runtime using automatic reference counting. In practice, this is (almost) as simple as garbage collection. You usually only need to set things to
nil
and the runtime will handle the rest. -
Values inside classes and structs are called
properties
. You can have stored properties, which are just plain values (like attributes in C# and Java), and computed properties, which are your usual functions that pretend to be values (defined withget
and optionalset
). -
Constructors are called
initializer
-s in Swift, they use theinit
reserved word. You can have many overloaded initializers and call them from one another. -
You can use
self
to refer to the current instance or type (in case of static methods) -
All members are accessed using the "dot" notation. There is no difference in treatement between reference and value types.
struct Address {
var street = ""
var city = ""
var country = ""
}
class Person {
var name = ""
var surname = ""
var address = Address()
var fullName: String {
return name + " " + surname
}
init(name: String, surname: String) {
self.name = name
self.surname = surname
}
}
var homer = Person(
name: "Homer",
surname: "Simpson"
)
homer.address.city = "Springfield"
print("\(homer.fullName) lives in \(homer.address.city)") //> Homer Simpson lives in Springfield
Interfaces are called protocols
. They are used exactly how you'd expect. For example, I can add CustomStringConvertible
protocol to my Person
class, so it will know how to stringify itself.
class Person: CustomStringConvertible {
//...
var description: String {
return "\(fullName) lives in \(address.city)";
}
//...
}
//...
print(homer) //> Homer Simpson lives in Springfield
Swift classes support single inheritence, with usual overloading mechanisms. You can use super
to refer to properties, methods and inits from the parent class.
protocol Talker {
var saying: String {get set}
func talk()
}
class SimpsonsCharacter: Person, Talker {
var saying = ""
init(name: String, surname: String, saying: String) {
super.init(name: name, surname: surname)
self.saying = saying
}
func talk() {
print("\(fullName) says \(saying)")
}
}
var homer = SimpsonsCharacter(
name: "Homer",
surname: "Simpson",
saying: "Doh!"
)
homer.talk() //> Homer Simpson says Doh!
There are a lot more details (including some actual innovations) in the swift docs, but this is the gist of it.
In general, the entire class system is very similar to C#. Every time I wondered how some OOP-related thing in Swift worked, I just asked myself "how would C# do it?", and I was right more often than not.
Step 2: Processing
Back to the interview question.
For the processing part, I've decided to take the most straightforward approach possible. I'll parse line by line as they come, calculate the time difference, and tally it up in a dictionary. Concurrency can wait until Apple figures it out in Swift 4.
Since this code will involve shared state, I moved my handler into a class.
class Accumulator {
private var data = [String: TimeInterval]()
func submitLine(_ line: String) {
// TODO
}
}
//...
let accumulator = Accumulator()
if (CommandLine.arguments.count > 1) {
readFromFile(CommandLine.arguments[1], accumulator.submitLine)
} else {
readFromStdIn(accumulator.submitLine)
}
By the way, you can totally pass a method around the same as you would function, and it will bring its class along. None of that javascript bind()
nonsense.
Next, parsing the lines.
class Accumulator {
//...
private let dateFormatter: DateFormatter = {
let df = DateFormatter()
df.dateFormat = "yyyy-MM-dd'T'HH:mm:ss"
df.timeZone = TimeZone(secondsFromGMT: 0)
return df
}()
func submitLine(_ line: String) {
let parts = line.components(separatedBy: " ")
if parts.count < 3 {
print("Invalid line: \(line)")
return
}
guard let from = dateFormatter.date(from: parts[0]), let to = dateFormatter.date(from: parts[1]) else {
print("Invalid dates: \(parts[0]), \(parts[1])")
return
}
let car = parts[2]
//...
First, I create a DateFormatter
instance that will be used to parse dates. If I were on iOS, I could have used the premade ISO8601DateFormatter
. On linux, I had to setup my own using this nifty property initializer (note the ()
after the code block!).
Then I split the line on spaces (I presume proper data format, so no significant error handling) and parse the dates. Normally, each dateFormatter.date(from:)
call would produce a nullable version of a date (Date?
). I used the guard
statement to coerce them into ordinary Date
s. If any fails to parse, it'll throw me into the else
block.
Finally, calculate the duration and tally it up:
func submitLine(_ line: String) {
//...
let elapsed = to.timeIntervalSince(from)
data[car] = (data[car] ?? 0) + elapsed
}
TimeInterval
type is just an alias for Double
. By the way, I hate this type aliasing stuff. Just call things what they are (and yes, I know about different type size on different architectures, it doesn't apply here for Double
).
??
is a nullable-value-or-fallback operator, once again lifted straight from C#.
Step 3: Output
The outputting of data is straightforward enough. Or at least it would be if built in functionalities in Swift worked on Linux.
DateComponentsFormatter
is another class that is not available on Linux at the moment. So I had to quickly put together my own time interval formatter.
let MINUTE = 60
let HOUR = MINUTE * 60
func formatInterval(_ interval: TimeInterval) -> String {
let hr = Int(interval / Double(HOUR))
let min = Int((interval - Double(hr * HOUR)) / Double(MINUTE))
let sec = Int(interval - Double(hr * HOUR + min * MINUTE))
let ms = Int((interval - Double(hr * HOUR + min * MINUTE + sec)) * 1000)
return "\(hr):\(min):\(sec):\(ms)"
}
I used UPPER_CASE style for time duration constants. I am not sure how idiomatic this is in Swift, but it makes sense to me.
Notice all those Int()
and Double()
casts? I found Swift to be waaay to fussy with type conversions between integers and floats. You basically have to cast everything, even when it feels like the language should just do the right thing (eg. subtracting Int
from Double
). I've been hearing other people complain about this, and now I see why.
Here's the final code structure.
class Accumulator {
//...
func printResults() {
print("Result count: \(data.count)")
for (car, duration) in data {
print("\(car) \(formatInterval(duration))")
}
}
}
//...
let startedAt = Date()
let accumulator = Accumulator()
if (CommandLine.arguments.count > 1) {
readFromFile(CommandLine.arguments[1], accumulator.submitLine)
} else {
readFromStdIn(accumulator.submitLine)
}
accumulator.printResults()
let elapsed = Date().timeIntervalSince(startedAt)
print("-------------------------")
print("Total time \(formatInterval(elapsed))")
The code will just print the entire output into the terminal. Nothing too wild here.
I have also added some basic time tracking. I didn't even try to implement memory usage metrics from the OP.
I compiled, piped the 200MB file into the app and was hit with an unpleasent surprise.
Performance
Eight. Freaking. Minutes.
Compare that with the claimed 30 seconds runtime from the OP. And that was for an unoptimized solution.
I compiled ayende's C# code using Mono and, on the same VM, it generated results in 75-80 seconds range. Not as good as real .NET on bare metal Windows, on a fast computer, but still a lot better than Swift.
So what gives?
The first thing I tried was to switch to the release
build (swift build -c release
). Maybe they are doing some amazing instrumentation that's killing the performance?
Nope. It still took more than 8 minutes.
Switching between stdin
and file reader didn't seem to make a lot of difference either. Just to make sure, I implemented and instrumented a naive read-all implementation. It loaded the entire dataset into RAM in 12 seconds, then proceeded to chew it for 8 minutes. So IO isn't the culprit.
Next potential hotspot was date parsing. So I hardcoded it away.
/*guard let from = dateFormatter.date(from: parts[0]), let to = dateFormatter.date(from: parts[1]) else {
print("Invalid dates: \(parts[0]), \(parts[1])")
return
}*/
//let elapsed = to.timeIntervalSince(from)
let elapsed:Double = 60
Even with this, the code took between 59 and 61 seconds to run.
Finally, I removed the Accumulator
class and dynamic onLine
delegation, and did the entire work inside one function. That brought it down to 45 seconds, which, considering this code does little more than iterate through lines, is... pretty terrible.
In the end, I failed to find any one silver bullet chokepoint I could get rid of and make this program perform. Swift was just slower than C#, no matter what I tried.
But what about that article claiming how Swift web frameworks are already faster than node.js? Well, I put together a quick node.js solution, and it chewed through the entire list in 40 seconds. So much for that.
Maybe I am just doing something incredibly stupid here, but I can't imagine what could that be. Feel free to clone the code and check it out yourselves.
Conclusion
The wild love afair with Swift is over and warts are starting to show up. Linux support is lackluster. API-s are incomplete. Documentation, outside of the excellent language guide, is too abstract and fragmented between different versions.
The language may be better than Objective-C, but it's still way too fussy and verbose compared to its closest inspiration, C#. Both ayende's C# implementation and my node version look (in my eyes) leaner and cleaner than Swift.
Worst of all, Swift on Linux is slow. Whatever magic Apple is doing on iOS, so far it didn't seem to translate into my VM.
Server side Swift in 2017 just doesn't seem to be ready for prime time.
But that doesn't mean it never will be. I still like some of its innovations and clean-slate optimism that every new language brings. Time is surely on Swift's side. I will definitely keep an eye on it over the coming months and years.