You are viewing totherme

totherme - "Thinking in types" or "The mind of a programmer" [entries|archive|friends|userinfo]
totherme

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

"Thinking in types" or "The mind of a programmer" [Apr. 27th, 2007|04:38 pm]
Previous Entry Share Next Entry
[Tags|, , , ]

In a previous post, pozorvlak and I wondered about the differences between the thought processes that goes into writing good static code, and those that go into good dynamic code. We figured that there wasn't a lot out there to help dynamic programmers get the hang of static style thinking, so what follows is a simple little toy example, solved in what I think is probably a fairly static typey style.

This is not a beginners haskell tutorial - I'm going to assume that you know about lists, higher order functions and fold in particular. This is for people that understand functional programming, but are frustrated by strongly statically typed functional programming. Actually, if there's anyone other than pozorvlak reading this who's in that position, I'd appreciate a comment - I've no idea how many of you exist, or if this is the sort of thing that helps.


The problem is to write a function that works in a similar way to the perl "split" function. The one I choose to write behaves slightly differently from the perl one - I'll work with polymorphic lists rather than just strings of characters (there's no reason not to), and I'll restrict the argument I split on to being a singleton in the list. I appreciate that no true perl programmer would forego this functionality, but I think it makes this little toy far easier to understand, and the function is still useful even with that restriction.

There are lots of ways to write this function, and this is how I happened to hack it together earlier today, when I needed to use use it for something. There are certainly better ways to write the function - my intention here is to try to give voice to as many as possible of the little subconscious thoughts that make programming happen.

This is statically typed programming, so we start with the type we want:
split :: b -> [b] -> [[b]]

It's clearly a list function, and map probably isn't going to do it because ther's no obvious relationship between the length of the output list and the length of the input list. Will fold work?
foldl :: (a -> b -> a) -> a -> [b] -> a

In the case where a is equal to [[b]] we have:
foldl :: ([[b]] -> b -> [[b]]) -> [[b]] -> [b] -> [[b]]

and I'll be able to do:
split x = foldl f y

Which looks like it'll typecheck for some approproate f and y.
Let's choose the simplest thing first - y. This is the bounds case of the fold - the thing that gets returned of the final argument is just an empty list. "" won't type check as [[b]], so let's try [[]], and see what happens. Now, can we find the f? If we can, it will certainly be of type:
f :: [[b]] -> b -> [[b]]

If y is [[]], then f doesn't need to worry about the [] case for the first arg, so we can have (remembering that x is in scope already):
f (acc@(y:ys)) z = if (z==x) then []:acc
                             else (z:y):ys

Notice that we've used == in there, so we'll need our list elements to be in class Eq. The function as a whole, then, will look like:
split :: (Eq a) => a -> [a] -> [[a]]
split x = foldl f [[]] where
        f (acc@(y:ys)) z = if (z==x) then []:acc
                                     else (z:y):ys

Let's try testing it :)
*Main> split ',' "a list, of, wordey thingies "
[" seigniht yedrow ","fo ","tsil a"]
*Main>


lol :)

*Main> reverse $ map reverse $ split ',' "a list, of, wordey thingies "
["a list"," of"," wordey thingies "]
*Main>

That's better.

The behaviour on the empty list case is still slightly annoying:
*Main> reverse $ map reverse $ split ',' ""
[""]
*Main>

...but at this stage I had what I needed for my own little hack, so I was happy. This isn't exercise-code that's being marked - it's not particularly pretty or general or fast. It's just a little snippit of something I wanted to run. If you've been coding in the same paradigm (scripts, OO, whatever) for a long time, then I imagine this thought process will seem like an awful lot of effort, for very little result. I don't think it's a lot of effort though - if I hadn't made the decision to self-analyse while I was writing this little function, I'd have done it in a few seconds without thinking about it. I think that all programmers do a comparable amount of work whenever they write a little function or procedure like this, but it's all subconscious, and we rarely talk or even think about it.

I'd be interested to see how other people solve similarly simple problems. I'm not interested in the solution itself, I'm interested in the thoughts that lead you to that solution. How do the rest of you do it?
linkReply

Comments:
From: twanvl
2007-04-27 05:22 pm (UTC)

(Link)

First of all, a very nice introduction on type based programming!

In your implementation you use foldl, which causes your reverse problems. This happens foldl basicly has a built-in reverse, so you have to undo it afterwards. It is better to use foldr (when in doubt, always use foldr), this gives you
split x = foldr f [[]] where
        f z (acc@(y:ys)) = if (z==x) then []:acc
                                     else (z:y):ys

Or maybe using guards instead of if (which is IMO more Haskell like)
split x = foldr f [[]]
  where  f z (acc@(y:ys))
           | z == x     =  []:acc
           | otherwise  =  (z:y):ys
[User Picture]From: totherme
2007-04-27 05:43 pm (UTC)

(Link)

Thank you :)

Yes, foldr and guards do look better - cheers :)
From: (Anonymous)
2007-04-27 05:56 pm (UTC)

(Link)

Found this off the Planet Haskell blogs. Being fairly new to FP myself, I still don't think much in types, and hadn't really thought of starting with a matching type first then filling in the blanks. I think other beginners might really like some elaboration on the thought process of matching the types up; it took me a few moments to realize first that split was point-free, that split was matching the types from the right, that the 'y' parameter was needed to fill in the second parameter of foldl to make the types perfectly fit, etc.

Then there was the natural tendency for me to want to make the polymorphic type names correspond to each other; things were made easy due to your declaration of split looking like:

split :: b -> [b] -> [[b]]

Which made matching it up with foldl easy by just substituting [[b]] for a and not having to touch b. Whereas the natural tendency would be to write:

split :: a -> [a] -> [[a]]

Thus making the substitution in one's head a little trickier for a beginner when looking at foldl. I find that writing my own types using x/y/z for type parameters is easier.

There's precious few "thinking in types" tutorials that are actually accessible to beginners, so reading this was quite eye-opening. Keep it up! :)
From: (Anonymous)
2007-04-27 06:26 pm (UTC)

(Link)

Sorry that wasn't as clear as I wanted to make it... I should say I use x/y/z in my types when I'm looking for a match, and then I convert them back to a/b/c afterward. As for "matching from the right", that is to say that split's type matches the right side of foldl.

I also discovered hoogle, which looks fantastic for finding matching types .. though I should probably get good at doing it in my head first. Still, I can't wait for an IDE with built in hooglitude :)
[User Picture]From: totherme
2007-04-27 06:45 pm (UTC)

(Link)

Hoogle is fantastic - and I use it often :)

I think the closest we have to and IDE with built in hooglitude is GHCi on Acid as mentioned here. There's also vim integration, if you want it :)
[User Picture]From: totherme
2007-04-27 06:36 pm (UTC)

(Link)

Thanks - it's nice to know that this stuff is useful to someone :)

I think other beginners might really like some elaboration on the thought process of matching the types up; it took me a few moments to realize first that split was point-free, that split was matching the types from the right, that the 'y' parameter was needed to fill in the second parameter of foldl to make the types perfectly fit, etc.


split sort of came out point free once I'd made the connection between its type signature and the fold one... First, I'll mention the "matching from the right":

In general, I think if you're writing function f such that f :: SomeInputs -> SomeMoreInputs -> AnOutput and you happen to have a function g lying around with type g :: SomeDifferentInputs -> AnOutput then g is a potential candidate for being the last thing called in your definition of f.

As for being point free, if that g has the type g :: SomeDifferentInputs -> SomeMoreInputs -> AnOutput then there's a chance you can make your f point free on SomeMoreInputs - if those latter inputs that g needs happen to coincide with what you're expecting in f. If that's the case, you've changed the nature of the problem - you don't have to write f :: SomeInputs -> SomeMoreInputs -> AnOutput anymore - you just need to write some (simpler) function h :: SomeInputs -> SomeDifferentInputs.

There's nothing to stop you from doing all this matching from the left though - if g happened to be of type g :: SomeInputs -> SomeMoreInputs -> IntermediateType and you can see that it'd be easy to write h :: IntermediateType -> AnOutput, then your definition of f is
f = g.h

Hm - I hope that didn't just get a whole lot less clear in the more general case ;) Maybe I should think about writing a concrete example of writing a function by matching types from the left?

Oh yes - you're right - I did choose the letter "b" in the type signature of split so as to make the connection to the type of foldl a little more obvious. I think this is similar to when you're writing untyped functions or procedures and you choose to make the parameter names match up:
//Forward declarations
void setBrushColour(colour);
void drawCircle(coords, radius);

void drawPrettyCircle(coords, colour, radius) {
    setBrushColour(colour);
    drawCircle(coords, radius);
}

...the fact that the parameter names match up makes the whole thing prettier, and easier to read and understand in a forum like this. But, once you've got a bit of practice in the language, third party libraries that use different names for the parameters in their documentation (position and size, perhaps) won't phase you at all.

Thanks for the feedback :)
[User Picture]From: pozorvlak
2007-04-27 10:24 pm (UTC)

(Link)

Excellent, glad it's not just me that finds this stuff hard :-)
From: (Anonymous)
2007-04-27 09:09 pm (UTC)

unfolding

(Link)

Personally, when I see something like '[b] -> [[b]]', the first thing I think is unfoldr:

unfoldr :: (b -> Maybe (a, b)) -> b -> [a]

unfoldr takes a seed (in this case, b = [c]), and a function from a seed to an element and a new seed (here, [c] -> Maybe ([c], [c])), and builds a list ([[c]]). The maybe tells it when to stop (end on Nothing, keep going on Just).

Here you'd get, at first:

split c l = unfoldr (f c) l

So you need to find an f that either returns Nothing to stop (at the seed []) or splits off a portion of the seed list otherwise. It so happens an appropriate function is:

f _ [] = Nothing
f c l = let (h, t) = break (==c) l in Just (h, drop 1 t)

So that's another possible way of solving the problem.
[User Picture]From: totherme
2007-04-27 10:57 pm (UTC)

Re: unfolding

(Link)

Ah - unfoldr is one of those things that I've written little toy exercises in, but haven't properly assimilated into my programming yet. Thanks for the pointer - I'll keep an eye out for increasing listeyness in my type signatures in future :)
[User Picture]From: ari_rahikkala
2007-05-01 07:19 am (UTC)

Re: unfolding

(Link)

(gah. Exhaustion and XHTML don't go well together. I deleted an earlier post of mine here, different in the respect that it had some broken italics elements...)

Anonymous parent poster: Your solution is pretty similar to my hacked-up-in-ghci one, except yours uses unfoldr while mine recurses explicitly (I have not yet attained the level of enlightenment where the former feels more natural than the latter):

split _ [] = [[]];
split at xs = let (this, rest) = break (==at) xs in this : if (null rest) then [] else split at (tail rest)


My process to arrive at this went something like...

let split _ [] = [[]]; split at (x:xs) = if (x == at) then

... wait, what do I put in here? I can't think of a way to say "start a list at this element and then when you get to some certain element end the list" that does not involve a call to a different function that I'd have to... no, actually, I wouldn't need to write it after all, I think it's in the Prelude... span? *checks type* ah, yes, that's the right type, except that what I want actually called break. So, let's see...

let split at [] = [[]]; split at xs = let (this, rest) = break (==at) xs in this : split at rest

break returns a tuple of a "this" part and a "rest" part (I don't want to confuse those with head and tail so I use different nomenclature). I return the "this" part and cons it to recursion on the "rest" part. I've used this pattern before and simply matched the problem to it - no complex thought was involved here. Works so beautifully, too, in lazy languages :).

So let's try it out...


split 1 [1,2,3]
*screen fills with an infinite list of empty lists, I interrupt ghci (and then, by the way, fight for a while with the strange glitchy behaviour that ^C sometimes causes in it)*

Ouch. So what am I doing wrong... hm... well, I'm recursing unconditionally... but if I were calling myself with an empty list I'd hit the first definition and the list would end...

* several minutes pass (hey, I was up all night) before I come to a realisation *

Oh, hm. Let's see what break actually does...


break (==1) [2,3,1,4]
([2,3],[1,4])


Ah, so that's obviously what must be up. I was in the thinking of split, of course, and expecting that break wouldn't return the element that satisfied the predicate. So, I guess I should fix it by taking the tail of the second element before recursing on it (a good example of the loose reasoning used in coding here: When I realised that an assumption of mine was wrong, I immediately went on to fix things so that the assumption would be right without actually thinking of whether it has anything to do with how the code is breaking. In this case it was the right thing, not a surprise in a program this short, but in my experience it's usually *not* the right thing to do when working with code that's even slightly more complex)

let split _ [] = [[]]; split at xs = let (this, rest) = break (==at) xs in this : split at (tail rest)
split 1 [1,2,3,1,2]
[[],[2,3],[2]*** Exception: Prelude.tail: empty list


Ah, so it's almost right... except... *a few minutes pass again* ah, when rest is an empty list I should just stop recursing there because I can't take the tail of an empty list (and because that means there is no need to recurse any longer anyway - I didn't think of this consciously). I hate introducing conditionals but can't think of any better way to do this...

let split _ [] = [[]]; split at xs = let (this, rest) = break (==at) xs in this : if (null rest) then [] else split at (tail rest)
split 1 [1,2,3,1,2]
[[],[2,3],[2]]


Ah, good. It passed a single test on a list with a couple of elements. Obviously it must be right.

Also, I must say I did slap my forehead when I saw the parent's solution to the same problem that avoids the conditional by just calling drop. Hopefully I will remember this in the future.
[User Picture]From: totherme
2007-05-01 10:21 am (UTC)

Re: unfolding

(Link)

(a good example of the loose reasoning used in coding here: When I realised that an assumption of mine was wrong, I immediately went on to fix things so that the assumption would be right without actually thinking of whether it has anything to do with how the code is breaking. In this case it was the right thing, not a surprise in a program this short, but in my experience it's usually *not* the right thing to do when working with code that's even slightly more complex)

That interests me...

The theorist in me wants to say "If there's a false assumption in your code, then it's wrong. The bug may not have manifested yet, but Murphey says it will if you ignore it, so you'd best fix it now whether that's the bug you thought you were working on or not."

Having said that, the engineer in me is more like "Oh man, that's gonna open up a huge can of worms: If I fix it so that the assumption holds, then something else will turn out to rely on the thing I changed, and if I try to remove the assumption from the code, then I'll probably have to follow through every dependency from this point out... And I'll have to look see if this assumption is made elsewhere... Maybe it'd be best to just fix the bug I was working on, and see if the code works ok for the cases I'm interested in?"

I'm not sure yet what I think about the relationship between these two guys in my head (though now I look at what I just wrote, I think it's interesting that the theorist in me talked about someone else's code, while the engineer in me talked about his own code ^_^ ).

I think the latter solution would be a lot easier to do if there was an array of tests on hand, defining the cases I'm interested in. Having said that, I think you might well be able to do a similar trick with theory - if you can see what effects the failed assertion will have on the output - you just change the specification to match what you can actually do ;) I guess I kinda did that when I noted that in my implementation split _ [] = [[]] and I didn't care...

Does haskell help reduce the pain of the engineer, and encourage coding in the style of the theorist? I wonder what working haskellers out there do in real-life situations.
From: (Anonymous)
2007-04-27 10:38 pm (UTC)

nit

(Link)

Your split isn't very lazy. Whenever you are matching against only
one pattern, it's usually a good idea to use irrefutable patterns:

split x = foldr f [[]] where
f z (acc@(~(y:ys))) = if (z==y) then []:acc
else (z:y):ys

This way, you can split infinite lists!

-- Stefan
[User Picture]From: totherme
2007-04-27 11:05 pm (UTC)

Re: nit

(Link)

I hadn't come across irrefutable patterns before - that's very cool, thanks :)

From: cinimod.myopenid.com
2007-04-28 10:29 am (UTC)

Looks like an unfold to me

(Link)

My thought process would be whenever I see a list type as a result then I think unfold.


b -> [b] -> [[b]]
From: (Anonymous)
2007-04-29 12:11 am (UTC)

(Link)

splitWith :: (a -> Bool)-> [a] ->[[a]]
splitWith fn lst =
    case (break fn lst) of
            (a, [])   -> [a]
            (a, b:cs) -> a : splitWith fn cs

split :: (Eq a) => a -> [a] -> [[a]]
split = splitWith . (==)

I knew break :: (a -> Bool) -> [a] -> ([a], [a]) from the Prelude, so I went looking for a way to recurse into the second element and drop its head (the match, traditionally dropped in a split). De-structuring pattern match looked the simplest.
[User Picture]From: ari_rahikkala
2007-05-01 07:27 am (UTC)

(Link)

Just a couple of things I didn't mention in http://totherme.livejournal.com/3845.html?thread=13829#t13829 due to the comment character limit (exhaustion makes me tirelessly verbose):

On types: I don't want this to come out as an insult, but... You should know better than to present the type of a function like that without predicating it on Eq. That you left it out is actually the reason why I went ahead and wrote a solution of my own in the first place: When people blog about Haskell and name their posts like "thinking in types", it's because they've come up with something really elegant, and therefore you know ahead of time that it's pointless to try to implement whatever they did because their implementiation is going to be much prettier anyway. In this case I saw the type you gave to split and immediately knew you can't have enough experience of Haskell to be able to come up with an incredibly beautiful split :).

On folds: You should read http://haskell.org/haskellwiki/Stack_overflow . That page is named rather too humbly... it's not just about stack overflows, rather, once you've read it, you'll never have to think about which direction to fold in. The rules are really so simple and obvious: If the combining function is strict in its second argument, use foldl', if not (i.e. if it can return at least something before evaluating its second argument), use foldr.
[User Picture]From: totherme
2007-05-01 09:59 am (UTC)

(Link)

lol - you're right, I should have seen immediately that a split function would require Eq. But hey - I didn't, so I didn't want to pretend on-line that I did. Hopefully next time I need to write something like this, I'll spot the Eq and the resemblance to unfoldr right away :)

Thanks for the fold link - I'll take a look.
[User Picture]From: pozorvlak
2007-05-01 03:01 pm (UTC)

(Link)

I saw the type you gave to split and immediately knew you can't have enough experience of Haskell to be able to come up with an incredibly beautiful split :)

Well, speaking as someone who really doesn't have enough experience of Haskell, I found totherme's more detailed thought processes (and many of the comments here, including yours) to be more helpful than any number of "beautiful", impenetrable programs plucked whole from the skull of Zeus - Haskell has quite enough of those already :-)

That stack overflow page is great - thanks!