Perhaps we need to start over from the top:
@MuchLearning, why do you want to know? I am not saying you shouldn't want to know, but the intent is relevant here, because (as I said earlier) this is the sort of micro--optimization that is entirely irrelevant in actual practice - any gain or loss from it is going be swamped by other factors unless it is a bottleneck in a very tight loop.
If you're just curious, fine; the answers already given are more than suitable, with Geri and Brendan's answers being pretty much equivalent in terms of cycles used (assuming that the compiler uses an 8-bit arguments for both the shift and mask, which I would surely hope it would but I wouldn't count on it).
If the intent is to break a bottleneck, then we'd need a
lot more detail as to what the actual code is doing, rather than just a mocked-up test. The test serves to illustrate the question, but to fix the live code, we'd need to see it.
Beyond that, meh. It really isn't something that you get a real advantage on one way or the other. How you do it is up to you. That having been said, I would be curious as to what you are using it for, in case there is some aspect of it that could make it amenable to an entirely different solution (one which your question doesn't give traction for, making this a
shoe or bottle problem).