Friday Mornings…

July 10, 2009 by thisaintnogame

For some reason, I am more tired than usual this morning and it seems that I am not able to wake myself up and actually do work. So far I have gotten iced coffee (which never tastes as good as regular coffee but it is too hot to purchase a very hot drink), checked email five thousand times, and started writing this post. I did change something in a program that I am working on, but that change caused it to explode with bugs which I am too tired/lazy to look at it. What a good use of a Friday morning.

In other news, the people at the Freakonomics blog have had their say on Swoopo, which is cool since I wrote about it a couple days before they did. A commenter (the only one) on my last post advertised this site called pricedrip.com as an alternative to Swoopo. The mechanism for this site is that you place a single bid, naming a price that you would like to pay for the item and then the price is lowered periodically until someone’s bid is met. Of course, the site charges you for each bid that you placed, but it would seem that an individual would be less likely to place more than a couple of bids per item with this method. This type of auction is known as a Dutch auction, where the auctioneer starts high and then lowers the price until someone is willing to buy it, with the difference that you must pay for your bid on this site. Admittedly, I haven’t really watched any of these auctions, but I guess the real question is how far does the price drop below the market price (the best price that you could purchase it for on the internet). It seems like you have a decent chance of getting a marginal savings on a product. As long as your savings is greater than the price of the bid, you would probably make money. Of course, if everyone uses that same strategy, then everyone gets screwed because everyone will have to bid higher to get that discount, thus decreasing the value of the discount. The thing that makes this less “evil” than Swoopo is that you do not gain any more information over the course of the auction that would cause you to change your bid. Say your bid is X and the auctioneer announces the current price is Y such that X < Y. If no one buys it at Y and the price moves to Y-1, then you gained no information because you implicitly assumed that no one was going to buy it at Y (if you had assumed that, you would have never spent money placing a losing bid of X). If someone does win it at Y, you gained information, but the auction is over so that information is useless now. Since you gain no new information, it never really makes any sense to place more than one bid.

You might conclude that if everyone else is bidding such as you are, that is trying to get a small discount from the item, then they might outbid you on it, so you should increase your bid by a little bit to win the item. Of course then they might have the same idea and increase their bid, and so on and so forth. Eventually you might reach a conclusion that you should bid more than the market value of the item to win the auction, but that is irrational. While that might happen at Swoopo (you would continue bidding so as to not lose the investment that you had already put in), that wont happen here since all that calculation would go in before you place your bid. Once you realize that the only winning strategy is to overbid the item, you would just go to Amazon and buy it and thus never lose any money on the auction. Thus my claim is that this site is probably less evil than Swoopo. I think some work has been done on auctions where there is an entry fee to the auction (not per bid), but considering it never makes any sense to bid more than once in this auction, the same strategies can be applied. I will try to look into that.

I started reading “Linked” by Barabasi which is a popSci book about the science of networks. Barabasi is a pretty big guy in the field but based on some videos that I have seen with him, I think he is prone to overestimating the importance of network theory. Still, it should be an interesting read and is only about 200 pages. I kind of want to do some type of empirical study in network theory (maybe a project like this or this), but I have yet to come up with a good, feasible idea (good means it has to be interesting and its conclusions have to be worth the effort and feasbile means that data has to be relatively easy collect). Oh well, back to doing some work (more likely, reading random news stories before lunch).

Swoopo

July 7, 2009 by thisaintnogame

After having Google Ad-sense advertise this site to me approximately 100 times, I finally visited swoopo.com the other day. The site advertises that it sells electronics for absurdly cheap prices, such as 20 dollars for an ipod touch or 200 dollars for a new macbook, so I was obviously curious if this was true and how could they afford to undercharge for such items. The first thing that struck me was that the site was an auction site, so there is no notion of directly purchasing something. After I looked around a while more, I discovered the trick for why they can afford to charge very low prices.

Here is the rundown of a standard swoopo auction: The item, lets say a PS3, starts off at some very low price, say 30 dollars. Then every time a user submits a bid, the price is increased by a predetermined amount, usually 15 cents and the time until the auction closes increases by a certain amount, usually 15 seconds (at some points, the time left jumped up dramatically, like by 20 minutes, but I couldn’t figure out the pattern there). At some point, someone will have placed an unchallenged bid and the auction will end. This part is relatively simple, just the standard ascending bid English auction (that is some moderator will name the next price and bidders will agree to pay that price, as opposed to having bidders name their own prices), but this doesn’t explain how this site makes any money.

The real trick is that every time you bid, you must spend one of your “bid tickets” and you have to buy bid tickets from the site. In other words, they charge you $1 for the right to submit a bid. So while one person might get an item that is way below market price, there are many more people who have essentially spent money and have nothing to show for it. So if the site is selling an item worth $1000 dollars and the bid price starts at $0, they must get 1000 bids on the item to make it a profitable venture for them (let’s ignore overhead costs right now). Those 1000 bids will only drive the price up to $150 which still makes it an amazing price to get that product at, so it is pretty likely that people will continue to bid on the item until it hits a point where it is reaching some reasonable approximation of market value. Even though Swoopo probably lost money on selling the laptop to the winner, they have made money off of all the users who submitted bids and lost.

Although this format strikes me as kind of gimmicky, the site is very upfront about the process of buying bid tickets and that you will lose a ticket every time you place a bid, regardless of whether or not it was the winning bid. Also, it seems that representatives from the site have been very good about responding to interviews and being honest about what they do. In one of the interviews that I read with a PR rep from Swoopo, he said something like “we don’t want people to feel like it is impossible, we want people to win” which sounds very reminiscent of rhetoric from casinos or from “Thank You for Smoking.”

One thing that I do find to be questionable about the site is this automatic bidding bot that it provides free of charge. Essentially, you tell the software what item you would like to bid on, how much you are willing to pay, and how many bids you are willing to use towards that item and then it will automatically submit bids very close to the end of the auction on your behalf. After watching an auction on a PS3 that started off at 37 dollars (or rather that was when I started watching the auction), it became very clear that many people were using this bidding robot since with about 5 seconds left in the auction, four or five bids would go in automatically (it would also say if the bid was placed by their bid butler as it is called). That PS3 ended up selling for somewhere in the 200 range, so I cant even imagine how many people lost a ton of their bidding tickets because of the straightforward strategy used by this bot.

Looking at this from an analytic point of view, we can see that we only really need one user to “bite the bullet” and submit the bid that will prevent the current bidder from winning and prolong the auction, so there is no need to submit a bid if you are sure that someone else is going to submit one. In fact, not submitting the bid there is to your advantage since you will have more bids later in the auction if we assume that everyone has finite resources. One could probably do some statistical analysis to estimate the probability that someone will submit a bid given what percentage of market value the current price is. If we had access to that, we could layback and wait for all the “noise traders” (people who use no intelligent strategy) to waste their bids when it is extremely improbable that anyone will win the auction, and then enter the auction later on when its more likely the auction will end within a reasonable number of bids. For instance, if we knew that product X has a 90% chance of getting a bid at price Y, then we would submit a bid at that price only 10% of the time (determined randomly). Following such a strategy would certainly not guarantee a win, but it would definitely result in using less bid tickets (in the above case, assuming that the 90% chance held for a while, we could save 90% of our bids) than using their bid butler. Obviously, Swoopo has no incentive to improve their software since their current bid butler is probably making them tons by wasting users’ bids, but it might provide a market for a third party software company to make a bidding bot that saves users lots of bids over times. Of course, if everyone used such a process, it would come down to the random numbers generated by the software, making it more of an explicit gambling game, but would save everyone money. I would also guess that it would decrease Swoopo’s profit by a decent amount since there wouldn’t be as many traders submitting simultaneous bids and thus keep the price a little bit lower.

I highly doubt that I will be making any bids on Swoopo in the future, but it does offer an interesting application of auction theory (according to my advisor, this “pay per bid” auction has not been studied too much). And if I am ever given a bunch of money and told to start a company, I might pursue that more intelligent bidding software option as a product (would it be better to charge a subscription fee or a certain percentage of the price of an item that a client of ours has won?).

Getting Past Brute Force

July 1, 2009 by thisaintnogame

I have been playing around with some questions on Project Euler over the past couple of days. In general, I like PE for a number of reasons. First, its a site that challenges people to think mathematically, which is always good. It appeals to a couple of my friends who aren’t very strong coders, probably because you dont need a huge coding or algorithms background to attack first 20 problems or so. Hopefully it will lead to them getting more comfortable with brain teaser/coding questions and they can start doing things like Topcoder or the problems on the UVA judge (can’t find a working link right now). Second, not everything requires code to answer, sometimes its just a simple equation evaluation (if you know the right equation) and in general it forces coders to get closer to the math. A lot of times I feel that we lose sight of the fact that we are applied mathematicians and should look at some of the mathematical properties of a problem before we start banging away at the keys or looking for a recurrence relation that we can DP on. Actually, in my Combinatorics class in the fall, I suggested we could solve the lattice path counting problem using a recurrence relation and received a response of “ah, yes that is how a computer scientist would think, but we mathematicians can do better” (it was said in a loving and European way). However, I have some reservations in the fact that most of the problems, at least on the first page, seem to involve simple computations, but adds in difficulty by putting in very large numbers. I guess it’s just a little annoying to have the right algorithm, but have to change data types from “int” to “long long” or just give up on the problem because it requires the use of some big int class.

Anyway, I find one problem that I really did like (problem 24 for those who are interested). The generalized version of the problem is as follows: Given N digits, determine the Kth permutation of the digits (that number K is determined by its position in the lexicographic ordering of all the permutations). So for example, if we were given 1 2 3 and told to find the 4th permutation, we would get 123 as the first, 132  as the second, 213 as the third and 231 as the 4th (the 5th would be 312 and 321 for the 6th). Now, I would like to give this problem at one of our programming competitions and make N semi-large (but small enough such that N! fits with in a 64-bit int). This is one of those problems that has a very obvious brute force solution, but the lack of a small bound on N destroys that approach, thus forcing the programmers to actually think about the problem. The BF approach would be to iterate through every permutation until you hit K (some programmers may not know a standard library call to a next_permutation function, so the problem might be a little bit tougher). Still, it isn’t that hard to come up with a good solution for this problem, so let me talk about that.

One of the nice things about permutation questions is that they usually lend nice recursive solutions, which is my favorite thing to work with. We notice that if we have N elements and we fix the first element, then we have (N-1)! permutations that we can construct. So if K <= (N-1)! then we know that the first element in the solution is the first element in the list. So now it really comes down to finding the Kth permutation of the remaining N-1 digits. If K > (N-1)! , then we know that the first element in the list cannot be the first element in the solution, so we must now switch elements. Since digit D can have (N-1)! permutations constructed with D as the first element, we must find the smallest position i such that i*(N-1)! >= K, and then switch the first element with that ith element. Now its a matter of finding the (K- ((i-1)*(N-1)!))th permutation of the remaining N-1 digits. That expression is a little scary, so let me explain it. Notice that if we had the digits 12345, the first 4! = 24 permutations would all begin with 1. So if we wanted to find the 25th permutation, we realize that 1 cant be the first permutation since it only heads the first 24 permutations, so we switch with the next lexicographic digit, which is 2 in this case. Since we already covered 24 permutations, we really only need to find the 1st permutation with 2 at the beginning, or the first permutation of the digits 1345.

So we modified the permutation number we were looking for by subtracting (i-1)*(N-1)! or the number of permutations that we already ruled out cant be the solution permutation, from the permutation that we are looking for. Again, i is the smallest position such that i*(N-1)! >= K. I am much better at explaining things in person, I swear.

Anyway, if we had put all of this mess together in an algorithm, we get a running time of O(nlgn) if we search for that i cleverly (binary search), which significantly kicks the ass of the brute force O(n!) algorithm. Regardless of this convoluted explanation, its a pretty simple algorithm but it makes you feel nice and special after you are done. I think its a pretty gentle problem to get people to think beyond the obvious, which is something that I strive for with the problems that I write. I have previously done that by beating them over the head with 2 dimensional DP problems, but then that gives people with no algorithms experience no chance at them. This type of problem gives anyone with a knowledge of permutations and basic recursion a decent shot. Since no one reads this blog anyway, I think I should be safe in giving it as a problem in one of this year’s competitions.

Why we do algorithms wrong

June 14, 2009 by thisaintnogame

So by we, I mean my school’s CS department. After only a few meetings with my advisor here, I see the difference between the teaching/advising styles and why one is far better than the other. In our department, the algorithms class pretty much proceeds in a manner than an algorithm is taught, maybe there is some discussion about the run time of it, and then the students usually have to implement it as an assignment. The textbook used is the Cormen textbook, which is pretty much like the webster’s dictionary of algorithms. So what are we missing?

First, we are missing the very important step of the proof of the algorithm. Yes I know that most students don’t care for proofs and find them difficult and useless, but too bad. The fact of the matter is that it is hard to prove the correctness of algorithms, but that is exactly what we need. The proofs are not necessarily important in the content of the proof in itself, but the process that one would go about reasoning the correctness of an algorithm. If you do enough proofs, you start to develop an intuition for the correctness of an algorithm long before you even finish it. We could all benefit by saving time on an approach that wont yield a correct answer.

But the most important thing that we are missing is the emphasis that the algorithms that are taught should help us develop an intuition on how to go about coming up with an algorithm. I know plenty of people who have gone through our algorithms course but still have little idea about how to start developing an algorithm for a Topcoder problem. This is because we teach our class more as a reference class for algorithms, not to teach the paradigm. A good manifest of this concept is the textbook used. Sure, the Cormen book is an excellent reference if you already have a great understanding of all the math and you are looking to read some material on an individual problem. But it doesn’t do that well actually teaching the concepts when compared to the Kleinberg algorithms book or the Papadimitrou/Vazirani book. Those books actually teach the concepts and walk through the development of these algorithms. The Kleinberg book also does a great job of showing how to arrive at correctness proofs (it even teaches about commonly used proof techniques such as exchange arguments). Plus the books are generally easier to read, which is a big plus when starting to teach more advanced concepts such as approximation and randomized algorithms.

I realize that our department isn’t hoping to raise the next generations of algorithmists or send people to do work in CS theory, but we should be trying for this approach. Even if it does go over the heads of most students, there is no harm to trying it this way since they can always go back to reading the book for all the answers they need. If our students are too lazy/incompetent to get all their answers from the book and still need a professor to hold their hand in the class, we have a bigger problem than we know.

Hopefully I end up with a really solid foundation for doing advanced work in algorithms after this summer. I am interested to see how our advanced algorithms seminar is going to turn out. I have some doubts, but I’ll keep those private (as if publishing a thought really constitutes it as a public thought).

REU 3.0

June 5, 2009 by thisaintnogame

Well, here I am at my third and final shot at making an REU worthwhile. My first one was a great learning experience, but I really didn’t know enough at that point to really take advantage of the “research” opportunity. I think I was ready to do some serious stuff for the second one, but it turned out to be a terrible experience in almost all facets. This one is starting strong. We have 13 students so far (we have one more to come) and we seem to have a good social dynamic going on. Im excited to see where the summer leads us around LA.

My project this year is infinitely better than what it was last year. My chief complaint from last summer is that my advisor treated me like a small child with minimal knowledge of what programming was; my advisor this year says he’s going to treat me like a 5th year PhD student who has passed all exams and has read all the literature unless I tell him to slow down and explain something. From our early meetings, its pretty clear that I will be taking advantage of the “slow down and explain something” option quite frequently.

So I’m getting my start in CS theory research this summer. We are considering the problem of maximizing the spread of your influence on a graph. We have a model where at each time step, a node will choose one of its neighbors at random and change to whatever color that node is in the next time step. Before this process begins, you are given an amount of money M that you can spend on nodes to color them your color, assuming that each node has its own activation cost. We would like to find a set of nodes S, such that the total initial cost of those nodes < M that will maximize the expected number of nodes that will be your color after T timesteps. This problem is motivated by the case where a company is looking to pay some people to initially adopt their product in order to spread the adoption of their product throughout the network. Two natural examples of this would be publishers giving out free textbooks to professors in hopes that they will adopt the textbook for their class and other professors will do the same after seeing the original professor use that textbook or paying certain people to start using a social networking site (like Twitter) in hopes that others will join because of the influence of those people.

When we are done researching that simple version, we will explore what happens when there are two companies, A and B, with the same finite amount of resources and nodes will adopt technology A or B with probability proportional to how much money each company has spent on them. In this scenario, I have already looked at a simple probability function where A has already made all of its moves and found an optimal algorithm for B (of course, it is not a polynomial time algorithm, but its better than the “try everything” approach). In the future, we are going to look at equilibria in the game. That is, if A knows how B will react given a strategy for A, what can player A do to preempt B and choose a strategy that will be worse for B when B reacts to it.

There is a fair bit more technical detail to what is going on and I have much more to learn, but I am really enjoying what I have done so far. Hopefully, it will remain that way.

For Loops are kind of essential to programming, if you know what I mean…

May 14, 2009 by thisaintnogame

More Madness…

March 22, 2009 by thisaintnogame

I found some articles having to do with strategies for choosing the NCAA basketball tournament brackets. In particular, I thought this article was pretty good. The article basically states that a contrarian strategy is a solid strategy if all you care about is making money on pool entries in the long run. With the contrarian strategy, you find teams which have a higher chance of winning and moving into later rounds than most people give them credit for and choose them. Notice, this doesn’t imply that they are favorites, even by any statistic, but as long as most people will choose against them, there will be a high payoff if they do win, since you will move to the top of your pool.

This makes the most difference when choosing teams that will make a win over a favorite and move on the top the Final Four. For an example from this year, Gonzaga will be playing UNC in the sweet sixteen. Many people have UNC winning the entire tournament and a large percentage of entries on ESPN have UNC making it to the final four. However, Gonzaga has a higher chance than those numbers suggest. Based on this fact, choosing Gonzaga over UNC will offer a huge advantage if Gonzaga does win since all those people just lost the opportunity to make points in the later rounds, where each game is worth more points (each game is worth twice the amount of points they were worth than the previous round, ensuring that in every round there is 320 possible points, assuming that the first round games were worth 10).

One of the interesting parts of this approach is that it completely contradicts the basic wisdom I was taught, which was to randomly choose some upsets in the first rounds, but then to go with who you thought the winner was going to be in the later rounds. This advice makes sense because of the relatively small information we usually know about the more obscure teams in the 8-9, 7-10 games, etc compared to the fact that we usually recognize the names of the #1 and #2 seeds and might even know some of their stats. But if we assume that everyone else in our pool follows that strategy, then the winner out of them would be largely based on the random happenings of the first round games, because their later round brackets would be too similar for one person to gain any significant point advantage. We also did this since we knew that losing a Final Four team almost guaranteed a near-last place finishing. But in reality, if you’re not in the money, does it really matter where you were?  Knowing that,  you would probably be better off simply betting the better seeded teams in the early rounds and then starting to make some “upsets” in the sweet sixteen when the differences between seeded teams will be smaller. By doing this, we increase the likelihood and the relative values of our upsets.

Looking at it this briefly with some math, choosing Gonzaga to win their division and make the Final four is worth total of 150 points (10 for their first round win, then 20 for the next, then 40, then 80). If they do make it, then against someone who chose UNC to make it to the Final Four (which most people are betting on), you would have a 120 point advantage, which is the equivalent of calling 12 first round upsets correctly. So basically, if you are looking to make some ridiculous calls and go against the grain, it makes more sense to call a later round game than an earlier round game. But it is important to note that this situation only applies when you have good reason to believe that the popular chances of the underdog winning are lower than they really are, which goes back to the root of problem of estimating probabilities of teams winning.

In the article I linked to above, there is another link to an academic paper that someone published on the matter and there was some other discussion on the blog of the guy who wrote Freakonomics. Two things there…

1. When I was looking early for literature on NCAA strategies, I should have known to look to economists rather than mathematicians since economists are much more likely to be cool and care about this type of thing.

2. All this literature illustrates one of the reasons that I want to remain around thinkers for the rest of my life; Even common events become extremely interesting when approached in an intelligent manner and reading the ideas of others just makes it one thousand times more interesting.

p.s. Apparently the problem of determing whether or not you still have a chance in the NCAA pool is NP-complete if we don’t fix the number of teams that enter it. Pretty cool.

March Madness

March 21, 2009 by thisaintnogame

So the NCAA tournament has rolled around again and I have entered into a small pool with some family and friends (its a tradition in my family). However there are two things that set this year’s tournament apart from others.

1. Binghamton made the big dance. It was pretty cool to see us play on national television, but we had an unfortunate first round matchup against Duke, who almost always makes it to my final four. Regardless of my school spirit, I had to go against them.

2. I wrote an algorithm to help me choose teams. I got a bunch of statistics on all division 1 teams from the NCAA website (they provide 16 stats, like points scored, points against, average margin of victory, average turn overs, etc) and was planning on using those statistics and a schedule of win/losses for the year to create a Naieve Bayes Classifier. The basic idea is I train the classifier on all of the regular season games, where each data point is a vector of pairs, where each pair is the ith statistic for each team. So for Duke vs. Binghamton it would have been a data point like this

< (ppg for Duke, ppg for Bing), (apg for Duke, apg for Bing)….>

and then include a win or lose label for each point. Based on that training data, for the games in the tournament, I create the data point and then see if that point is more likely given a win (this is the bayesian part of it) or more likely given a loss. I choose the label which maximizes the probability of that data point. There would have been some more details to work out, but that is the basic idea. But only after I had written all the code to parse the statistic files, I found that I could not find a schedule for the season in plain text, so I couldn’t parse that information and making everything I did pretty much useless.

Being that I had already collected all the statistics for each team, I decided that I would simply cluster all the teams using k-means just to get an idea of what teams were statistically similar. Then using those clusters and my limited knowledge of the NCAA teams, I could infer the quality of a team. For instance, I know that Duke is always a good team (and were also ranked #2), so I can reasonably infer that teams in the same cluster as them will have similar stats and will probably be good teams as well. There are a number of flaws in that reasoning, but I did this all in an afternoon and this algorithm was meant to provide basis to my picks, which would otherwise be arbitrary. So how did this work out? I set k = 33 since there are roughly 330 division 1 teams and I figured that if it were evenly distributed, 10 teams per cluster was a good amount.  Well 7/8 of the top ranked teams in the tournament were placed in the same cluster (UConn being the odd one out), along with some other teams in the tournament, notably gonzaga and LSU. Once I saw the resulting clusters, I ordered them based on their euclidean distance from the “best” cluster, or the one that contained all the top seeds. Then, when making picks, I would choose the team that was in a higher-ranked cluster.

So what are the problems with this algorithm? Most obviously, I have no way to choose between teams in the same clusters, which is a problem since I should expect that the later rounds will come down to teams in the same cluster (assuming that a 1 or 2 seed will be the winner of each regional conference, this algorithm is useless in choosing final four games). The next problem, which is less obvious but more important, is that a cluster being closer to the best cluster does not actually imply that is a better cluster. For instance, lets assume that we only consider average points scored and average points allowed. Lets say the best cluster has an average of 80 points scored per game and lets in an average of 70. Well a cluster whose median team scored 75 points per game and allows 75 points per game will be “closer” than a cluster whose median team scores 90 points a game and allows 60 points against. Clearly, the second cluster would be a better choice for winning, but the first cluster would be ranked higher since it has a smaller euclidean distance. I haven’t yet thought of how to fix this problem, but I assume its possible if I used a different distance metric. Of course, this whole algorithm is extremely ad hoc and based upon weak theoretical foundations, so trying to fix just that problem would be like rearranging deck chairs on the titanic.

So, in practical terms, how is this algorithm doing? My algorithm didn’t choose too many upsets (an upset being defined as a higher seed beating a lower seed, so a 9 seed beating an 8 seed), but about as many as a human would. Well I am near last in my bracket, but there were some true upsets and there were two games that my algorithm choose a 12 seed over a 5 seed and got it right (I choose Wisconsin over Florida St and West Kentucky over Illinois). Many people had those 5 seeds advancing to the sweet sixteen, so I will make up some points there hopefully. In terms of the 8-9 games, I got two right and two wrong, which is about what most people did. It also has some irony that I did no better than flipping coins since I created this algorithm for the purposes of choosing games like that. The big triumph of my algorithm will be if LSU beats UNC, since an 8 seed would knock out a favorite to win the entire tournament. Looking at it after I made that choice, it might make some sense that LSU has a chance since UNC is an entirely offense based team (averaging about 90 points a game), but doesn’t seem to do too much on defense (let’s in about 72 points again which puts them somewhere in 200’s in terms of defensive rankings).

Whether this whole thing turns out good or bad, it was a fun project to do for an afternoon, and I plan on expanding the project for the next season. One fundamental difference between my algorithm and others that I have seen is that my algorithm doesn’t really look to take a chance factor into play. Since I always chose the better ranked team, there is not much difference between what I did and what someone who always chooses the better seeded team to win. So by following this algorithm, I basiscally assume that I have come up with a better ranking than the NCAA selection committee, which is an interesting idea. I feel that better way to do it would be to estimate probabilities of each team winning and then running a simulation, so that I can actually take into account the chance factor. There is also a lot I could do with making the statistics more meaningfully, such as weighting more recent statistics more heavily. Whatever I do, its a cool application for some stuff that I have learned and maybe I can make a research project out of it next year.

Let’s go LSU!

What I did on my Summer Vacation

July 31, 2008 by thisaintnogame

Well technically, I could also phrase it “What am I doing on my Summer Vacation” and then reference writing this blog post, but the self-reference is too much for me. And self-reference is way beyond third graders; you need at least 5th graders to understand the concept.

 

From many perspectives, my time at Umass was probably definitely useless. My project, while sounding cool in an elevator pitch, turned out to be completely boring. Im not really sure where it went wrong or who is to blame, but all I know is that I asked several times to do something more interesting and I kept getting the response of “well we might be able to look at that after we are done”. The unfortunate part was that I really had no incentive to get the boring stuff done in an efficient manner because I never believed he would have me do something interesting after I was done. Most of it is probably my fault, but some of it certainly related to him. I also get the strong feeling that he was not expecting to be an advisor for this program, so he just wanted something that would keep me busy. It probably would have been all the same to him if I had showed up on day 1, took my paychecks, and went back to Long Island. Not too much I can do it about it now, but it really does make me pretty angry. I also fully resent being told that I was going to work with one professor upon acceptance to the program and then being given a professor who doesn’t match my listed interests at all. I get a lot angrier when I look at a few other projects which are really cool. Perhaps the grass is always greener on the other side though. 

 

I was really hoping to get involved in a field and learn some new theories this summer. Machine learning sounded like a good one to get into; at least I know it has a ton of jargon and methods that I could learn. And even if my project had turned out to be a complete bust, I would have learned some stuff. Last summer, I didn’t learn a whole lot about a specific field, but I did learn a lot of computational complexity, approximations, and algorithms in general. Plus, Madden gave me a pretty darn good project and is a good advisor.

 

If what I took away from this summer was solely based upon my project and interactions with my advisor, the probability that I go to grad school would be significantly lower than what it was in May. 

There is one thing to be said for this summer though and that is that it gave me a lot of time to read. And the things that I did read are keeping my grad school desires as high and perhaps even higher than when the summer began. Since I basically had nothing else to do while my 5 hour SQL queries were running, I read a lot of blogs and essays about theoretical computer science/algorithms/and computation in general. Particularly, I have found this blog to be amazing. Its all about theoretical computer science, with a focus on quantum computation. His writing is awesome and there’s a pretty good community of people who comment and fuel further discussions. I’ve also been reading a lot of other stuff, which unfortunately I dont have links for, about computation in other fields of science. Here is a presentation that was given at a “Computer Science: Past, Present, and Future” type of conference that talks about viewing the world through an algorithmic lens (thats pretty much the title of the talk).

 

I chose computer science in college because I started programming robots and it was awesome to see my model of the world (aka the code which reacts to sensor input) actually perform in some real environment. It was a great experience. I learned later that what I really like are the algorithms (though, robots are pretty damn cool too). Had it not been for my involvement with the ACM and the programming competitions, I can’t really say if I would have stuck with computer science, because the intro classes are so boring and don’t talk about all these grand sounding things like modeling the world or understanding the universe as a a basic set of operations. At Binghamton, it is truly amazing what a disservice we do to computer science. But I will have to leave that rant to another post.

 

Hopefully my next research experience thing will be more fruitful than Umass (it really cant be any worse). Ideally, it would be something interesting, my advisor will be famous, they will pay me gobs of money, and at the end of the summer I get a free ride into space and a trip around the world. But I will just settle for something interesting and a good community of people to work with.

 

Well, time for the poster fair aka convincing people that I wasn’t a waste of NSF money (read: lying to them).

What I believe, but cannot prove

June 13, 2008 by thisaintnogame

Recently (in fact, right after I cashed my first paycheck here at Umass), I walked into a book store and found of copy of John Brockman’s “What We Believe but Cannot Prove”. For those of you who dont know, John Brockman is not a writer, but rather an editor who focuses a lot on the “Third Culture” (his term). The third culture refers to the scientists in the world who are taking the forefront of public interest by publishing their ideas in ways that are accessible to most people (popSci books). He runs this site, edge.org, which always has a bunch of articles related to science, mathematics, or the culture surrounding the two. As part of the edge’s activities, every year he poses a question for which he collects responses from leading thinkers in a lot of fields and publishes their responses in a book. Before this one I just got, I had purchased “What is Your Dangerous Idea”, which turned out to be pretty good. The responses to these questions range from half of a page to about three pages, with most responses being a page and a half. Obviously, these works aren’t intended to give you any real scientific thought, but they leave you with some cool things to think about.

 

While reading “What We Believe but Cannot Prove”, I noticed that many authors chose to write about the afterlife (or rather the lack thereof). Now, I suppose this is the ultimate example of something that we believe but cannot prove, but I was very discontented by these responses. First of all, the response is a cop-out from writing something more interesting and intellectual. Secondly, many of the responses went on to implicitly, if not explicitly, insult those who believe in a higher power or an afterlife. It almost seems that the authors would welcome a continuation of the schism that rests between “persons of reason” and “persons of faith” (quotations meant to emphasize the fact that I find both of these terms to be an unfortunate use of language). I believe that the world would be a better place if everyone valued the scientific method and the knowledge derived from it. Certainly our nation would be better off if we had more people pursuing science as a career. However, those in the scientific community who help drive the wedge between the two existing cultures do nothing but hurt that goal. By publishing books with titles such as “The God Delusion” and telling people they are stupid for believing in a god will never get us anywhere. In fact, I understand the need for a god in people’s lives. I understand the fact that people want to believe in the afterlife and want to believe that humanity is not just some chance happening. Its really not that unreasonable (of course, there are those in religion that take it too far, but I am talking about faith not religion). People who hold faith as their #1 belief are going to be reluctant to join a community full of people who will consider them inherently stupid for their belief. 

 

What I believe, but cannot prove is that those in science who feel that they must ridicule a belief in a higher power are only following a crude and primitive human desire for conflict. Furthermore, I believe they are necessarily stupid in some ways for viciously fighting over something that is inherently unknowable. But perhaps they need that. To quote John Lennon, “whatever gets you through the night, it’s alright.”