<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Dataists - Latest Comments</title><link xmlns="http://www.w3.org/2005/Atom" rel="http://api.friendfeed.com/2008/03#sup" href="http://disqus.com/sup/all.sup#forumcomments-4bad095b" type="application/json"/><link>http://dataists.disqus.com/</link><description></description><atom:link href="http://dataists.disqus.com/comments.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Mon, 04 Feb 2013 01:43:35 -0000</lastBuildDate><item><title>Re: Best Boxes in a Super Bowl Pool</title><link>http://www.dataists.com/2011/02/best-boxes-in-a-super-bowl-pool/#comment-787863273</link><description>&lt;p&gt;Drew, I created a quick mobile app today to tell me my odds during the game in case I forgot.  This is based on 7 years of NFL scores gathered by &lt;a href="http://caseyshead.com/2013-super-bowl-squares-odds/" rel="nofollow"&gt;http://caseyshead.com/2013-sup...&lt;/a&gt; .  What I was specifically interested in was their "per quarter" data, which I think is very important to realize as you mention above, increased variance in possibilities is related to time, and due to such is an abstraction of the second law of thermodynamics.  The app I put together is pretty rough (I did it in between lunch and the superbowl) &lt;a href="http://footballsquares.azurewebsites.net/" rel="nofollow"&gt;http://footballsquares.azurewe...&lt;/a&gt; and takes your teams single digit score number as input.&lt;/p&gt;

&lt;p&gt;I would like to enhance this next year in my spare time, so I would love to gather some ideas.  Somethings I am thinking about doing are adding support for multiple squares, calculating total risk vs. potential reward, and incorporating real-time scores into the calculations &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jonnyslide</dc:creator><pubDate>Mon, 04 Feb 2013 01:43:35 -0000</pubDate></item><item><title>Re: Best Boxes in a Super Bowl Pool</title><link>http://www.dataists.com/2011/02/best-boxes-in-a-super-bowl-pool/#comment-787860898</link><description>&lt;p&gt;Hey just quick correction here 17 = 2 TD's (14) and 1 FG(3)&lt;br&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jonnyslide</dc:creator><pubDate>Mon, 04 Feb 2013 01:34:45 -0000</pubDate></item><item><title>Re: What&amp;#8217;s the use of sharing code nobody can read?</title><link>http://www.dataists.com/2010/10/whats-the-use-of-sharing-code-nobody-can-read/#comment-763560580</link><description>&lt;p&gt;Thank you for this - absolutely agree. Maybe you would be interested in my replication blog (about political science but the basics are the same). I recently received an email that someone did not keep their R code for a published study!!! &lt;a href="http://politicalsciencereplication.wordpress.com" rel="nofollow"&gt;politicalsciencereplication.wo...&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">PolSci Replication</dc:creator><pubDate>Fri, 11 Jan 2013 05:08:51 -0000</pubDate></item><item><title>Re: Snippet: Where the F**k Was I?</title><link>http://www.dataists.com/2011/06/snippet-where-the-fk-was-i/#comment-734659930</link><description>&lt;p&gt; Great post. Thank you very much &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sieuthimuachung02</dc:creator><pubDate>Thu, 13 Dec 2012 03:37:53 -0000</pubDate></item><item><title>Re: Ranking the popularity of programming languages</title><link>http://www.dataists.com/2010/12/ranking-the-popularity-of-programming-langauges/#comment-713487901</link><description>&lt;p&gt;For my part, When I have seen vim scripting questions on stack overflow, they are tagged with 'vim', not (usually ever) 'viml', even though they have to do with the scripting language. This confusion of SO tags might be what we see with viml. Even I have my .vimrc file on gitHub, but i don't think of it as a 'viml' repo, but a 'vim' repo.&lt;br&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Haskin</dc:creator><pubDate>Sun, 18 Nov 2012 18:46:17 -0000</pubDate></item><item><title>Re: Snippet: Where the F**k Was I?</title><link>http://www.dataists.com/2011/06/snippet-where-the-fk-was-i/#comment-576723828</link><description>&lt;p&gt;lets m ake language more sweet than uncouth&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Shravangulla</dc:creator><pubDate>Thu, 05 Jul 2012 04:41:57 -0000</pubDate></item><item><title>Re: Ranking the popularity of programming languages</title><link>http://www.dataists.com/2010/12/ranking-the-popularity-of-programming-langauges/#comment-574677236</link><description>&lt;p&gt;afaict the "popularity" of most of those languages is being grossly distorted when you convert the "# of Tags" and "# of Projects" data to rankings.The range in rank value for the stackoverflow tags is from 1 to 56, but the range in "# of Tags" that rank is based upon is from 0 to 82,923 and the data is so skewed that only 11 of 56 languages have above average "# of Tags". &lt;/p&gt;

&lt;p&gt;Haskell is well below average for  "# of Tags" and Java is well above average for "# of Tags" --&lt;/p&gt;

&lt;p&gt;#56 Java = 82,923&lt;br&gt;&amp;gt;&amp;gt;&amp;gt; mean = 18,770 &amp;lt;&amp;lt;&amp;lt;&lt;br&gt;#40 Haskell = 1,896&lt;br&gt;# 1 F# = 0&lt;/p&gt;

&lt;p&gt;(The story seems to be the same for the github "# of Projects" rank numbers -- see &lt;a href="http://www.r-chart.com/2010/08/github-stats-on-programming-languages.html)" rel="nofollow"&gt;http://www.r-chart.com/2010/08...&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Isaac Gouy</dc:creator><pubDate>Mon, 02 Jul 2012 17:15:29 -0000</pubDate></item><item><title>Re: What&amp;#8217;s the use of sharing code nobody can read?</title><link>http://www.dataists.com/2010/10/whats-the-use-of-sharing-code-nobody-can-read/#comment-552613190</link><description>&lt;p&gt;&lt;br&gt;&lt;a href="http://www.raybanstore.us" rel="nofollow"&gt;Ray Ban Highstreet&lt;/a&gt;&lt;br&gt;&lt;br&gt;&lt;a href="http://www.raybanoutlet.us" rel="nofollow"&gt;Ray Bans For Cheap&lt;/a&gt;&lt;br&gt;&lt;a href="http://www.wholesaleraybansunglasses.us" rel="nofollow"&gt;Cheap Ray Ban Sunglasses&lt;/a&gt;&lt;br&gt; &lt;br&gt;&lt;a href="http://www.raybanforcheap.us" rel="nofollow"&gt;Ray Ban Clubmaster Sunglasses&lt;/a&gt;&lt;br&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Wholesale Oakley Sunglasses</dc:creator><pubDate>Fri, 08 Jun 2012 22:56:41 -0000</pubDate></item><item><title>Re: Accentuate.us: Machine Learning for Complex Language Entry</title><link>http://www.dataists.com/2011/04/accentuateus/#comment-540243799</link><description>&lt;p&gt;Amazing post written for languages.I properly read your article which was very interesting and excellent.Keep sharing such interesting articles.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cross Cultural Training</dc:creator><pubDate>Mon, 28 May 2012 08:17:49 -0000</pubDate></item><item><title>Re: The Data Science Venn Diagram</title><link>http://www.dataists.com/2010/09/the-data-science-venn-diagram/#comment-484103065</link><description>&lt;p&gt;I will definitely reuse this (with proper attribution). Thank you!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">David Douglas</dc:creator><pubDate>Tue, 03 Apr 2012 00:15:03 -0000</pubDate></item><item><title>Re: A Taxonomy of Data Science</title><link>http://www.dataists.com/2010/09/a-taxonomy-of-data-science/#comment-478423727</link><description>&lt;p&gt;Thanks for a great post.&lt;/p&gt;

&lt;p&gt;Another activity I find useful to consider is Iteration.  It often takes many runs at a problem to get to a solution.  Becoming efficient at iteration is a great skill. Making iterations shorter and getting to simple or partial answers quickly helps move analysis in a good direction earlier in the process. &lt;/p&gt;

&lt;p&gt;While it is a waste of time to formalize much exploration, I find myself regretting when I make decisions that don't allow me to do what I did yesterday again quickly, with a slight change.  A couple of tricks I use all of the time 'history | tail &amp;gt; my.bash' or use 'head -q -n100' instead of 'cat' to sample a small data set until I am ready for the final answer. I am sure people have used many others.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Scott Hendrickson</dc:creator><pubDate>Wed, 28 Mar 2012 10:50:13 -0000</pubDate></item><item><title>Re: Best Boxes in a Super Bowl Pool</title><link>http://www.dataists.com/2011/02/best-boxes-in-a-super-bowl-pool/#comment-425755856</link><description>&lt;p&gt;Your friend is correct.  However, he is setting himself for more extreme odds after the numbers are drawn.  For example, if his single row is for a 2 then that will give him some of the worst odds on the chart.  However if his single row number is 7 then that will give hi some of the best odds on the chart.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Justaguest</dc:creator><pubDate>Tue, 31 Jan 2012 16:26:26 -0000</pubDate></item><item><title>Re: Best Boxes in a Super Bowl Pool</title><link>http://www.dataists.com/2011/02/best-boxes-in-a-super-bowl-pool/#comment-420043903</link><description>&lt;p&gt;A guy at work picked 5 Super Bowl squares in the same row. So while &lt;br&gt;he'll have five different numbers for the Giants, he only will have one &lt;br&gt;number for the Pats.&lt;/p&gt;

&lt;p&gt;I said that's a bad strategy, because you're only giving yourself one chance at nailing the Pats' score.&lt;/p&gt;

&lt;p&gt;He&lt;/p&gt;

&lt;p&gt; says that's nonsense because it's all random and only one square wins &lt;br&gt;anyway. (I say that's true, but not all squares have the same &lt;br&gt;probability, as they would if the sport were, say, basketball. &lt;br&gt;Obviously, numbers like 4, 7 and 0 are better than 9&lt;br&gt; and 2.)&lt;/p&gt;

&lt;p&gt;I liked this example: "Pick two people to guess a number&lt;br&gt; between 1 and 10, but give one of them five guesses and the other guy &lt;br&gt;only one guess." That didn't sway him.&lt;/p&gt;

&lt;p&gt;Thoughts? &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mattdemazza</dc:creator><pubDate>Tue, 24 Jan 2012 19:42:36 -0000</pubDate></item><item><title>Re: Snippet: Where the F**k Was I?</title><link>http://www.dataists.com/2011/06/snippet-where-the-fk-was-i/#comment-330982407</link><description>&lt;p&gt;We wanted to let you know that your blog was included in our list of the top 50 statistics blogs of 2011. Our goal was to highlight blogs that students and prospective students will find useful and interesting in their exploration of the field. You can view the entire list at &lt;a href="http://www.thebestcolleges.org/best-statistics-blogs/" rel="nofollow"&gt;http://www.thebestcolleges.org...&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Congratulations!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">The Best Colleges</dc:creator><pubDate>Mon, 10 Oct 2011 11:39:54 -0000</pubDate></item><item><title>Re: A Taxonomy of Data Science</title><link>http://www.dataists.com/2010/09/a-taxonomy-of-data-science/#comment-319973557</link><description>&lt;p&gt;OSEMN rhymes with possum, but it is a homophone for awesome.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mike</dc:creator><pubDate>Mon, 26 Sep 2011 01:00:06 -0000</pubDate></item><item><title>Re: A Taxonomy of Data Science</title><link>http://www.dataists.com/2010/09/a-taxonomy-of-data-science/#comment-319971148</link><description>&lt;p&gt;I like that the Machine Learning steps are "E-M", how appropriate!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Dataist</dc:creator><pubDate>Mon, 26 Sep 2011 00:51:26 -0000</pubDate></item><item><title>Re: The Data Science Venn Diagram</title><link>http://www.dataists.com/2010/09/the-data-science-venn-diagram/#comment-318671200</link><description>&lt;p&gt;Having taught data mining before and teaching it again, this is a great way of putting things into perspective. I could not agree more with the observation that science can only arise when you bring all three of them together!&lt;/p&gt;

&lt;p&gt;PS: May I please use it to warn my students what they have signed up for?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Claudia Perlich</dc:creator><pubDate>Fri, 23 Sep 2011 18:57:19 -0000</pubDate></item><item><title>Re: Accentuate.us: Machine Learning for Complex Language Entry</title><link>http://www.dataists.com/2011/04/accentuateus/#comment-314669566</link><description>&lt;p&gt;Thanks so much Franking for the kind word, I'm glad you enjoyed our post! Please let me know if you have any questions, I'd be happy to answer.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Schade</dc:creator><pubDate>Sun, 18 Sep 2011 23:44:34 -0000</pubDate></item><item><title>Re: Accentuate.us: Machine Learning for Complex Language Entry</title><link>http://www.dataists.com/2011/04/accentuateus/#comment-314666004</link><description>&lt;p&gt;This is a nice content.There is exactly one candidate unicodification in the training data: nios.I like this one.This is a great article.The written skill is so good.I appreciate to this well informative blog.Keep sharing.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Franking Machine</dc:creator><pubDate>Sun, 18 Sep 2011 23:31:50 -0000</pubDate></item><item><title>Re: Ranking the popularity of programming languages</title><link>http://www.dataists.com/2010/12/ranking-the-popularity-of-programming-langauges/#comment-306283208</link><description>&lt;p&gt;This is for all you Developers and Hackers! &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Andres Castañeda</dc:creator><pubDate>Fri, 09 Sep 2011 18:34:51 -0000</pubDate></item><item><title>Re: Ranking the popularity of programming languages</title><link>http://www.dataists.com/2010/12/ranking-the-popularity-of-programming-langauges/#comment-300577803</link><description>&lt;p&gt;Would it be possible to use the same methodology to rank testing frameworks? (He says, having written JUnit).&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Kent Beck</dc:creator><pubDate>Thu, 01 Sep 2011 15:08:06 -0000</pubDate></item><item><title>Re: Snippet: The Popularity of Data Analysis Software</title><link>http://www.dataists.com/2011/04/snippet-the-popularity-of-data-analysis-software/#comment-269617203</link><description>&lt;p&gt;Interesting.  One factor with regard to Minitab use:  it's commonly used by Six Sigma consultancies and for training in Six Sigma.  That puts it in Big Corp land, of course. &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">BuggyFunBunny</dc:creator><pubDate>Sat, 30 Jul 2011 22:07:20 -0000</pubDate></item><item><title>Re: Accentuate.us: Machine Learning for Complex Language Entry</title><link>http://www.dataists.com/2011/04/accentuateus/#comment-256108685</link><description>&lt;p&gt;Philip, there was certainly no intent to misrepresent prior work on this problem. I'll rephrase the comment following the citation to Yarowsky's paper and the others - I was trying to distinguish earlier approaches from ones based on character n-grams.  "These papers all rely on pre-existing NLP resources..." is, I agree, incorrect as stated.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Kevin Scannell</dc:creator><pubDate>Mon, 18 Jul 2011 12:32:20 -0000</pubDate></item><item><title>Re: Accentuate.us: Machine Learning for Complex Language Entry</title><link>http://www.dataists.com/2011/04/accentuateus/#comment-255984628</link><description>&lt;p&gt;I'm afraid this post, and the paper, are misrepresenting the prior art.  I don't doubt that there is practical value in the solution promoted here, but Yarowsky's work, cited briefly in the paper, applied purely data-driven machine learning techniques to this problem on a large scale more than 15 years ago.  In their literature review, the authors lump that paper (Yarowsky 1994, &lt;a href="http://bit.ly/poeEkP)" rel="nofollow"&gt;http://bit.ly/poeEkP)&lt;/a&gt; in with others that "rely on pre-existing NLP resources such as electronic dictionaries and part-of-speech taggers", but this suggests they may not have read that paper with care.  Yarowsky's paper utilized absolutely no pre-existing dictionaries, taggers, or any other resources; the range of possible diacritizations and contextual evidence for their &lt;br&gt;disambiguation were discovered solely from statistical analysis of 49 &lt;br&gt;million words of monolingual Spanish text and 20 million words of &lt;br&gt;monolingual French text, which was very large coverage for the era.  Again, I'm sure there is significant value in the scaling up that's done here, but I do hope the authors will amend their attribution of credit when it comes to innovations in approaches to this problem.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Philip Resnik</dc:creator><pubDate>Mon, 18 Jul 2011 08:58:59 -0000</pubDate></item><item><title>Re: Accentuate.us: Machine Learning for Complex Language Entry</title><link>http://www.dataists.com/2011/04/accentuateus/#comment-222904907</link><description>&lt;p&gt;You mention you are interested in hearing from people in the DM feild.  What is the best way to get in contact with you guys?&lt;/p&gt;

&lt;p&gt;Thanks, and nice blog.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Matthew R. Goodman</dc:creator><pubDate>Thu, 09 Jun 2011 21:55:55 -0000</pubDate></item></channel></rss>