ÄúµÄÎ»ÖÃ > Ê×Ò³ > ÉÌÒµÖÇÄÜ > Statistics 101: Introduction to the Central Limit Theorem (with implementation i ...

À´Ô´£º·ÖÎö´óÊ¦ | 2019-05-03 | ·¢²¼£º¾¹ÜÖ®¼Ò

What is one of the most important and core concepts of statistics that enables us to do predictive modeling, and yet it often confuses aspiring data scientists? Yes, I¡¯m talking about the central limit theorem.It is a powerful statistical concept that every data scientist MUST know. Now, why is that?Well, the central limit theorem (CLT) is at the heart of hypothesis testing ¨C a critical component of the data science lifecycle.That¡¯s right, the idea that lets us explore the vast possibilities of the data we are given springs from CLT. It¡¯s actually a simple notion to understand, yet most data scientists flounder at this question during interviews.We will understand the concept of Central Limit Theorem (CLT) in this article. We¡¯ll see why it¡¯s important, where it¡¯s used and then learn how to apply it in R.I recommend going through the below article if you need a quick refresher on distribution and its various types: Let¡¯s understand the central limit theorem with the help of an example. This will help you intuitively grasp how CLT works underneath.Consider that there are 15 sections in the science department of a university and each section hosts around 100 students. Our task is to calculate the average weight of students in the science department. Sounds simple, right?The approach I get from aspiring data scientists is to simply calculate the average:But what if the size of the data is humongous? Does this approach make sense? Not really ¨C measuring the weight of all the students will be a very tiresome and long process. So, what can we do instead? Let¡¯s look at an alternate approach.Source: http://www.123rf.comThis, in a nutshell, is what the central limit theorem is all about. If you take your learning through videos, check out the below introduction to the central limit theorem. This is part of the comprehensive statistics module in the ¡®Introduction to Data Science¡¯ course:Let¡¯s put a formal definition to CLT:Given a dataset with unknown distribution (it could be uniform, binomial or completely random), the sample means will approximate the normal distribution.These samples should be sufficient in size. The distribution of sample means, calculated from repeated sampling, will tend to normality as the size of your samples gets larger.Source: corporatefinanceinstitute.comThe central limit theorem has a wide variety of applications in many fields. Let us look at them in the next section.The central limit theorem has both statistical significance as well as practical applications. Isn¡¯t that the sweet spot we aim for when we¡¯re learning a new concept?We¡¯ll look at both aspects to gauge where we can use them.Source: http://srjcstaff.santarosa.eduSource: projects.fivethirtyeight.comThe central limit theorem has many applications in different fields. Can you think of more examples? Let me know in the comments section below the article ¨C I will include them here.Before we dive into the implementation of the central limit theorem, it¡¯s important to understand the assumptions behind this technique:In general, a sample size of 30 is considered sufficient when the population is symmetric.The mean of the sample means is denoted as: X = where,And, the standard deviation of the sample mean is denoted as: X = /sqrt(n)where,And that¡¯s it for the concept behind central limit theorem. Time to fire up RStudio and dig into CLT¡¯s implementation!Excited to see how we can code the central limit theorem in R? Let¡¯s dig in then.A pipe manufacturing organization produces different kinds of pipes. We are given the monthly data of the wall thickness of certain types of pipes. You can download the data here.The organization wants to analyze the data by performing hypothesis testing and constructing confidence intervals to implement some strategies in the future.The challenge is that the distribution of the data is not normal.Note: This analysis works on a few assumptions and one of them is that the data should be normally distributed.The central limit theorem will help us get around the problem of this data where the population is not normal. Therefore, we will simulate the central limit theorem on the given dataset in R step-by-step. So, lets get started.First, import the CSV file in R and then validate the data for correctness:Output:Next,calculate the population mean and plot all the observations of the data:Output:

±¾ÎÄÒÑ¾¹ýÓÅ»¯ÏÔÊ¾£¬²é¿´ÔÎÄÇëµã»÷ÒÔÏÂÁ´½Ó£º

²é¿´ÔÎÄ£ºhttps://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/

²é¿´ÔÎÄ£ºhttps://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/

- ¡¾¾¹ÜÖ®¼Ò¡¿ ÖÐ¹ú¹¤³ÌÔºÔºÊ¿Â¬Îý³ÇÌ¸ÖÇÄÜ¼ÆËã£ºÓ¦µ±º»Êµ¼¼Êõ»ù´¡ ... 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ Çå»ª´óÑ§Ðû²¼³ÉÁ¢ÈË¹¤ÖÇÄÜÑ§ÌÃ°à£¬Í¼Áé½±µÃÖ÷Ò¦ÆÚÖÇ ... 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ ¼¸¸öÖ÷Òª»ú¹¹ÔÚICML 2019Í¶µÝµÄÂÛÎÄÖÐÖØµã¹Ø×¢µÄ·½Ïò ... 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ »æ»´´×÷Ð¡³ÌÐò¡°ÉÙÅ®»¼ÒÐ¡±ù¡±ÕýÊ½½âËø 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ ¿ªÔ´Èí¼þÒ²ÒªÔâ½û£¿×¨¼Ò£ºÎÞÐè¿Ö»Å 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ ×îÐÂ£¡ÉîÈë½â¶ÁËÄµØÈË¹¤ÖÇÄÜÕþ²ß¼°½øÕ¹£¡£¨ÎÄÄ©ÓÐÁÁ ... 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ AIÄÜÐ´ÂÛÎÄÁË£¡»ªÈË±¾¿ÆÉú·¢Ã÷AIÂÛÎÄÉú³ÉÆ÷ÈË¹¤ÖÇÄÜ 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ ×ß½üÒÀÍ¼¿Æ¼¼£ºÉî¸ûÒ½ÁÆµÄ¿ç½çAI¶À½ÇÊÞÆóÒµ 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ ÍÆÌØ2200ÔÞ£ºÒ»Ö»AIÑµÁ·ÅÅ³ö180¶Ö¶þÑõ»¯Ì¼ÈË¹¤ÖÇÄÜ 05-27
- ¡¾¾¹ÜÖ®¼Ò¡¿ ×¨·ÃInterSystems£ºÒ½ÁÆÊý¾ÝÖúÍÆÐÐÒµÊý×Ö»¯×ªÐÍ 05-27

- 1 Ë«Ãæ¡°Íõ³¯¡±£ºÂ¬¿¡ÇäºÍËûµÄ¡°Â¬ÊÏÍõ³¯ ...
- 2 ²Ìºê²¨£ºÖÐÃÀÃ³Ò×Õ½¶ÔÎÒ¹ú¾ÍÒµµÄÓ°ÏìÓÐ ...
- 3 ¾¼Ã¹ÜÀíÑ§Ôº2016ÄêµÚÒ»´Îµ³ÕþÁªÏ¯»áÒé ...
- 4 ÎÒÐ£ÕÙ¿ª¹á³¹ÂäÊµÈ«Ê¡½ÌÓýÏµÍ³·À·¶ºÍµÖ ...
- 5 É½Î÷Ê¡½ÌÓý¿ÆÑ§¡°Ê®ÈýÎå¡± ¹æ»®2016Äê¶È ...
- 6 2018Ñ§Êõ±¨¸æÏµÁÐÖ®Ê®-ÀµÓÀÔö ½ÌÊÚ
- 7 ÎÒÔºÔºÓÑ´÷Ë«½ÉèÁ¢¡°Ð£ÓÑ½±Ñ§½ð¡±
- 8 ÔóÈªÍ¶×Ê¶À°ÔÇ°Á½Ãû Ë½Ä¼ÕýÊÕÒæ²úÆ·³¬Áù ...
- 9 Ê×½ìÈ«¹ú´óÑ§Éú±£ÏÕÐÂ²úÆ·´´Òâ´óÈü¾öÈü ...

¾©ICP±¸11001960ºÅ¡¡ ¾©ICPÖ¤090565ºÅ ¾©¹«Íø°²±¸1101084107ºÅ¡¡ÂÛÌ³·¨ÂÉ¹ËÎÊ£ºÍõ½øÂÉÊ¦ÖªÊ¶²úÈ¨±£»¤ÉùÃ÷ÃâÔð¼°ÒþË½ÉùÃ÷ ¡¡ Ö÷°ìµ¥Î»£ºÈË´ó¾¼ÃÂÛÌ³ °æÈ¨ËùÓÐ

ÁªÏµQQ£º2881989700 ¡¡ÓÊÏä£ºservice@pinggu.org

ºÏ×÷×ÉÑ¯µç»°£º(010)62719935 ¹ã¸æºÏ×÷µç»°£º13661292478£¨ÁõÀÏÊ¦£©

Í¶Ëßµç»°£º(010)68466864 ²»Á¼ÐÅÏ¢´¦Àíµç»°£º(010)68466864

ÁªÏµQQ£º2881989700 ¡¡ÓÊÏä£ºservice@pinggu.org

ºÏ×÷×ÉÑ¯µç»°£º(010)62719935 ¹ã¸æºÏ×÷µç»°£º13661292478£¨ÁõÀÏÊ¦£©

Í¶Ëßµç»°£º(010)68466864 ²»Á¼ÐÅÏ¢´¦Àíµç»°£º(010)68466864