HomePagev2 Forums Interviews Prep What is data sampling?

  • Satish Vadlamani

    Administrator
    September 2, 2020 at 10:34 am

    Please add to this or explain it better.

    What is data sampling?

    Is a statistical analysis technique, where we take a representative sample of a larger data set for further understanding and pattern recognition. When you have huge amounts of data to analyze, instead of analysing the entire dataset we take a sample of the data. This sample data describes the and explains the entire data set.

    Types of sampling?

    Random sampling: Randomly choose data points from the larger dataset. In python we could use random.sample(), in pandas we have df.sample()

    Stratified sampling: Small groups of the data set is created based on some common factor, and samples are randomly collected from each subgroup. Example: sklearn.model_selection.train_test_split(*arrays, **options), sklearn.model_selection.StratifiedShuffleSplit(n_splits=10, *, test_size=None, train_size=None, random_state=None)

    Cluster sampling: The larger data set is clustered or divided into buckets based on a factor, then a random sampling of clusters is analyzed.

    Multistage sampling: A more complicated version of cluster sampling. In this method we larger population into a number of clusters, like in cluster sampling. And in the next stage these clusters are further broken into clusters based on a another factor, and these new clusters are then samples are analyzed. This process can continue.

    Systematic sampling: We set a predefined limit till which we would pull the data from the population. Example, first 100 rows in a data frame or in excel.

    The above methods are probability sampling, we can also do nonprobability sampling. In this technique, the analyst decides based on knowledge and experience, which data is important.

    Nonprobability data sampling methods include:

    Convenience sampling: Data is collected from a convenient group or easily available group.

    Consecutive sampling: Data is collected as long as a give criteria and the predetermined sample size is met.

    Purposive: The data is selected based analysts judgment.

    Quota sampling: The analyst ensures that each group selected from the data represents the population.

  • Trever Ehrlich

    Administrator
    September 2, 2020 at 4:39 pm

    @satishvadlamani I really should learn this, thanks for posting!

  • Sai Gowtham Babu AMBURI Gowtham Babu AMBURI

    Member
    September 10, 2020 at 4:31 am

    Thanks for posting sir

    I really don’t know that sampling have this many topics involved in it land now I does know about it.

    And also sir can you please add @ to tag all of us so that we Can  get messages to our mail ids

    Thank you 😊

Log in to reply.

Original Post
0 of 0 posts June 2018
Now