2.(1) The claim by a weight loss Company is that on average, the client will lose 10 pounds over
first 2 weeks. 50 people who joined the programme are sampled, their weight loss is 9 pounds with a standard deviation of 2.8 pounds. Can we conclude at the .05 level that a person joining the programme will lose less than 10 pounds?
(2) The following is a random sample of 90-day futures prices in dollars for 1 troy oz. of silver from The Wall Street Journal issues in May and June of 1997: 4.74, 4.77, 4.87, 4.91, 4.83, 4.72, 4.92, 4.86, 4.97, 4.71, 4.90, 4.93, 4.75, 4.88, 4.79, 4.83, 4.89.
a. Calculate the mean
c. Standard deviation of the 90-day future price of silver data
(3) A mining company needs to estimate the average amount of copper ore per ton mined. A random sample of 50 tons gives a sample mean of 146,75 pounds. The population standard deviation is assumed to be 35.2 pounds.
a. Give a 95% confidence interval for the average amount of copper in the population of tons mined.
b. Give a 90% confidence interval for the average amount of coper per ton
c. Give a 99% confidence interval for the average amount of coper per ton
(4) An e-commerce Website gets 2,385 visitors on a particular day. Among these, 1790 visitors explore the products by looking at more pages at the site. Among these 1790 visitors who explore the products, 387 make a purchase.
a. If a visitor chosen at random from all those who visited the site, what is the probability that the visitor explored the products
b. If a visitor is chosen at random from all those who visited the site, what is the probability that the visitor made a purchase.
c. If a visitor is chosen at random from all those who explored the products, what is the probability that the visitor made a purchase.
d. Which of the preceding three probabilities is relevant to the design of the home page that leads to product page.
3.In this problem and the next one, we’re going to make a very simple spam checker program by just looking
ooking at how likely a given email is to be spam based on the words it contains. In particular, in this problem we’re going to count how often words are present in spam emails within some set of training data (which here means a set of emails that have already been marked as spam or not spam manually).
We have already started to write a function spam_score(spam_file, not_file, word), which takes in two filenames, along with a target word (a lowercase string). Both filenames refer to text files which must be in the same directory as hw07.py (we’ve provided several such files in hw07files.zip). The text files contain one email per line (really just the subject line to keep things simple) - you can assume that these emails will be a series of words separated by spaces with no punctuation. The first file contains emails that have been identified as spam, the second contains emails that have been identified as not spam.
Since you haven’t learned File I/O yet, we’ve provided code that opens the two files and puts the data into two lists of strings (where each element is one line - that is, one email). You then must complete the function, so that it returns the spam score for the target word. The spam score is an integer representing the total number of times the target word occurs across all the spam emails, minus the total number of times the word occurs in not-spam emails. Convert all words to lowercase before counting, to ensure capitalization does not throw off the count.
4.1. Choose some var columns as features. Explain why you choose those columns. It can be common sense, or statistics.
, or statistics.
2. Be careful with those missing data. They can be empty string, -1, -98, -99, etc. You will need to check the data and var dictionary to make your best judgement.
2. Use those selected columns to predict the "loan_default" column. You will try 3 machine learning algorithms
* Logistic regression
* Naive Bayes Classifier
3. For each algorithm, you should select features and fit the model, then predict and evaluate.
4. Try different techniques to improve the model score. You can choose different columns, transform data and normalize data. Show your improvements.
6.This first part of the Individual Research Project is an Outline and Annotated Bibliography. The
Outline should provide a very brief
tline should provide a very brief overview of what you think you will do in the Policy Brief.
The Annotated Bibliography requires you to summarize at least three peer-reviewed scholarly
sources you will cite in the Policy Brief.
This assignment is designed to get you thinking about your topic in a way that clearly anticipates
the writing you will do for the Policy Brief. We want you to brainstorm and do a bit of research
well in advance of the deadline for the Policy Brief and, most importantly, we want you to put
your ideas down on paper so that we can give you feedback before writing the actual Policy
Brief. In other words, we are asking you to submit an Outline and Annotated Bibliography so
that we can help you write the best Policy Brief possible.
Your Outline should be divided into the following five sections and should be written in
I. Audience: Identify the audience you are addressing and consider what that audience
is interested in. Who are you talking to in the Policy Brief and what does this suggest
about the approach you should take? (75-100 words).
II. Problem: State how you know the issue exists. What is the proof that students need
to improve this skill? (125-150 words).
III. Importance of Problem: Indicate why this problem matters. What are the
consequences of the problem not being addressed? Why do students need to improve
this skill? (100 words)
IV. Solution: Identify your preferred solution. What solution will work in your context
and why? (75-100 words)
V. Alternative Solution: Identify at least one other possible solution. What other
solutions did you consider? (75-100 words)
The total length of the Outline should be between 450 and 550 words.
When you submit your Outline, you must also include an Annotated Bibliography. An Annotated
Bibliography is an alphabetical list of research sources that provides bibliographical data (the
title, author, date, publisher, etc.) and a short summary or annotation of the source.
Your Annotated Bibliography should contain a minimum of three scholarly or peer-reviewed
sources, each with an accompanying annotation that is between 150 and 250 words long. The
annotations must summarize the research question or thesis, research methodology, results, and
conclusion. Annotations must include summaries and paraphrased information, NOT quotations.
A good annotation will include two separate paragraphs: 1) a paragraph summarizing the
research question or thesis, research methodology, results and conclusion; and 2) a paragraph
commenting on why this source is relevant for your research.