1.In this problem and the next one, we’re going to make a very simple spam checker program by just looking
ooking at how likely a given email is to be spam based on the words it contains. In particular, in this problem we’re going to count how often words are present in spam emails within some set of training data (which here means a set of emails that have already been marked as spam or not spam manually).
We have already started to write a function spam_score(spam_file, not_file, word), which takes in two filenames, along with a target word (a lowercase string). Both filenames refer to text files which must be in the same directory as hw07.py (we’ve provided several such files in hw07files.zip). The text files contain one email per line (really just the subject line to keep things simple) - you can assume that these emails will be a series of words separated by spaces with no punctuation. The first file contains emails that have been identified as spam, the second contains emails that have been identified as not spam.
Since you haven’t learned File I/O yet, we’ve provided code that opens the two files and puts the data into two lists of strings (where each element is one line - that is, one email). You then must complete the function, so that it returns the spam score for the target word. The spam score is an integer representing the total number of times the target word occurs across all the spam emails, minus the total number of times the word occurs in not-spam emails. Convert all words to lowercase before counting, to ensure capitalization does not throw off the count.
2.students are expected to research and compose a paper based on the application of concepts and theories examined in class.
ies examined in class. This paper is not a literature review, though a literature review is part of your work. As this course takes place in a compressed timeline, I provided some suggestions for research topics. Feel free to use one of these as a springboard or propose your own.
At the end of the second week of class, students submit a three-page research paper prospectus. A research prospectus is a preliminary plan for conducting a study. This is not a detailed and technical research proposal, but rather, an analysis of the issues likely confronted in such a study. In essence, it is a preliminary proposal of work.
Research Paper Prospectus Elements
To complete the Research Paper Prospectus, consider the following elements. While the prospectus is limited to three pages of body content, remember, students must cover each of these areas as relevant to the plan for research:
Research Problem. What is the research problem? A problem is a situation when left untreated, produces a negative consequence for a group, an institution, or a(n) individual(s). What makes it a problem? For whom? Who says so?
Assumptions. On what assumptions is the work based? Which assumptions are verifiable in literature? Which assumptions are speculative?
Theoretical Issues. What theoretical issues arise from the study? For example, "theoretically," how is the problem and suspected results explained to other scholars? Is there a behavior view? A social systems view? Are there other theoretical orientations to consider in the study's design?
Literature Review. What, in general, does the literature say about the topic? While more development is expected for the final paper, a review of major theories, research, and writers in the field is needed.
Research Questions. Based on the problem, what are the research questions to be answered? How and why will answering the questions contribute to solving the research problem? Remember....a research question can only be answered with empirical data or information.
General Research Plan. In general, what research is necessary to answer the research question. What kind of data is needed? Specify the type, such as surveys, observations, or interviews. Who is to be studied and why? How is the data reduced and made sense of? How is the quality of the data assured?
Anticipated Difficulties and Pitfalls. What kind of difficulties and pitfalls are expected in a study of this nature? What can be done to prevent them or minimize their effects?
Anticipated Benefits. Who will benefit from the fact this research is undertaken? How? Why? Who might be disturbed by this proposed study? How? Why?
Paper Format Requirements
The Research Paper Prospectus is presented in standard APA 7 format, with a cover page, running head, body, and references list. The cover page and references do not count toward the three-page requirement. The body uses headers and in-text citations in the manner prescribed by APA. Students should include any references they know at the time they submit the prospectus, though it is expected the references may change or increase in number. Full and complete adherence to APA is required.
As APA format is the rule, remember the formatting rules shown on the Sample Paper (Links to an external site.):
Times New Roman, 12pt
1" margins on all sides
Double spaced, with extra line spaces removed (see below)
Page numbers in the upper right
Two spaces after concluding punctuation
150-250 word abstract with keywords
APA-style in-text citations and quote format. Use the Purdue OWL in-text citation information (Links to an external site.)to help you.
Alphabetical (by author) reference page with correct reference format. DO NOT trust the reference generator in your word processing program. It is WRONG! Use the Purdue OWL references information (Links to an external site.)to correctly structure references and do so manually.
You are asked to carry out a study on behalf of a business analytics specialised consultancy on a subsample
on a subsample of weekly data from Randall’s Supermarket, one of the biggest in the UK. Randall’s marketing management team wishes to identify trends and patterns in a sample of weekly data collected for a number of their loyalty cardholders during a 26-week period. The data includes information on the customers’ gender, age, shopping frequency per week and shopping basket price. Randall’s operates two different types of stores (convenient stores and superstores) but they also sell to customers via an online shopping platform. The collected data are from all three different types of stores. Finally, the data provides information on the consistency of the customer’s shopping basket regarding the type of products purchased. These can vary from value products, to brand as well as the supermarket’s own high-quality product series Randall’s Top. As a business analyst you are required to analyse those data, make any necessary modifications in order to determine whether for any single customer it is possible to predict the value of their shopping basket.
Randall’s marketing management team is only interested in identifying whether the spending of the potential customer will fall in one of three possible groups including:
• Low spender (shopping basket value of £25 or less)
• Medium Spender (shopping basket value between £25.01 and £70) and
• High spenders (shopping basket greater than £70)
For the purpose of your analysis you are provided with the data set Randall’s.xls. You have to decide, which method is appropriate to apply for the problem under consideration and undertake the necessary analysis. Once you have completed this analysis, write a report for the Randall’s marketing management team summarising your findings but also describing all necessary steps undertaken in the analysis. The manager is a competent business analyst himself/herself so the report can include technical terms, although you should not exceed five pages. Screenshots and supporting materials can be included in the appendix.
After completing your analysis, you should submit a report that consists of two parts. Part A being a non-technical summary of your findings and Part B a detailed report of the analysis undertaken with more details.
Part A: A short report for the Head of Randall’s Marketing Management (20 per cent). This should briefly explain the aim of the project, a clear summary and justification of the methods considered as well as an overview of the results.
Although, the Head of Randall’s Marketing Management team who will receive this summary is a competent business analytics practitioner, the majority of the other team members have little knowledge of statistical modelling and want to know nothing about the technical and statistical underpinning of the techniques used in this analysis. This report should be no more than two sides of A4 including graphs, tables, etc. In this report you should include all the objectives of this analysis, summary of data and results as well as your recommendations (if any).
Part B: A technical report on the various stages of the analysis (80 per cent).
The analysis should be carried out using the range of analytics tools discussed:
• SPSS Statistics
Ensure that the exercise references:
• Binary and multinomial logistic regression
• Linear vs Logistic regression
• Logit Model with odds Ratio
• Co-efficients and Chi Squared
• MLR co-efficients
• Assessing usefulness of MLR model
• Interpreting a model
• Assessing over-all model fit with Psuedo R-Squared measures
• Classification accuracy (Hit Ratio)
• Wald Statistic
• Odd ratio exp(B)
• Ratio of the probability of an event happening vs not happening
• Ratio of the odds after a unit change in the predictor to the original odds
• Residuals analysis
• Cook’s distance
• Adequacy (with variance inflation factor VIF and tolerance statistic)
• Outliers and influential points cannot just be removed. We need to check them (typo? – unusual data?)
• Check for multicollinearity
Write a short and concise report to explain the technical detail of what you have done for each step of the analysis.
The report should also cover the following information:
• Any type of analysis that might be useful and check whether the main assumptions behind the analyses do not hold or cannot be
• Give evidence of the understanding of the statistical tools that you are using. For example, comment on the model selection procedure and the coefficient interpretation, e.g. comment on the interpretation of the logistic regression coefficients if such a method is used and provide an example of
• Conclusions and explanation, in non-technical terms, of the main points
Design and write pseudocode using a repetition structure
Design and write Java for a class, including attributes, accessors, mutators, and
va for a class, including attributes, accessors, mutators, and constructors.
Design and write Java for an application program that instantiates and uses objects of a user-defined class.
Use the repetition structure in class methods and application program modules.
Perform error checking.
Use a graphical drawing program (ArgoUML) to create class diagrams.
Directions for completing and submitting the homework:
You will be submitting the following files:
Pseudocode written with Word, Notepad++, or similar application
The application class created in 3b below
The UML class diagram created in ArgoUML, Raptor, or similar application
Write the pseudocode needed to complete Chapter 5, number 9 – Pennies for Pay.
Implement Pennies for Pay in Java.
The Secondhand Rose Resale Shop is having a seven-day sale during which the price of any unsold item drops 10 percent each day. Design a class diagram showing the class, the application program, the relationship between the two, and multiplicity. Then write the Java code as described below. Be sure to follow the CSI 117 Style Criteria (Links to an external site.) for naming conventions, class diagrams, pseudocode, keywords, and operators.
An Inventory class that contains:
an item number and the original price of the item. Include the following:
A default constructor that initializes each attribute to some reasonable default value for a non-existent inventory item.
Another constructor method that has a parameter for each data member, called the overloaded constructor. This constructor initializes each attribute to the value provided when an object of this type is instantiated. Be sure to incorporate adequate error checking for all numeric attributes.
Accessor and mutator methods for each attribute. Be sure to incorporate adequate error checking for all numeric attributes.
Extra credit for including Javadoc comments.
An application program that contains two methods: the main() module and the printSaleData()module.
The main()module must do the following:
create an Inventory object using the default constructor
use a loop to get inventory items from the user. The user should enter the item number and the original price of the item. This loop should continue until the user indicates that they have no more items to enter. For each item entered by the user, the code inside the loop should do the following 2 items:
set the attributes of the Inventory object by calling the appropriate method in the Inventory class for each item entered by the user
send the Inventory items, one at a time, to the printSaleData() module for processing
Extra credit for including Javadoc comments.
The printSaleData()module must accept an Inventory object and produce a report that shows the item number and the price of an inventory item on each day of the sale, one through seven, using a loop. For example, an item with an original price of $10.00 costs 10 percent less, or $9.00, on the first day of the sale. On the second day of the sale, the same item is 10 percent less than $9.00, or $8.10.