1.a) Carbon tetrachloride is an organic compound with the chemical formula CCl4. It is a
colorless liquid with a “sweet” smell.
quid with a “sweet” smell.
i. Using the ground state electron configuration and excited state electron
configuration explain the hybridization of the central Carbon (C) atom. (7 Marks)
ii. Identify the orbitals that overlap to form the C-Cl bond. Draw a diagram to show
the orbital overlap. (3 Marks)
iii. What is the bond angle of CCl4? (1 Mark)
b) Consider a Fluorine atom (F) and a Fluorine anion (F-). Which of these two species would
you expect to have a larger radius? Explain your answer. (5 Marks)
c) Explain why the first ionization energy of Aluminum (Al) is less than that of Magnesium
(Mg). (4 Marks)
d) Assume the atom Oxygen(O) can form both cationic(O+) and anionic(O-) species. Place
the following species in order of increasing first ionization energy, starting with the lowest.
O+, O, O-
a) Sea water contains roughly 28.0 g of NaCl per liter. (NaCl molar mass = 58.44 gmol-1).
i. Calculate the number of moles of NaCl in a liter of sea water. (2 Marks)
ii. Calculate the molarity of NaCl in sea water. (4 Marks)
iii. Calculate the mass by volume percent (W/V) of NaCl in sea water. (4 Marks)
You are asked to carry out a study on behalf of a business analytics specialised consultancy on a subsample
on a subsample of weekly data from Randall’s Supermarket, one of the biggest in the UK. Randall’s marketing management team wishes to identify trends and patterns in a sample of weekly data collected for a number of their loyalty cardholders during a 26-week period. The data includes information on the customers’ gender, age, shopping frequency per week and shopping basket price. Randall’s operates two different types of stores (convenient stores and superstores) but they also sell to customers via an online shopping platform. The collected data are from all three different types of stores. Finally, the data provides information on the consistency of the customer’s shopping basket regarding the type of products purchased. These can vary from value products, to brand as well as the supermarket’s own high-quality product series Randall’s Top. As a business analyst you are required to analyse those data, make any necessary modifications in order to determine whether for any single customer it is possible to predict the value of their shopping basket.
Randall’s marketing management team is only interested in identifying whether the spending of the potential customer will fall in one of three possible groups including:
• Low spender (shopping basket value of £25 or less)
• Medium Spender (shopping basket value between £25.01 and £70) and
• High spenders (shopping basket greater than £70)
For the purpose of your analysis you are provided with the data set Randall’s.xls. You have to decide, which method is appropriate to apply for the problem under consideration and undertake the necessary analysis. Once you have completed this analysis, write a report for the Randall’s marketing management team summarising your findings but also describing all necessary steps undertaken in the analysis. The manager is a competent business analyst himself/herself so the report can include technical terms, although you should not exceed five pages. Screenshots and supporting materials can be included in the appendix.
After completing your analysis, you should submit a report that consists of two parts. Part A being a non-technical summary of your findings and Part B a detailed report of the analysis undertaken with more details.
Part A: A short report for the Head of Randall’s Marketing Management (20 per cent). This should briefly explain the aim of the project, a clear summary and justification of the methods considered as well as an overview of the results.
Although, the Head of Randall’s Marketing Management team who will receive this summary is a competent business analytics practitioner, the majority of the other team members have little knowledge of statistical modelling and want to know nothing about the technical and statistical underpinning of the techniques used in this analysis. This report should be no more than two sides of A4 including graphs, tables, etc. In this report you should include all the objectives of this analysis, summary of data and results as well as your recommendations (if any).
Part B: A technical report on the various stages of the analysis (80 per cent).
The analysis should be carried out using the range of analytics tools discussed:
• SPSS Statistics
Ensure that the exercise references:
• Binary and multinomial logistic regression
• Linear vs Logistic regression
• Logit Model with odds Ratio
• Co-efficients and Chi Squared
• MLR co-efficients
• Assessing usefulness of MLR model
• Interpreting a model
• Assessing over-all model fit with Psuedo R-Squared measures
• Classification accuracy (Hit Ratio)
• Wald Statistic
• Odd ratio exp(B)
• Ratio of the probability of an event happening vs not happening
• Ratio of the odds after a unit change in the predictor to the original odds
• Residuals analysis
• Cook’s distance
• Adequacy (with variance inflation factor VIF and tolerance statistic)
• Outliers and influential points cannot just be removed. We need to check them (typo? – unusual data?)
• Check for multicollinearity
Write a short and concise report to explain the technical detail of what you have done for each step of the analysis.
The report should also cover the following information:
• Any type of analysis that might be useful and check whether the main assumptions behind the analyses do not hold or cannot be
• Give evidence of the understanding of the statistical tools that you are using. For example, comment on the model selection procedure and the coefficient interpretation, e.g. comment on the interpretation of the logistic regression coefficients if such a method is used and provide an example of
• Conclusions and explanation, in non-technical terms, of the main points
4.your boss at the bank finally gives you its current rough estimate of the bank’s average costs for each type
each type of classification error.
[Note that all bank models here include only profits and losses within three years of when a card is issued, so the impact of out-years (years beyond 3) can be ignored.]
Cost Per False Negative: $5000
Cost Per False Positive: $2500
For the 600 individuals that were automatically given cards without being classified, the total cost of the experiment turned out to be 25%*($5000)*600 or $750,000. This is $1,250 per event.
Only models with lower cost per event than $1,250 should have any value.
Question: What is the threshold score on the Training Set data for your model that minimizes Cost per Event? You will need this number to answer later questions.
Hint: Using theAUC Calculator Spreadsheet, identify which Column displays the same cost-per-event (row 17) as the overall minimum cost-per-event shown in Cell J2. The threshold is shown in row 10 of that Column. What the threshold means is that at and above this number everything is classified as a "default."