
This
The field of information retrieval is concerned with finding relevant electronic documents based upon a query. For example, given a group of keywords (the query), a search engine retrieves Web pages (documents)and displays them sorted by relevance to the query. This technology requires a way to compare a document with the query to see which is most relevant to the query.
A simple way to make this comparison is to compute the binary cosine coefficient. The coefficient is a value between 0 and 1, where 1 indicates that the query is very similar to the document and 0 indicates that the query has no keywords in common with the document. This approach treats each document as a set of words. For example, given the following sample document:
“Chocolate ice cream, chocolate milk, and chocolate bars are delicious.” This document would be parsed into keywords where case is ignored and punctuation discarded and turned into the set containing the words {chocolate, ice, cream, milk, and, bars, are, delicious}. An identical processis performed on the query to turn it into a set of strings. Once we have a query Q represented as a set of words and a document D represented as a set of words, the similarity between Q and D is computed by:
Sim=| Q∩D || Q | | D |
Modify the StringSet from Programming Project 12 by adding an additional member function that computes the similarity between the current StringSet and an input parameter of type StringSet. The sqrt function is in the cmath library.
Create two text files on your disk named Document1.txt and Document2.txt. Write some text content of your choice in each file, but make sure that each file contains different content. Next, write a program that allows the user to input from the keyboard a set of strings that represents a query. The program should then compare the query to both text files on the disk and output the similarity to each one using the binary cosine coefficient. Test your program with different queries to see if the similarity metric is working correctly.

Want to see the full answer?
Check out a sample textbook solution
Chapter 11 Solutions
Problem Solving with C++ (10th Edition)
Additional Engineering Textbook Solutions
Starting Out with Programming Logic and Design (5th Edition) (What's New in Computer Science)
Database Concepts (8th Edition)
Modern Database Management
Java: An Introduction to Problem Solving and Programming (8th Edition)
Electric Circuits. (11th Edition)
Thinking Like an Engineer: An Active Learning Approach (4th Edition)
- Please no AI! Or if you do use AI, Check the work please! Thank you!arrow_forward(Dynamic Programming.) Recall the problem presented in Assign- ment 3 where given a list L of n ordered integers you're tasked with removing m of them such that the distance between the closest two remaining integers is maxi- mized. See Assignment 1 for further clarification and examples. As it turns out there is no (known) greedy algorithm to solve this problem. However, there is a dynamic programming solution. Devise a dynamic programming solution which determines the maximum distance between the closest two points after removing m numbers. Note, it doesn't need to return the resulting list itself. Hint 1: Your sub-problems should be of the form S(i, j), where S(i, j) returns the maximum distance of the closest two numbers when only considering removing j of the first i numbers in L. As an example if L [3, 4, 6, 8, 9, 12, 13, 15], then S(4, 1) = 2, since the closest two values of L' = [3,4,6,8] are 6 and 8 after removing 4 (note, 8-6 = = 2). = Hint 2: For the sub-problem S(i, j),…arrow_forward
- (Greedy Algorithms) Describe an efficient algorithm that, given a set {x1, x2, . . ., xn} of points on the real line, determines the smallest set of unit-length closed intervals that contains all of the given points. Argue that your algorithm is correct.arrow_forwardWhat does the value of the top variable indicate in this ArrayStack implementation? What will happen if we call pop on this stack? What value will be returned, and what changes will occur in the array and the top variable? 3. If we push the value "echo" onto the stack, where will it be stored in the array, and what will be the new value of top? 4. Explain why index 0 contains the string "alpha" even though top is currently 3. 5. What would the state of the stack look like (values in the array and value of top) after two consecutive pop 0 operations?arrow_forwardPlease solve and show all work. Suppose there are four routers between a source and a destination hosts. Ignoring fragmentation, an IP datagram sent from source to destination will travel over how many interfaces? How many forwarding tables will be indexed to move the datagram from the source to the destination?arrow_forward
- Please solve and show all work. When a large datagram is fragmented into multiple smaller datagrams, where are these smaller datagrams reassembled into a single large datagram?arrow_forwardPlease solve and show all steps. True or false? Consider congestion control in TCP. When the timer expires at the sender, the value of ssthresh is set to one-half of the last congestion window.arrow_forwardPlease solve and show all work. What are the purposes of the SNMP GetRequest and SetRequest messages?arrow_forward
- Please solve and show all steps. Three types of switching fabrics are discussed in our course. List and briefly describe each type. Which, if any, can send multiple packets across the fabric in parallel?arrow_forwardPlease solve and show steps. List the four broad classes of services that a transport protocol can provide. For each of the service classes, indicate if either UDP or TCP (or both) provides such a service.arrow_forwardPlease solve and show all work. What is the advantage of web caches, and how does it work?arrow_forward
- C++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningC++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology PtrNew Perspectives on HTML5, CSS3, and JavaScriptComputer ScienceISBN:9781305503922Author:Patrick M. CareyPublisher:Cengage Learning
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENTMicrosoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,




