Asked in: Flipkart
def textQueries(sentences, queries):
h = {}
for index, sentence in enumerate(sentences):
words = list(sentence.split())
// ... rest of solution available after purchase
```
To solve this problem effectively, the first step is to clearly understand what the queries are asking and the constraints of exact word matching. Each query is a set of words, and for each query, you need to find all sentences that contain every single word from that query exactly. The result is a list of indices of sentences that satisfy this condition, or [-1] if none do.
Step 1: Understand the problem requirements
- You have an array of sentences, where each sentence is a string of words separated by spaces.
- You have an array of queries, each query is a string of one or more words separated by spaces.
- For each query, you want to find all sentence indices where every word in the query appears exactly (case-sensitive and exact match).
- If no sentence matches all words in a query, output [-1] for that query.
- Words in queries must match exactly to words in sentences. Partial matches or substrings do not count.
- You need to output a 2D list of matching indices for each query.
Step 2: Decompose the problem into manageable parts
- Splitting sentences and queries into lists of words is essential.
- You need a way to quickly determine if a sentence contains a certain word.
- Since queries can have multiple words, you must verify all words are present in the sentence.
- Doing this for each query by scanning all sentences repeatedly can be inefficient.
Step 3: Consider data structures for fast lookup
- The major bottleneck is to determine for each query which sentences contain all query words.
- Preprocessing sentences to map each word to a list of sentence indices containing that word can drastically speed up lookups.
- For example, create a dictionary (hash map) where keys are words, and values are sets or lists of sentence indices that contain that word.
- This inverted index allows quick retrieval of candidate sentences for each query word.
Step 4: Process each query using the inverted index
- For each query, split it into words.
- For each word, get the set/list of sentence indices from the inverted index.
- To find sentences containing all words, compute the intersection of these sets.
- The intersection will give sentence indices that contain every word in the query.
- If intersection is empty, return [-1].
- If non-empty, return the list of indices in sorted order (to maintain order consistency).
Step 5: Handle duplicates as per example
- The example shows duplicate indices in the output when the same sentence contains the query word multiple times.
- However, the problem states "determine which sentences contain all the words," suggesting each sentence should appear once per query.
- But since the example includes duplicates (like [0, 1, 1] for the third query), this implies you must count the number of occurrences of each query word in each sentence.
- For instance, for the query word "like," sentence 1 has "like" twice, so the index 1 is repeated twice.
- Therefore, for each query word, count how many times it appears in each sentence.
- For each sentence that contains all words, determine the minimum count of any query word in that sentence (i.e., how many times the sentence contains all query words simultaneously).
- Add that sentence index to the result list as many times as the minimum count.
- This detail makes the problem more complex than just checking presence.
Step 6: Efficient counting of word frequencies per sentence
- While preprocessing, instead of only storing sentence indices per word, store counts per sentence as well.
- For each word, maintain a dictionary mapping sentence index to the number of occurrences of that word in that sentence.
- Then, for each query, collect the count dictionaries for all query words.
- For each sentence index that appears in all query word dictionaries, find the minimum count across the query words.
- Add the sentence index to the result list that many times.
Step 7: Edge cases and correctness
- If a query word is not found in any sentence, immediately return [-1].
- For words that appear in some sentences but not others, only sentences present in all query words’ dictionaries are candidates.
- Make sure to handle the case when a sentence has zero count for a word (meaning it is excluded).
- Take care to process queries that contain repeated words; for example, query "like like" requires sentences to contain at least two occurrences of "like".
Step 8: Summary of the approach
- Preprocess sentences:
- For each sentence, tokenize into words.
- For each word, update the inverted index dictionary with counts per sentence.
- For each query:
- Tokenize into words.
- For each query word, get the dictionary of sentence counts.
- Find sentences that appear in all query words’ dictionaries.
- For each such sentence, determine the minimum count of any query word.
- Add the sentence index to the result list that many times.
- If no sentence matches, return [-1].
- Return a 2D list containing results for each query.
Step 9: Performance considerations
- Using dictionaries and sets allows for efficient lookups.
- The preprocessing step takes O(total words across all sentences).
- Query processing depends on the number of query words and how many sentences contain those words.
- Intersection and minimum count computations are efficient with proper data structures.
- The approach avoids scanning all sentences for every query and leverages precomputed data.
By carefully organizing the data and leveraging word counts per sentence, you can efficiently determine which sentences satisfy each query’s conditions, even accounting for multiple occurrences of query words within sentences. This approach balances time complexity and memory use, making it feasible for large inputs.
```