Project SayDo brings your favorite features from Google Home, Amazon Alexa, and Siri to OpenHome! Winner of the “Best Capability” Prize, this project enables the community to immediately leverage highly useful features on OpenHome.
Introducing Browser, or Project SayDo, the all-in-one capabilities bundle. Browser enables OpenHome to:
- Look up directions on Google Maps
- Including functionality for choice of transport (walk, public transit, car)
- Call/text someone from your contacts
- Open an app on your phone
- Play/search music on Spotify
- Search movies on Netflix
- Search/play videos on YouTube
- Set a timer
Most people only use Amazon Alexa or Google Home to control the lights or play music. The full suite of capabilities enables users to completely replace their existing home speakers with OpenHome SpeakerOS. These capabilities in Browser represent some of the most practical and most-used smart speaker functions, and marks a huge milestone in the development of a voice-first OS.
Remember how Amazon had 10,000 people on their Alexa team and it still couldn't understand even the most basic queries?
— Ben Buchanan (@01Core_Ben) March 1, 2024
Remember how AWFUL Siri still is?
Remember what a joke "Hey Google" still is?
Had you asked one of the tens of thousands of employees working on these…
Demo
About the Developer
Diego is the founder of Domo.ai, a company building voice assistants for wearables, based out of San Francisco. His background is in convolutional neural networks. Congrats Diego for winning the “Best Capability” prize at the Building Voice Experiences Hackathon!
About the Project
Diego’s goal was to use OpenHome’s SDK to hack together a smart speaker with the same functionality as the existing smart speakers on the market, namely Google Home and Amazon Alexa. During the hackathon, which lasted about 8 hours, he was able to implement several capabilities, including making calls, sending texts, getting directions on Google Maps, and searching for YouTube videos (see his methodology below).
Methodology: Capability 101
Each capability has the following components:
- Definition: Define the keywords that will trigger the capability to be executed.
- Prompting: The program will guide the user through a series of prompts to get the information it needs to execute the function.
- Execution: The program will call the relevant API with the information the user requested.
Below is the code for how to implement the “search song on Spotify” capability:
Definition
class SpotifySearchCapability(Capability):
@classmethod
def register_capability(cls):
return cls(unique_name="spotify_music_search", hotwords=["play some music", "play some vibes"])
Each capability starts with defining a class for that specific capability. Within the class definition, the developer will specify “`hotwords“`, or keywords that OpenHome will listen to in order to trigger the capability.
Prompt Scripting
def call(self, agent):
initial_message = "Tell me the mood or type of music you're interested in."
agent.speak(response=initial_message)
music_request = agent.listen().strip()
if not music_request:
agent.speak(response="I didn't catch that. Could you please repeat?")
return "Failed to get music request."
prompt = "Produce the name of one song, just and only one song, according to what the user wants, and do not say anything else, just the name of the song: \n\n" + music_request
suggestion = text_to_text(music_request)
if not suggestion:
agent.speak(response="I couldn't find any good matches for your request.")
return "No suggestions generated."
The capability will then prompt the user to provide more information about their song request. If the program detects the user is asking for a specific song, it will send a call to the LLM asking it to intelligently interpret the user’s request, and return the name of the song they are asking for.
API Integration
# Construct search query from suggestions. Here we just pick the first suggestion.
search_query = suggestion
search_url = f"https://open.spotify.com/search/{urllib.parse.quote_plus(search_query)}"
try:
# Attempt to open the Spotify search results
webbrowser.open(search_url, new=2)
logging.info(f"Searched Spotify for: {search_query}")
agent.speak(response=f"Trying to play {search_query} on Spotify.")
return f"Searched for {search_query} on Spotify."
except Exception as e:
logging.error(f"Failed to search Spotify for {search_query}: {str(e)}")
agent.speak(response="Failed to search on Spotify. Please try again.")
return "Failed to initiate Spotify search."
Once the song name is identified, it is a simple matter of calling the Spotify API and searching for the song. Browser’s capabilities emphasize opening web browsers to execute the function, so Diego’s implementation opens a new browser searching for the song on Spotify.
Build with OpenHome
OpenHome is an open-source AI smart speaker. From creating novel personalities to building useful capabilities, our voice operating system enables developers to build epic voice-first experiences.
Join the vibrant open-source community today and explore the future of voice AI. Start building everything from custom personalities, to truly intelligent smart home assistants, to voice enabled apps.
Apply for a $10,000 grant today!
- Apply for a $10,000 grant to build an app!
- Join our Discord!
- Check out our Website
- Get access to our Github