Whiteboard: An Effortless Smart Notetaker on OpenHome

We are thrilled to spotlight this community project from developer Seena Pourzand and introduce Whiteboard, your intelligent voice-based notetaker.

A whiteboard session with collaborators is one of the best ways to get ideas from mind to paper. But sometimes there isn’t always a whiteboard or paper nearby, and the speed of conversation outpaces the ability to write ideas down.

Collaborative note-taking and brainstorming apps like Miro and Apple’s Freeform have exploded in popularity, but they are still rather manual and require users to be behind a screen, which is not the most conducive for true collaboration and limitless creativity.

Demo

About the Project

Whiteboard is your intelligent AI notetaker. It utilizes large language models’ power to summarize complex ideas and draw out its main points, and generative AI’s ability to turn natural language into rich images, to enhance collaborations and to improve your note-taking experience.

About the Developer

Seena studied Computer Science at University of Southern California. He is an active member of the OpenHome community and has been hacking on voice-powered therapist chatbots.

Whiteboard: Design

Google API integration

Python
class WhiteboardCapability(Capability):

    document_id: str = '1hF8UWkIfZ79sSDcL3lcUa86YSLoTzysID54cAoGIQwA'

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.initialize_google_services()

    def initialize_google_services(self):
        """Initializes Google Docs API service."""
        creds = self.get_credentials()
        self.__dict__['docs_service'] = build('docs', 'v1', credentials=creds)

    def get_credentials(self):
        """Handles the OAuth 2.0 flow for Google API credentials."""
        creds = None
        if os.path.exists(TOKEN_FILE):
            with open(TOKEN_FILE, 'rb') as token:
                creds = pickle.load(token)

        if not creds or not creds.valid:
            if creds and creds.expired and creds.refresh_token:
                creds.refresh(Request())

            else:
                flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRETS_FILE, SCOPES)
                creds = flow.run_local_server(port=0)

            with open(TOKEN_FILE, 'wb') as token:
                pickle.dump(creds, token)

        return creds

To begin a Whiteboard session, open a new Google document and get its unique ID. The Whiteboard application will take a unique document ID and auto-populate it with content from the user. The code above shows how to handle Google API credentials.

Prompting: Note-Taking

Python
summarization = "Could you please provide a concise and comprehensive summary of the given text? The summary should capture the main points and key details of the text while conveying the author's intended meaning accurately. Please ensure that the summary is well-organized and easy to read, with clear headings and subheadings to guide the reader through each section. The length of the summary should be appropriate to capture the main points and key details of the text, without including unnecessary information or becoming overly long. Summarize the following: "

text = text_to_text(summarization + msgs)
        print(f"Attempting to append text to document {self.document_id}: '{text}'")
        requests = [
            {
                'insertText': {
                    'location': {
                        'index': 1,
                    },
                    'text': f'{text}\n',
                },
            },
        ]
        response = self.docs_service.documents().batchUpdate(documentId=self.document_id, body={'requests': requests}).execute()

        print("Append text response:", response)

To turn conversation into notes, prompt the LLM asking it to summarize a block of text. The block of text will be a transcription of a conversation, usually of a brainstorming session or a discussion.

Generate Images in your Notes

Python
def insert_image_to_doc(self, image_url):
        """Inserts an image into the Google Doc at the specified URL."""
        requests = [
            {
                "insertInlineImage": {
                    "uri": image_url,
                    "location": {
                        "index": 1,
                    },
                    "objectSize": {
                        "height": {
                            "magnitude": 200,
                            "unit": "PT"
                        },
                        "width": {
                            "magnitude": 200,
                            "unit": "PT"
                        }
                    }
                },
            },
            {
                "updateParagraphStyle": {
                    "range": {
                        "startIndex": 1,
                        "endIndex": 3,  
                    },
                    "paragraphStyle": {
                        "alignment": "CENTER"
                    },
                    "fields": "alignment"
                }
            }
        ]

        response = self.docs_service.documents().batchUpdate(
            documentId=self.document_id, body={"requests": requests}).execute()

        print("Insert image response:", response)

        image_prompt = " The following message is meant to be translated into a concise yet                     descriptive description of whatever being said meant to be fed as a prompt for AI-powered image geneartion. Here is the message: "

        image_descript = text_to_text(image_prompt + msgs)

In a similar fashion, users can dictate an idea for an image – “Can you sketch me a Figma mockup of an iOS app showing the locations of the nearest hackathons” – and put that in the document too.

Future Roadmap

  • Multiple collaborators: One of the best parts of Google docs is the ability to collaborate in real-time. Future versions of Whiteboard will take advantage of this capability and attribute notes to each user.
  • Intelligent prompting for note-taking: The current Whiteboard prototype simply summarizes the user’s conversation. Future versions would integrate pedagogical and note-taking techniques – like Socratic seminar, Cornell notes, active recall– and develop prompting infrastructure to generate more structured notes.

Building with OpenHome

OpenHome is an open-source AI smart speaker. From creating novel personalities to building useful capabilities, our voice operating system enables developers to build epic voice-first experiences.

Join the vibrant open-source community today and explore the future of voice AI. Start building everything from custom personalities, to truly intelligent smart home assistants, to voice enabled apps.

Join our Discord!

Apply for a $10,000 grant today!