function ResearchAloeVera() {
    return <section className="container section">
        <h2 className="is-size-3 mb-5">Motivation</h2>
        <div className="columns is-multiline mb-4 display-linebreak">
            <p align="justify">
            The aim of this project is to create a Vision Language Model (VLM) for the medical domain, capable of providing
            high-quality textual answers to medical questions. This includes the ability to process medical images and localization 
            bounding boxes as input and outputs. The model aims to be proficient in a wide variety of languages, open, free-to-use, 
            reliable, and aligned with fundamental ethical principles. 
            <br /><br />
            The medical domain is one where AI is highly useful and critical. 
            Through foundational and multimodal models, AI can provide solutions capable of processing a wide variety of inputs 
            (radiology (CT-scans, MRIs), visible light photography (dermatology, endoscopy), microscopy, and others), while offering 
            users an intuitive means of interaction through conversational Large Language Models (LLMs). The introduction of localization 
            skills for both inputs and outputs increases the number of tasks it can be applied to, as well as facilitating the 
            interpretability and reliability of its results. Among similar open options within the family of VLMs, LLaVA-Med and
            Med-Flamingo lack the localization component, which is key in the medical domain. Med-PaLM M seems to be the closest model, 
            but unfortunately, it's entirely private. An open and accessible medical VLM could have a significant
            impact on decision support for healthcare, particularly in places where access to clinical experts is limited.
             </p>
        </div>
        <h2 className="is-size-3 mb-5">Aloe and Aloe Vera</h2>
        <div className="columns is-multiline mb-4 display-linebreak">
            <p align="justify">
            In a coordinated effort to push for more accurate open source medical models, we introduce Aloe and Aloe Vera:


            <ul>
                <li><strong>Aloe</strong>, our dedicated <strong>Medical Language Model (MLM)</strong>, excels at processing textual information, providing insightful answers to complex medical questions.</li>
                <li><strong>Aloe Vera</strong>, the <strong>Medical - Vision Language Model (Med - VLM)</strong>, is a multimodal model that delves deeper by incorporating medical images, bounding box localization and improved comprehension, using Aloe as its core MLM.</li>
            </ul>

            <br /><br />

            <figure>
                <embed src="/images/work/aloevera_poster.png" class="responsive-image"/>
            </figure>

            </p>
        </div>
        <h2 className="is-size-3 mb-5">Architecture</h2>
        <div className="columns is-multiline mb-4 display-linebreak">
            <p align="justify">
            Following the standard VLM architecture based on LLaVa, we consider two main components:
            <ul>
                <li>The image encoder extracts visual representations from images, which are then aligned with textual representations, enabling multimodality</li>
                <li>On the other hand, the language model handles both the textual information and the visual representations. Considering that text is the binding modality,
                the LLM plays a very significant role.</li>
            </ul>

            <br /><br />
            <figure>
                <center>
                    <embed src="/images/work/LLaVA-network-architecture.png" class="responsive-image"/>
                    <figcaption>Liu, Haotian, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. "Visual instruction tuning." arXiv preprint arXiv:2304.08485 (2023).</figcaption>
                </center>
            </figure>
            <br /><br />
            Several recent multimodal models showcase the capability of localization, by outputting bounding boxes or segmentation 
            masks alongside the textual response. Despite promising results, none of these models are applicable out of the box for the medical 
            domain. We attempt to solve this problem by training the model with text that already contains bounding boxes.  
            Our model architecture is highly inspired by the llava model with some particular changes.
            </p>
        </div>
        <h2 className="is-size-3 mb-5">Data</h2>
        <div className="columns is-multiline mb-4 display-linebreak">
            <p align="justify">
            One of our main priorities has been gathering a significant amount of reliable data. We achieve
            this by searching for publicly accessible medical datasets and utilizing a combination of manual and automated processing
            techniques (cleaning, formatting, templating, etc.) to build our final datasets, ensuring our collected data is of the utmost quality.
            <br /><br />
            In our effort to increase the potential of medical VLMs, we have curated specialized datasets both for Aloe and Aloe Vera. The former one includes text in the context of general purpose question-answering, medical question-answering and DPO; the latter includes bounding
            boxes, images and text in the context of visual question answering. We believe these datasets, once released, will contribute to refining and providing more accurate results in
            the medical context.
            </p>
        </div>
        <h2 className="is-size-3 mb-5">Evaluation</h2>
        <div className="columns is-multiline mb-4 display-linebreak">
            <p align="justify">
            Regarding evaluation, we have designed a comprehensive strategy that encompasses several medical-specific benchmarks, a general
            multimodal benchmark, and many other benchmarks that conform an LLM evaluation strategy for bias, toxicity, fairness and other essential safety issues.
            This provides a holistic view of the overall performance and potential harm of the model.
            <br /><br />
            Specifically for the medical evaluation, we use a range of text-based and multimodal data with the aim of assessing the following tasks:
            Question Answering, Visual Question Answering, Report Summarization, Report Generation and Image Classification. Following 
            the existing literature, we employ several quantitative metrics to evaluate the responses to open and closed-ended
            questions.
            <br /><br />
            In addition, we are carrying out a thorough <strong>human evaluation</strong> consisting in collecting pair-wise comparison of our model's answers with others' across different axis, as well as medical assesment of generated outputs.
            This will involve collaboration with many doctors and medical experts from different hospitals and health institutions from Catalonia. We also aim to do a final lay-person evaluation to assess the quality of the answers generated by our model.
            </p>
        </div>
        <h2 className="is-size-3 mb-5">Alignment</h2>
        <div className="columns is-multiline mb-4 display-linebreak">
            <p align="justify">
            We are committed to ensuring that our advancements are ethical and responsible. It's not just about innovation for us; 
            it's about making AI a positive force in healthcare. We are aware of the growing array of
            responsibilities involved, ranging from the cost and footprint associated with the training of large models to copyright conflicts
            within training data and the critical risks associated with a model applied to clinical data.
            <br /><br />
            Concerning the proposed model(s), we have already compiled an extensive list of datasets with friendly licenses, avoiding
            those with privacy issues. We meticulously curate datasets, address potential biases and set mechanisms in place to interact with the local Data Protection Officer (DPO) in compliance with GDPR.
            Furthermore, we acknowledge that for any multimodal medical model to be useful, extensive and careful work on alignment must
            be carried out. We intend to make this one of the pillars of our project, conducting extensive reinforcement learning with 
            AI feedback (RLAIF) and policy-driven alignment.
            <br /><br />
            We're also working on multimodal alignment with specialized DPO techniques and proper red-teaming.
            </p>

        </div>
        <h2 className="is-size-3 mb-5">Preliminary Results</h2>
        <div className="columns is-multiline mb-4 display-linebreak">
            <p align="justify">
            We compared Aloe v0.1, our latest version of the Medical LLM, with other popular general and medical LLMs. Our model achieves 
            a competitive performance and outperforms existing 7B open models. These results are from the early alpha version and we expect it to change in the future.
            We're looking forward to sharing more details about our model in the upcoming weeks once its gone through human evaluation and red-teaming.
            <br /><br />
            <center>
                <embed src="/images/work/aloe_results.png"  class="responsive-image"/>
            </center>
            </p>
        </div>

        <div class="box-container">
            <a href="https://aloe.atalaya.at/" class="box-demo"><strong>VLM Demo</strong></a>
        </div>
        
        
    </section>;
}

export default ResearchAloeVera;
