miriad/miriad-4.4M
Viewer • Updated • 4.49M • 434 • 35
How to use yasserrmd/oncology-gemma-300m-emb with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("yasserrmd/oncology-gemma-300m-emb")
sentences = [
"What are the criteria for evaluating the therapeutic efficacy of tumor treatment?\n",
"Detecting the sentinel node in gastric cancer is challenging due to the complex lymphatic drainage of the stomach. The lymphatic network in the stomach is considerably more complex than that of ectodermal organs like breast and skin, making it difficult to identify the sentinel node accurately. This complexity is attributed to the complex embryological development of the stomach.",
"The criteria for evaluating the therapeutic efficacy of tumor treatment include measuring and calculating the sum of the longest diameter of all target lesions and comparing it with the baseline sum of longest diameters. The objective tumor evaluation criteria include complete remission, partial remission, progressive disease, and stable disease.",
"Low levels of GAS7C mRNA expression have been frequently detected in lung cancer samples, particularly in stage IV and metastatic patients. This suggests an association between low GAS7C expression and cancer progression. Additionally, low GAS7C expression has been correlated with poorer survival in late-stage lung cancer patients from Asian and Caucasian populations. These findings indicate that GAS7C may serve as a prognostic biomarker in lung cancer patients with metastasis. Furthermore, it is possible that GAS7C may act as a metastasis suppressor in other types of cancer."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(4): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("yasserrmd/oncology-gemma-300m-emb")
# Run inference
queries = [
"What are the current standard treatments for glioblastoma multiforme (GBM) and why is recurrence almost unavoidable?\n",
]
documents = [
'The current standard treatment for GBM includes surgery, radiotherapy, and chemotherapy. However, complete surgical resection is not possible, and GBM is resistant to chemotherapy, including the commonly used drug temozolomide (TMZ). This resistance and the inability to completely remove the tumor during surgery contribute to the high recurrence rate of GBM.',
'The overexpression of GALNT2 in oral squamous cell carcinoma (OSCC) cells can promote their invasive potential. GALNT2 modifies the O-glycosylation of proteins and increases the activity of epidermal growth factor receptor (EGFR), which plays a crucial role in the invasive behavior of OSCC cells. This suggests that GALNT2 may be involved in the occurrence and development of OSCC.',
'The main mechanisms responsible for oncogene-mediated drug resistance in ovarian cancer include deregulation of apoptosis, altered phosphorylation (intracellular signaling), and metabolic pathways. Activation of the PI3K/AKT cell survival pathway, as well as deregulation of growth factor receptors mediated by NF-kB and STAT3, plays a pivotal role in drug resistance. Additionally, alterations in DNA damage and repair mechanisms, impaired apoptotic machinery, and epithelial-to-mesenchymal transition (EMT) have been implicated in drug resistance. Wnt signaling, particularly the β-catenin-independent pathway via Wnt5a/ROR1/ROR2, is also involved in EMT and chemoresistance. Targeting these pathways may offer potential means to overcome drug resistance in ovarian cancer.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.7010, 0.0508, -0.0444]])
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
Is there a way to prevent PTLD in high-risk patients? |
Currently, there is no convincing data for the prophylaxis of PTLD. However, the case mentioned suggests that early use of rituximab after HSCT (Hematopoietic Stem Cell Transplantation) could be a good way to prevent PTLD in high-risk patients, especially those who are serum EBV (Epstein-Barr Virus) positive. Early recognition of PTLD, early lymph node biopsy, and early diagnosis are key factors in the successful treatment of PTLD. |
How does the 34-gene 'CTC profile' contribute to the prognostic power of breast cancer patients? |
The 34-gene 'CTC profile' has been found to be predictive of CTC status in breast cancer patients. It demonstrated a classification accuracy of 82% in the training cohort and 67% in an independent microarray dataset. Furthermore, it has been shown to be prognostic in both independent datasets, with a hazard ratio (HR) of 10 in the first validation dataset and a HR of 3.2 in the second validation dataset. Importantly, multivariate analysis confirmed that the CTC profile provided prognostic information independent of other clinical variables in both patient cohorts. |
How are beauty care services for cancer patients organized and provided? |
Beauty care services for cancer patients are not standardized or evaluated and vary from one establishment to another. In the case of the IGR, consultations on image advice and socio-aesthetics are provided by a socio-aesthetician who has been trained as a personal image advisor. These consultations are offered to women with breast cancer or young adults and adolescents with cancer who are referred by medical units. The consultations take place in a dedicated area with three rooms: an office, make-up parlor, and beauty care salon. Patients are usually seen multiple times during their treatment period. The socio-aesthetician is paid by the hospital and is part of the Onco-hematology Interdisciplinary Supportive Care Directorate. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
per_device_train_batch_size: 4per_device_eval_batch_size: 4num_train_epochs: 1multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.1 | 500 | 0.0144 |
| 0.2 | 1000 | 0.0293 |
| 0.3 | 1500 | 0.0128 |
| 0.4 | 2000 | 0.0153 |
| 0.5 | 2500 | 0.0182 |
| 0.6 | 3000 | 0.008 |
| 0.7 | 3500 | 0.0098 |
| 0.8 | 4000 | 0.0044 |
| 0.9 | 4500 | 0.0024 |
| 1.0 | 5000 | 0.0019 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
google/embeddinggemma-300m