(modal) fme:/mnt/c/temp/modal$ modal run openllama.py
? Initialized. View app at https://modal.com/apps/ap-9...
? Created objects.
+-- ?? Created download_models.
+-- ?? Created mount /mnt/c/temp/modal/openllama.py
+-- ?? Created OpenLlamaModel.generate.
+-- ?? Created mount /mnt/c/temp/modal/openllama.py
Downloading shards: 0%| | 0/2 [00:00<?, ?it/s]Downloading shards: 100%|¦¦¦¦¦¦¦¦¦¦| 2/2 [00:00<00:00, 1733.54it/s]
Loading checkpoint shards: 100%|¦¦¦¦¦¦¦¦¦¦| 2/2 [00:12<00:00, 5.70s/it]Loading checkpoint shards: 100%|¦¦¦¦¦¦¦¦¦¦| 2/2 [00:12<00:00, 6.23s/it]
Building a website can be done in 10 simple steps:
1. Choose a domain name. 2. Choose a web hosting service. 3. Choose a web hosting package. 4. Choose a web hosting plan. 5. Choose a web hosting package. 6. Choose a web hosting plan. 7. Choose a web hosting package. 8. Choose a web hosting plan. 9. Choose a web hosting package. 10. Choose a web hosting plan. 11. Choose a web hosting package. 12. Choose a web hosting package. 13. Choose a web hosting package. 14. Choose a web hosting
? App completed.2-3c per run seems very high. That's probably just the cost if you have to spin up a new container. You can shorten the idle timeout on a container if its going to just serve one request typically. If it's going to serve more requests, then the startup and idle shutdown cost is amortized over more requests :)