fix: improve ollama workflow from CI #53

baxen · 2024-09-24T04:45:25Z

It'd be great to run this in CI, however the smaller model sizes end up very flakey on consistently using tools. But using larger model sizes makes this take a long time. I think we can revisit this when we get a more consistent small tool use model

codefromthecrypt · 2024-09-24T05:13:02Z

I'm not sure the flake is something we want to skip, though. Both failures weren't due to the LLM, but something causing it to look for the openai model, not the model that's passed. This seems like something to investigate right?

{"error":{"message":"model \"gpt-4o-mini\" not found, try pulling it first","type":"api_error","param":null,"code":null}}

baxen · 2024-09-24T05:19:32Z

I'm not sure the flake is something we want to skip, though. Both failures weren't due to the LLM, but something causing it to look for the openai model, not the model that's passed. This seems like something to investigate right?
{"error":{"message":"model \"gpt-4o-mini\" not found, try pulling it first","type":"api_error","param":null,"code":null}}

Yup you're totally right - that's where i went first too. The changes to test_integration.py here fix this problem, which was showing up in some PRs. Now that it's fixed though, i'm finding locally when running small models that the test_tools function fails pretty often:

AssertionError: assert 'gandalf' in 'the most famous wizard in the lord of the rings canon is aragorn breezy, the son of king elrond and grandson of legol...n he assists merry and pippin in obtaining elwater, the treasured horse that saved aragorn from the hands of the orcs.'

AssertionError: assert 'hello exchange' in 'sure! please provide me with the filename.'

We could consider trying to pass through a seed and temperature to get more consistency here though!

codefromthecrypt · 2024-09-24T05:24:01Z

so do we feel deleting the tests is better than using a larger model? like qwen2.5 or just not override it and use mistral-nemo? larger ones take 4-5m total, but otoh it is more telling about the integration story.

Reason I ask is that all models can hallucinate or not give a precise response, just the likelihood is less. I had chatted with @michaelneale about this thing and if it isn't better to use a higher level retry concept than http 5xx, etc. and then handle the unexpected results. Deleting the test doesn't change this is what I mean, and we could at least learn if the default model is problematic routinely

michaelneale · 2024-09-24T05:42:47Z

I am ok with either temporarily disabling, or making it the larger one (if it is worth it) - I guess @baxen may be interested, is it worth 4m concurrent test run for larger LLM to test things a bit more end to end?

codefromthecrypt · 2024-09-24T06:05:23Z

the other food for thought is that if/when we have an integration test in goose, possibly there's less pressure here. Just it will still be that 5m overhead to run that. Personally, I have projects whose integration tests take a lot longer than 5m, but otoh that doesn't make this part acceptable.

michaelneale · 2024-09-24T06:38:57Z

can also do integration tests with a GH secret and api call, but if it can work in similar time with local is "nice"?

codefromthecrypt · 2024-09-24T07:33:35Z

can also do integration tests with a GH secret and api call, but if it can work in similar time with local is "nice"?

only from protected branches though right (not forks as that would let folks steal the secrets)? I guess since most changes are from the same team, testing public providers on these type of branches is better than not testing them.

That said, I think we can explore that independent of local tests which eventually need to be solved even if it isn't prioritized vs I guess openai.

michaelneale · 2024-09-24T09:00:43Z

yeah ideally wouldn't - GH redacts secrets etc but no way to be 100% sure with forks (and have to bless runs anyway)

michaelneale · 2024-09-24T22:34:10Z

is there a way to run this without it holding back the checks?
The larger model does make the check slower, but exchange doesn't change that much does it? I think may still be worthwhile.

codefromthecrypt

👍 to removing this as seems more are interested in this, and it conserves time for other things that we need to look at. Plus it prevents some changes I have from being blocked on a decision here. We can revisit this later and meanwhile ad-hoc test.

codefromthecrypt · 2024-09-24T23:41:39Z

#54 swaps back to the default model which we can use until/if we merge deleting the workflow

baxen · 2024-09-25T00:20:51Z

@codefromthecrypt @michaelneale take a look at the updates? I think - let's confirm - that setting the seed like this might fix the issue while still letting us use the small and fast model

michaelneale · 2024-09-25T00:21:39Z

I think problem was more that summarizer would default to gpt4o-mini - was there also some additional flake? if so, yeah this looks good as well

michaelneale

yep I think good

codefromthecrypt

coolio!

tests/test_integration.py

Co-authored-by: Adrian Cole <64215+codefromthecrypt@users.noreply.github.com>

michaelneale · 2024-09-25T00:37:39Z

yes, this works well: #59 I tested it on top of @lukealvoeiro's changes and seems good - I think we can move ahead with this.

codefromthecrypt · 2024-09-25T03:11:51Z

thanks @baxen!

fix: Remove ollama workflow from CI

1bebb70

It'd be great to run this in CI, however the smaller model sizes end up very flakey on consistently using tools. But using larger model sizes makes this take a long time. I think we can revisit this when we get a more consistent small tool use model

codefromthecrypt mentioned this pull request Sep 24, 2024

feat: add LocalAI support #51

Open

lukealvoeiro approved these changes Sep 24, 2024

View reviewed changes

codefromthecrypt reviewed Sep 24, 2024

View reviewed changes

codefromthecrypt mentioned this pull request Sep 24, 2024

test: reduce code redundancy in openai based tests #54

Merged

baxen force-pushed the baxen/remove-ollama-workflow branch from 593b9cf to 9ee368f Compare September 25, 2024 00:18

fix consistency by setting seed

f5a693b

baxen force-pushed the baxen/remove-ollama-workflow branch from 9ee368f to f5a693b Compare September 25, 2024 00:22

michaelneale approved these changes Sep 25, 2024

View reviewed changes

michaelneale changed the title ~~fix: Remove ollama workflow from CI~~ fix: improve ollama workflow from CI Sep 25, 2024

codefromthecrypt reviewed Sep 25, 2024

View reviewed changes

tests/test_integration.py Show resolved Hide resolved

Update tests/test_integration.py

003b942

Co-authored-by: Adrian Cole <64215+codefromthecrypt@users.noreply.github.com>

michaelneale mentioned this pull request Sep 25, 2024

fix: implement moderator todos #52

Open

baxen merged commit 9feb7ec into main Sep 25, 2024
5 checks passed

michaelneale deleted the baxen/remove-ollama-workflow branch September 25, 2024 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve ollama workflow from CI #53

fix: improve ollama workflow from CI #53

baxen commented Sep 24, 2024

codefromthecrypt commented Sep 24, 2024

baxen commented Sep 24, 2024

codefromthecrypt commented Sep 24, 2024

michaelneale commented Sep 24, 2024

codefromthecrypt commented Sep 24, 2024

michaelneale commented Sep 24, 2024

codefromthecrypt commented Sep 24, 2024

michaelneale commented Sep 24, 2024

michaelneale commented Sep 24, 2024

codefromthecrypt left a comment

codefromthecrypt commented Sep 24, 2024

baxen commented Sep 25, 2024

michaelneale commented Sep 25, 2024

michaelneale left a comment

codefromthecrypt left a comment

michaelneale commented Sep 25, 2024

codefromthecrypt commented Sep 25, 2024

fix: improve ollama workflow from CI #53

fix: improve ollama workflow from CI #53

Conversation

baxen commented Sep 24, 2024

codefromthecrypt commented Sep 24, 2024

baxen commented Sep 24, 2024

codefromthecrypt commented Sep 24, 2024

michaelneale commented Sep 24, 2024

codefromthecrypt commented Sep 24, 2024

michaelneale commented Sep 24, 2024

codefromthecrypt commented Sep 24, 2024

michaelneale commented Sep 24, 2024

michaelneale commented Sep 24, 2024

codefromthecrypt left a comment

Choose a reason for hiding this comment

codefromthecrypt commented Sep 24, 2024

baxen commented Sep 25, 2024

michaelneale commented Sep 25, 2024

michaelneale left a comment

Choose a reason for hiding this comment

codefromthecrypt left a comment

Choose a reason for hiding this comment

michaelneale commented Sep 25, 2024

codefromthecrypt commented Sep 25, 2024