Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace model outputs to a binary file #477

Closed
wants to merge 1 commit into from
Closed

Trace model outputs to a binary file #477

wants to merge 1 commit into from

Conversation

Piezoid
Copy link
Contributor

@Piezoid Piezoid commented Mar 24, 2023

This adds a --trace option that exports the decoder activations to a file. The format can be read with:

python -m examples.traceparser trace.bin

It's a basic app that comes with a parser, designed to help building analysis tools and tests. For now, it only replicates the soft max and the top k and nucleus filtering.

The format is versioned and should be easily extendable with more data (eg. embeddings, insertions in caches).

I'm using it for the same purpose as #246 (which is retired now) - to perform numerical analysis in Python for exploring #331.

Is this approach acceptable and useful to others?
This might become redundant once Python bindings allows to do the same thing in process memory, without going through a serialized format.

main.cpp Outdated Show resolved Hide resolved
utils.cpp Outdated Show resolved Hide resolved
@thement
Copy link
Collaborator

thement commented Mar 25, 2023

I did run test dump and then parsed it with the analyzer script and it works.
I would maybe think about making a separate directory for all the analysis-related scripts and tools (maybe make a tools directory?).

@anzz1
Copy link
Contributor

anzz1 commented Mar 26, 2023

I did run test dump and then parsed it with the analyzer script and it works. I would maybe think about making a separate directory for all the analysis-related scripts and tools (maybe make a tools directory?).

I think this is a great idea. Make it as a "trace.cpp" tool like there are "quantize.cpp" and "perplexity.cpp" already.

@Piezoid
Copy link
Contributor Author

Piezoid commented Mar 27, 2023

@anzz1 Thank you for expressing your interest.

I'm replying to #331 (comment) here:

I'm also rooting for you finishing the trace tool at some point.

Yes I plan to finalize a v1 format once the specifications are better defined. Right now it's pretty basic, there is a single record type, for a specific application.

I try to keep it mind that this should be kept as simple as possible, while being an helpful debugging tool. I see two side for this: the --trace option in the main example that could perform few basic data extractions task, and the trace_* intended for debugging numerical code in an ad hoc fashion.

I have a lot ideas for improving the format, but have not yet weighted them properly. The activations could be encoded in f16, or made sparse by removing the tail. In any case it depends a lot on the downstream task.

To ensure expandability we could use a generic object format like BSON. A simple serializer can be made from few functions, without requiring tagged unions or dynamic types. This might also be a candidate for the ggml model header, but it is not as easy to write a deserializer in a lean way.

I see that it could be highly valuable. Especially for larger scale testing and graphing, but even for smaller cases like debugging and general interest it's a cool feature to have to be able to see how and why the decisions were made and which tokens are what, etc.

The main value is indeed post-mortem analysis. However, Transformers' outputs are easily reproduced by feeding back the prompt and inferred text. Essentially, it's largely about the ease of use and the amount of CPU time waisted for a replay. Also, I have yet to see tools that bring llama activations and weights in a repl for numerical analysis (but python binding may eventually get there too).

I had the idea some time ago to have a command line option to output that stuff to console, maybe redirect stderr but tbh your idea is a lot better.

I did something very similar in #246. You can already get a long way by un-comment the debug code, but some of it broke during refactoring.

  1. my CPU is too low powered to do any proper quantitative analysis like using perplexity tool.

I have the same issue, my Westmere era machine lacks AVX, so I have to connect to a friend's WSL2 system.

  1. To be perfectly honest I dont really know wtf i'm doing half the time

Same here 😄 I feel like I'm shooting in the dark with such a complex and sensitive to noise system. I'm sure that C gurus are doing fine with printf and awk. Honestly, I'm just not feeling very confident in what I'm doing without better access to the data.

@Piezoid Piezoid closed this May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants