
2 phase fine tuning
Here’s another version of the fine tuned Phi-2 on the NextStep 3.3 Network System Admin manual data. The idea I’ve tested here is to do a 2 phased fine tuning.
1) At the first phase I’ve “injected” new knowledge to the model bu showing it the full manual’s content in chunks. LoRA config r = 32, alpha = 64, 50 epoch.
The training shows overfitting but I wanted to have good memorization so this time used the last checkpoint. I can still try all checkpoint and test their performance on memorization.

2) Second phase I’ve used Question – Answer pairs that were generated by a local LLM (Orca2 13B) based on 2% NextStep 3.3 Network System Admin documents. LoRA config r = 32, alpha = 64, 10 epoch.
The lower number epochs prevented overfitting. This eval curve looks nice.

Additionally I’ve improved on the training data generation. To minimize splitting of the context I’ve used nltk to split the input into sentences, and I’ve applied chunking on the sentences.
https://www.nltk.org/api/nltk.tokenize.sent_tokenize.html#nltk-tokenize-sent-tokenize
sentences = sent_tokenize(text)
This has been used for both for chunking the documents for training and to chunking the documents to feed them into the local LLM for Question-Answer generation.
Evaluation results
My evaluation based on Perplexity. Perplexity is calculated as exponent of the loss obtained from the model. I’ve measured this on the local LLM generated Question-Answer pairs test set which is based on 1% NextStep 3.3 Network System Admin documents (1652 data points). The test data has been generated with a different model (LLaMA 2 7B) to ensure diverse set of test data that is different from the training set.
Base model
- microsoft/phi-2
- Perplexity: 13.86
First fine tuned model
- DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2
- base model: microsoft/phi-2
- LoRA config: r = 128, alpha = 128
- Trained on LLM generated Question – Answer pairs based on NextStep 3.3 NetworkSystemAdmin (35439 data points)
- Preplexity: 11.25
V2 fine tune phase1
- base model microsoft/phi-2
- LoRA config: r = 32, alpha = 64
- Trained on chunked content of NextStep 3.3 Network System Admin manual (116482 data points)
- Preplexity: 38.54
V2 fine tune phase2
- base model microsoft/phi-2
- LoRA config: r = 32, alpha = 64
- Trained on LLM generated Question – Answer pairs based on 2% NextStep 3.3 Network System Admin (5391 data points)
- Preplexity: 5.88
There are still issues
The local model struggle to follow instructions regarding the generated Question-Answer format. Because of this the the regular expressions I’ve used to parse the Q-A pairs from the model response might fail, so the training data contain answers that contain further Q-A pairs. By spot-checking and trying the tuned model I’ve saw examples of this behavior.
When the document contain examples, like example output of a command, the model treat that data as fact, that results less useful data trained with the model. Example:
{“input”: “How many NetInfo server processes are running on the machine you are currently using?”, “output”: “According to the output from the ps command, there are two NetInfo server processes running on the machine I am currently using, one for the local domain and one for the network domain. The process IDs are 63 and 65 respectively.”, “context”: “NeXTStep 3.3 nsa 03_SetUpNet”, “data”: “NetInfo Server Processes\nEach NetInfo database needs a process to serve it; that is, a program running all the time that can get information from the database to provide to other programs that need it.For example, the login process needs to check the user name and password against information in the database.The login program contacts the database server, which returns the necessary information.Each NetInfo database must be accessed through the netinfod server process, which runs constantly.1.Open a terminal window on a machine running a network-wide NetInfo server process and enter the following command:\nps -ax\nThe ps command displays process information.For more information about ps, see the UNIX manual page and Chapter 14, \”General Troubleshooting.\”2.Find lines in the output with netinfod in them, like these:\n63 ?S 2:02 /usr/etc/netinfod local\n65 ?S 16:00 /usr/etc/netinfod network\nThese are the two NetInfo server processes, one for the local domain and the other for the network domain.”, “chunk_id”: 1513, “file”: “/home/csaba/Downloads/next_dataset_prefiltered/html/NeXTStep/3.3/nsa/03_SetUpNet.htmld/index.html”}
Model is publish on Huggingface
https://huggingface.co/DevQuasar/vintage-nextstep_os_systemadmin-ft-phi2_v2
Next steps
- Fix data generation issues, and build in some self supervised aspect to ensure better data quality.
- Automate checkpoint selection based on evaluation results.
- I’m still using DDP, and half precision model. There are options to test training with QLoRA and FullySharded so I can use bigger context length in training without hitting the memory limits on my local GPUs.