the goal and start testing, we need to

Data used to track, manage, and optimize resources.
Post Reply
rifattryo.ut11
Posts: 31
Joined: Mon Dec 23, 2024 6:09 am

the goal and start testing, we need to

Post by rifattryo.ut11 »

Data Cleaning Remove Duplicate Records Check the duplicate lines in the document and delete them to avoid bias in the analysis. Handle Missing Values ​​Identify missing values ​​in the document and decide whether to fill them, delete them, or keep them. Correct Errors and Outliers Identify document entry errors and outliers and correct/delete them to ensure data accuracy. Format Unification Ensure that the content in the document follows a unified format such as date and time format. Text Data Cleaning For text data, it is recommended to remove meaningless filler words such as "ah" and "um" and other punctuation marks or perform stemming and word form restoration.



Textualization removes images from documents and iran phone number format adds content to the documents in the form of text. Word segmentation performs word segmentation on text data to break sentences into words or phrases. Stop word filtering removes common words such as "的", "和", "是" and so on that are not very meaningful for analysis from text data. Bag of words model converts text into a bag of words model, that is, the frequency of occurrence of words in the text. -Calculate the importance of words in the document to evaluate the relevance of words. Model 2 Evaluation The actual evaluation part is relatively simple.



After setting up the process, you can ask your own questions to the big model and then score it. This part mainly talks about the platform we use -y. It is an open source large language model M application development platform that allows developers to create application management models through intuitive interfaces or code methods, upload documents to form a knowledge base, create custom tools, and provide services to the outside world. Developers have a high degree of customization and control over projects. It is suitable for professional developers who seek flexible solutions, and the fees for enterprise use are not high.
Post Reply