How to make this glossary usable? Initiator des Themas: Hans Lenting
| Hans Lenting Niederlande Mitglied (2006) Deutsch > Niederländisch
There is this glossary on the internet:
It has a silly, "space-saving" paper dictionary layout that cannot be used for automatic searching in CAT tools.
How to convert this glossary to this layout?
One could use a spreadsheet, but that will still require a lot of manual editing. | | | Tony M Frankreich Local time: 11:29 Mitglied Französisch > Englisch + ... SITE LOCALIZER Start by using "search and replace" | May 31, 2024 |
I would start by trying to identify what the actual delimiters used are, and then use S&R to replace them with e.g. TAB; then you can format into a table, if necessary, and then easily delete superfluous columns.
This is more or less what I do a lot of the time to convert miscellaneous customer glossaries into CAT-compatible ones. | | | Philippe Locquet Portugal Local time: 10:29 Mitglied (2013) Englisch > Französisch + ...
Hans Lenting wrote:
There is this glossary on the internet:
It has a silly, "space-saving" paper dictionary layout that cannot be used for automatic searching in CAT tools.
How to convert this glossary to this layout?
One could use a spreadsheet, but that will still require a lot of manual editing.
If that's not confidential, I would use an LLM, with the correct prompt, it would do short work of this. I've had similar tasks in the past and it worked for me. | | | Stepan Konev Russische Föderation Local time: 13:29 Englisch > Russisch
If all the terms are consistently separated with [space hyphen space], you can use two replacements with wildcards checked:
**Before running the replacements, select all in MS Word, press Ctrl+E, Ctrl+L. This would remove all extra spaces at the beginning of each line.**
Then replace as follows:
1) Find what:
-[^32]{2;}
Replace with: [nothing, leave this field blank]
This would remove the [hyphen space]... See more If all the terms are consistently separated with [space hyphen space], you can use two replacements with wildcards checked:
**Before running the replacements, select all in MS Word, press Ctrl+E, Ctrl+L. This would remove all extra spaces at the beginning of each line.**
Then replace as follows:
1) Find what:
-[^32]{2;}
Replace with: [nothing, leave this field blank]
This would remove the [hyphen space] combination at the beginning of each line.
2) Find what: [space hyphen space]
Replace with: [tab]
===
Now select all and press one button after the other: Alt, C, 4, G
This would convert text to table¹. Click OK.
That's it.
¹Probably the combination of buttons to convert text to table is different in non-QWERTY keyboards. If this is your case, just use the MS Word menu (Insert - Table - Convert to table). ▲ Collapse | |
|
|
Tony M Frankreich Local time: 11:29 Mitglied Französisch > Englisch + ... SITE LOCALIZER The real problem here... | May 31, 2024 |
...is that the format used replaces the 'headword' in each entry with symbols, and in some cases, a succession of words.
This seems to be Asker's principal headache — and I for one can't think of a solution without, as they point out, major manual editing. | | | Stepan Konev Russische Föderation Local time: 13:29 Englisch > Russisch Glossary in table format is available too | May 31, 2024 |
I̶ s̶u̶p̶p̶o̶s̶e̶ t̶h̶e̶ g̶l̶o̶s̶s̶a̶r̶y̶ i̶s̶ n̶o̶t̶ c̶o̶n̶f̶i̶d̶e̶n̶t̶i̶a̶l̶ b̶e̶c̶a̶u̶s̶e̶ i̶t̶ i̶s̶ a̶v̶a̶i̶l̶a̶b̶l̶e̶ o̶n̶ t̶h̶e̶ i̶n̶t̶e̶r̶n̶e̶t̶, r̶i̶g̶h̶t̶? C̶a̶n̶ y̶o̶u̶ s̶h̶a̶r̶e̶ t̶h̶e̶ f̶i̶l̶e̶ o̶r̶ l̶i̶n̶k̶ t̶o̶ t̶h̶a̶t̶ g̶l̶o̶s̶s̶a̶r̶y̶? I̶ c̶a̶n̶'t̶ s̶e̶e̶ a̶n̶y̶ p̶r̶o̶b̶l̶e̶m̶ h̶e̶r̶e̶ b̶a̶s̶e̶d̶ o̶n̶ y̶o̶u̶r̶ s̶c̶r̶e̶e̶n̶s̶h̶o̶... See more I̶ s̶u̶p̶p̶o̶s̶e̶ t̶h̶e̶ g̶l̶o̶s̶s̶a̶r̶y̶ i̶s̶ n̶o̶t̶ c̶o̶n̶f̶i̶d̶e̶n̶t̶i̶a̶l̶ b̶e̶c̶a̶u̶s̶e̶ i̶t̶ i̶s̶ a̶v̶a̶i̶l̶a̶b̶l̶e̶ o̶n̶ t̶h̶e̶ i̶n̶t̶e̶r̶n̶e̶t̶, r̶i̶g̶h̶t̶? C̶a̶n̶ y̶o̶u̶ s̶h̶a̶r̶e̶ t̶h̶e̶ f̶i̶l̶e̶ o̶r̶ l̶i̶n̶k̶ t̶o̶ t̶h̶a̶t̶ g̶l̶o̶s̶s̶a̶r̶y̶? I̶ c̶a̶n̶'t̶ s̶e̶e̶ a̶n̶y̶ p̶r̶o̶b̶l̶e̶m̶ h̶e̶r̶e̶ b̶a̶s̶e̶d̶ o̶n̶ y̶o̶u̶r̶ s̶c̶r̶e̶e̶n̶s̶h̶o̶t̶.
Update: ah, ok, I see now. The "-" char stands for the parent entry...
Ok then just use this link
https://www.gerritspeek.nl/auto/autowoordenboek/autowoordenboek-l.html
[Edited at 2024-05-31 21:00 GMT] ▲ Collapse | | | Hans Lenting Niederlande Mitglied (2006) Deutsch > Niederländisch THEMENSTARTER | VB.NET-inspired pseudocode - untested! | Jun 1, 2024 |
I've been faced with many similar tasks over the years, and if the volume of data justifies the time spent on writing a bit of program code, then I prefer do do that and avoid error-prone and time-consuming manual fiddling with word tables, spreadsheets, etc.
We all have our 'pet' programming languages, and mine happens to be VB.NET - but I don't doubt that Hans and others will have no problem following the logic.
************
DIM an empty DataTable 'DT' with 10 columns ... See more I've been faced with many similar tasks over the years, and if the volume of data justifies the time spent on writing a bit of program code, then I prefer do do that and avoid error-prone and time-consuming manual fiddling with word tables, spreadsheets, etc.
We all have our 'pet' programming languages, and mine happens to be VB.NET - but I don't doubt that Hans and others will have no problem following the logic.
************
DIM an empty DataTable 'DT' with 10 columns (0 - 9) 'column count >= max. number of header (sub)levels
OPEN plain text file for read-only
WHILE NOT EOF
strLine = READLINE from file 'read one line at a time
Replace all 'space hyphen space' in strLine with '|¿' 'pipe avoids confusion with wanted hyphens, and '¿' flags the need to substitute words from the previous line
SPLIT strLine on '|' --> arrWords() 'variable-length string array
DIM an empty DataRow 'DR' with 10 columns (0 - 9)
FOR i = 0 to Ubound(arrWords) - 2 'copy all except last item from arrWords() into DR
DR(i)=TRIM(arrWords(i)) 'trim leading/trailing spaces
NEXT
DR(9) = TRIM(arrWords(Ubound-1)) 'copy NL term to last column
Add DR to DT
END WHILE
FOR EACH Row IN DT
FOR EACH Column IN Row
IF thisRow/Column = '¿' then
REPLACE thisRow/Column with PreviousRow/SameColumn
ELSE
REPLACE '¿' in thisRow/Column with nothing
END IF
NEXT
NEXT
Each row in the table should now contain one or more EN words, zero or more empty columns, and the NL term is in the last column.
The required output format can be built in various ways, depending on the final destination, knowing that in each Row:
English term = TRIM(JOIN Row/Columns(0-8) with 'space' separator)
Dutch term = Row/Column(9)
**********************
HTH
JL
[Edited at 2024-06-01 16:03 GMT]
[Edited at 2024-06-01 20:22 GMT] ▲ Collapse | |
|
|
Hans Lenting Niederlande Mitglied (2006) Deutsch > Niederländisch THEMENSTARTER
Thank you all for your input. The case is solved. A kind person wrote a JavaScript. | | | Hans Lenting Niederlande Mitglied (2006) Deutsch > Niederländisch THEMENSTARTER Community Project | Jun 3, 2024 |
See here. | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to make this glossary usable? Pastey | Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |