Izihloko

Yiziphi isizindalwazi se-vector, ukuthi zisebenza kanjani kanye nezimakethe ezingaba khona

Isizindalwazi se-vector wuhlobo lwesizindalwazi esigcina idatha njengamavektha anobukhulu obuphezulu, okuyizethulo zezibalo zezici noma izibaluli. 

Lawa ma-vector ngokuvamile akhiqizwa ngokusebenzisa uhlobo oluthile lomsebenzi wokushumeka kudatha eluhlaza, njengombhalo, izithombe, umsindo, ividiyo, nokunye.

Vector yolwazi kungaba definite njengethuluzi elikhomba futhi ligcine okushumekiwe kwevekhtha ukuze kutholakale ngokushesha nokusesha okufanayo, okunezici ezifana nokuhlunga imethadatha nokukalwa okuvundlile.

Isikhathi sokufunda esilinganiselwe: 9 imizuzu

Ukukhula Kwenzalo Yabatshalizimali

Emasontweni asanda kwedlula, kube nokukhula kwentshisekelo yabatshalizimali kusizindalwazi se-vector. Kusukela ekuqaleni kuka-2023 sibonile ukuthi:

  • ukuqaliswa kwedatha ye-vector Shintsha Wathola $50 wezigidi ngezimali zeSeries B;
  • I-Pinecone inyuse amaRandi ayizigidi eziyikhulu ngoxhaso lwe-Series B ngenani lamaRandi ayizigidi ezingama-100;
  • I-Chroma , iphrojekthi yomthombo ovulekile, iqoqe amaRandi ayizigidi ezingu-18 ngokushumeka imininingwane yayo;

Ake sibone ngokuningiliziwe ukuthi iyini isizindalwazi se-vector.

AmaVektha njengokumelela idatha

Imininingo egciniwe yamaVekhtha incike kakhulu ekushumekeni kwevekhtha, uhlobo lokumelwa kwedatha oluqukethe ulwazi lwe-semantic olubalulekile ukuze i-AI ithole ukuqonda nokugcina inkumbulo yesikhathi eside ezosetshenziswa lapho kwenziwa imisebenzi eyinkimbinkimbi. 

Ukushumeka kweVector

Ama-Vector embeds afana nemephu, kodwa esikhundleni sokusibonisa ukuthi izinto zikuphi emhlabeni, asibonisa ukuthi izinto zikuphi entweni ebizwa ngokuthi. indawo ye-vector. Isikhala seVector siwuhlobo lwenkundla yokudlala enkulu lapho yonke into inendawo yayo yokudlala. Ake sithi uneqembu lezilwane: ikati, inja, inyoni nenhlanzi. Singakha i-vector eshumekiwe yesithombe ngasinye ngokusinikeza indawo ekhethekile ebaleni lokudlala. Ikati lingase libe ekhoneni elilodwa, inja ngakolunye uhlangothi. Inyoni yayingaba esibhakabhakeni kanti izinhlanzi zingase zibe sedanyini. Le ndawo iyindawo ene-multidimensional. Isilinganiso ngasinye sihambisana nezici ezihlukene zazo, isibonelo, izinhlanzi zinamaphiko, izinyoni zinamaphiko, amakati nezinja zinemilenze. Esinye isici sazo kungase kube ukuthi izinhlanzi ngezamanzi, izinyoni ikakhulukazi ezasezulwini, namakati nezinja phansi. Uma sesinalawa ma-vector, singasebenzisa amasu ezibalo ukuwaqoqa ngokususelwe ekufananeni kwawo. Ngokusekelwe olwazini esinalo,

Ngakho-ke, ukushumeka kwevekhtha kufana nemephu esisiza ukuthi sithole ukufana phakathi kwezinto ezisesikhaleni se-vector. Njengoba nje imephu isisiza ukuthi sizulazule emhlabeni, ukushumeka kwe-vector kusiza ukuzulazula endaweni yokudlala ye-vector.

Umbono oyinhloko ukuthi ukushumeka okufana ngokwemvelo komunye nomunye kunebanga elincane phakathi kwakho. Ukuthola ukuthi zifana kangakanani, singasebenzisa imisebenzi yebanga le-vector njengebanga le-Euclidean, ibanga le-cosine, njll.

Imitapo yolwazi yamaVekhtha iqhathaniswa namalabhulali e-Vector

Imitapo yolwazi ye-vector gcina ukushumeka kwama-vectors kuzinkomba enkumbulweni, ukuze wenze ukusesha okufanayo. Imitapo yolwazi yeVector inezici/imikhawulo elandelayo:

  1. Gcina ama-vector kuphela : Imitapo yolwazi ye-Vector igcina kuphela okushumekiwe kwama-vector hhayi izinto ezihlotshaniswayo ezakhiwe ngazo. Lokhu kusho ukuthi uma sibuza, umtapo wezincwadi we-vector uzophendula ngama-vector afanelekile kanye nama-ID ezinto. Lokhu kuyakhawulela njengoba ulwazi lwangempela lugcinwa entweni hhayi i-id. Ukuxazulula le nkinga, kufanele sigcine izinto endaweni yokugcina yesibili. Singabe sesisebenzisa ama-ID abuyiswe umbuzo futhi siwafanise nezinto ukuze siqonde imiphumela.
  2. Idatha yenkomba ayinakuguqulwa : Izinkomba ezikhiqizwe yilabhulali ye-vector aziguquleki. Lokhu kusho ukuthi uma sesingenise idatha yethu futhi sakha inkomba, asikwazi ukwenza izinguquko (akukho okusha okufakiwe, ukususwa, noma izinguquko). Ukuze senze izinguquko kunkomba yethu, kuzodingeka siyakhe kabusha kusukela ekuqaleni
  3. Buza ngenkathi ukhawulela ukungenisa : Amalabhulali amaningi e-vector awakwazi ukubuzwa ngenkathi ungenisa idatha. Kudingeka singenise zonke izinto zethu zedatha kuqala. Ngakho inkomba idalwe ngemva kokuba izinto zingenisiwe. Lokhu kungaba yinkinga ezinhlelweni zokusebenza ezidinga izigidi noma izigidigidi zezinto okumele zingeniswe.

Kunemitapo yolwazi eminingi yokusesha etholakalayo: I-FAISS ye-Facebook, i-Annoy by Spotify kanye IskenaNN nge-Google. I-FAISS isebenzisa indlela yokuhlanganisa, i-Annoy isebenzisa izihlahla kanti i-ScanNN isebenzisa ukucindezelwa kwe-vector. Kukhona ukuhwebelana kokusebenza kwakho ngakunye, esingakukhetha ngokusekelwe kuhlelo lwethu lokusebenza namamethrikhi okusebenza.

CRUD

Isici esiyinhloko esihlukanisa isizindalwazi se-vector kusuka kumalabhulali e-vector yikhono lokufaka kungobo yomlando, ukuvuselela nokususa idatha. Isizindalwazi seVector sinokusekelwa kwe-CRUD qedela (dala, funda, buyekeza futhi ususe) exazulula imikhawulo yelabhulali ye-vector.

  1. Gcina ama-vectors nezinto : Imininingo egciniwe ingagcina kokubili izinto zedatha namavekhtha. Njengoba kokubili kugcinwa, singahlanganisa ukusesha kwe-vector nezihlungi ezihlelekile. Izihlungi zisivumela ukuthi senze isiqiniseko sokuthi omakhelwane abaseduze bafana nesihlungi semethadatha.
  2. Ukuguquguquka : njengoba isizindalwazi se-vector sisekela ngokugcwele crud, singangeza kalula, sisuse noma sibuyekeze okufakiwe kunkomba yethu ngemva kokuthi isidaliwe. Lokhu kuwusizo ikakhulukazi uma usebenza ngedatha eshintsha njalo.
  3. Ukusesha kwesikhathi sangempela : Ngokungafani namalabhulali e-vector, izizindalwazi zisivumela ukuthi sibuze futhi siguqule idatha yethu phakathi nenqubo yokungenisa. Njengoba silayisha izigidi zezinto, idatha engenisiwe ihlala ifinyeleleka ngokugcwele futhi isebenza, ngakho asikho isidingo sokuthi ulinde ukungenisa kuqedwe ukuze uqale ukusebenza kulokho osekuvele kukhona.

Ngamafuphi, isizindalwazi se-vector sinikeza isixazululo esingcono kakhulu sokuphatha ukushumeka kwe-vector ngokubhekana nemikhawulo yezinkomba ze-vector eziqukethwe ngokwazo njengoba kuxoxwe ngazo emaphuzwini adlule.

Kepha yini eyenza imininingwane egciniwe ye-vector ibe phezulu kunemininingwane egciniwe yendabuko?

Imininingo egciniwe yeVector iqhathaniswa nesizindalwazi esivamile

Imininingo egciniwe evamile iklanyelwe ukugcina futhi ibuyise idatha ehlelekile kusetshenziswa amamodeli ahlobene, okusho ukuthi ithuthukiselwa imibuzo ngokusekelwe kumakholomu nemigqa yedatha. Nakuba kungenzeka ukugcina okushumekiwe kwe-vector kuzingosi zolwazi ezivamile, lezi sizindalwazi azilungiselelwa ukusebenza kwe-vector futhi azikwazi ukwenza ukusesha okufanayo noma eminye imisebenzi eyinkimbinkimbi kumadathasethi amakhulu ngempumelelo.

Lokhu kungenxa yokuthi izizindalwazi ezivamile zisebenzisa izindlela zokukhomba ezisuselwe ezinhlotsheni zedatha ezilula, njengezintambo noma izinombolo. Lezi zindlela zokukhomba aziyifanele idatha ye-vector, enobukhulu obuphezulu futhi idinga amasu okukhomba akhethekile njengezinkomba ezihlanekezelwe noma izihlahla zendawo.

Futhi, izizindalwazi ezivamile aziklanyelwe ukuphatha amanani amakhulu edatha engahlelekile noma enesakhiwo esincane esivame ukuhlotshaniswa nokushumekwa kwe-vector. Isibonelo, isithombe noma ifayela lomsindo lingaqukatha izigidi zamaphoyinti edatha, isizindalwazi esivamile esingakwazi ukuwaphatha kahle.

Izingosi zolwazi zamaVekhtha, ngakolunye uhlangothi, ziklanyelwe ngokukhethekile ukugcina nokubuyisa idatha ye-vector futhi zenzelwe ukusesha okufanayo neminye imisebenzi eyinkimbinkimbi kumadathasethi amakhulu. Basebenzisa amasu okukhomba okukhethekile nama-algorithms adizayinelwe ukusebenza ngedatha enobukhulu obuphezulu, okuwenza asebenze kahle kakhulu kunesizindalwazi esivamile sokugcina kanye nokubuyisa okushumekiwe kwe-vector.

Manje njengoba usufunde okuningi mayelana nedatha ye-vector, ungase uzibuze, zisebenza kanjani? Ake sibheke.

Ngabe i-database ye-vector isebenza kanjani?

Sonke siyazi ukuthi isizindalwazi esihlobene sisebenza kanjani: sigcina izintambo, izinombolo, nezinye izinhlobo zedatha ye-scalar emigqeni namakholomu. Ngakolunye uhlangothi, isizindalwazi se-vector sisebenza kuma-vector, ngakho-ke indlela eyenziwe kahle futhi ebuzwe ngayo ihluke kakhulu.

Kusizindalwazi esivamile, sivame ukubuza imigqa kusizindalwazi lapho inani livame ukufana ncamashi nombuzo wethu. Kumininingwane egciniwe yamavekhtha, sisebenzisa imethrikhi yokufana ukuze sithole ivekhtha efana kakhulu nombuzo wethu.

Isizindalwazi se-vector sisebenzisa inhlanganisela yama-algorithms ambalwa wonke abamba iqhaza ekusesheni komakhelwane okuseduze (ANN). Lawa ma-algorithms athuthukisa usesho ngokusheshisa, ukuphindaphinda noma ukusesha okususelwa kugrafu.

Lawa ma-algorithms ahlanganiswe abe yipayipi elihlinzeka ngokubuyiswa okusheshayo nokunembile komakhelwane bevekhtha ebuzwayo. Njengoba isizindalwazi se-vector sinikeza imiphumela elinganiselwe, ukuhwebelana okuyinhloko esikucabangelayo kuphakathi kokunemba nesivinini. Uma umphumela unemba kakhulu, umbuzo uzohamba kancane. Nokho, isistimu enhle inganikeza ukusesha okusheshayo ngokunemba okucishe kufane.

  • Ukwenza inkomba : Isizindalwazi se-Vector sikhomba ama-vectors sisebenzisa i-algorithm efana ne-PQ, LSH noma i-HNSW. Lesi sinyathelo sihlobanisa ama-vector nesakhiwo sedatha esizovumela ukusesha okusheshayo.
  • Umbuzo : Isizindalwazi se-vector siqhathanisa i-vector yombuzo onenkomba ngokumelene namavekhtha anenkomba kudathasethi ukuze kutholwe omakhelwane abaseduze (usebenzisa imethrikhi yokufana esetshenziswa yileyo nkomba)
  • Ngemuva kokucubungula : Kwezinye izimo, isizindalwazi se-vector silanda omakhelwane bokugcina abaseduze kudathasethi bese ibacubungula ngemva kokubuyisela imiphumela yokugcina. Lesi sinyathelo singase sihlanganise ukuhlukanisa kabusha omakhelwane abaseduze kusetshenziswa isilinganiso esihlukile sokufana.

izinzuzo

Imininingo egciniwe yamaVektha iyithuluzi elinamandla lokusesha okufanayo neminye imisebenzi eyinkimbinkimbi kumasethi amakhulu edatha, engakwazi ukwenziwa ngempumelelo kusetshenziswa imininingwane egciniwe evamile. Ukwakha isizindalwazi se-vector esisebenzayo, ukushumeka kubalulekile, njengoba kuthatha incazelo ye-semantic yedatha futhi kunika amandla ukusesha okufanayo okunembile. Ngokungafani namalabhulali e-vector, isizindalwazi se-vector siklanyelwe ukulingana necala lethu lokusebenzisa, okuyenza ilungele izinhlelo zokusebenza lapho ukusebenza nokulinganisa kubaluleke kakhulu. Ngokukhula kokufunda komshini kanye nobuhlakani bokwenziwa, isizindalwazi se-vector siya ngokuya sibaluleke kakhulu ezinhlobonhlobo zezinhlelo zokusebenza ezihlanganisa izinhlelo zokuncoma, ukusesha izithombe, ukufana kwe-semantic kanye nohlu luyaqhubeka . Njengoba inkambu iqhubeka nokuvela, singalindela ukubona izinhlelo zokusebenza ezintsha zolwazi lwe-vector esikhathini esizayo.

Ercole Palmeri

Innovation newsletter
Ungaphuthelwa yizindaba ezibaluleke kakhulu zokuqamba. Bhalisa ukuze uthole nge-imeyili.

Izihloko zakamuva

I-Veeam ifaka ukusekelwa okuphelele kakhulu kwe-ransomware, kusukela ekuvikelweni kuya ekuphenduleni nasekululameni

I-Coveware ye-Veeam izoqhubeka nokuhlinzeka ngezinsizakalo zokuphendula izigameko zokuntshontshwa kwe-inthanethi. I-Coveware izohlinzeka ngama-forensics kanye nekhono lokulungisa…

23 April 2024

Inguquko Eluhlaza Nedijithali: Indlela Ukugcinwa Okubikezelwayo Kuyiguqula Kanjani Imboni Kawoyela Negesi

Ukulungiswa okuqagelayo kuguqula umkhakha kawoyela negesi, ngendlela emisha nesebenzayo yokuphatha izitshalo.…

22 April 2024

Isilawuli se-antitrust sase-UK siphakamisa i-alamu ye-BigTech nge-GenAI

I-CMA yase-UK ikhiphe isexwayiso mayelana nokuziphatha kwe-Big Tech emakethe yezobunhloli bokwenziwa. Lapho…

18 April 2024

I-Casa Green: inguquko yamandla yekusasa elisimeme e-Italy

Isinqumo esithi "Case Green", esakhiwe yi-European Union ukuze kuthuthukiswe ukusebenza kahle kwamandla ezakhiwo, siphothule inqubo yaso yomthetho ngokuthi...

18 April 2024